generate_XY.Rd
Creates a toy data set \(S = (X,Y)\) where the columns of \(X\) are sampled from an independent Gaussian distribution with mean \(\mu_i\) and standard deviation \(\sigma_i\), i.e. \(N(\mu_i, \sigma_i^2)\). The response \(Y\) is given by \(Y = X^T \beta\). The final dimension will be \(n \times (p + 1)\), with the number of data points \(n\) to be specified.
generate_XY( n = 100, mu = rep(0, 10), sigma = rep(1, 10), beta_coefficients = 1:10 )
n | desired number of data points in the data set. |
---|---|
mu | a \(p\)-dimensional vector of means for \(\mu\). |
sigma | a \(p\)-dimensional vector of non-negative standard deviations for \(\sigma\). |
beta_coefficients | a \(p\)-dimensional vector of coefficients for \(\beta\). |
An \(n \times (p+1)\) dimensional data frame given by \(S = (X,Y)\). In the base case, the columns \(X_i\) are sampled from \(N(0,1)\). We also have \(n = 100\) and \(p = 10\), with \(beta\)-coefficients 1 to 10.
generate_XY() #> # A tibble: 100 × 11 #> X1 X2 X3 X4 X5 X6 X7 X8 X9 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 -0.176 0.605 1.12 -1.39 1.03 0.0491 0.851 -2.90 -0.731 #> 2 0.756 0.801 0.400 0.437 -1.58 -0.0328 0.430 0.337 -0.396 #> 3 -0.980 0.318 -0.985 0.316 0.0850 -0.511 -1.06 0.765 -1.42 #> 4 -0.386 0.103 -0.503 0.195 1.21 0.356 -0.0459 -0.669 -0.109 #> 5 0.382 -0.658 0.987 -0.456 -0.0235 0.418 -0.132 -0.161 0.0743 #> 6 1.35 0.0339 2.19 0.813 0.0847 0.579 -0.319 1.51 0.450 #> 7 0.984 -0.650 -0.165 0.275 -0.325 -1.48 0.474 -0.122 0.387 #> 8 0.0866 0.911 -0.686 0.00601 0.506 1.32 0.813 -0.243 -0.258 #> 9 0.470 -0.0473 0.941 2.01 0.415 1.03 0.257 0.708 1.52 #> 10 -0.893 -1.18 -0.164 0.314 -1.18 0.317 -2.15 -0.480 0.551 #> # … with 90 more rows, and 2 more variables: X10 <dbl>, Y <dbl> generate_XY(n = 60, mu = 1:4, sigma = rep(1, 4), beta_coefficients = 1:4) #> # A tibble: 60 × 5 #> X1 X2 X3 X4 Y #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1.28 3.46 2.84 3.66 31.4 #> 2 1.78 1.89 1.91 3.54 25.5 #> 3 0.779 3.33 3.55 5.02 38.2 #> 4 0.862 3.14 4.39 5.46 42.2 #> 5 1.12 1.07 1.41 3.78 22.6 #> 6 2.53 1.81 5.07 4.29 38.5 #> 7 0.344 2.92 1.30 4.26 27.1 #> 8 1.86 1.72 3.86 4.01 32.9 #> 9 -0.188 2.64 2.62 3.82 28.2 #> 10 1.54 3.20 4.85 5.65 45.1 #> # … with 50 more rows