Creates a toy data set \(S = (X,Y)\) where the columns of \(X\) are sampled from an independent Gaussian distribution with mean \(\mu_i\) and standard deviation \(\sigma_i\), i.e. \(N(\mu_i, \sigma_i^2)\). The response \(Y\) is given by \(Y = X^T \beta\). The final dimension will be \(n \times (p + 1)\), with the number of data points \(n\) to be specified.

generate_XY(
  n = 100,
  mu = rep(0, 10),
  sigma = rep(1, 10),
  beta_coefficients = 1:10
)

Arguments

n

desired number of data points in the data set.

mu

a \(p\)-dimensional vector of means for \(\mu\).

sigma

a \(p\)-dimensional vector of non-negative standard deviations for \(\sigma\).

beta_coefficients

a \(p\)-dimensional vector of coefficients for \(\beta\).

Value

An \(n \times (p+1)\) dimensional data frame given by \(S = (X,Y)\). In the base case, the columns \(X_i\) are sampled from \(N(0,1)\). We also have \(n = 100\) and \(p = 10\), with \(beta\)-coefficients 1 to 10.

Examples

generate_XY()
#> # A tibble: 100 × 11
#>         X1      X2     X3       X4      X5      X6      X7     X8      X9
#>      <dbl>   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
#>  1 -0.176   0.605   1.12  -1.39     1.03    0.0491  0.851  -2.90  -0.731 
#>  2  0.756   0.801   0.400  0.437   -1.58   -0.0328  0.430   0.337 -0.396 
#>  3 -0.980   0.318  -0.985  0.316    0.0850 -0.511  -1.06    0.765 -1.42  
#>  4 -0.386   0.103  -0.503  0.195    1.21    0.356  -0.0459 -0.669 -0.109 
#>  5  0.382  -0.658   0.987 -0.456   -0.0235  0.418  -0.132  -0.161  0.0743
#>  6  1.35    0.0339  2.19   0.813    0.0847  0.579  -0.319   1.51   0.450 
#>  7  0.984  -0.650  -0.165  0.275   -0.325  -1.48    0.474  -0.122  0.387 
#>  8  0.0866  0.911  -0.686  0.00601  0.506   1.32    0.813  -0.243 -0.258 
#>  9  0.470  -0.0473  0.941  2.01     0.415   1.03    0.257   0.708  1.52  
#> 10 -0.893  -1.18   -0.164  0.314   -1.18    0.317  -2.15   -0.480  0.551 
#> # … with 90 more rows, and 2 more variables: X10 <dbl>, Y <dbl>

generate_XY(n = 60, mu = 1:4, sigma = rep(1, 4), beta_coefficients = 1:4)
#> # A tibble: 60 × 5
#>        X1    X2    X3    X4     Y
#>     <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  1.28   3.46  2.84  3.66  31.4
#>  2  1.78   1.89  1.91  3.54  25.5
#>  3  0.779  3.33  3.55  5.02  38.2
#>  4  0.862  3.14  4.39  5.46  42.2
#>  5  1.12   1.07  1.41  3.78  22.6
#>  6  2.53   1.81  5.07  4.29  38.5
#>  7  0.344  2.92  1.30  4.26  27.1
#>  8  1.86   1.72  3.86  4.01  32.9
#>  9 -0.188  2.64  2.62  3.82  28.2
#> 10  1.54   3.20  4.85  5.65  45.1
#> # … with 50 more rows