generate_X_cat.Rd
Creates a toy data set \(S = (X, X_{cat})\) where the columns of \(X\) are sampled from an independent Gaussian distribution with mean \(\mu_i\) and standard deviation \(\sigma_i\), i.e. \(N(\mu_i, \sigma_i^2)\), and the columns of \(X_{cat}\) are categorical, sampled with replacement from a given number of categories (indexed by integers). The final dimension will be \(n \times (p_1 + p_2)\), where \(p_1\) is the number of columns in \(X\) and \(p_2\) is the number of columns in \(X_{cat}\), with the number of data points \(n\) to be specified.
generate_X_cat( n = 100, mu = rep(0, 10), sigma = rep(1, 10), no_of_cat = c(4, 5) )
n | The desired number of data points in the data set. |
---|---|
mu | A \(p_1\)-dimensional vector of means for \(\mu\). |
sigma | A \(p_1\)-dimensional vector of non-negative standard deviations for \(\sigma\). |
no_of_cat | A \(p_2\)-dimensional vector where the entries indicate the number of categories desired for each column of \(X_{cat}\). |
An \(n \times (p_1 + p_2)\) dimensional data frame given by \(S = (X, X_{cat})\). In the default case, the columns of \(X\) are sampled from \(N(0,1)\), \(n = 100\) and \(p_1 = 10, p_2 = 2\), i.e. two additional categorical columns of \(X_{cat}\) are added. The columns of \(X_{cat}\) are factors.
generate_X_cat() #> # A tibble: 100 × 12 #> X1 X2 X3 X4 X5 X6 X7 X8 X9 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0.109 -0.441 0.552 0.136 -0.591 -0.141 0.238 1.53 1.68 #> 2 0.231 -2.18 -0.162 -1.04 0.857 -1.83 -1.30 0.286 1.01 #> 3 -1.27 -0.678 1.35 1.37 0.509 0.404 1.74 0.966 0.143 #> 4 0.621 0.729 -0.00533 0.212 0.397 -0.0138 1.46 0.0417 0.0254 #> 5 -0.642 -1.24 -2.41 -0.420 0.387 -0.102 -1.34 -0.187 -0.678 #> 6 0.662 -0.588 0.990 -0.515 0.370 0.202 -0.00785 -1.35 -0.372 #> 7 0.246 1.11 -1.23 -0.481 1.04 0.803 -0.709 -0.953 1.21 #> 8 -0.848 -0.502 0.0274 -0.0229 2.36 -0.588 1.42 -1.28 -0.251 #> 9 -0.435 0.347 -0.391 -0.236 -0.914 2.11 2.10 1.35 -0.541 #> 10 0.883 1.75 2.40 -0.0429 1.28 0.106 0.0564 0.0875 -0.0933 #> # … with 90 more rows, and 3 more variables: X10 <dbl>, X11 <fct>, X12 <fct> generate_X_cat(n = 40, mu = 1:6, sigma = rep(1, 6), no_of_cat = c(2,3,5)) #> # A tibble: 40 × 9 #> X1 X2 X3 X4 X5 X6 X7 X8 X9 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> #> 1 0.332 1.74 4.87 4.45 4.99 6.55 1 2 3 #> 2 -0.995 3.76 2.00 4.51 4.07 4.89 1 3 2 #> 3 0.643 1.74 3.28 4.05 5.83 5.28 1 1 1 #> 4 1.59 2.53 1.97 5.39 3.40 6.96 2 1 2 #> 5 1.05 0.836 3.00 2.71 4.86 6.96 2 1 3 #> 6 2.54 3.45 3.02 4.65 4.96 5.40 1 3 1 #> 7 -0.201 3.35 3.08 3.38 4.73 7.08 1 1 1 #> 8 1.30 1.72 2.55 5.29 4.23 5.11 2 2 5 #> 9 0.415 1.79 2.31 3.93 5.08 5.82 2 2 2 #> 10 1.20 2.13 3.24 4.35 5.50 6.96 1 1 1 #> # … with 30 more rows