causal_XTY_multiple.Rd
Creates a causal data set \(S = (X, Y_i, T, Y_{obs})\) for causal inference. The \(p\) columns of \(X\) are sampled from an independent Gaussian distribution with mean \(\mu_i\) with standard deviation \(\sigma_i\), i.e. \(N(\mu_i, \sigma_i^2)\). A treatment \(T\) is sampled, where more than 2 treatments are possible. The observations \(Y_i\) correspond to the outcome if the treatment \(i\) is applied. The outcome \(Y = X^T \beta\) is assumed to depend on \(X\) in a linear fashion, and the treatment effect of treatment \(T = i\) is additive. See Causality (Pearl 2009) for further details and a general introduction to causal inference.
causal_XTY_multiple( n = 100, mu = rep(0, 3), sigma = rep(1, 3), beta_coefficients = 1:3, treatment_prob = rep(0.25, 4), treatment_effect = c(10, 20, 30, 40) )
n | desired number of data points in the data set. |
---|---|
mu | a \(p\)-dimensional vector of means for \(\mu\). |
sigma | a \(p\)-dimensional vector of non-negative standard deviations for \(\sigma\). |
beta_coefficients | a \(p\)-dimensional vector of coefficients for \(\beta\). |
treatment_prob | a probability vector with weights summing to 1, corresponding to the probability of treatment. |
treatment_effect | a vector corresponding to the additive treatment effect of each treatment on the outcome \(Y\). |
A causal data set \(S = (X,Y_i, T, Y_{obs})\) with multiple potential outcomes. In the default case, the \(p\) columns \(X_i\) are sampled from \(N(0,1)\), with \(beta\)-coefficients 1 to 3 for the base outcome \(Y\). We also have \(n = 100\), \(p = 3\), where \(p\) corresponds to the number of columns in \(X\). The treatment probabilities are equally likely.
causal_XTY_multiple() #> # A tibble: 100 × 10 #> X1 X2 X3 Y Y1 Y2 Y3 Y4 treatment Y_observed #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> #> 1 0.0639 0.354 -0.276 -0.0562 9.94 19.9 29.9 39.9 4 39.9 #> 2 -0.919 0.530 0.384 1.29 11.3 21.3 31.3 41.3 1 11.3 #> 3 0.901 -0.311 1.45 4.62 14.6 24.6 34.6 44.6 4 44.6 #> 4 -0.798 -0.244 1.73 3.92 13.9 23.9 33.9 43.9 4 43.9 #> 5 0.668 -0.292 0.456 1.45 11.5 21.5 31.5 41.5 3 31.5 #> 6 0.155 -1.13 0.708 0.0216 10.0 20.0 30.0 40.0 4 40.0 #> 7 0.129 -1.12 2.06 4.07 14.1 24.1 34.1 44.1 1 14.1 #> 8 -1.53 2.05 0.0239 2.65 12.6 22.6 32.6 42.6 4 42.6 #> 9 0.202 -0.910 0.246 -0.879 9.12 19.1 29.1 39.1 2 19.1 #> 10 -0.718 0.458 0.272 1.02 11.0 21.0 31.0 41.0 3 31.0 #> # … with 90 more rows causal_XTY_multiple(n = 40, mu = rep(2, 7), sigma = 1:7, beta_coefficients = 1:7, treatment_prob = c(0.4, 0.1, 0.1, 0.2, 0.2), treatment_effect = 1:5) #> # A tibble: 40 × 15 #> X1 X2 X3 X4 X5 X6 X7 Y Y1 Y2 Y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0.294 -1.10 -0.118 7.32 1.70 -4.90 6.16 49.3 50.3 51.3 52.3 #> 2 1.14 3.56 -3.57 -1.25 -7.32 0.370 9.76 26.5 27.5 28.5 29.5 #> 3 1.86 4.14 1.26 9.18 -4.37 4.74 -4.60 25.0 26.0 27.0 28.0 #> 4 1.68 1.63 0.993 4.75 -6.91 1.90 6.90 52.1 53.1 54.1 55.1 #> 5 1.83 5.12 1.25 2.36 -0.543 -1.25 -0.973 8.22 9.22 10.2 11.2 #> 6 0.764 1.57 3.38 3.30 -6.68 7.25 -6.00 -4.67 -3.67 -2.67 -1.67 #> 7 0.0977 3.86 0.620 2.29 2.20 6.45 3.63 93.9 94.9 95.9 96.9 #> 8 1.91 2.82 2.18 2.93 1.38 1.01 0.675 43.5 44.5 45.5 46.5 #> 9 2.03 -0.560 -0.228 9.95 -1.06 -2.51 -9.64 -47.8 -46.8 -45.8 -44.8 #> 10 2.46 0.435 -4.49 -1.81 2.80 -5.52 15.4 71.4 72.4 73.4 74.4 #> # … with 30 more rows, and 4 more variables: Y4 <dbl>, Y5 <dbl>, #> # treatment <int>, Y_observed <dbl>