Simulating a causal data set \(S = (X,T,Y_0, Y_1, Y_{obs})\) with potential outcomes.

Creates a causal data set \(S = (X,T,Y_0, Y_1, Y_{obs})\) for causal inference. The \(p\) columns of \(X\) are sampled from an independent Gaussian distribution with mean \(\mu_i\) with standard deviation \(\sigma_i\), i.e. \(N(\mu_i, \sigma_i^2)\). The observations \(Y_0, Y_1\) correspond to the outcome if the treatment \(T\) is 0 or 1, respectively. A binary treatment \(T\) taking values 0 or 1 is sampled with probability \(p_{treatment}\) and \(Y_{obs}\) is obtained by choosing the potential outcome (either \(Y_0\) or \(Y_1\)) corresponding to the sampled treatment \(T\). The base outcome \(Y = X^T \beta\) is assumed to depend on \(X\) in a linear fashion, and the average treatment effect corresponds to the additive effect of obtaining treatment \(T = 1\). See Causality (Pearl 2009) for further details and a general introduction to causal inference.

causal_XTY_binary(
  n = 100,
  mu = rep(0, 4),
  sigma = rep(1, 4),
  beta_coefficients = 1:4,
  treatment_prob = 0.5,
  treatment_effect = 10
)

Arguments

n	The desired number of data points in the data set.
mu	A \(p\)-dimensional vector of means for \(\mu\).
sigma	A \(p\)-dimensional vector of non-negative standard deviations for \(\sigma\).
beta_coefficients	A \(p\)-dimensional vector of coefficients for \(\beta\).
treatment_prob	A probability between 0 and 1 specifying the probability of treatment assignment \(p_{treatment}\).
treatment_effect	The average treatment between two potential outcomes \(Y_0\) and \(Y_1\).

Value

A causal data set \(S = (X,T,Y_0, Y_1, Y_{obs})\). In the default case, the \(p\) columns \(X_i\) are sampled from \(N(0,1)\) and the coefficients are all 1. We also have \(n = 100\), \(p = 4\), with beta-coefficients 1 to 4. The base treatment probability is 0.5 (i.e. a coin flip), with the default average treatment effect set to 10.

Examples

causal_XTY_binary()
#> # A tibble: 100 × 8
#>          X1     X2      X3     X4 treatment     Y0     Y1 Y_observed
#>       <dbl>  <dbl>   <dbl>  <dbl>     <dbl>  <dbl>  <dbl>      <dbl>
#>  1 -1.40    -0.387 -0.429  -0.356         0 -4.89   5.11      -4.89 
#>  2  0.255   -0.785  1.36   -1.06          1 -1.49   8.51       8.51 
#>  3 -2.44    -1.06  -0.0709  1.08          1 -0.455  9.55       9.55 
#>  4 -0.00557 -0.796 -0.272   1.18          0  2.31  12.3        2.31 
#>  5  0.622   -1.76  -2.45    0.198         1 -9.44   0.563      0.563
#>  6  1.15    -0.691  0.0655 -0.400         1 -1.64   8.36       8.36 
#>  7 -1.82    -0.559 -1.10    0.616         0 -3.77   6.23      -3.77 
#>  8 -0.247   -0.537 -0.633   1.97          1  4.68  14.7       14.7  
#>  9 -0.244    0.227 -2.06    1.88          1  1.56  11.6       11.6  
#> 10 -0.283    0.978  2.65   -1.59          1  3.27  13.3       13.3  
#> # … with 90 more rows

causal_XTY_binary(n = 40, mu = 1:7, sigma = rep(1, 7),
                  beta_coefficients = 1:7, treatment_prob = 0.75, treatment_effect = 25)
#> # A tibble: 40 × 11
#>         X1    X2    X3    X4    X5    X6    X7 treatment    Y0    Y1 Y_observed
#>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl> <dbl> <dbl>      <dbl>
#>  1  2.78   2.06   2.34  3.16  3.78  5.25  6.21         1  120.  145.       145.
#>  2  1.39   1.73   3.63  2.65  2.55  5.84  6.27         1  118.  143.       143.
#>  3  0.0813 1.55   2.49  3.18  3.51  6.35  7.68         1  133.  158.       158.
#>  4 -0.584  0.589  3.27  3.37  4.57  5.71  6.77         1  128.  153.       153.
#>  5  0.916  1.49   3.47  4.82  4.06  6.10  5.49         1  129.  154.       154.
#>  6 -1.09   1.73   3.72  4.30  4.88  6.72  6.42         1  140.  165.       165.
#>  7  1.00   0.915  3.61  5.81  6.34  5.39  4.98         1  136.  161.       161.
#>  8  0.644  2.36   2.38  3.11  4.14  4.89  7.40         1  127.  152.       152.
#>  9  2.15   1.66   3.22  3.95  5.67  6.53  7.55         0  151.  176.       151.
#> 10  0.779  3.36   4.13  3.53  3.58  6.74  7.03         1  142.  167.       167.
#> # … with 30 more rows