---
title: "The ExactMed functions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The ExactMed functions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  prompt = TRUE,
  comment = " "
)
```


## Introduction

This document aims to illustrate the usage of the functions `exactmed()`, `exactmed_c()` and `exactmed_cat()`, as well as their behavior via additional examples. All functions compute natural direct and indirect effects, and controlled direct effects for a binary outcome. However each function handles a specific type of mediator: `exactmed()` accommodates a *binary* mediator, `exactmed_c()` a *continuous* mediator and `exactmed_cat()` a *categorical* mediator. Details on the use of the function `exactmed()` are provided next. Usage of `exactmed_c()` and `exactmed_cat()` is similar to that of `exactmed()`, but differs on some aspects described thereafter.


In `exactmed()`, the user can specify the high levels of the outcome and mediator variables using the input parameters `hvalue_m` and `hvalue_y`, respectively (see the function help). Controlled direct effects are obtained for both possible mediator values ($m=0$ and $m=1$). Natural and controlled effects can either be unadjusted (crude) or adjusted for covariates (that is, conditional effects). By default, adjusted effects estimates are obtained for covariates fixed at their sample-specific mean values (for numerical covariates and categorical covariates through associated dummies). Alternatively, adjusted effects estimates can be obtained for specific values of the covariates that are user-provided. Also, by default, `exactmed()` incorporates a mediator-exposure interaction term in the outcome model, which  can be removed by setting `interaction=FALSE`. Concerning interval estimates, `exactmed()`  generates, by default, $95\%$ confidence intervals obtained by the delta method. Alternatively, percentile bootstrap confidence intervals, instead of delta method confidence intervals, can be obtained by specifying `boot=TRUE` in the function call. In this case, 1000 bootstrap data sets are generated by default. 

In `exactmed_c()` and `exactmed_cat()`, only the high level of the outcome variable can be specified (using the input parameter `hvalue_y`). Moreover, for each scale, the controlled direct effect is computed at a mediator value or level specified by the user using the parameter `mf`. By default, this parameter is fixed at the sample-specific mean of the mediator in `exactmed_c()`, whereas it is fixed at the reference level of the mediator in `exactmed_cat()`. In order to use `exactmed_cat()`, the mediator must be coded as a factor variable in the data set. By default, the reference level of the mediator is the first level of the corresponding factor variable. The extra input parameter `blevel_m` of the `exactmed_cat()` function allows the user to change the default reference level to any other level. It is worth noting that parameter `blevel_m` only potentially impacts the value of the controlled direct effect (not the natural direct and indirect effects).

Due to the similarity between `exactmed()`, `exactmed_c()` and `exactmed_cat()` in terms of use and options offered to the user, most examples will be presented with the `exactmed()` function. In all the `exactmed()` examples presented below we use the data set `datamed`, available after loading the **ExactMed** package. Some of the features of this data set can be found in its corresponding help file (`help(datamed)`). A user interested in the `exactmed_c()` or `exactmed_cat()` functions for the continuous or categorical mediator cases, respectively, will only need to change the name of the function (and data set) in the calling of these examples to understand their use. The data sets `datamed_c` and `datamed_cat`, which feature a continuous and a categorical mediator, respectively, are presented at the end of the document along with a few calling examples. 

Lastly, we recall that all **ExactMed** functions only work on data frames with named columns and no missing values.
 
```{r}
library(ExactMed)

head(datamed)

```
 
The following command verifies whether the data set contains any missing values:
```{r}

as.logical(sum(is.na(datamed)))

```
 
 
## Basic examples
 
 
Suppose that one wishes to obtain unadjusted (crude) mediation effects estimates for a change in exposure from $0$ to $1$, assuming there is no exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals.

In this case, a valid call to `exactmed()` would be:
 

```{r}

results1 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', 
  a1 = 1, a0 = 0, interaction = FALSE
  )  

results1

```
 
 
Mediation effects estimates adjusted for covariates are obtained through the use of the character vectors `m_cov` and `y_cov`, which contain the names of the covariates to be adjusted for in the mediator and outcome models, respectively. The following call to `exactmed()` incorporates covariates `C1` and `C2` in both the mediator and outcome models:

```{r}

results2 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results2

```


The `exactmed()` function also allows for the specification of two different sets of covariates in the mediator and outcome models. For example, the following specification of `m_cov` and `y_cov` means that the mediator model is adjusted for `C1` and `C2`, while the outcome model is adjusted for `C1` only.

However, we advise against this practice unless it is known that excluded covariates are independent of the dependent variable (mediator or outcome) being modeled given the rest of covariates.  


```{r}

results3 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1'), 
  interaction = FALSE
  )

results3

```


By default, the `adjusted` parameter is `TRUE`. If the `adjusted` parameter is set to `FALSE`, `exactmed()` ignores the values of the vectors `m_cov` and `y_cov` and computes unadjusted (crude) effects estimates as in the first example above: 
 
```{r}

results4 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1'), 
  adjusted = FALSE, interaction = FALSE
  )

results4

```
 

To perform an adjusted mediation analysis allowing for exposure-mediator interaction (by default, the interaction parameter is `TRUE`) and using bootstrap based on $100$ resamples with initial random seed $= 1991$ to construct $97\%$ confidence intervals, one should call `exactmed()` as follows:

 
```{r, results='hide'}

results5 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97
  )

```
 
 
```{r}

results5

```


## Firth's penalization 

 
In the situation where we believe that we are facing a problem of separation or quasi-separation, Firth's penalization can be used by setting the `Firth` parameter to `TRUE` (Firth penalized mediation analysis).
If this is the case, Firth's penalization is applied to both the mediator model and the outcome model.  

The `Firth` parameter implements Firth's penalization to reduce the bias of the regression coefficients estimators under scarce or sparse data (see details in `exactmed()` help page):

```{r, results='hide'}

results6 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), Firth = TRUE, 
  boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97
  )

```

 
```{r}

results6

```
 
 
## Stratum-specific effects

 
The following call to `exactmed()` returns mediation effects adjusted for the covariates `C1` and `C2`, when the values of the covariates `C1` and `C2` are $0.1$ and $0.4$, respectively, assuming an exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals:
 
 
```{r}

results7 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1, C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
  )

results7

```
 
 
Common adjustment covariates in vectors `m_cov` and `y_cov` must have the same values; otherwise, the execution of the `exactmed()` function is aborted and an error message is displayed in the R console. Example:


```{r, error=TRUE, collapse=FALSE}

exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.3, C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
 )


```


If the covariates specified in `m_cov_cond` (`y_cov_cond`) constitute some proper subset of `m_cov` (`y_cov`) then the other covariates are set to their sample-specific mean levels. Hence, the call

```{r}

results8 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1)
  )

```

 
 is equivalent to:
 
```{r}

 mc2 <- mean(datamed$C2)
 mc2

results9 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1, C2 = mc2), y_cov_cond = c(C1 = 0.1, C2 = mc2)
  )

```
 
 
This can be checked by comparing the two outputs:
 
 
```{r}

all.equal(results8, results9)

```
 

With this in mind, an error is easily predicted if one makes this call:


```{r,error=TRUE, collapse=FALSE}

exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
  )


```

 
## Categorical covariates

The `exactmed()` function also allows for categorical covariates. Covariates of this type must appear in the data frame as factor, character, or logical columns. To illustrate how `exactmed()` works with categorical covariates, we replace the covariate `C1` in the data set `datamed` by a  random factor column:

```{r}

cate <- factor(sample(c("a", "b", "c"), nrow(datamed), replace =TRUE))
datamed$C1 <- cate

```


It is possible to estimate mediation effects at specific values of categorical covariates using the input parameters `m_cov_cond` and `y_cov_cond`. Note that if the targeted covariates are a mixture of numerical and categorical covariates, the above parameters require to be list-type vectors, instead of atomic vectors as when covariates are only numerical or only categorical.
 
 Hence, if one wants to estimate mediation effects at level 'a' for `C1` and at value $0.4$ for `C2`, assuming an exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals, `exactmed()` should be called as follows:
 
 
```{r}

results10 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = list(C1 = 'a', C2 = 0.4), y_cov_cond = list(C1 = 'a', C2 = 0.4)
  )

results10

```
 

If one does not specify a value for the categorical covariate `C1`, `exactmed()` computes the effects by assigning each dummy variable, created internally by `exactmed()` for each non-reference level of `C1`,  to a value equal to the proportion of observations in the corresponding category (equivalent to setting each dummy variable to its mean value): 


```{r}

results11 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C2 = 0.4), y_cov_cond = c(C2 = 0.4)
  )

results11

```
 
 
## Case-control data 

`exactmed()` can also compute mediation effects with a binary outcome and a binary mediator when the data come from a classical case-control study wherein the probability of being selected only depends on the outcome status. To do so, the true outcome prevalence (that is, the population prevalence $P(Y = hvalue\_y))$ must be known and the `yprevalence` parameter set to this value. `exactmed()` accounts for the ascertainment in the sample by employing weighted regression techniques that use inverse-probability weighting (IPW) with robust standard errors (see details in the documentation). 

The following call to `exactmed()` returns mediation effects supposing that the data have been obtained from a case-control study  and that the true outcome prevalence is $0.1$:

```{r}

results12 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', 
  a1 = 1, a0 = 0, interaction = FALSE, yprevalence = 0.1
  )

results12

```

Of note, the same optional parameters described in the previous sections are available in the case-control study context.

##  Mediation analysis with a continuous mediator

As mentioned in the introduction, in the case of a continuous mediator, the **ExactMed** package allows the user to obtain estimates of the different mediation effects using the `exactmed_c()` function, which essentially offers the same options as `exactmed()`. The only difference is the absence of the `hvalue_m` parameter and the addition of the `mf` parameter, the latter allowing to set the value of the mediator in the calculation of the controlled direct effect (by default fixed at the sample-specific mean of the mediator).

For illustration, the package also makes available to the user the `datamed_c` data set containing a continuous mediator variable. Some of the features of this data set can be found in its corresponding help file (`help(datamed_c)`). We recall that the `exactmed_c()` function only works on data frames with named columns and no missing values.

```{r}

library(ExactMed)

head(datamed_c)


```

We provide below an example of call to `exactmed_c()` that allows to obtain estimates of conditional mediation effects supposing no exposure-mediator interaction in the outcome regression model:

```{r}

results13 <- exactmed_c(
  data = datamed_c, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results13

```


To perform an adjusted mediation analysis allowing for exposure-mediator interaction, using bootstrap based on $100$ resamples with initial random seed $= 1885$ to construct $95\%$ confidence intervals and computing the controlled direct effect when the mediator is set at the value $2$, one should call `exactmed_c()` as follows:

 
```{r, results='hide'}

results14 <- exactmed_c(
  data = datamed_c, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1885, confcoef = 0.95,
  mf = 2
  )

```
 
 
```{r}

results14

```

 
##  Mediation analysis with a categorical mediator
 
 As mentioned in the introduction, in the case of a categorical mediator (coded as factor), the **ExactMed** package allows the user to obtain estimates of mediation effects through the `exactmed_cat()` function, which basically offers the same options as `exactmed()`. The only difference is the absence of the `hvalue_m` parameter and the addition of two extra parameters: `blevel_m` and `mf`. The first one allows to set the reference level of the mediator, which by default  corresponds to the first level of the corresponding factor variable. The second parameter allows to specify the level of the mediator in the calculation of the controlled direct effect. Parameter `blevel_m` will thus impact the mediator regression model and associated output by fixing the reference level of the dependent variable. Parameter `blevel_m` will not impact the values of the natural effects and will impact the controlled direct effect only if the value of the parameter `mf` is not specified by the user. In this case, the value of the parameter `mf` will by default correspond to the value of parameter `blevel_m`. 
 
 For illustration, the package also makes available to the user the `datamed_cat` data set containing a categorical mediator variable. Some of the features of this data set can be found in its corresponding help file (`help(datamed_cat)`). We recall that the `exactmed_cat()` function only works on data frames with named columns and no missing values.
 
```{r}

head(datamed_cat)

```

We provide below an example of call to `exactmed_cat()` to obtain estimates of conditional mediation effects supposing no exposure-mediator interaction in the outcome regression model:

```{r}

results15 <- exactmed_cat(
  data = datamed_cat, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results15

```

To perform an adjusted mediation analysis allowing for exposure-mediator interaction, using bootstrap based on $100$ resamples with initial random seed $= 1875$ to construct $95\%$ confidence intervals and computing the controlled direct effect at the level 'c' of the mediator, one should call `exactmed_cat()` as follows:

 
```{r, results='hide'}

results16 <- exactmed_cat(
  data = datamed_cat, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1875, confcoef = 0.95,
  mf = 'c'
  )

```
 
 
```{r}

results16

```


One can note from the previous output that the reference level for the mediator model is by default the first level of the mediator factor variable (`blevel_m = 'a'`). However, the controlled direct effect is computed at the level 'c' of the categorical mediator, as requested by the parameter `mf` (that is, `mf = 'c'`).