Chapter 6 Univariate statistical test
In statistical analysis plan (SAP), univariate tests are recommended. In epidemiology it permits a first assessment of possible relationships between the outcome and the covariates (or possible confounders).
The choice of the test depends on the type of the variable, the shape of its distribution, and the number of groups to compare.
6.1 Quantitative data
6.1.1 Two groups comparison
To compare the distribution of a quantitative variable between 2 groups you first need to describe the distribution of the data (shape) to choose between a parametric or a non-parametric test.
Parametric tests are based on the parameters of the distribution i.e. the mean and the variance of the data series therefore the mean should be a reliable and correct parameter to summarize the distribution of the values in the 2 groups. The distributions should be similar to a Normal distribution (Bell-shape like) and not skewed. To verify this assumption one can use the Shapiro test. In addition, the 2 groups should have similar variance or the test will have to be adjusted for uneven variances. To verify this assumption one can do a variance test.
In all examples below, we are interested to see the risk factors for sleep apnea.
R
load("sleepApnea-clean.Rdata")
# globally
# shapiro.test(sleep$SleepApnea)
# But we want to compare 2 groups
by(sleep$SleepApnea, sleep$gender, shapiro.test)
## sleep$gender: female
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.86531, p-value = 0.01203
##
## ------------------------------------------------------------
## sleep$gender: male
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.93775, p-value = 6.453e-05
In both group, the p-value is below 5%. We reject H0. The sleep apnea data do not follow a Normal distribution. We should use Wilcoxon test, a non-parametric test to compare 2 groups not normally distributed.
We can try a log transformation on the data, a mathematical trick that (sometimes) helps reshaping the distribution into a Normal distribution
$SleepApnea_log <- log(sleep$SleepApnea)
sleep# To compare the 2 groups
by(sleep$SleepApnea_log, sleep$gender,shapiro.test)
## sleep$gender: female
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.95755, p-value = 0.5251
##
## ------------------------------------------------------------
## sleep$gender: male
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.98737, p-value = 0.3932
Indeed, this time we fail to reject H0 for the 2 groups. We can state now that we cannot prove that the 2 data series are different from Normal distribution and a parametric test can be used for comparison. We will be able to apply a Student t-test on the log-transformed sleep apnea data.
Stata
The analysis steps are the same in STATA and the syntax is similar.
swilk SleepApnea, by(gender)
In both groups, the p-value is below 5%. We reject H0. The sleep apnea data do not follow a Normal distribution. We should use Wilcoxon test, a non-parametric test to compare 2 groups not normally distributed.
gen SleepApnea_log = log(SleepApnea)
swilk SleepApnea_log, by(gender)
After log-transformation, we cannot show that the data distributions are different from a Normal distribution. We will be able to use a Student T-test (parametric test) on the log-transformed data series.
Homoscedasticity or homogeneity of variances
In the configuration of a parametric test, the variances have to be equal in order to perform a classical Student t-test.
The hypothesis are:
- H0: the variance of sleep apnea values among female is equal to the variance of sleep apnea values among male
- H1: the variance of sleep apnea values among female is not equal to the variance of sleep apnea values among male
- with alpha= 5%
R
The Bartlett’s test assess that the variances in each of the groups (samples) are the same.
bartlett.test(SleepApnea_log~gender, data=sleep)
##
## Bartlett test of homogeneity of variances
##
## data: SleepApnea_log by gender
## Bartlett's K-squared = 1.5169, df = 1, p-value = 0.2181
The p-value is 0.21, we fail to reject H0. The variance are homogeneous and the adjustement will be required for the t-test.
Note: The var.test() is a special case of comparing variances in two samples from normal distributions.
Student t-test
Since the log-transformed data are not different from a Normal distribution and the variances are homogeneous in both gender group, we can applied a Student T-test to the sleep apnea values to answer the following hypothesis:
- H0: On average,in log scale, the sleep apnea values among female is equal to the sleep apnea values among male
- H1: On average, in log scale, the sleep apnea values among female is not equal to the sleep apnea values among male
- with alpha= 5%
R
The output of t.test() function call takes too much room on the web page so for display purpose we stored the results in a variable that we named ttest_res. We can then retrieve elements of results by their names in the list (see help page for details). For instance, the p-value can be access as follow:
<- t.test(SleepApnea_log~gender, var.qual=TRUE, data=sleep)
ttest_res $p.value ttest_res
## [1] 0.903352
The test shows that we fail to reject H0. We cannot show any differences between gender as sleep apnea is concerned.
Note: (1) storing the results of a test in an object can become handy when one want iterate the test over several covariates. (2) By default the t.test() function is set for unequal variance.
Wilcoxon test
Non parametric tests can always be used but they tend to be less powerful, meaning that they are less likely to identify small differences in small sample.
The Wilcoxon test is based solely on the order in which the observations from the two samples fall. It is also know as the Wilcoxon rank sum test. The null hypothesis tested is:
- H0: The distribution of the sleep apnea values among female and male are equal
- H1: The distribution of the sleep apnea values among female and male are not equal
- with alpha= 5%
R
In R to perform a one- and two-sample non-parametric rank sum test you should use the wilcoxon.test() function on vectors of data. The two-sample test is also known as Mann-Whitney test but used the same the function call.
If we use it on the original sleep apnea data
by(sleep$SleepApnea, sleep$gender,shapiro.test)
## sleep$gender: female
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.86531, p-value = 0.01203
##
## ------------------------------------------------------------
## sleep$gender: male
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.93775, p-value = 6.453e-05
The conclusion is the same as with the log-transformed scale and parametric test. Here, we fail to reject H0. No difference can be shown between gender which adds evidence against gender related outcome.
6.1.2 ANOVA or Kruskal-Wallis tests
To compare more than 2 groups you should use a generalization of the T-test or Wilcoxon test.
R
In R, let’s look at the cholesterol in groups along with the sleep apnea response variable?
$chlgr <- ifelse(sleep$cholesterol <= 2,
sleep"normal",
ifelse(sleep$cholesterol > 2 &
$cholesterol<= 2.39,
sleep"bordeline high", "high"))
by(sleep$SleepApnea, sleep$chlgr,shapiro.test)
## sleep$chlgr: bordeline high
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.96073, p-value = 0.1569
##
## ------------------------------------------------------------
## sleep$chlgr: high
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.89608, p-value = 0.01509
##
## ------------------------------------------------------------
## sleep$chlgr: normal
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.91099, p-value = 0.0002685
by(sleep$SleepApnea_log, sleep$chlgr,shapiro.test)
## sleep$chlgr: bordeline high
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.96631, p-value = 0.2469
##
## ------------------------------------------------------------
## sleep$chlgr: high
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.98136, p-value = 0.9107
##
## ------------------------------------------------------------
## sleep$chlgr: normal
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.98139, p-value = 0.4677
The sleep apnea variable in its original scale is not normally distributed in 2 of the 3 groups of cholesterol, with quite an important deviation for latest (High cholesterol). In the log transformed data, all distribution are similar to a normal distribution.
Next, we look at the homoscedasticity:
bartlett.test(sleep$SleepApnea_log ~ sleep$chlgr)
##
## Bartlett test of homogeneity of variances
##
## data: sleep$SleepApnea_log by sleep$chlgr
## Bartlett's K-squared = 0.45559, df = 2, p-value = 0.7963
Finally, we can perform an ANOVA than will assess whether the variance within groups is similar than between group (H0) or not (H1).
- H0: On average,in log scale, the sleep apnea values among the different groups of cholesterol are equal
- H1: At least one cholesterol group differs from the other in term of sleep apnea values
- with alpha= 5%
<- aov(sleep$SleepApnea_log ~ sleep$chlgr)
res_anova summary(res_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep$chlgr 2 0.463 0.2314 1.095 0.338
## Residuals 126 26.619 0.2113
## 1 observation deleted due to missingness
In conclusion, we fail to reject H0. We cannot demonstrate any statistically significant associated between log sleep apnea and cholesterol group levels.
If any statistically significant association was observed we could have used the TukeyHSD() to decipher which group. This post-hoc pairwise comparison is commonly performed after significant effects when there are three or more levels of a factor.
Stata
In Stata the name of the function is more in link with the actual computation:
// generate cholesterol groups
generate byte chlgr=1 if cholesterol <= 2
replace chlgr=2 if cholesterol>2
& cholesterol<=2.39replace chlgr= 3 if cholesterol>2.39
// test normality and homoscedasticity
swilk SleepApnea_log, by(chlgr)
sdtest SleepApnea_log, by(chlgr)
// ANOVA
oneway SleepApnea_log chlgr
Stata has three built-in pairwise methods (sidak, bonferroni and scheffe) in the oneway command. Although these options are easy to use, many researchers consider the methods to be too conservative for pairwise comparisons, especially when the are many levels. The Sidak method is the least conservative of the three followed, in order, by Bonferroni and Scheffe.
oneway SleepApnea_log chlgr, sidak bonferroni scheffe
Stata does not have a simple Tukey test command built-in, a few easy steps will download a user-created one that works just as well. Assuming you have not already installed the command, type findit tukeyhsd into the Command window and click on the appropriate link for download (see Reed College help page for details).
Next, you only need to call the function
tukeyhsd chlgr
6.2 Qualitative data (categorical)
Outcome and covariate can be both categorical and univariate statistical tests are then based on counts (although hypothesis are expressed in proportion).
Fisher Exact Test
The Fisher Exact test is the most accurate and powerful test for qualitative data. The only limit is the greedy permutation algorithm that your personal computer may not be in capacity to handle.
Let’s look at the high apneic patients (sleep apnea level above 69) and its possible relationship with obesity. The null hypothesis is:
- H0: Among the highly apneic patients, the proportion obese persona is equal to the proportion of normal weight persona
- H1: The proportion obese persona is not equal in the group of highly apneic patients and the low (or no) apneic patients
- with alpha= 5%
R
In R we first create the apnea_high variable as a categorical variable (factor). Next, we use the fisher.test() function call
$apnea_high <- ifelse(sleep$SleepApnea > 69, "Yes", "No")
sleep$apnea_high <- factor(sleep$apnea_high)
sleepxtabs(~ sleep$obese + sleep$apnea_high )
## sleep$apnea_high
## sleep$obese No Yes
## 0 89 8
## 1 25 7
fisher.test(sleep$obese, sleep$apnea_high)
##
## Fisher's Exact Test for Count Data
##
## data: sleep$obese and sleep$apnea_high
## p-value = 0.05407
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.8615118 10.8256470
## sample estimates:
## odds ratio
## 3.081952
We gather from the test we cannot reject H0 and there do not seem have a relationship between high sleep apnea level and obesity (p-value 0.054). Although the p-value is borderline and that observation might be due to a lack of power (by construction) as only 15 patients are in the High apnea group.
Pearson Chi-Square Test
An alternative to the Fisher Exact test is the Chi-Square test. The hypothesis are the same but there is one assumption for the Chi-Square: all expected counts should be greater than 5 otherwise the p-value will be poorly estimated.
R
# create contingency table
<- table(sleep$apnea_high, sleep$obese)
aHigh2ob # Perform the test and store the result
# in a object aHigh2ob_chisq
<- chisq.test(aHigh2ob)
aHigh2ob_chisq # After the R warning
# verify the expected counts
$expected aHigh2ob_chisq
##
## 0 1
## No 85.72093 28.27907
## Yes 11.27907 3.72093
Here one expected count is 3.72 \(<\) 5 so you should not use the result of that test which are too much approximated.