[Data] Longitudinal Study of Treatment for Depression
Subjects were classified into two groups based on their diagnosis severity (mild/severe)
In each group, subjects were randomly assigned to one of two drugs
Dichotomous assessment of each subject’s extent of suffering from mental depression was made at weeks 1, 2, and 4
[Goal] Marginal model for
\[ \text{logit}(\pi) = \alpha+\beta_1 \text{severity}+\beta_2\text{drug}+\beta_3\text{time}+\beta_4(\text{drug}\times\text{time}) \]
[Code]
depression = read.csv("../data/depression.csv",header=TRUE)
head(depression)
## drug severity response time subject
## 1 Standard Mild N 0 1
## 2 Standard Mild N 1 1
## 3 Standard Mild N 2 1
## 4 Standard Mild N 0 2
## 5 Standard Mild N 1 2
## 6 Standard Mild N 2 2
depression$drug <- relevel(depression$drug , ref = "Standard")
library(lattice)
sample.proportion = aggregate((response == "N")~ severity+drug+time, data=depression, mean)
xyplot(sample.proportion[,4] ~ factor(time)|severity, group=drug,
type="b", data=sample.proportion,
auto.key=TRUE, ylab="Sample Proportion", xlab="log2(Week)")
library(gee) # install this package with: install.packages("gee")
gee.fit.1 = gee((response=="N") ~ severity+drug*time, id=subject, data=depression, family=binomial)
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) severitySevere drugNew Drug time
## -0.02798843 -1.31391092 -0.05960381 0.48241209
## drugNew Drug:time
## 1.01744498
summary(gee.fit.1)
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logit
## Variance to Mean Relation: Binomial
## Correlation Structure: Independent
##
## Call:
## gee(formula = (response == "N") ~ severity + drug * time, id = subject,
## data = depression, family = binomial)
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -0.94844242 -0.40683252 0.05155758 0.38830952 0.80242231
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -0.02798843 0.1627083 -0.1720160 0.1741865 -0.1606808
## severitySevere -1.31391092 0.1453432 -9.0400569 0.1459845 -9.0003423
## drugNew Drug -0.05960381 0.2205812 -0.2702126 0.2285385 -0.2608042
## time 0.48241209 0.1139224 4.2345663 0.1199350 4.0222784
## drugNew Drug:time 1.01744498 0.1874132 5.4288855 0.1876938 5.4207709
##
## Estimated Scale Parameter: 0.9854113
## Number of Iterations: 1
##
## Working Correlation
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
[Result]
The estimated time effect is \(\hat\beta_3 = 0.482\) for the standard drug and \(\hat\beta_3 + \hat\beta_4 = 1.5\) for the new one
The change of slope due to new drug is \(\hat\beta_4 = 1.017\) (Robust SE = 0.188). The Wald test of no interaction, \(H_0: \beta_4 = 0\), tests a common time effect for each drug. It has z test statistic equal to \(1.017/0.188 = 5.4\) (\(p\)-value \(<\) 0.0001). There is strong evidence of faster improvement for the new drug
The severity of depression estimate is \(\hat\beta_1 = -1.314\) (Robust SE = 0.146). For each drug-time combination, the estimated odds of a normal response when the initial diagnosis was severe equal \(\exp(-1.314) = 0.27\) times the estimated odds when the initial diagnosis was mild
The estimate \(\hat\beta_2 =-0.060\) (Robust SE = 0.228) for the drug effect applies only when time = 0 (i.e., after one week), for which the interaction term does not contribute to the drug effect. It indicates an insignificant difference between the drugs after 1 week
At time \(t\), the estimated odds of normal response with the new drug are \(\exp(-0.060 + 1.017t)\) times the estimated odds for the standard drug, for each initial diagnosis level. By the final week (\(t = 2\)), this estimated odds ratio has increased to 7.2
In summary, severity, drug treatment, and time all have substantial effects on the probability of a normal response. The chance of a normal response is similar for the two drugs initially and increases with time, but it increases more quickly for those taking the new drug than the standard drug
This conclusion is consistent with the graphical representation of the sample proportions
R output also contains the working correlation matrix used:
The working correlation matrix is \(3\times3\) because there are 3 responses per subject
This working correlation matrix assumes the 3 responses are independent
R also accommodates other working correlation matrix structures: (the default), , , , , and
Responses between subjects are independent
gee.fit.2 = gee((response=="N") ~ severity+drug*time, id=subject, data=depression,
corstr = "exchangeable", family=binomial)
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) severitySevere drugNew Drug time
## -0.02798843 -1.31391092 -0.05960381 0.48241209
## drugNew Drug:time
## 1.01744498
summary(gee.fit.2)
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logit
## Variance to Mean Relation: Binomial
## Correlation Structure: Exchangeable
##
## Call:
## gee(formula = (response == "N") ~ severity + drug * time, id = subject,
## data = depression, family = binomial, corstr = "exchangeable")
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -0.94843397 -0.40683122 0.05156603 0.38832332 0.80238627
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -0.02809866 0.1625499 -0.1728617 0.1741791 -0.1613205
## severitySevere -1.31391033 0.1448627 -9.0700418 0.1459630 -9.0016667
## drugNew Drug -0.05926689 0.2205340 -0.2687427 0.2285569 -0.2593091
## time 0.48246420 0.1141154 4.2278625 0.1199383 4.0226037
## drugNew Drug:time 1.01719312 0.1877051 5.4191018 0.1877014 5.4192084
##
## Estimated Scale Parameter: 0.985392
## Number of Iterations: 2
##
## Working Correlation
## [,1] [,2] [,3]
## [1,] 1.000000000 -0.003432732 -0.003432732
## [2,] -0.003432732 1.000000000 -0.003432732
## [3,] -0.003432732 -0.003432732 1.000000000
[Result]
R output also contains the working correlation matrix used:
All the off-diagonal elements are the same, as expected
The off-diagonal elements are pretty close to 0, indicating weak correlation among responses from the same subject
GEE does not provide SE for the off-diagonal elements so a test can not be conducted
[Comments]
In the presence of clustering, specification of the independence correlation structure seems like a poor choice. Indeed, it is the least desirable option for describing within-cluster correlation. However, when working with large or complex data sets, it is not always possible to obtain GEE estimates for all of the correlation structures. In practice, the independence structure may be the only structure for which GEE estimates can be obtained