Baseline-Category Logit

[Data] Alligator Data: In a study by the Florida Game and Fresh Water Fish Commission on the foods that alligators in the wild choose to eat, 59 alligators in Lake George, Florida, were sampled and the primary food type found in the alligators stomach was recorded along with the alligator length.

[Goal] Study the relationship between the alligator length and primary food choice.

[Code]

alligator = read.csv("../data/alligator.csv",header=TRUE)

plot(food ~ length, data=alligator)

library(nnet)  # run install.packages("nnet") first
## Warning: package 'nnet' was built under R version 3.4.2
# set "Others" as the baseline; otherwise, the defualt is the first one "Fish". 

alligator$food <- relevel(alligator$food, ref = "O")

fit = multinom(food ~ length, data=alligator) 
## # weights:  9 (4 variable)
## initial  value 64.818125 
## iter  10 value 49.170785
## final  value 49.170622 
## converged
summary(fit)
## Call:
## multinom(formula = food ~ length, data = alligator)
## 
## Coefficients:
##   (Intercept)     length
## F    1.617952 -0.1101836
## I    5.697543 -2.4654695
## 
## Std. Errors:
##   (Intercept)    length
## F    1.307291 0.5170838
## I    1.793820 0.8996485
## 
## Residual Deviance: 98.34124 
## AIC: 106.3412

Goodness of Fit Statistics

observed = matrix(c(371,250,64,25, 49,45,9,5, 74,71,15,13), ncol=3)
observed
##      [,1] [,2] [,3]
## [1,]  371   49   74
## [2,]  250   45   71
## [3,]   64    9   15
## [4,]   25    5   13
hat_pi = matrix(c(0.76,0.68,0.71,0.62, 0.10,0.12,0.10,0.12, 0.14,0.20,0.19,0.26),ncol=3)
hat_pi
##      [,1] [,2] [,3]
## [1,] 0.76 0.10 0.14
## [2,] 0.68 0.12 0.20
## [3,] 0.71 0.10 0.19
## [4,] 0.62 0.12 0.26
num.obs = rowSums(observed)     
num.obs
## [1] 494 366  88  43
expected = diag(num.obs) %*% hat_pi
expected
##        [,1]  [,2]  [,3]
## [1,] 375.44 49.40 69.16
## [2,] 248.88 43.92 73.20
## [3,]  62.48  8.80 16.72
## [4,]  26.66  5.16 11.18
G2 = 2*sum(observed * log(observed/expected)) 
G2
## [1] 1.101727
X2 = sum((observed - expected)^2/expected)  
X2
## [1] 1.115243

Cumulative Logits

[Data] Party Data: Contingency table on Political Ideology and Party Affiliation

[Goal] Relates political ideology to political party affiliation.

ideology = rbind(c(1,1,80),
                 c(1,2,81),
                 c(1,3,171),
                 c(1,4,41),
                 c(1,5,55),
                 c(0,1,30),
                 c(0,2,46),
                 c(0,3,148),
                 c(0,4,84),
                 c(0,5,99))
colnames(ideology) = c("party","ideology","count")
ideology = as.data.frame(ideology) 
ideology
##    party ideology count
## 1      1        1    80
## 2      1        2    81
## 3      1        3   171
## 4      1        4    41
## 5      1        5    55
## 6      0        1    30
## 7      0        2    46
## 8      0        3   148
## 9      0        4    84
## 10     0        5    99
## mosaic plot
ideology.table = matrix(ideology$count, ncol=2, 
                         dimnames=list(ideology=c("1","2","3","4","5"), 
                                         party=c("1","0")))

library(vcd) # run install.packages("vcd") first
## Warning: package 'vcd' was built under R version 3.4.2
## Loading required package: grid
mosaic(ideology.table, shade=TRUE)

library(MASS)
fit = polr(factor(ideology) ~ party, weights=count, data=ideology)    
summary(fit)
## 
## Re-fitting to get Hessian
## Call:
## polr(formula = factor(ideology) ~ party, data = ideology, weights = count)
## 
## Coefficients:
##         Value Std. Error t value
## party -0.9745     0.1292  -7.545
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2  -2.4690   0.1318   -18.7363
## 2|3  -1.4745   0.1090   -13.5314
## 3|4   0.2371   0.0942     2.5165
## 4|5   1.0695   0.1039    10.2923
## 
## Residual Deviance: 2474.985 
## AIC: 2484.985

NOTE: The coefficient of party is the opposite of that from the SAS output. This is because the function polr is built upon the latent variable model. It models \(\alpha_j-\beta x\). The one from SAS is the one we need.