Baseline-Category Logit
- Goodness of Fit Statistics
Cumulative Logits

Baseline-Category Logit

[Data] Alligator Data: In a study by the Florida Game and Fresh Water Fish Commission on the foods that alligators in the wild choose to eat, 59 alligators in Lake George, Florida, were sampled and the primary food type found in the alligators stomach was recorded along with the alligator length.

Response outcome: primary food choice (three levels: Fish, Inverterbrates, and Others)
One explanatory variable: alligator length (continuous)

[Goal] Study the relationship between the alligator length and primary food choice.

[Code]

alligator = read.csv("../data/alligator.csv",header=TRUE)

plot(food ~ length, data=alligator)

library(nnet)  # run install.packages("nnet") first

## Warning: package 'nnet' was built under R version 3.4.2

# set "Others" as the baseline; otherwise, the defualt is the first one "Fish". 

alligator$food <- relevel(alligator$food, ref = "O")

fit = multinom(food ~ length, data=alligator)

## # weights:  9 (4 variable)
## initial  value 64.818125 
## iter  10 value 49.170785
## final  value 49.170622 
## converged

summary(fit)

## Call:
## multinom(formula = food ~ length, data = alligator)
## 
## Coefficients:
##   (Intercept)     length
## F    1.617952 -0.1101836
## I    5.697543 -2.4654695
## 
## Std. Errors:
##   (Intercept)    length
## F    1.307291 0.5170838
## I    1.793820 0.8996485
## 
## Residual Deviance: 98.34124 
## AIC: 106.3412

Goodness of Fit Statistics

observed = matrix(c(371,250,64,25, 49,45,9,5, 74,71,15,13), ncol=3)
observed

##      [,1] [,2] [,3]
## [1,]  371   49   74
## [2,]  250   45   71
## [3,]   64    9   15
## [4,]   25    5   13

hat_pi = matrix(c(0.76,0.68,0.71,0.62, 0.10,0.12,0.10,0.12, 0.14,0.20,0.19,0.26),ncol=3)
hat_pi

##      [,1] [,2] [,3]
## [1,] 0.76 0.10 0.14
## [2,] 0.68 0.12 0.20
## [3,] 0.71 0.10 0.19
## [4,] 0.62 0.12 0.26

num.obs = rowSums(observed)     
num.obs

## [1] 494 366  88  43

expected = diag(num.obs) %*% hat_pi
expected

##        [,1]  [,2]  [,3]
## [1,] 375.44 49.40 69.16
## [2,] 248.88 43.92 73.20
## [3,]  62.48  8.80 16.72
## [4,]  26.66  5.16 11.18

G2 = 2*sum(observed * log(observed/expected)) 
G2

## [1] 1.101727

X2 = sum((observed - expected)^2/expected)  
X2

## [1] 1.115243

Cumulative Logits

[Data] Party Data: Contingency table on Political Ideology and Party Affiliation

Response variable: Political ideology has a five-point ordinal scale, ranging from very liberal to very conservative.
One explanatory variable: Party Affiliation (Indicator)

[Goal] Relates political ideology to political party affiliation.

ideology = rbind(c(1,1,80),
                 c(1,2,81),
                 c(1,3,171),
                 c(1,4,41),
                 c(1,5,55),
                 c(0,1,30),
                 c(0,2,46),
                 c(0,3,148),
                 c(0,4,84),
                 c(0,5,99))
colnames(ideology) = c("party","ideology","count")
ideology = as.data.frame(ideology) 
ideology

##    party ideology count
## 1      1        1    80
## 2      1        2    81
## 3      1        3   171
## 4      1        4    41
## 5      1        5    55
## 6      0        1    30
## 7      0        2    46
## 8      0        3   148
## 9      0        4    84
## 10     0        5    99

## mosaic plot
ideology.table = matrix(ideology$count, ncol=2, 
                         dimnames=list(ideology=c("1","2","3","4","5"), 
                                         party=c("1","0")))

library(vcd) # run install.packages("vcd") first

## Warning: package 'vcd' was built under R version 3.4.2

## Loading required package: grid

mosaic(ideology.table, shade=TRUE)

library(MASS)
fit = polr(factor(ideology) ~ party, weights=count, data=ideology)    
summary(fit)

## 
## Re-fitting to get Hessian

## Call:
## polr(formula = factor(ideology) ~ party, data = ideology, weights = count)
## 
## Coefficients:
##         Value Std. Error t value
## party -0.9745     0.1292  -7.545
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2  -2.4690   0.1318   -18.7363
## 2|3  -1.4745   0.1090   -13.5314
## 3|4   0.2371   0.0942     2.5165
## 4|5   1.0695   0.1039    10.2923
## 
## Residual Deviance: 2474.985 
## AIC: 2484.985

NOTE: The coefficient of party is the opposite of that from the SAS output. This is because the function polr is built upon the latent variable model. It models \(\alpha_j-\beta x\). The one from SAS is the one we need.

Multinomial Logit Regression

Baseline-Category Logit

Goodness of Fit Statistics

Cumulative Logits