I'm conducting a multinomial logistic regression model using proc logistic
in SAS with around 3.6 million observations, an outcome with 5 levels, and dozens of categorical predictors. I had no issue running both univariate and multivariate models when setting param = ref
.
However, once I tried param = glm
, it started giving the warning message of "The information matrix is singular and thus the convergence is questionable. specifying a larger SINGULAR= value." in multivariate models. After doing some research, I found this message suggesting a multicollinearity issue in the model. I then tried to use only 2 predictors and it still gave the message while the correlation matrix showed no correlation between the two predictors.
As far as I know, the only difference of param = ref
and param = glm
is that param = glm
uses less-than-full-rank reference coding, meaning that it will create k-1
dummy variables given k
levels in the categorical predictor. These two parametrization methods should generate the same log-likelihood and estimates given the same reference level. To confirm this, I also compared the result of the two models using only 2 predictors. While param = glm
throwing a warning, the result is identical to param = ref
(Except a bunch of zeros in the estimates of reference levels for each predictor in param = glm
, is it the cause?).
My question is, why did the param = glm
model throwing a warning while param = ref
did not. And more importantly, in this situation, should I trust the result of the param = ref
even though no warning was displayed.
I appreciate any advice and suggestions. Thank you in advance.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745201178a4616333.html
评论列表(0条)