As I mentioned on the landing, the issue I sought to solve (or at least understand) is why my plots of logistic regression over aggregated data sometimes looked awful. Unfortunately, the article that my teammate gave me on marginal standardization didn’t solve the main issue I was trying to fix. The difference between marginally standardized plots and plots with predictions at the mean were often similar and didn’t address the weirdness I frequently saw.
I’m a considerably better programmer than statistician so I think I’ll simulate a data set and do some experiments with different types of covariates. We’ll get a chance to try out this simstudy
library that I saw on a blog. Here’s the vignette for reference. The code for these experiments can be founds here.
For this experiment I’m going to create two outcome variables. One will be a binary response with linear log odds changes through time. The incidence changes from the low 40s to high 50s through the entire time period. The other will be a normally distributed variable with a mean that moves from about -0.5 to 0.5 and a variance of 4. I’ve created roughly 1,000 observations for each of the 100 time points. Here are the aggregated values for each of the 100 times points:
After creating the outcome and time variables, I created a lot of binary and normal variables to act as covariates for our models. The binary versions were all created in sets of low, medium, and high incidence. For the predictors that were correlated with time and/or an outcome, they were created in sets of 20 where I gradually increased the strength of the relationships by 5% each time. The types were:
Because I created the outcomes to be correlated with time, categories #2 & #3 are somewhat similar to the predictors in category #4. The difference is that the way I programmed this, the predictors that are listed as being only correlated with time or an outcome have a stronger, direct relationship.
A graph of the binary predictors that were of medium incidence and correlate with time only are depicted below: