This might be a very dumb question, but is it valid to compare a 2 compartment model using DV as an input versus a TMDD model using LNDV? As in, are the delta OFV values comparable to determine which one is better? Or would I need to compare the 2c LNDV vs TMDD LNDV or 2c DV vs TMDD DV model?
You should compare models for the same response variable, either DV in both models or LNDV in both models (I am assuming that’s the natural logarithm of DV). A more detailed explanation is given below for why DV and LNDV models are not directly comparable.
When LNDV is assumed to be normally distributed, this is equivalent to assuming that DV is log-normally distributed with a caveat. When sampling/simulating, simulating from DV ~ LogNormal(..) is equivalent to simulating from LNDV ~ Normal(..) and then taking the exp of the output. However, when calculating the log probability of LNDV given its normal distribution, this log probability is not equal to the log probability of DV given its corresponding log-normal distribution.
Luckily, the difference between the 2 log probabilities is a constant in the parameters and only a function of the data DV which is constant during fitting. This means that fitting both models will give you the same parameter estimates at the end, assuming the rest of the model is the same obviously. In Pumas, FOCE supports Normal and LogNormal distributions so you can use either without a significant difference in performance.
However, the log likelihoods of the normal and log-normal models (using LNDV and DV as observations, respectively), and by extension their AICs and BICs, are not comparable. The following are the exact formulas for:
The log probability of a single observation log(x) given a normal distribution with mean \mu and standard deviation \sigma, and
So the difference is -\log (x) for each observation. In other words, if you fit your model using LNDV and a normal distribution, you need to subtract the sum (across all observations) of LNDV from the log likelihood. The result is the log likelihood assuming the log-normal distribution model for DV. This means you can then compare it to any other model which models DV directly.
Note that the issue is not the normal vs log normal distribution. The issue is the response variable being different. It is fine to compare a normal distribution model for DV with a log-normal distribution model for DV. But a normal distribution model for LNDV is equivalent to a log-normal distribution model for DV which is why we need to do the above correction before comparing it to any other model for DV. I hope this helps.
What I recommend is, if you can, to just make sure you use the same response variable in both models. If you want to model LNDV as a Normal distribution, use DV ~ @. LogNormal(..) in the derived block in Pumas instead. FOCE will still work and you won’t need any error-prone corrections.
Hi Mohamed, thank you for this very thorough and helpful explanation! I appreciate you explaining things in depth Your explanation makes perfect sense and the practical recommendation to use the same response variable in both models to make them directly comparable. Thank you!