Custom distribution / logpdf / error model with Pumas

Hello all,

I’m trying to build out a custom model to use in Pumas and am wondering if it is possible. I’m ultimately trying to build out a custom longitudinal Graded Response Model. First however I’m just trying out a custom static Graded response model (GRM). The logpdf for a GRM is relatively easy to code (see below). The difficulty I’m having is in figuring out how to get it into the Pumas NLME estimation framework (FOCE). I have this all working in a Turing.jl estimation framework and am now trying to port things over to Pumas. For a little background, a GRM predicts the discrete probabilities for a collection of ordinal (polytomous) domain scores and estimates a latent trait parameter for each individual (theta, this has IIV, hence NLME) as well as some domain parameters (no IIV).

This is a simple model with a simple logpdf. I have this running in Turing with @addlogprob! and have also built a custom GRM distribution within the Distributions.jl type system (along with rand, pdf, and logpdf functions defined on it). However, when I try to use this in Pumas, I get an error saying this distribution is (not supported for this approximation method). To be clear, Turing.jl is ok with this distribution, so it is not a construction issue. I’m assuming Pumas only works with a subset of Distributions.jl.

Is there any way to get a custom probabilistic model into a Pumas model??? Either a way to get it to accept the custom distribution or some equivalent of @addlogprob!?

struct GradedResponseDist{T<:Real, V<:AbstractVector{<:Real}} <: DiscreteUnivariateDistribution
    θ::T
    b::V
    function GradedResponseDist(θ::T, b::V) where {T<:Real, V<:AbstractVector{<:Real}}
        length(b) ≥ 1 || throw(ArgumentError("b must have ≥ 1 threshold"))
        @inbounds for k in 1:length(b)-1
            if !(b[k] < b[k+1])
                println("b vector: ", b)
                throw(DomainError(b, "thresholds must be strictly increasing"))
            end
        end
        new{T, V}(θ, b)
    end
end

# log-mass with stable difference of sigmoids
function logpdf(d::GradedResponseDist, y::Int)
    insupport(d, y) || return -Inf
    θ, b = d.θ, d.b
    K = length(b)

    if y == 0
        # log(1 - σ(x)) = -softplus(x)
        return -softplus(θ - b[1])
    elseif y == K
        # log(σ(x)) = -softplus(-x)
        return -softplus(-(θ - b[K]))
    else
        x1 = θ - b[y]
        x2 = θ - b[y + 1]
        la = -softplus(-x1)              # log σ(x1)
        lb = -softplus(-x2)              # log σ(x2)
        return la + log1mexp(lb - la)    # log(σ(x1) - σ(x2))
    end
end

# This derived block does not work. Full model not provided for brevity and the error is coming from the distribution anyway.
@derived begin
            D1 ~ @. GradedResponseDist(θ, b1)
            D2 ~ @. GradedResponseDist(θ, b2)
            D3 ~ @. GradedResponseDist(θ, b3)
            D4 ~ @. GradedResponseDist(θ, b4)
            D5 ~ @. GradedResponseDist(θ, b5)
            D6 ~ @. GradedResponseDist(θ, b6)
            D7 ~ @. GradedResponseDist(θ, b7)
        end

Long story short: please try with LaplaceI instead of FOCE.

The longer story si that FOCE requires that you are able to compute the Hessian approximation \mathrm{E}\left(\frac{\partial l(\theta,\eta)}{\partial \eta}\frac{\partial l(\theta,\eta)}{\partial \eta}^T | \eta \right) for your custom error model. This expectations needs to be provided for each supported distribution and since your custom distribution is brand new, there isn’t a matching definition for the Hessian approximation and you receive the error message. In contrast, LaplaceI relies on the actual Hessian instead of the approximation. Of course the problem with LaplaceI is that is sometimes collapses. We have plans for making the FOCE support extensible but it will require some work so we a not sure when it will be ready.

Thankyou for the response. That worked! I think you answered a previous question about MixtureModels, and LaplaceI was required there to but I didn’t realize why.

After digging around, a possibly better option in this case is may be to generate probabilities as numbers and then construct a standard Categorical for them.

I do have one follow up question that seems to be related to internals of Pumas. When modeling item response theory, you essentially have J questions (unique DV variables), each of which you form a categorical for. In theory, a lot of this should be able to be wrapped in a map statement (no loop of course to avoid mutation). However, when I try to formulate a Pumas model like this, I keep getting BoundsError.

For example, in the code, I can construct p1…p3 in the @pre block and then access them in the derived block no problem. But if I attempt to construct the vector [p1, p2, p3] and then index into it in the derived block, I end up getting a BoundsError. Is there a way to do something like this?

That is, if I have J different observed variables (J DVs that is) of the same type (that can reasonably be treated the same and looped over), and I need the model to predict all of them, is there way to subsume some of this into loops? I’ve tried map and @eval approaches and both end up throwing errors. I’m assuming there are some things happening on the parsing side that I just don’t understand.


grm_model = @model begin
    @param begin
        # Vector of first thresholds by item.
        b0 ∈ VectorDomain(3; init = 0.0)              # per-item base

        # θ related parameters
        μθ ∈ RealDomain(; init = 0.0)
        σ_θ ∈ RealDomain(; init = 1.0, lower = 0.0)             # Var(η₁); fixed below for identifiability

        # Threshold increments per item
        δ1 ∈ VectorDomain(Kj[1]-1, init = 0.2, lower = 0.0)
        δ2 ∈ VectorDomain(Kj[2]-1, init = 0.2, lower = 0.0)
        δ3 ∈ VectorDomain(Kj[3]-1, init = 0.2, lower = 0.0)
    end

    @random begin
        η ~ Normal(0.0, σ_θ)    # one random effect for θ
    end

    @pre begin
        θ = μθ + η      # subject-specific latent trait

        deltas = [δ1, δ2, δ3]

        # Build threshold vectors for each item
        thresh_vec_of_vecs = map(i -> build_thresholds(b0[i], deltas[i], Kj[i]), 1:J)

        # Working code
        p1 = compute_item_probs(thresh_vec_of_vecs[1], θ)
        p2 = compute_item_probs(thresh_vec_of_vecs[2], θ)
        p3 = compute_item_probs(thresh_vec_of_vecs[3], θ)

        ### Not working code ###
        # p_all = map(i -> compute_item_probs(thresh_vec_of_vecs[i], θ), 1:J)


    end

    # ---------- Explicit @derived for the 3 baseline items ----------
    @derived begin

        # Working code
        D1 ~ @. Pumas.Categorical(p1)
        D2 ~ @. Pumas.Categorical(p2)
        D3 ~ @. Pumas.Categorical(p3)

        ### Not working code ###
        # p_all = map(i -> compute_item_probs(thresh_vec_of_vecs[i], θ), 1:J) # Tried this line in both pre and derived, not both.
        # D1 ~ @. Pumas.Categorical(p_all[1])
        # D2 ~ @. Pumas.Categorical(p_all[2])
        # D3 ~ @. Pumas.Categorical(p_all[3])
    end
end

A key difference between `@pre` and `@derived` is the handling of time.

In `@pre` we evaluate everything as a function of time. The whole block is written under the assumption that some time point has been set and now you can write some code to combine different parameters (of course it’s all written for a generic time point `t` which is also the reserved keyword you can use in the block).

`@derived` works differently. Each variable, even those with identical names to variables defined in other blocks are to be interpreted as vectors of the same dimension as the subject time vector. So of `sub.time == [0.0, 1.0, 2.0]` then any variable referred to in `@derived` can be indexed `var[1]`, `var[2]`, `var[3]` to get `var` (defined in, say, `@pre`). at time point `0.0`, `1.0`, `2.0` respectively.

So in your pre block you can defined `p_all` to be a vector at some time point for all J (generic). number of categories but when you get to the derived block you will find that `p_all` is now a `T` dimensional vector of `J` dimensional vectors, set of probabilities for each observed time point.

We can discuss the optimal way to write a IRT model but focusing on the issue at hand, this means that
```
D1 ~ @. Pumas.Categorical(p_all[1])
D2 ~ @. Pumas.Categorical(p_all[2])
D3 ~ @. Pumas.Categorical(p_all[3])
```
does not work, because p_all[1] is a vector at the first time point, p[2] at the second time point, etc, and my guess is that you only have one time point? So the second and third lines will have bounds errors.

Instead, you could write
```
D1 ~ map(p->Pumas.Categorical(p[1], p_all)
D2 ~ map(p->Pumas.Categorical(p[2], p_all)
D3 ~ map(p->Pumas.Categorical(p[3], p_all)
```
and I believe that should “work”. We can discuss how you should write these models, but I thought it was an important aspect to understand.