Blurb: Forward Sample Exploration in Soss

May 10, 2021

Over the past few months, we've made some big improvements to our tools for working with samples in Soss. This short post gives an update.

Say you have a linear model

using Soss
using MeasureTheory
using TupleVectors

m = @model n begin
    α ~ Normal()
    β ~ Normal()
    x ~ Normal() |> iid(n)
    σ ~ Exponential(λ=1)
    y ~ For(x) do xj
        Normal(α + β * xj, σ)
    end
    return y
end

You can "run" the model like this:

rand(m(3))
3-element Vector{Float64}:
 -1.3481528254815156
 -0.8467839966758355
 -3.2793581323670113

If you want more (iid) samples from the model, you can do

rand(m(3),5)
5-element ArraysOfArrays.ArrayOfSimilarArrays{Float64, 1, 1, 2, ElasticArrays.ElasticMatrix{Float64, 1, Vector{Float64}}}:
 [-0.5801870698412048, -0.7831790492917818, -3.694601328454099]
 [2.0979335431889803, 3.5630270546059557, 2.769652054248688]
 [-3.1107934379484066, -2.6920071046722596, -2.1705448322142686]
 [1.208651101972964, 1.7374231087644396, 2.2250207982499965]
 [1.4589264372159674, 1.2845073278482224, 1.3285317115334254]

Note the ArraysOfArrays. The underlying storage is a single Matrix. Still, this quickly becomes a wall of numbers that's hard to track. We'll be adding more tools to make rand output easy to work with (Note to self...). But for now, let's compare simulate:

simulate(m(3), 5)
5-element TupleVector with schema (trace = (x = Vector{Float64}, y = Vector{Float64}, α = Float64, β = Float64, σ = Float64), value = Vector{Float64})
(trace = (x = [-0.51±2.0, -0.58±1.3, -0.13±0.72], y = [-0.43±1.7, -0.54±1.0, 0.05±1.1], α = -0.1±0.72, β = -0.37±0.91, σ = 0.64±0.54), value = [-0.43±1.7, -0.54±1.0, 0.05±1.1])

Much better! Now we can easily do things like

mysim = simulate(m(3), 1000)
1000-element TupleVector with schema (trace = (x = Vector{Float64}, y = Vector{Float64}, α = Float64, β = Float64, σ = Float64), value = Vector{Float64})
(trace = (x = [0.02±0.95, 0.07±0.99, 0.0±0.97], y = [-0.06±1.9, 0.04±1.9, -0.0±2.0], α = -0.001±0.97, β = 0.02±1.0, σ = 1.0±1.0), value = [-0.06±1.9, 0.04±1.9, -0.0±2.0])

Just to be clear, what we're looking at here is a summary of 1000 samples. Here's one of them:

mysim[1]
(trace = (x = [0.30021947827881323, -1.7858459256694803, 1.077205113757541], y = [0.6374538286688411, -5.9348120120560885, 0.7044161798044803], α = -0.4563348174767442, β = 2.0661080442940025, σ = 0.8741722654925329), value = [0.6374538286688411, -5.9348120120560885, 0.7044161798044803])

It can be handy to make the names of a TupleVector available as variables, transform them, and then wrap the result into a new TupleVector. For that, we have @with. Here are the expected y values and studentized residuals

mytrace = mysim.trace
@with mytrace begin
    Ey = α .+ β .* x
    r = (y - Ey) / σ
    (;Ey, r)
end
1000-element TupleVector with schema (Ey = Vector{Float64}, r = Vector{Float64})
(Ey = [-0.0±1.4, -0.04±1.4, 0.0±1.3], r = [0.03±1.0, 0.06±0.96, 0.0±1.1])

We could even be a little fancy and do

@with mysim begin
    @with trace begin
        Ey = α .+ β .* x
        r = (y - Ey) / σ
        (;Ey, r)
    end
end
1000-element TupleVector with schema (Ey = Vector{Float64}, r = Vector{Float64})
(Ey = [-0.0±1.4, -0.04±1.4, 0.0±1.3], r = [0.03±1.0, 0.06±0.96, 0.0±1.1])

You can do this with posterior samples too! Let's save that one for next time.