Low-level probabilistic programming in Terra

Quicksand is a library for probabilistic programming in the Terra programming language. Since Terra is a low-level language, Quicksand programs have a level of abstraction comparable to that of C. Users can write these programs directly, or potentially use Quicksand as a compiler target for higher-level/domain-specific languages.

- Installation
- Simple Quicksand Programs
- Real Numbers
- Random Choices
- Likelihoods and Conditioning
- Functions and Loops
- Programs and Modules
- Inference
- Acknowledgments

You’ll first need to download and build Terra.

**Important: Quicksand works with with the 2016-02-26 release of Terra, but will not work with later versions (due to the switch from lazy to eager typechecking).** This project is not under active development, so there are currently no plans to overhaul Quicksand to make it compatible with future versions of Terra.

Then just add the following to your `.profile`

, `.bashrc`

, or equivalent file:

```
export QS=[path to quicksand root]
export TERRA_PATH="./?.t;./?/init.t;$QS/?.t;$QS/?/init.t;;"
```

To verify that everything’s working correctly, you can run the test suite and check that all tests pass:

```
terra tests/testsuite.t
```

Quicksand has been tested on OSX and should also work on Linux. On Windows, your mileage may vary.

Let’s take a look at a simple Quicksand program–a probabilistic “Hello, World,” if you will:

```
local qs = require("qs")
local p1 = qs.program(function()
return terra()
var a = qs.flip(0.5)
var b = qs.flip(0.5)
qs.condition(a or b)
return (a and b)
end
end)
```

The probabilistic program `p1`

draws two random values via unbiased coin flips (the 0.5 means “50% chance of coming up true”) and returns the logical AND of those two values, subject to the constraint that at least one of them is true.

Here’s a slightly more complex (and useful) example: estimating parameters of a Gaussian mixture model from data:

```
local qs = require("qs")
local std = require("qs.lib.std")
local distrib = require("qs.distrib")
local p2 = qs.program(function()
-- Populate a global vector with synthetic data
local data = global(std.Vector(qs.real))
local norm = distrib.gaussian(qs.real).sample
local terra initdata()
data:init()
for i=0,400 do
data:insert(norm(-6.0, 1.0))
end
for i=0,200 do
data:insert(norm(0.0, 1.0))
end
for i=0,100 do
data:insert(norm(5.0, 1.0))
end
end
initdata()
-- Here's our mixture model
return terra()
var mixparams =
qs.dirichlet(arrayof(qs.real, 1.0, 1.0, 1.0))
var means =
arrayof(qs.real, qs.gaussian(0.0, 5.0),
qs.gaussian(0.0, 5.0),
qs.gaussian(0.0, 5.0))
for d in data do
var which = qs.categorical(mixparams)
qs.gaussian.observe(d, means[which], 1.0)
end
return mixparams, means
end
end)
-- Query the model for the parameters with highest
-- posterior probability.
local infer =
qs.infer(p2,
qs.MAP,
qs.MCMC(qs.TraceMHKernel(),
{numsamps=50000, verbose=true}))
local terra run()
var mixparams, means = infer()
std.printf("mixparams: %g %g %g\n",
mixparams[0], mixparams[1], mixparams[2])
std.printf("means: %g %g %g\n",
means[0], means[1], means[2])
end
run()
```

Here, the program `p2`

draws mixture parameters and mixture component means from Dirichlet and Gaussian priors, respectively. For each data point, it then selects which mixture component the data point was drawn from (a latent variable), and then accounts for the probability of drawing that data point from that mixture component.

If you run this code, you should see output like the following:

```
Iteration 50000/50000
Acceptance Ratio: 23603/50000 (47.206%)
Time: 2.753325
mixparams: 0.123469 0.348266 0.528265
means: 5.21575 -0.0444654 -5.95487
```

That’s not a bad guess, computer. Not bad at all.

You can find more examples of probabilistic programs over at the Forest respository for generative models. Most of these programs are written in Church (a LISP-based probabilistic programming language), but nearly all of the same principles apply. The online Probabilistic Models of Cognition course is also a great source for more examples and explanation.

In the rest of this document, we’ll describe the parts that go into building Quicksand programs, as well as the procedures available for performing inference on them.

Within a probabilistic program, any real (i.e. floating point) number used should have the type `qs.real`

. This defaults to `double`

, but can be changed by modifying the value of `qs.float`

in `qs/globals.t`

. Inference methods can also change the value of `qs.real`

(e.g. Hamiltonian Monte Carlo).

Random choices are what make probabilistic programs probabilistic. A program makes some number of random choices during its execution, and inference methods then manipulate the values of those random choices to explore the space of possible program executions.

Quicksand has a number of built-in random choice functions:

`qs.flip(p)`

Flip a coin with weight `p`

(i.e. a Bernoulli distribuion).

`qs.uniform(lo, hi)`

A number uniformly distributed between `lo`

and `hi`

.

`qs.gaussian(m, sd)`

A number normally distributed with mean `m`

and standard deviation `sd`

.

`qs.gamma(k, theta)`

Draw from a gamma distribution with shape parameter `k`

and scale parameter `theta`

.

`qs.gammamv(m, v)`

Draw from a gamma distribution with mean `m`

and variance `v`

. This is just syntax sugar on top of `qs.gammma`

.

`qs.beta(a, b)`

Draw from a beta distribution with shape parameters `a`

and `b`

.

`qs.betamv(m, v)`

Draw from a beta distribution with mean `m`

and variance `v`

. This is just syntax sugar on top of `qs.beta`

.

`qs.binomial(p, n)`

Draw from a binomial distribution with probability `p`

and number of trials `n`

.

`qs.poisson(lambda)`

Draw from a poisson distribution with average rate `lambda`

.

`qs.categorical(params)`

Draw from a categorical distribution. `params`

may be either an array or a vector of real numbers.

`qs.dirichlet(params)`

Draw from a dirichlet distribution. `params`

may be either an array or a vector of real numbers.

In addition to their native parameters, every random choice can also take a set of options, specified as a struct literal:

```
qs.gaussian(0.0, 1.0, {init=0.0, struc=false})
```

`init`

Specifies the initial value of the random choice for MCMC inference.

`struc`

Declares whether this random choice can affect the *structure* of the program execution trace–more specifically, whether the value of this random choice can determine the existence of other random choices. This must be a boolean compile-time constant (`true`

, `false`

, or a Lua boolean variable), and it defaults to `true`

for all choices unless otherwise specified. In principle, whether a choice is structural can be determined automatically through program analysis. To keep the system as simple as possible, though, Quicksand requires manual specification. It is usually not difficult to identify when a choice is non-structural. In fact, most random choices are, except for some latent variables in hierarchical or recursive models. For example, in this toy program:

```
qs.program(function()
return terra()
var a = 0.7
if qs.flip(0.9) then
a = qs.beta(1.0, 5.0, {struc=false})
end
var b = qs.flip(a, {struc=false})
qs.condition(b)
return a
end
end)
```

the first `qs.flip`

is structural, because it determines whether or not the `qs.beta`

gets called. However, the `qs.beta`

and the second `qs.flip`

are non structural, since they don’t affect the creation of any further random choices.

Why bother with this? Annotating your random choices with `struc`

tags will make most inference methods run noticeably faster. Additionally, many inference methods operate only on non-structural choices, so you must tag your random choices in order to use them.

You can define your own primitive random choice functions using `makeRandomChoice`

in `qs/erp.t`

. See `qs/erpdefs.t`

for usage examples. The interface to random choice creation is subject to change, so it is not documented in any further detail here.

A probabilistic program is only interesting if we introduce some condition or constraint on its execution–otherwise, we could just run it forward to obtain samples from the distribution it defines.

Every random choice comes equipped with a method to observe (or condition on) a value, instead of drawing one randomly. For example:

```
qs.gaussian.observe(0.2, 0.0, 1.0)
```

This adjusts the probability of the current program execution according to the probability of observing the value `0.2`

draw from a standard normal distribution.

Imposes a hard constraint on a program’s execution:

```
var x = qs.beta(2.0, 2.0)
qs.condition(x > 0.5)
```

This will cause all program executions with `x <= 0.5`

to be rejected. Imposing many or difficult-to-satisfy constraints in this way can cause inference to become inefficient, however.

Automatically applies the return value of a function as a condition:

```
local p = qs.program(function()
local cf = qs.conditionfunc(terra(x: qs.real)
return x > 0.5
end)
return terra()
var x = qs.beta(2.0, 2.0)
cf(x)
end
end)
```

While overkill for the example above, this can be useful for encapsulating conditions that require extensive computation to compute. Additionally, some inference methods can sometimes skip executing these functions when it is not necessary for their operation, which can be a computation savings.

Directly adds to the log probability of the current program execution:

```
var x = qs.flip(0.5)
var y = qs.flip(0.5)
if x == y then qs.factor(100.0) end
if x ~= y then qs.factor(-100.0) end
```

Useful for implementing undirected models such as Ising models and Markov Random Fields.

Is to `qs.factor`

what `qs.conditionfunc`

is to `qs.condition`

. The same addendum about potential computation savings also applies here.

Function calls and loops can affect the control flow structure of a program, so Quicksand provides some constructs to help inform inference methods about this structure.

All Terra functions that make random choices (or whose callees make random choices) should be wrapped with `qs.func`

:

```
local f = qs.func(terra(m: qs.real, sd: qs.real)
return qs.gaussian(m, sd)
end)
```

Slightly different syntax is necessary for recursive functions:

```
local g = qs.func()
g:define(terra(p: qs.real)
if qs.flip(p) then
return 0.0
else
return 1.0 + g(p)
end
end)
```

Note that it is entirely possible to use unwrapped Terra functions in your probabilistic programs: Quicksand will not warn you about this (and in fact has no mechanism to do so). Inference will always be correct regardless, but using `qs.func`

will make it more efficient for programs that exhibit structure change (and especially for recursive programs).

A variant of `qs.func`

for wrapping struct methods:

```
local struct Blob { val: qs.real }
Blob.methods.sampleVal = qs.method(terra(self: &Blob)
self.val = qs.uniform(0.0, 1.0) end
end)
```

Provides an iterable range of integers:

```
var sum = 0
for i in qs.range(0, 10) do
sum = sum + int(qs.flip(0.5))
end
```

Other looping constructs, such as native for loops, while loops, or other iterator-based loops are also perfectly fine to use with Quicksand. However, `qs.range`

tracks information about loop nesting, so using it allows more efficient inference in e.g. programs that use nested loops where the number of loop iterations are derived from random choices.

Quicksand has constructs for separating probabilistic code (i.e. code that makes random choices) from vanilla Terra code.

Defines a probabilistic program. This is the object that inference methods operate on. It takes one argument, a Lua function, and that Lua function should return a no-argument Terra function–the ‘main’ function of the program, essentially:

```
local p = qs.program(function()
-- Define helper functions, types, etc.
return terra()
-- Make random choices, invoke other functions, etc.
return [something]
end
end)
```

Any code that makes random choices (functions, struct methods) **must** be defined inside of a `qs.program`

. Code within a probabilistic program goes through a special compilation process. Attempting to define probabilistic code outside of probabilistic program results in undefined behavior (most likely manifesting as inscrutable error messages).

For large projects, it can be useful to develop probabilistic programs modularly, instead of putting all of your code in one giant, monolithic `qs.program`

. `qs.module`

allows you to do this:

```
local mod = qs.module(function()
return
{
dirichletCategorical3 = qs.func(terra()
return qs.categorical(qs.dirichlet(array(1.0, 1.0, 1.0)))
end)
}
end)
```

The argument to `qs.module`

is a Lua function that can return anything (but typically a Lua table of functions and types). To use this module in a probabilistic program, it has to be “opened” with the `:open`

method:

```
local p = qs.program(function()
local m = mod:open()
return terra()
var items = array(10, 22, 30)
var choices : int[10]
for i=0,10 do
choices[i] = m.dirichletCategorical3()
end
qs.condition(choices[0] == 22)
return choices
end
end)
```

Additionally, you might have some code defined inside a probabilistic program that you’d also like to use outside of it. For instance, the return value of your program is a struct with some probabilistic methods, and you’d like to do some processing on it. `qs.module`

facilitates this use case, too, with the `:openAs`

method:

```
local mod = qs.module(function()
local struct Vec { data: qs.real[2] }
Vec.methods.sampleRand = qs.method(terra(self: &Vec)
self.data[0] = qs.gaussian(0.0, 1.0, {struc=false})
self.data[1] = qs.gaussian(0.0, 1.0, {struc=false})
end)
terra Vec:normSq()
return self.data[0]*self.data[0] +
self.data[1]*self.data[1]
end
return { Vec = Vec }
end)
local p = qs.program(function()
local m = mod:open()
return terra()
var vecs : m.Vec[10]
for i=0,10 do
vecs[i]:sampleRand()
qs.condition(vecs[i]:normSq() > 0.5)
end
return vecs
end
end)
-- Function for processing the return value
local m = mod:openAs(p)
local terra totalNorm(vecs: m.Vec[10])
var sum = vecs[0]:normSq()
for i=1,10 do sum = sum + vecs[i]:normSq() end
return sum
end
```

`:openAs`

is necessary because probabilistic code compiles differently depending on the program that is using it. In fact, it is an error to use `:open`

outside of a `qs.program`

.

Finally, we get to the real heart of Quicksand–performing inference on programs.

This is the entrypoint for all inference in Quicksand. It takes three arguments: a probabilistic program, an inference query to perform, and a method to use to answer that query. For example:

```
local p = qs.program(function()
return terra()
return qs.gaussian(0.0, 0.2, {struc=false})
end
end)
-- qs.infer(program, query, method)
local samples = qs.infer(p, qs.Samples, qs.ForwardSample(1000))()
```

`qs.infer`

returns a no-argument Terra function, which, when invoked, performs the requested inference (hence the extra set of parens at the end of the last line above).

Next, we’ll cover the interface to method and query types, including those that are built-in to Quicksand.

Inference in Quicksand is based on sampling, so inference *methods* are procedures that describe how to obtain samples from a probablistic program. Specifically, a method is a Lua function that takes a `qs.program`

and return a Terra function that fills in a vector of samples (see `qs/infer.t`

for precise details). Quicksand comes with several built-in methods:

Draw samples by simply running the program forward repeatedly. This ignores any likelihood or conditioning statements. Useful for testing and debugging, but not a whole lot else. Takes one argument: the number of samples to draw (e.g. `qs.ForwardSample(1000)`

).

Draw samples by forward sampling, but reject any samples that do not satisfy conditioning statements and weight all samples by any likelihood statements. This is the simplest method that respects the full semantics of a probabilistic program, but it’s often not very efficient. Useful mostly as a baseline for comparison / sanity checking. Also takes one argument: the number of samples to draw (e.g. `qs.WeightedRejectionSample(1000)`

).

Draw samples by Markov Chain Monte Carlo. This is the go-to method for inference in most cases. It takes two arguments: an MCMC transition kernel, and a table of other parameters:

```
local p = qs.program(function()
return terra()
var sites : bool[100]
for i=0,100 do sites[i] = qs.flip(0.5, {struc=false}) end
for i=0,99 do
if sites[i] == sites[i+1] then
qs.factor(100.0)
else
qs.factor(-100.0)
end
end
return sites
end
end)
-- qs.MCMC(kernel, params)
local samples = qs.infer(p, qs.Samples,
qs.MCMC(qs.TraceMHKernel(),
{numsamps=1000, lag=1, verbose=true}))
```

`numsamps`

How many samples to draw. Defaults to 1000.

`lag`

How many iterations of MCMC should be performed between recorded samples. Increasing this value decreases autocorrelation between samples but increases computation time. Defaults to 1.

`verbose`

Whether Quicksand should write out MCMC analytics. Can be `true`

, `false`

, or a C-style `stdio.h`

file handle (i.e. a `FILE*`

). Defaults to `true`

, in which case Quicksand writes to `stdout`

.

The actual legwork of MCMC is done by *transition kernels*, which describe how to take a program execution trace and randomly modify some of its random choices such that the stationary distribution defined by doing this over and over again is the same as the distribution defined by the program. You can define your own transition kernels (see `qs/mcmc.t`

for the inferface and examples), but Quicksand provides several already that cover many use cases:

`qs.TraceMHKernel({doStruct, doNonstruct})`

The workhorse of probabilistic programming inference. This kernel implements the MCMC trace sampler described in Algorithm 2 of this paper. It will work on any probabilistic program, though it is not necessarily the most efficient kernel. This is often a good kernel to start with. The optional parameters `doStruct`

and `doNonstruct`

specify whether the kernel should operate on structural or non-structural random choices. They both default to `true`

(i.e. operating on all random choices)

`qs.LARJKernel({annealKernel, intervals, stepsPerInterval})`

Implements the Locally-Annealed Reversible-Jump MCMC algorithm. This kernel is particularly useful for structure-changing programs where structure changes are persistently rejected using `qs.TraceMHKernel`

. The `annealKernel`

option should specify a kernel that operates on only nonstructural choices–this is the kernel applied during the annealing transition phase of the algorithm (defaults to `qs.TraceMHKernel({doStruct=false})`

). The `intervals`

option specifies how many discrete interpolation intervals the algorithm should use in interpolating the old structure to the new structure (defaults to `0`

, i.e. vanilla reversible-jump MCMC), and `stepsPerInterval`

specifies the number of annealing kernel applications to use for at each interval (defaults to `1`

).

`qs.HARMKernel({scale, doScaleAdapt})`

Implements the Hit And Run Metropolis algorithm. This kernel is only applicable to non-structural random choices with continuous values. It is very computationally efficient, updating all non-structural choices while requiring only one program evaluation. The option `scale`

determines the largest possible jump the kernel will take (defaults to `1.0`

), and the ‘doScaleAdapt’ flag determines whether the kernel automatically adapts its scale to achieve a target optimal acceptance ratio (defaults to `true`

).

`qs.DriftKernel({scale, doScaleAdapt, lexicalScaleSharing})`

Implements the Random Walk Metropolis algorithm using gaussian drift proposals with a diagonal covariance matrix. This kernel is only applicable to non-structural random choices with continuous values. Like `qs.HARMKernel`

, this kernel also updates all non-structural chioces with only one program evaluation. The `scale`

option specifies the standard deviation for gaussian proposals (defaults to `1.0`

). The `doScaleAdapt`

flag, when `true`

, causes the kernel to automatically adapt individual scales for every random choice (defaults to `true`

). If the `lexicalScaleSharing`

flag is `true`

, then all random choices originating from the same source code location in the program will share one adapted proposal scale (defaults to `false`

). In some models that iteratively or recursively make many of the same “type” of random choice, this statistical sharing can help the kernel find better proposals more quickly (especially for new random choices introduced by structure change).

`qs.HMCKernel({stepSize, numSteps, doStepSizeAdapt})`

Implements the Hamiltonian Monte Carlo algorithm. This kernel is only applicable to non-structural random choices with continuous values. It is particularly useful for cases where multiple random choices are tightly constrained via factor or observe statements. The `stepSize`

option specifies the step size for the leapfrog integrator used to generate proposals (defaults to `1.0`

). The `numSteps`

option specifies how many leapfrog steps should be used per proposal (defaults to `1`

, i.e. Langevin Monte Carlo). The `doStepSizeAdapt`

flag determines whether the kernel will automatically adapt its step size to achieve a target optimal acceptance ratio (defaults to `true`

). Step size adaptation uses the algorithm described in this paper. This kernel is more expensive per proposal than `qs.HARMKernel`

or `qs.DriftKernel`

, but it can still have much higher statistical efficiency for highly-constrained models.

`qs.MixtureKernel(kernels, weights)`

Constructs a kernel that stochastically alternates between multiple different kernels. For example, to make a kernel that uses `qs.LARJKernel`

to change structural variables 10% of the time and uses `qs.TraceMHKernel`

to change nonstructural variables otherwise:

```
qs.MixtureKernel({qs.LARJKernel({intervals=20}),
qs.TraceMHKernel({doStruct=false})},
{0.1, 0.9})
```

`qs.AnnealingKernel(kernel, annealingFn)`

Performs simulated annealing by adjusting the *temperature* of the program (i.e. a scaling factor on its log probability) as MCMC runs. For example, a simple linear annealing schedule could be defined like:

```
local maxTemp = 100
local minTemp = 0.1
qs.AnnealingKernel(qs.TraceMHKernel(),
terra(currIter: int, numIters: int)
var t = currIter/double(numIters)
return (1.0-t)*maxTemp + t*minTemp
end)
```

Note that under simulated annealing, samples collected by MCMC are no longer representative of the true distribution defined by the program. They are still perfectly useful for finding modes (i.e. maxima), however, using e.g. the `qs.MAP`

query defined in the next section.

An inference *query* describes how you want `qs.infer`

to interpret the samples it collects using one of the above methods. Again, you can author your own queries (see `qs/infer.t`

), but Quicksand provides several common ones:

Just return the samples directly, leaving it up to you to process them further. You are responsible for freeing the memory of the returned vector (by calling `:destruct`

on it).

Compute the exepcted value of the program by averaging the samples. If the return type of the program is a struct type, then it must overload several operators: binary `+`

and `-`

, binary `*`

(this should be an “inner product”), and multiplication and division by a real number. This query takes an optional boolean argument that, when `true`

, will cause it to also compute and return the variance of the program:

```
local mean, variance = qs.infer(p, qs.Expectation(true), [method])
```

Returns the sample with the largest probability. If this is a struct type, then you are responsible for freeing its memory.

Computes the autocorrelation of the samples and returns it in a vector (you’re responsible for its memory). You can optionally pass in “true” mean and variance values; otherwise, it will use the mean and variance of the samples themselves.

Returns a normalized histogram of program return values. This only makes sense for programs with discrete return values. The histogram is stored in a hash map (see `qs/lib/hashmap.t`

); again, you’re responsible for freeing its memory. If the return type of the program is a struct, then it must have the `__hash`

method defined.

Quicksand is being developed as part of the DARPA Probabilistic Programming for Advanced Machine Learning (PPAML) program. This material is based on research sponsored by DARPA under agreement number FA8750-14-2-0009. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.