Introduction

The presentation here is based on the following resources:

PPAML Summer School 2016.
Language intro (DIPPL).
Language intro (AgentModels).
Bayesian Conitive Modeling, A Practical Course; M. D. Lee and E-J. Wagenmakers, Cambridge University Press, 2013. (BCM Book)

Simple Start (Sum)

Goal: If U1 and U2 are two independent variables with uniform distribution over the integers 1, 2, and 3, what is the distribution of U1 + U2 ?



var main = function () {

  var u1 = uniformDraw([1,2,3])
  var u2 = uniformDraw([1,2,3])
  
  return u1 + u2
}

var dist = Infer(
  {method: 'enumerate'},
  main);


print("Pr(x=1) = " + Math.exp(dist.score(1)).toFixed(4))
print("Pr(x=2) = " + Math.exp(dist.score(2)).toFixed(4))
print("Pr(x=3) = " + Math.exp(dist.score(3)).toFixed(4))
print("Pr(x=4) = " + Math.exp(dist.score(4)).toFixed(4))
print("Pr(x=5) = " + Math.exp(dist.score(5)).toFixed(4))
print("Pr(x=6) = " + Math.exp(dist.score(6)).toFixed(4))

viz.table(dist);
viz.auto(dist);

display("Most likely value and its probability: ")
print(dist.MAP().val.toFixed(4) + " its Pr = " + (Math.exp(dist.MAP().score).toFixed(4)))


var variance = function(a) {
  var m = expectation(a);
  return expectation(a, function(x) { return Math.pow(x - m, 2) })
}

print("Mean: " + expectation(dist).toFixed(4))
display("Variance: " + variance(dist).toFixed(4))

Consider: What are the distributions of U1 - U2, U1 * U2, and U1 / U2? What changes if the distribution of U2 is uniformDraw([0,1,2])?

Consider: The distribution returns the log-probability of elements using dist.score(val). To get the probability, take the exponent. To print, convert to a fixed point value (here with four decimals).

Simple Start (Comparison)

Goal: If U1 and U2 are two independent variables with uniform distribution over the integers 1, 2, and 3, what is the distribution of U1 == U2 ?



var main = function () {

  var u1 = uniformDraw([1,2,3])
  var u2 = uniformDraw([1,2,3])
  
  return u1 == u2
}

var dist = Infer(
  {method: 'enumerate'},
  main);

viz.table(dist);
//viz.auto(dist);

Consider: How can you simplify the above program to draw only a single sample (from another distribution)?

Consider: How does the inference results change if we instead use this inference method: var dist = Infer( {method: 'MCMC', samples: 100}, main); (try running several times).

Consider: How much do the run-to-run differences change if we increase the number of samples to 1000 or 10000?

Several More Simple Models (AgentModels)

WebPPL is a functional subset of JavaScript, with several probabilistic constructs. To get accustomed with the syntax, read and try out the examples from this page.

Consider:

How to filter only numerical array elements? (e.g., this is useful for data cleaning)
What do the functions map, filter, and zip do? (How do they relate to loops?) See all built-in functions here.
How is recursion used to implement a geometric distribution?
See different ways to plot data in 1D or 2D.

Optionally, see also this introduction.

Another Simple Model

Goal: We have three independent coin flips, assuming they are fair. We observed at least two heads as the outcome. What is the (posterior) probability that the first coin flip resulted in a head? What is the posterior expected value of that flip?



var model = function() {
  var c1 = flip(0.5);
  var c2 = flip(0.5);
  var c3 = flip(0.5);

  condition (c1+c2+c3 >= 2)
  
  return c1;
}

var dist = Infer( {method: 'enumerate'}, model);

viz.table(dist)

print("Expected value: " + expectation(dist).toFixed(3))


viz.auto(Infer({method: 'MCMC', samples: 10000}, model));

Consider:How is the expectation of the Bernoulli variable computed? What is it equal to?

Consider:How much does the prediction change when we change the prior distribution of c1 to 0.6?

Consider:Write a function that prints the expectation for every change of the prior probability of c1 from 0.4 to 0.6. Hint: use the function 'map'.

Simple Rate Inference (Beta-Bernoulli)

Goal: Beta distribution is a common prior for the Bernoulli distribution -- because its support is [0,1], i.e., the range of the parameter of the Bernoulli distribution. We want to infer the underlying probability of the coin that is tossed three times:


  var main = function () {
  var theta = beta(1,1)
  
  var k1 = flip(theta)
  var k2 = flip(theta)
  var k3 = flip(theta)

  condition (k1+k2+k3 == 2)  
  
  return theta
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);
viz.auto(dist);

Consider: How would you modify the example to model tossing 5 coins? 7 coins?

Consider: Why we cannot use the inference method 'enumerate' for this example?

Consider: Try plotting the prior (beta(1,1)). You can do it by using the inference method 'forward'. What distribution does it look like? (see also discussion later about prior redictive and posterior redictive).

Simple Rate Inference (Beta-Binomial)

Goal: Infer the underlying success rate for a binary process (a binomial distribution represents a sum of independent coin tosses). We want to determine the posterior distribution of the rate theta.


  var main = function () {
  var n = 10 //100
  var kk = 5 //50
  var theta = beta(1,1)
  
  var k = binomial(theta, n)
  
  condition (k == kk)  
  
  return theta
}

var dist = Infer(
  {method: 'MCMC', samples: 10000},
  main);
viz.auto(dist);

Consider: Let us instead have n=100 examples instead of 10 and we observe kk = 50 heads. What changes in the posterior? What remains the same?

Difference Between Two Rates (BCM Book)

Goal: We want to compare two different binomial processes.



var main = function () {
  var n1 = 10
  var n2 = 10
  var theta1 = beta(1,1)
  var theta2 = beta(1,1)

  
  var k1 = binomial(theta1, n1)
  var k2 = binomial(theta2, n2)
  
  condition (k1 == 5)
  condition (k2 == 7)
  
  var delta = theta1 - theta2
  
  return delta
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);

viz.auto(dist);

Consider: What can we conclude about the individual processes and their difference? The code below may be helpful:


// same as before
var main = function () {
  var n1 = 10
  var n2 = 10
  var theta1 = beta(1,1)
  var theta2 = beta(1,1)

  
  var k1 = binomial(theta1, n1)
  var k2 = binomial(theta2, n2)
  
  condition (k1 == 5)
  condition (k2 == 7)
  
  var delta = theta1 - theta2
  
  return [theta1, theta2] // note: here we return the joint distribution
}

var dist1 = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.auto(dist1);
viz.marginals(dist1); // we can plot the marginals directly

Return to Simple Rate Inference (BCM Book)

Goal: infer the underlying success rate for a binary process. We want to determine the posterior distribution of the rate theta.

Show how to distinguish between:

Prior distribution over parameters -- describes our initial belief.
Prior predictive distribution -- describes what data to expect from the model given the prior beliefs.
Posterior distribution over parameters -- describes our belief about the parameter distribution updated after observing data.
Posterior predictive distribution -- describes what data to expect from the model given the updated beliefs.


  var main = function () {
  var n = 10 
  
  // prior: 
  var theta = beta(1,1)
  
  // posterior: 
  var k = binomial(theta, n)
  
  condition (k == 5)  

  // prior predictive: 
  var thetaprior = beta(1,1)
  var priorpredictive = binomial(thetaprior, n)

  // posterior predictive:
  var postpredictive = binomial(theta, n)

  return [theta, priorpredictive, postpredictive]

}

var dist = Infer(
  {method: 'MCMC', samples: 10000, burn: 1000, lag: 2},
  main);
viz.marginals(dist);

HMM (PPAML 2016)

Goal: Hidden Markov Model (HMM) is a well-know probabilistic model, often used in pattern recognition and reinforcement learning.

Its basic assumption is that the (unobservable) system is transitioning between its internal states. We can observe only noisy measurements. But, we want to infer the most likely state of the system. For example, you expect some message to arrive (e.g., 0-1-0), but it may be noisy, and sometimes you may get (0-1-1) or (0-0-0). Your goal is to get the most likely intended message from the observed message.

Below, the system can be in the state 'true' or 'false' (and transition between the two with probability 0.3). In other words, if the system is in the state 'true', it will remain in this state with probability 0.7 or go to 'false' with probability 0.3; if this system is in the state 'false', it will transition to 'true' with probability 0.3 or remain 'false' with probability 0.7. The system starts from the initial state 'false'


var trueObservations = [false, true, false];

// The state transition (unobserved):
// transition to the other state with the probability 0.3
var transprob = 0.3
var transition = function(s) {
  return s ? flip(1-transprob) : flip(transprob)
}

// Noisy observation:
var noiselvl = 0.1
var observe = function(s) { 
  var changeobservation = flip(noiselvl)
  return changeobservation ? !s : s 
}

var hmm = function(n) {
  var prev = (n > 1) ? // note recursion !!
    hmm(n - 1) :       // <---- here!
  	{states: [false], observations: []} // initial state

  // moving to the next state (the system itself is probabilistic):
  var newState = transition(prev.states[prev.states.length - 1]);

  // observe the noisy data
  var newObs = observe(newState);

  // return the trace of the states and the oboservations (we compute the probability of those) 
  return {
    states: prev.states.concat([newState]),
    observations: prev.observations.concat([newObs])
  };
}

// run hmm with the three observations
var hmmrunner =   function() {
    var r = hmm(trueObservations.length);
    map2(function(o,d) {condition(o===d)}, r.observations, trueObservations)
    return r.states;
}

var stateDist = Infer({method: 'enumerate'}, hmmrunner)

viz.table(stateDist)

Consider: How is this recursion unrolling? (Suggestion: write down the execution trace on a piece of paper)

Consider: What does the function map2 do?

Consider: What happens to the posterior distribution if we change the noiselvl to 0.01 or 0.05? What happens if the noiselvl increases to 0.2?

Consider: Do your conclusions change if you extend the list of true observations to e.g., 6-7 elements?

Consider: Can this approach scale to e.g., 20 observations? Try instead running the rejection sampling: {method: 'rejection', samples: 100}, Why does it succeed or fail? Would you suggest instead a different more efficient algorithm for solving this problem?