Refresh your understanding of Bayesian inference. Work through a few manual review exercises that also communicate the inspiration behind the development of probabilistic programming languages and systems. Develop some familiarity with existing probabilistic programming languages and the relative ease of posing and solving Bayesian inference problems relative to the manual approach. This homework is long but will be “easy” intellectually if you have the appropriate level of probabilistic reasoning fluency and practical coding ability required to complete the course efficiently. Many have succeeded even when they found this first homework intellectually difficult or time-consuming from a coding perspective, however, be warned that the level of understanding required to really get what is going on rises very quickly, and the intellectual complexity of the coding tasks does as well.
Please write up your answers using LaTeX and generate a .pdf
. Include in this submitted report clickable \url{}
links to and Weights and Biases (“wandb”) reports for the questions that demand it (3, 5, and 6). Please submit your homework to gradescope using the course code and instructions distributed in class. The homework is due at midnight PST on the evening of the indicated due date.
1) (2 points) Show that the Gamma distribution is conjugate to the Poisson distribution.
2) (2 points) Show that the Gibbs transition operator satisfies the detailed balance equation and as such can be interpreted as an MH transition operator that always accepts.
3) (10 points) Write code to compute the probability three ways that it is cloudy given that we observe that the grass is wet using this Bayes net model.
Start from the following Python support code
Subsequent homework will assume familiarity and make extensive use of both PyTorch and Weights and Biases (“wandb”). Ensure that you sign up for Weights and Biases and are added to the cs532-2022
wandb team (send your wandb username or the email you used to sign up for wandb to the course wandb slack channel). For all subsequent homework, the hand-in results will take the form of wandb reports. Make such a report for the results of this
homework question and put a clickable \url{}
link to it in your LaTeX formatted hand-in.
4) (10 points) Consider the Bayesian linear regression model discussed in the lecture on graphical models
It has joint
likelihood
and prior
Show and derive the updates required to
return
expression of probabilistic programs can be, for instance, a predicted quantity. This will be a posterior predictive quantity due to probabilistic programming semantics and you should, here, think about the Rao-Blackwellized integral that is be analytically computed here and why this is more efficient than, for instance, the sampling methods above5) (10 points) Refer to HW3, Program 2. This program is
written in the FOPPL syntax from the book and corresponds to a scalar version of question 4 of this homework. Starting from the Python scaffolding
below, translate this FOPPL probabilistic program into Pyro and generate samples from the
denoted joint posterior of slope
and bias
and the posterior predictive
distribution of the
model output (second “dimension” of data
) given a new input value 0.0
. Please write one or two sentences
describing this model and applications and application settings in which such a posterior predictive distribution
might be useful.
Start from the following Python support code
Note that this model is the same as the model in Q4 of this homework, in which we asked you to do manual inference algorithm design and derivation. Write a sentence or two in your response to this question about which approach to solving this problem is easier: Pyro or “by hand.” Write an additional sentence or two about potential disadvantages of the Pyro approach.
6) (OPTIONAL) (10 points; extra credit) Refer to HW2, Program 5, the “Bayesian neural network.” This program is
written in the FOPPL syntax from the book. Starting from the Python scaffolding
below, translate this FOPPL probabilistic program into STAN and generate samples from the
denoted posterior of the network weights and the posterior predictive distribution of the
network output given a new input value 6
. Please write one or two sentences
describing this model and reasons for why and applications in which such a posterior predictive distribution
might be useful (hint: compare and contrast to simply fitting the neural network to
the provided data and using the maximum likelihood model parameter estimate for prediction).
Start from the following Python support code