For those who weren't in class - this week we discussed some of the problems people were having with the Bayesian inference section of the assignment and went through getting started in Tensorflow. The code for the Tensorflow tutorial is below...
import tensorflow as tf
import numpy as np
There are loads of good tensorflow tutorials online, so I'm not going to try cover everything but rather focus on the bare essentials to get you started. The key idea to using tensorflow is that we build computational graphs that get compiled and run rather than working in an imperitive fashion as you would in Numpy or Matlab. This is sort of abstract so let's look at an example instead.
If we were doing linear regression using gradient descent in numpy, our code would look like this:
# generate random input data
n = 1000 # training examples
d = 5 # features
X = np.random.randn(n*d).reshape((n,d)) # design matrix
Y = np.dot(X, np.array([2., 4., 1., -2., 3.])) + np.random.randn(n) # made up coefficients
# model code
W = np.zeros(d)
alpha = 0.1
old_loss = np.inf
for i in range(100):
y_hat = np.dot(X, W) # no biases because I'm lazy :)
loss = np.mean(np.square(Y - y_hat))
grad_loss = 1./n * np.dot(X.T, -2 * (Y - y_hat))
W -= alpha * grad_loss
print i, loss, np.round(W, 1)
if np.abs(old_loss - loss) < 1e-6:
break
else:
old_loss = loss
If we were being more generic about things, we could write functions for some of those operations. For example, we could write the follow function to calculate the loss...
def loss(x,y,W):
y_hat = np.dot(x, W) # no biases because I'm lazy :)
return np.mean(np.square(y - y_hat))
loss(X,Y,W)
This is essentially a python version of a tensorflow "computational graph" that calculates the loss given placeholders x
and y
and Variable W
. In tensorflow, we'd write the following:
x = tf.placeholder(tf.float32, shape=(None, d)) # we use None so we can choose and arbitrary batch size
y = tf.placeholder(tf.float32, shape=(None, 1))
W = tf.Variable(tf.random_normal([d,1], stddev=0.3), name='W')
b = tf.Variable(tf.zeros([1]), name='b') # let's use biases this time
y_hat = tf.matmul(x, W) + b
loss = tf.reduce_mean(tf.square(y - y_hat))
To actually run the code, you need to initialize the variables and run it in a tensorflow "session" (see the documentation for more on this).
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
loss_output = sess.run(loss, feed_dict = {x:X, y:Y.reshape((-1,1))})
print loss_output
Why both with all of this? Tensorflow gives you two big advantages over numpy:
params = tf.trainable_variables()
params = [W, b]
grads = tf.gradients(loss, params)
alpha = 0.1
train_op = []
for v, g in zip(params, grads):
update = tf.assign(v, v - alpha * g)
train_op.append(update)
train_op
with tf.Session() as sess:
sess.run(init_op)
for i in range(50):
_, loss_output, curr_par = sess.run([train_op, loss, W], feed_dict={x:X, y:Y.reshape((-1,1))})
print i, loss_output, np.array(curr_par.flatten())
Let's do something a little more interesting and build a feed forward network.
We essentially need to write:
h1 = relu(tf.matmul(x, W1) + b1)
h2 = relu(tf.matmul(x, W2) + b2)
...
y_hat = tf.matmul(x, Wn) + bn
But given that we're going to be reusing this structure over and over, it makes sense to write a feed forward layer class or function to hold the variables and perform the forward pass.
tf.reset_default_graph() # clear everything from earlier so we can build a new graph
def relu(x):
return tf.maximum(0., x)
class FFlayer(object):
def __init__(self, num_in, num_out):
self.W = tf.Variable(tf.random_normal([num_in,num_out], stddev=0.3), name='W')
self.b = tf.Variable(tf.zeros([num_out]) + 0.2, name='b')
def __call__(self, layer_below):
return tf.matmul(layer_below, self.W) + self.b
# inputs
x = tf.placeholder(tf.float32, shape=(None, d))
# outputs
y = tf.placeholder(tf.float32, shape=(None, 1))
#network
h1 = relu(FFlayer(d, 64)(x))
h2 = relu(FFlayer(64, 32)(h1))
y_hat = FFlayer(32, 1)(h2)
loss = tf.reduce_mean(tf.square(y - y_hat))
# Let's use tensorflow's gradient descent implementation
optimizer = tf.train.AdamOptimizer(0.01)
train_op = optimizer.minimize(loss)
# Variable init
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
for i in range(500):
_, loss_output = sess.run([train_op, loss], feed_dict={x:X, y:Y.reshape((-1,1))})
if i % 10 == 9:
print i, loss_output
Writing our own classes for each layer we use gets tedious quickly (although it's a good skill to have). Fortunately there's a massive number of packages that build on top of tensorflow that do this for you.
Tensorflow has:
Also worth taking a look at PyTorch - it is imperitive so feel more like working with Numpy / Matlab. Personally, I really like the look of it... but on the downside it doesn't have the ecosystem that TensorFlow has (yet!)... tools like TensorBoard are really nice for visualizing your training.
As an example, let's take a look at the Pytorch GAN tutorial to see how you'd build a GAN in a higher level framework.