This post should be quick as it is just a port of the previous Keras code. For the intuition and derivative of Variational Autoencoder (VAE) plus the Keras implementation, check previou post. The full code is available in my Github repo: https://github.com/wiseodd/generative-models.
The networks
Let’s begin with importing stuffs.
import torchimport torch.nn.functional as nnimport torch.autograd as autogradimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as pltimport matplotlib.gridspec as gridspecimport osfrom torch.autograd import Variablefrom tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('../MNIST_data', one_hot=True)mb_size = 64Z_dim = 100X_dim = mnist.train.images.shape[1]y_dim = mnist.train.labels.shape[1]h_dim = 128c = 0lr = 1e-3
Now, recall in VAE, there are two networks: encoder
def xavier_init(size): in_dim = size[0] xavier_stddev = 1. / np.sqrt(in_dim / 2.) return Variable(torch.randn(size) * xavier_stddev, requires_grad=True)
Wxh = xavier_init(size=[X_dim, h_dim])bxh = Variable(torch.zeros(h_dim), requires_grad=True)
Whz_mu = xavier_init(size=[h_dim, Z_dim])bhz_mu = Variable(torch.zeros(Z_dim), requires_grad=True)
Whz_var = xavier_init(size=[h_dim, Z_dim])bhz_var = Variable(torch.zeros(Z_dim), requires_grad=True)
def Q(X): h = nn.relu(X @ Wxh + bxh.repeat(X.size(0), 1)) z_mu = h @ Whz_mu + bhz_mu.repeat(h.size(0), 1) z_var = h @ Whz_var + bhz_var.repeat(h.size(0), 1) return z_mu, z_var
Our
def sample_z(mu, log_var): # Using reparameterization trick to sample from a gaussian eps = Variable(torch.randn(mb_size, Z_dim)) return mu + torch.exp(log_var / 2) * eps
Let’s construct the decoder
Wzh = xavier_init(size=[Z_dim, h_dim])bzh = Variable(torch.zeros(h_dim), requires_grad=True)
Whx = xavier_init(size=[h_dim, X_dim])bhx = Variable(torch.zeros(X_dim), requires_grad=True)
def P(z): h = nn.relu(z @ Wzh + bzh.repeat(z.size(0), 1)) X = nn.sigmoid(h @ Whx + bhx.repeat(h.size(0), 1)) return X
Note, the use of b.repeat(X.size(0), 1)
is because this Pytorch issue.
Training
Now, the interesting stuff: training the VAE model. First, as always, at each training step we do forward, loss, backward, and update.
params = [Wxh, bxh, Whz_mu, bhz_mu, Whz_var, bhz_var,Wzh, bzh, Whx, bhx]
solver = optim.Adam(params, lr=lr)
for it in range(100000): X, _ = mnist.train.next_batch(mb_size) X = Variable(torch.from_numpy(X))
# Forward # ...
# Loss # ...
# Backward # ...
# Update # ...
# Housekeeping for p in params: p.grad.data.zero_()
Now, the forward step:
z_mu, z_var = Q(X)z = sample_z(z_mu, z_var)X_sample = P(z)
That is it. We just call the functions we defined before. Let’s continue with the loss, which consists of two parts: reconstruction loss and KL-divergence of the encoded distribution:
recon_loss = nn.binary_cross_entropy(X_sample, X, size_average=False)kl_loss = 0.5 * torch.sum(torch.exp(z_var) + z_mu**2 - 1. - z_var)loss = recon_loss + kl_loss
Backward and update step is as easy as calling a function, as we use Autograd feature from Pytorch:
# Backwardloss.backward()
# Updatesolver.step()
After that, we could inspect the loss, or maybe visualizing
The full code could be found here: https://github.com/wiseodd/generative-models.