/ 4 min read
Deriving Contractive Autoencoder and Implementing it in Keras
In the last post, we have seen many different flavors of a family of methods called Autoencoders. However, there is one more autoencoding method on top of them, dubbed Contractive Autoencoder (Rifai et al., 2011).
The idea of Contractive Autoencoder is to make the learned representation to be robust towards small changes around the training examples. It achieves that by using different penalty term imposed to the representation.
The loss function for the reconstruction term is similar to previous Autoencoders that we have been seen, i.e. using
Hence, the loss function is as follows:
in which
that is, the penalty term is the Frobenius norm of the jacobian matrix, which is the sum squared over all elements inside the matrix. We could think Frobenius norm as the generalization of euclidean norm.
In the loss above, clearly it’s the calculation of the jacobian that’s not straightforward. Calculating a jacobian of the hidden layer with respect to input is similar to gradient calculation. Recall than jacobian is the generalization of gradient, i.e. when a function is a vector valued function, the partial derivative is a matrix called jacobian.
Let’s calculate the jacobian of the hidden layer of our autoencoder then. Let’s say:
where
It looks familiar, doesn’t it? Because it’s exactly how we calculate gradient. The difference is however, that we treat
Let
We need to form a diagonal matrix of the gradient of
As our main objective is to calculate the norm, we could simplify that in our implementation so that we don’t need to construct the diagonal matrix:
Translated to code:
import numpy as np
# Let's say we have minibatch of 32, and 64 hidden units# Our input is 786 elements vectorX = np.random.randn(32, 786)W = np.random.randn(786, 64)
Z = np.dot(W, X)h = sigmoid(Z) # 32x64
Wj_sqr = np.sum(W.T**2, axis=1) # Marginalize i (note the transpose), 64x1dhj_sqr = (h * (1 - h))**2 # Derivative of h, 32x64J_norm = np.sum(dhj_sqr * Wj_sqr, axis=1) # 32x1, i.e. 1 jacobian norm for each data point
Putting all of those together, we have our full Contractive Autoencoder implemented in Keras:
from keras.layers import Input, Densefrom keras.models import Modelimport keras.backend as K
lam = 1e-4
inputs = Input(shape=(N,))encoded = Dense(N_hidden, activation='sigmoid', name='encoded')(inputs)outputs = Dense(N, activation='linear')(encoded)
model = Model(input=inputs, output=outputs)
def contractive_loss(y_pred, y_true): mse = K.mean(K.square(y_true - y_pred), axis=1)
W = K.variable(value=model.get_layer('encoded').get_weights()[0]) # N x N_hidden W = K.transpose(W) # N_hidden x N h = model.get_layer('encoded').output dh = h * (1 - h) # N_batch x N_hidden
# N_batch x N_hidden * N_hidden x 1 = N_batch x 1 contractive = lam * K.sum(dh**2 * K.sum(W**2, axis=1), axis=1)
return mse + contractive
model.compile(optimizer='adam', loss=contractive_loss)model.fit(X, X, batch_size=N_batch, nb_epoch=5)
And that is it! The full code could be found in my Github repository: https://github.com/wiseodd/hipsternet.
References
- Rifai, Salah, et al. “Contractive auto-encoders: Explicit invariance during feature extraction.” Proceedings of the 28th international conference on machine learning (ICML-11). 2011.