$$ \newcommand{\dint}{\mathrm{d}} \newcommand{\vphi}{\boldsymbol{\phi}} \newcommand{\vpi}{\boldsymbol{\pi}} \newcommand{\vpsi}{\boldsymbol{\psi}} \newcommand{\vomg}{\boldsymbol{\omega}} \newcommand{\vsigma}{\boldsymbol{\sigma}} \newcommand{\vzeta}{\boldsymbol{\zeta}} \renewcommand{\vx}{\mathbf{x}} \renewcommand{\vy}{\mathbf{y}} \renewcommand{\vz}{\mathbf{z}} \renewcommand{\vh}{\mathbf{h}} \renewcommand{\b}{\mathbf} \renewcommand{\vec}{\mathrm{vec}} \newcommand{\vecemph}{\mathrm{vec}} \newcommand{\mvn}{\mathcal{MN}} \newcommand{\G}{\mathcal{G}} \newcommand{\M}{\mathcal{M}} \newcommand{\N}{\mathcal{N}} \newcommand{\S}{\mathcal{S}} \newcommand{\I}{\mathcal{I}} \newcommand{\diag}[1]{\mathrm{diag}(#1)} \newcommand{\diagemph}[1]{\mathrm{diag}(#1)} \newcommand{\tr}[1]{\text{tr}(#1)} \renewcommand{\C}{\mathbb{C}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\E}{\mathbb{E}} \newcommand{\D}{\mathcal{D}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerbig}[1]{\left \langle #1 \right \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\two}{\mathrm{II}} \newcommand{\GL}{\mathrm{GL}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\grad}[1]{\mathrm{grad} \, #1} \newcommand{\gradat}[2]{\mathrm{grad} \, #1 \, \vert_{#2}} \newcommand{\Hess}[1]{\mathrm{Hess} \, #1} \newcommand{\T}{\text{T}} \newcommand{\dim}[1]{\mathrm{dim} \, #1} \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\rank}[1]{\mathrm{rank} \, #1} \newcommand{\inv}1 \newcommand{\map}{\text{MAP}} \newcommand{\L}{\mathcal{L}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} $$

Wasserstein GAN implementation in TensorFlow and Pytorch

Wasserstein GAN comes with promise to stabilize GAN training and abolish mode collapse problem in GAN.

InfoGAN: unsupervised conditional GAN in TensorFlow and Pytorch

Adding Mutual Information regularization to a GAN turns out gives us a very nice effect: learning data representation and its properties in unsupervised manner.

Maximizing likelihood is equivalent to minimizing KL-Divergence

We will show that doing MLE is equivalent to minimizing the KL-Divergence between the estimator and the true distribution.

Variational Autoencoder (VAE) in Pytorch

With all of those bells and whistles surrounding Pytorch, let's implement Variational Autoencoder (VAE) using it.

Generative Adversarial Networks (GAN) in Pytorch

Pytorch is a new Python Deep Learning library, derived from Torch. Contrary to Theano's and TensorFlow's symbolic operations, Pytorch uses imperative programming style, which makes its implementation more "Numpy-like".

Theano for solving Partial Differential Equation problems

We all know Theano as a forefront library for Deep Learning research. However, it should be noted that Theano is a general purpose numerical computing library, like Numpy. Hence, in this post, we will look at the implementation of PDE simulation in Theano.

Linear Regression: A Bayesian Point of View

You know the drill, apply mean squared error, then descend those gradients. But, what is the intuition of that process in Bayesian PoV?

MLE vs MAP: the connection between Maximum Likelihood and Maximum A Posteriori Estimation

In this post, we will see what is the difference between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP).

Conditional Generative Adversarial Nets in TensorFlow

Having seen GAN, VAE, and CVAE model, it is only proper to study the Conditional GAN model next!

KL Divergence: Forward vs Reverse?

KL Divergence is a measure of how different two probability distributions are. It is a non-symmetric distance function, and each arrangement has its own interesting property, especially when we use it in optimization settings e.g. Variational Bayes method.