$$ \newcommand{\dint}{\mathrm{d}} \newcommand{\vphi}{\boldsymbol{\phi}} \newcommand{\vpi}{\boldsymbol{\pi}} \newcommand{\vpsi}{\boldsymbol{\psi}} \newcommand{\vomg}{\boldsymbol{\omega}} \newcommand{\vsigma}{\boldsymbol{\sigma}} \newcommand{\vzeta}{\boldsymbol{\zeta}} \renewcommand{\vx}{\mathbf{x}} \renewcommand{\vy}{\mathbf{y}} \renewcommand{\vz}{\mathbf{z}} \renewcommand{\vh}{\mathbf{h}} \renewcommand{\b}{\mathbf} \renewcommand{\vec}{\mathrm{vec}} \newcommand{\vecemph}{\mathrm{vec}} \newcommand{\mvn}{\mathcal{MN}} \newcommand{\G}{\mathcal{G}} \newcommand{\M}{\mathcal{M}} \newcommand{\N}{\mathcal{N}} \newcommand{\S}{\mathcal{S}} \newcommand{\I}{\mathcal{I}} \newcommand{\diag}[1]{\mathrm{diag}(#1)} \newcommand{\diagemph}[1]{\mathrm{diag}(#1)} \newcommand{\tr}[1]{\text{tr}(#1)} \renewcommand{\C}{\mathbb{C}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\E}{\mathbb{E}} \newcommand{\D}{\mathcal{D}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerbig}[1]{\left \langle #1 \right \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\two}{\mathrm{II}} \newcommand{\GL}{\mathrm{GL}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\grad}[1]{\mathrm{grad} \, #1} \newcommand{\gradat}[2]{\mathrm{grad} \, #1 \, \vert_{#2}} \newcommand{\Hess}[1]{\mathrm{Hess} \, #1} \newcommand{\T}{\text{T}} \newcommand{\dim}[1]{\mathrm{dim} \, #1} \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\rank}[1]{\mathrm{rank} \, #1} \newcommand{\inv}1 \newcommand{\map}{\text{MAP}} \newcommand{\L}{\mathcal{L}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} $$

Boundary Seeking GAN

Boundary Seeking GAN (BGAN) is a recently introduced modification of GAN training. Here, in this post, we will look at the intuition behind BGAN, and also the implementation, which consists of one line change from vanilla GAN.

Intuition of Boundary Seeking GAN

Recall, in GAN the following objective is optimized:

GAN Value Function

Following the objective above, as shown in the original GAN paper [1], the optimal discriminator $ D^*_G(x) $ is given by:

\[D^*_G(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)}\]

Hence, if we know the optimal discriminator with respect to our generator, $ D^*_G(x) $, we are good to go, as we have this following amount by rearranging the above equation:

\[p_{data}(x) = p_g(x) \frac{D^*_G(x)}{1 - D^*_G(x)}\]

What does it tell us is that, even if we have non-optimal generator $ G $, we could still find the true data distribution by weighting $ p_g(x) $, the generator’s distribution, with the ratio of optimal discriminator for that generator.

Unfortunately, perfect discriminator is hard to get. But we can work with its approximation $ D(x) $ instead. The assumption is that if we train $ D(x) $ more and more, it becomes closer and closer to $ D^*_G(x) $, and our GAN training becomes better and better.

If we think further at the above equation, we would get $ p_{data}(x) = p_g(x) $, i.e. our generator is optimal, if the ratio of the discriminator is equal to one. If that ratio is equal to one, then consequently $ D(x) $ must be equal to $ 0.5 $. Therefore, the optimal generator is the one that can make make the discriminator to be $ 0.5 $ everywhere. Notice that $ D(x) = 0.5 $ is the decision boundary. Hence, we want to generate $ x \sim G(z) $ such that $ D(x) $ is near the decision boundary. Therefore, the authors of the paper named this method Boundary Seeking GAN (BGAN).

That statement has a very intuitive explanation. If we consider the generator to be perfect, $ D(x) $ can’t distinguish the real and the fake data. In other words, real and fake data are equally likely, as far as $ D(x) $ concerned. As $ D(x) $ has two outputs (real or fake), then, those outputs has the probability of $ 0.5 $ each.

Now, we could modify the generator’s objective in order to make the discriminator outputting $ 0.5 $ for every data we generated. One way to do it is to minimize the distance between $ D(x) $ and $ 1 - D(x) $ for all $ x $. If we do so, as $ D(x) $ is a probability measure, we will get the minimum at $ D(x) = 1 - D(x) = 0.5 $, which is what we want.

Therefore, the new objective for the generator is:

\[\min_{G} \, \mathbb{E}_{z \sim p_z(z)} \left[ \frac{1}{2} (\log D(x) - \log(1 - D(x)))^2 \right]\]

which is just an $ L_2 $ loss. We added $ \log $ as $ D(x) $ is a probability measure, and we want to undo that, as we are talking about distance, not divergence.

Implementation

This should be the shortest ever implementation note in my blog.

We just need to change the original GAN’s $ G $ objective from:

G_loss = -torch.mean(log(D_fake))

to:

G_loss = 0.5 * torch.mean((log(D_fake) - log(1 - D_fake))**2)

And we’re done. For full code, check out https://github.com/wiseodd/generative-models.

Conclusion

In this post we looked at a new GAN variation called Boundary Seeking GAN (BGAN). We looked at the intuition of BGAN, and tried to understand why it’s called “boundary seeking”.

We also implemented BGAN in Pytorch with just one line of code change.

References

Hjelm, R. Devon, et al. “Boundary-Seeking Generative Adversarial Networks.” arXiv preprint arXiv:1702.08431 (2017). arxiv
Goodfellow, Ian, et al. “Generative adversarial nets.” Advances in Neural Information Processing Systems. 2014. arxiv