|Generative Adversarial Networks and their Potential
By Vladimir Bok, author of GANs in Action
This article discusses the history and meaning of Generative Adversarial Networks, and their potential.
Take 37% off GANs in Action. Just enter code fcclangr into the discount code box at checkout at manning.com.
Teaching Computers Creativity
One of the most exciting chapters in machine learning began in one of the most inconspicuous places: a Canadian bar. That is where, in 2014, Ian Goodfellow—who was a PhD student at the University of Montreal at the time—was discussing the challenges involved in asking a computer to generate images with his colleagues. To non-experts, this problem may not seem particularly noteworthy—after all, computers have achieved such extraordinary feats as reaching superhuman accuracy in face recognition and defeating the world’s top human player in a game of Go. Compared to that, the task of generating images may feel anticlimactic—until, that is, we understand the unique challenges involved in data creation.
Historically, machine learning algorithms have been great at recognizing patterns in existing data and using that insight for tasks such as classification and prediction. When asked to generate new data, computers have generally struggled. An algorithm can beat a chess grand master with ease or classify whether a credit card transaction is likely to be fraudulent; in contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed to fail. Indeed, some tasks that are trivial for us humans—such as the ability to engage in conversation or expressing ourselves creatively—are out of reach of even the most sophisticated supercomputers.
Now that we have a better appreciation for the unique significance of data creation, let’s return to the pub to Ian and his friends. Ian’s colleagues were considering complex statistical methods to help computers grasp the various elements that constitute an image—a step they considered prerequisite for generating realistic-looking data. In order to create something, they reasoned, a computer needs to understand it first.
Ian came up with a different approach. Instead of having a researcher devise a complex method to improve an algorithm’s understanding of an image, Ian conceived a way to have another algorithm do the teaching. The approach he proposed leverages what machines are generally good at (recognizing existing data) to help them with what they are generally bad at (producing new data). This is accomplished by pitting two algorithms against one another in a kind of cat-and-mouse game. One (the “Generator”) learns to produce realistic-looking data while the other (the “Discriminator”) learns to distinguish the synthetic examples produced by the Generator from the real examples coming from a training dataset provided by the researcher.
The resulting machine learning model, which Ian implemented later that night after returning home from the pub, came to be known as “Generative Adversarial Network,” or “GAN” for short. That is quite a mouthful so let’s unpack the name term by term. The word “generative” indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set—for example, if we want a GAN to paint like Leonardo da Vinci, we would use a training dataset of Leonardo’s artwork.
The term “adversarial” points to the game-like, competitive dynamic between two algorithms that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like Leonardo’s. The Discriminator’s objective is to tell the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be Leonardo’s. The two networks are constantly trying to outwit one another: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.
Lastly, the word “networks” indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. As their name suggests, these models are loosely inspired by the human brain—analogously to the nervous system, they use a set of interconnected nodes, or “neurons,” to process their computations.
While the mathematics underpinning GANs are fairly complex, there are many real world analogies that may make the intuition behind them easier to understand. Above, we discussed the example of an art forger (the “Generator”) trying to fool an art expert (the “Discriminator”). The more convincing the fake paintings the forger makes, the better the art expert needs to be at determining their authenticity. This is true the other way round as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger needs to improve his or her technique to avoid being caught red-handed. Another metaphor often used to describe GANs—one that Ian himself likes to use—is one of a criminal (the “Generator”) who forges money and the police (the “Discriminator”) who try to catch him. The more authentic-looking the counterfeit bills get, the better the police need to be at detecting them, and vice versa.
Wielding a Two-Edged Sword
Ever since Ian and his co-authors published a paper detailing his invention, GANs have been hailed by academics and industry experts as one of the most important innovations in deep learning. Yann LeCun, the Director of AI Research at Facebook, went as far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”
The excitement is well-justified. GANs have achieved remarkable results that have long been thought impossible for artificial systems, such as the ability to generate photo-realistic images or turn a video footage of a horse into a running zebra—all without the need for vast troves of painstakingly-labeled training data. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from just about anyone else, GANs have captured the imagination of researchers and wider public alike. Indeed, they have been covered by The New York Times, The Wired, Scientific American, and many other prominent media outlets.
Some of the spotlight focuses on the technology’s potential for mischief. At the end of an aptly titled piece about GANs—“How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”—the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs’ being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale.
Other applications of GANs are less ominous, even beneficial. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations—by analyzing countless outfits, the system will learn to produce items matching a given style. In medical research, GANs are used to augment datasets to improve diagnostic accuracy and even to aid new drug discovery. In game development, GANs can be leveraged to create new game levels and characters dynamically, without the need for human programmers and UX designers. GANs are also widely seen as an important stepping stone toward achieving so-called “general intelligence,” a single artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain—from motor skills involved in walking to language and creative skills needed to compose sonnets.
When future historians look back at the fateful day Ian went out drinking with his friends, it remains to be seen whether they would have wished he rather stayed home and the idea of two dueling neural networks had never occurred to him. Only the coming years and decades can tell whether the fears about GAN misuse will prove justified or whether any of the experimental applications will find its way to improving the lives of patients, optimizing creative workflows, or ushering in an era of sentient supercomputers. What is certain is that GANs have unlocked a vast array of research directions and applications whose impact will not be contained to academia alone. Perhaps, it is only fitting that GANs were invented in a pub because we all may need a drink before this is over.
If you are interested in learning more about this remarkable technique, look no further than GANs in Action: Deep Learning with Generative Adversarial Networks which I am co-writing with the London-based Data Scientist (and a fellow Czech national), Jakub Langr. Also see this slide deck.
 Chaochao Lu: “Surpassing Human-Level Face Verification Performance on LFW with GaussianFace”, 2014; arXiv:1404.3840.