From Deep Learning and the Game of Go by Max Pumperla and Kevin Ferguson
This article is an excerpt from chapter 2 of Deep Learning and the Game of Go. We’re going to look at the game of Go and discuss why it’s such a good subject and learning tool for machine learning and deep learning. Plus, we’ll also discover why games in general are such a good subject for AI as well as which aspects of game playing can be solved with machine learning.
Games are a favorite subject for AI research, and it’s not just because they’re fun. They also simplify some of the complexities of real life, so we can focus on the algorithms we’re studying.
Imagine you see a comment on Twitter or Facebook: something like “ugh I forgot my umbrella.” You’d quickly conclude that your friend got caught out in the rain, despite that information not having been included anywhere in the sentence. How did you reach that conclusion? First, you applied common knowledge about what umbrellas are for. Second, you applied social knowledge about what kind of comments people bother to make: it would be very strange to say “I forgot my umbrella” on a bright sunny day.
As humans, we effortlessly factor in all this context when reading a sentence. This is not so easy for computers. Modern deep learning techniques are very effective at processing the information we supply them. However, our ability to find all the relevant information and feed it to computers is limited. Games sidestep that problem. They take place in an artificial universe where all the information you need to make a decision is spelled out in the rules.
Games are especially well suited for reinforcement learning. Recall that reinforcement learning requires repeatedly running our program and evaluating how well it has accomplished a task. Imagine you’re using reinforcement learning to train a robot to move around a building. Before the control system is finely tuned, you risk the robot falling down a flight of stairs or knocking over your furniture. Another option is to build a computer simulation of the environment the robot will operate in. This eliminates the risks of letting an untrained robot run around in the real world, but creates new problems. First, you have to invest in developing a detailed computer simulation, which is a significant project in its own right. Second, there’s always a chance that your simulation is not completely accurate.
With games, on the other hand, all we need to do is have our AI play. If it loses a few hundred thousand matches while it’s learning, so what? In reinforcement learning, games are essential to serious research. Many cutting edge algorithms were first demonstrated on Atari video games such as Breakout.
To be clear, you can sucessfully apply reinforcement learning to problems in the physical world. Many researchers and engineers have done so. But starting with games solves the problem of creating a realistic training environment, and lets you focus on the mechanics and principles of reinforcement learning.
Why write a whole book about computer Go? You might suspect that the authors are die-hard Go nuts–OK, guilty as charged. But the real reason to study Go, as opposed to chess or backgammon, is that a really strong Go AI requires deep learning. A top-tier chess engine such as Stockfish is full of chess-specific logic; you need a certain amount of knowledge about the game to write something like that. With deep learning, we can teach a computer to imitate strong Go players, even if we don’t understand what they’re doing. And that’s a powerful technique that opens up all kinds of applications, both in games and in the real world.
Chess and checkers AIs are designed around reading out the game, farther and more accurately than human players can. There are two problems with applying this technique to Go. First, we can’t read very far ahead, because there are too many moves to consider. Second, if we could read ahead, we don’t know how to evaluate if the result is good. It turns out that deep learning is the key to unlocking both these problems.
A lightning introduction to the game of Go
You don’t need to be a strong Go player to read this book, but you do need to understand the rules well enough to enforce them in a computer program. Fortunately, the rules are famously simple. In short, two players alternate placing black and white stones on a board (white always goes second). The goal is to surround as much of the board as possible with your own stones.
Although the rules are simple, Go strategy has endless depth, and we don’t even attempt to cover it in this book. If you’re interested in learning more, we provide some resources at the end of this section.
A Go board is a square grid. Stones go on the intersections, not inside the squares. The standard board is19x19, but sometimes players use a smaller board for a quick game–9×9 and 13x13boards are popular.
Notice that nine points are marked with a dot. These points are called the star points. Their main purpose is to help players judge distances on the board: they have no effect on game play.
Figure 1.A standard 19×19 Go board. The intersections marked with the dots are the star points: they are solely for players’ reference. Stones go on the intersections.
Placing and capturing stones
One player plays with black stones and the other plays with white stones. The two players alternate placing stones on the board, starting with the black player. Stones don’t move once they are on the board, although they can be captured and removed entirely. To capture your opponent’s stones, you must completely surround them with your own. Here’s how that works.
Stones of same color that are touching are considered connected together. For the purposes of connection, we only consider straight up, down, left, or right; diagonals don’t count. Any empty point touching a connected group is called a liberty of that group. Every group needs as least one liberty to stay on the board. So you can capture your opponent’s stones by filling their liberties.
Figure 2.The three black stones are connected. They have four liberties on the points marked with squares. White can capture the black stones by placing white stones on all the liberties.
When you place a stone in the last liberty of an opponent’s group, that group is captured and removed from the board. On flip side, you may not play a stone that would have zero liberties, unless you are completing a capture.
Figure 3.The white stones on the left can never be captured: black can play at neither A nor B. A black stone placed there would have no liberties, and is therefore an illegal play. On the other hand, black can play at C to capture five white stones.
There’s an interesting consequence of the capturing rules. If a group of stones has two completely separate internal liberties, it can never be captured. See Figure 3: black can’t play at A, because that black stone would have no liberties. Nor can black play at B. So black has no way to fill the last two liberties of the white group. These internal liberties are called eyes. In constrast, black can play at C to capture five white stones. That white group has only one eye, and is doomed to get captured at some point.
Although it’s not explicitly part of the rules, the idea that a group with two eyes can’t be captured is the most basic part of Go strategy. In fact, this is the only strategy we will specifically code into our bot’s logic.
Ending the game and counting
Either player may pass any turn instead of placing a stone. When both players pass consecutive turns, the game is over. Before scoring, the players identify any dead stones: stones that have no chance of making two eyes or connecting up to friendly stones. Dead stones are treated exactly the same as captures when scoring the game. If there’s a disagreement, the players can resolve it by resuming play, but this is very rare: if the status of any group is unclear, players will usually try to resolve it before passing.
The goal of the game is to surround a larger section of the board than your opponent. There are two ways to add up the score, but they nearly always give the same result.
The most common counting method is territory scoring. In this case, you get one point for every point on the board that is completely surrounded by your own stones, plus one point for every opponent’s stone that you captured. The player with more points in the winner.
An alternative counting method is area scoring. With area scoring, you get one point for every point of territory, plus another point for every stone you have on the board. Except invery rare cases, you’ll get the same winner by either method: if neither player passes early, the difference in captures will equal the difference in stones on the board.
Territory scoring is more common in casual play, but it turns out that area scoring is slightly more convenient for computers. Throughout the book, our AI assumes it’s playing under area scoring, unless otherwise noted.
In addition, the white player gets extra points as compensation for going second. This compensation is called komi. Komi is usually 6.5 points under territory scoring or 7.5 points under area scoring–the extra half point ensures there are no ties. We assume a komiof 7.5 points throughout the book.
Figure 4.The final position of 9×9 game. Dead stones are marked with an X. Black territory is marked with a triangle. White territory is marked with a square.
Figure 4 shows an example 9×9 game. Here’s how you score it:
- First, the stones marked with an X are considered dead: they count as captures, even though the players did not actually make the capture in the game. Black also made a capture earlier in the game (not shown). So black has 3 captures and white has 2 captures.
- Black has 12 points of territory: the 10 points marked with a triangle, plus the two points underneath the dead white stones.
- White has 17 points of territory: the 15 points marked with a square, plus the two points underneath the dead black stones.
- Black has 27 stones on the board, after removing the dead black stones.
- White has 25 stones on the board, after removing the dead white stones.
- By territory scoring, white has 17 points of territory + 2 captures + 6.5 komi for a total of 25.5 points. Black has 12 points of territory + 3 captures for a total of 15 points.
- By area scoring, white has 17 points of territory + 25 stones on the board + 7.5 komi for a total of 49.5 points. Black has 12 points of territory + 27 stones on the board for a total of 39 points.
- By either counting method, white wins by 10.5 points.
There’s one more way a game can end: either player can resign at any point. In a game between experienced players, it’s considered courteous to resign when you are clearly behind. For our AI to be a good opponent, it should learn to detect when it should resign.
There’s one more restriction on where you can place stones. To ensure that the game ends, it is illegal to play a move that will return the board to a previous state. Figure 5 shows an example of how this can happen.
Figure 5.An illustration of the ko rule. First, black captures one white stone. White would like to capture back by playing at A, but that would revert the board to its previous position. The ko rule forbids such a play, in order to prevent an endless cycle of capture and re-capture. White must play somewhere else on the board first.
In the diagram, black has just captured one stone. White might like to play at the point marked A to re-capture the new black stone. That, however, would put the board back in the same position it was in two moves ago. Instead, white must play somewhere else on the board first. After that, if white captures a black stone by placing at A, the global board position will be different, making the move legal. Of course, this gives black the opportunity to protect the vulnerable stone in the meantime.
This situation is called a ko, from the Japanese word for “eternity.” Players adopt special strategies when a ko is on the board, and this was a weak spot for previous generations of Go AIs.
When two players of unequal strength play, there’s a simple system to keep the game interesting. The weaker player takes black and gets to place a few stones on the board before the game starts, these stones are called handicap stones. Then the stronger player takes white, and makes the first move.
It’s traditional to place the handicap stones on the star points, but some players allow the black player to choose where to put them.
Where to learn more
- The best way to get into Go is dive in and start playing, and it’s easier than ever to find a casual game online. The Online Go Server (http://online-go.com) is a popular server where you can play directly in your web browser. Even if you just learned the rules, its ranking system will help you find a competitive game. Other popular Go servers include the Kiseido Go Server (http://gokgs.com) and Tygem (http://www.tygembaduk.com).
- Sensei’s Library (https://senseis.xmp.net) is a wiki-style reference, full of strategy, tips, history, and trivia.
- Janice Kim’s Learn To Play Go series ranks among the best English-language Go books. We highly recommend volumes 1 and 2 for complete beginners: they will very quickly get you to the point where you can make sense of a game.
What can we teach a machine?
Early in the game, it’s difficult to evaluate a particular move because of the huge number of variations in the rest of the game. Both chess and Go AIs often use an opening book: a database of opening sequences taken from expert human games. To build this, you need a collection of game records from strong players. You analyze the game records, looking for common positions. In any common position, if there is a strong consensus on the next move–say, one or two moves account for 80% of the follow-ups –you add those moves to theopening book. The bot can then consult the book when playing a game. If any early game position appears in the opening book, the bot just looks up the expert move.
In chess and checkers, where pieces are removed from the board as the game progresses, AIs also contain similar endgame databases: once there are just a few pieces on the board, you can calculate out all the variations in advance. This technique doesn’t really apply to Go, where the board fills up toward the end.
Searching game states
The core idea behind board game AI is tree search. Think about how humans play a strategy game. First, we consider a possible move for our next turn. Then we need to think of how our opponent is likely to respond,and plan how we would respond to that, and so on. We read out the variation as far as we can, and then judge if the result is good. Then we backtrack a bit and look at a different variation to see if it’s better.
This very closely describes how the tree search algorithms used in game AI work. Of course, humans can only keep a few variations in their heads at once, while computers can juggle millions with no trouble. Humans make up for their lack of raw computing power with intuition. Experienced chess and Go players are scary good at spotting the handful of moves worth considering.
Ultimately, raw computing power won out in chess. But mastering computer Go took an interesting twist: bringing human intuition to computers.
In the context of game tree search, the number of possible moves on a given turn is the branching factor.
In chess, the average branching factor is about 30. At the start of the game, each player has 20 legal options for the first move; the number increases a bit as the board opens up. At that scale, it’s realistic to read out every single possible move to 4 or 5 moves ahead, and a chess engine will read out the more promising lines much deeper than that.
By comparison, the branching factor in Go is enormous. On the first move of the game, there are 361 legal moves, and the number decreases very slowly. The average branching factor is around 250 valid moves per turn. This means looking ahead just 4 moves requires evaluating nearly 4 billion positions. It’s crucial to narrow down the number of possibilities.
Table 1.Number of positions to evaluate
|Branching factor 30||Branching factor 250|
|After 2 moves||900||62,500|
|After 3 moves||27,000||15 million|
|After 4 moves||810,000||4 billion|
|After 5 moves||24 million||1 trillion|
Rules-based approaches to move selection turn out to be mediocre at this task: it’s extremely difficult to write out rules that reliably identify the most important area of the board. Luckily, deep learning is perfectly suited to the problem. We can apply supervised learning to train a computer to imitate a human player.
We start with a large collection of game records between strong human players; online gaming servers are a great resource here. Then we replay all the games on a computer, extracting each board position and the following move. That’s our training set. With a suitably deep neural network, it’s possible to predict the human move with better than 50% accuracy. You can build a bot that just plays the predicted human move, and it’s already a credible opponent. The real power comes when you combine these move predictions with tree search: the predicted moves give you a ranked list of branches to explore.
In any game more complex than tic-tac-toe, it’s not practical read out the entire game to the end. At some point, we have to stop and pick one of the variations we’ve looked at. To do so, we take the final board position we’ve read out and assign it a numerical score. Of all the variations we analyzed, we select the move that leads to the best-scoring position. Figuring out how to compute that score is the tricky part: that’s the problem of position evaluation.
In chess AIs, position evaluation is based on logic that makes sense to chess players. You can start with simple rules like: if you capture my pawn, and I capture your rook, that’s good for me. Top chess engines go far beyond that, factoring sophisticated rules about where on the board the pieces ended up and what other pieces are blocking their movement.
In Go, position evaluation may be even more difficult than move selection. The goal of the game is to surround more territory, but it’s surprisingly difficult to count territory–the boundaries tend to remain vague until very late in the game. Counting captures doesn’t help much either; sometimes you can make it all the way to the end of a game with only a couple captures. This is another area where human intuition has traditionally reigned supreme.
Deep learning was a major breakthrough here as well. The kinds of neural networks that are suitable for move selection can also be trained to evaluate board positions. Instead of training a neural network to predict what the next move will be, we train it to predict who will win. We can design the network so it expresses its prediction as a probability, which gives us a numeric score to evaluate the board position.
How to measure our Go AI’s strength
As we work on our Go AI, we’ll naturally want to know how strong it is. We need a system to describe how strong it is, and a way to calibrate its level.
Traditional Go ranks
Go players generally use the traditional Japanese system, where players are given kyu (beginner) or dan (expert) ranks. The strongest kyu rating is 1 kyu, and larger numbers are weaker. Dan ranks go in the opposite direction: 1 dan is one level stronger than 1 kyu, and larger dan ranks are stronger.
The ratings are based on the number of handicap stones required to make up the difference in strength between two players. For example, if Alice is 2 kyu and Bob is 5 kyu, that means Alice will usually give Bob 3 handicap stones, so that they have an equal chance of winning.
Regional Go associations maintain ranks for their members, as do most online Go servers. These ranks don’t match perfectly from one rating system to another, but players understand the general meaning of traditional ranks as follows:
|25 kyu||Complete beginners who have just learned the rules|
|20 kyu– 11kyu||Beginners|
|10 kyu– 1kyu||Intermediate players|
|1 dan – 6 dan||Strong amateur players|
|7 dan||Top amateur players, close toprofessional strength|
The strongest players in the world can get certified as a professional by their regional Go association. You can safely assume that any player with a professional rank is at least as strong as an amateur 7 dan player, and possibly much stronger.
Benchmarking our Go AI
An easy way to estimate the strength of our own bot is to play against other bots of known strength. Open source Go engines such as GNU Go and Pachi provide good benchmarks. GNU Go plays at around a 5 kyu level, and Pachi is about 1 dan (Pachi’s level varies a bit, depending on how much computing power you provide it). So if we have our bot play GNU Go 100 times and it wins around 50 games, we can conclude our bot is also somewhere in the neighborhood of 5 kyu.
To get a more precise rank, we can set up our AI to play on a public Go server with a rating system. A few dozen games should be enough to get a reasonable estimate.
- Games are a popular subject for AI research because they create controlled environments with known rules.
- The strongest Go AIs today rely on machine learning rather than game-specific knowledge. Due in part to the huge number of possible variations to consider, rule-based Go AIs were not historically strong.
- Two places we can apply deep learning in Go are move selection and position evaluation.
- Move selection is the problem of narrowing the set of moves we need to consider in a particular board position. Without good move selection, our Go AI will have too many branches to read.
- Position evaluation is the problem of estimating which player is ahead and by how much. Without good position evaluation, our Go AI will have no ability to select a good variation.
- We can measure the strength of our AI by playing against widely available bots of known strength, like GNU Go or Pachi.