Solving the 3x3 Variant of 2048

SOLVING THE 3x3 VARIANT OF 2048
Brian Galebach - May 7, 2014

ABSTRACT:
The 3x3 variant of 2048 has been strongly solved for four objectives: Achieving a 256 tile, achieving a 512 tile, achieving a 1024 tile, and maximizing expected game score. For each of these four objectives, the expected value (win probability or expected additional score) has been calculated for each of the four allowable moves from any achievable board position.

2048 GAME DESCRIPTION:
The original 4x4 version of 2048 is a single-player game created in March 2014 by Gabriele Cirulli. (http://gabrielecirulli.github.io/2048/, http://en.wikipedia.org/wiki/2048_%28video_game%29).

The rules of the original game are simple:
The game is played on a 4x4 square grid. At the start of the game, two tiles randomly appear within an empty grid, and after each player move, one tile randomly appears. Each randomly appearing tile is a 2 with 90% probability or a 4 with 10% probability, and appears in any blank space with equal probability. On each turn, the player may move in any of four directions (left, up, right, or down), as long as at least one tile may slide in that direction (having an available blank space toward the direction of movement), or as long as two identical tiles are adjacent in the direction of movement. If none of the four directions are possible (no empty board spaces, and no adjacent identical tiles), the game ends.

When there are adjacent identical tiles in the direction of movement, they merge into a tile with the sum of the two tiles. In the case where three identical tiles are adjacent, the two tiles closest to the edge of the board towards the direction of movement are combined, with the third tile remaining uncombined. In the case of four identical adjacent tiles (impossible in the 3x3 variant), both pairs of tiles combine into higher tiles. After all tile merges are completed, all remaining tiles on the board slide as far as possible in the direction of the player move.

Each time a pair of tiles combines, it adds the total of those tiles (the value of the new tile) to the player's score. No points are added to the player's score for randomly appearing tiles. The main goal of the original 2048 game was to achieve the 2048 tile, but shortly after the game creation, the game was modified to allow the player to continue to strive for higher tiles and higher scores.

3x3 VARIANT
The author was motivated to seek out the 3x3 variant of 2048 (https://itunes.apple.com/us/app/2048-3x3/id866593895?mt=8, http://cyberzhg.github.io/2048/index.html?size=3) after achieving some very high scores in the original 4x4 version (Figure 1). While scoring this high in the 4x4 version is fun for the player, this does require thousands of moves and several hours to achieve, and so can become somewhat tedious. In contrast, the 3x3 variant, with its far more restrictive board, requires just minutes to achieve relatively high scores (Figure 2).

Figure 1: Highest score and board position achieved by author in the 4x4 original 2048 version (without computer assistance)

Figure 2: Highest score and board position achieved by author in the 3x3 2048 variant app (with computer assistance)

The stated goal of the 3x3 variant app is to achieve the 256 tile. It is also possible to achieve the 512 tile and the 1024 tile, but somewhat ironically, not the 2048 tile. There is not enough board space to create the 2048 tile since after two 4 tiles are combined into an 8, leaving an optimal set of 8 tiles (8, 16, 32, 64, 128, 256, 512, 1024), the new random tile of a 2 or 4 appearing in the final empty space immediately ends the game.

SOLVING THE 3x3 VARIANT
--Motivation and Overview
In order to see how closely the author's strategy matched optimal game strategy, and as a fun programming challenge, the author set out to strongly solve (http://en.wikipedia.org/wiki/Solved_game) the 3x3 variant. The first goal chosen was the stated goal of the variant, which is to achieve the 256 tile. After the game was solved for that goal, three additional goals were solved: Achieving the 512 tile, achieving the 1024 tile, and maximizing the expected value of the player's score.

For each of the four goals, each achievable position (following the random tile appearance) was calculated from the start of the game to the end of the game when no more moves are possible, or to when the the goal tile is achieved. In each of those positions, the expected value for each of the possible four moves (left, up, right, down) is calculated, where the expected value is either the probability of achieving the goal tile, or it is the expected value of the additional score that will be achieved for that move, including the score for combining any tiles on that particular move.

--Board Storage
Each achievable board position can be rotated and/or reflected to create up to 8 strategically equivalent board positions (including the original non-rotated/non-reflected position). In order to minimize the number of board positions that needed to be calculated and stored, only a board position in which the lexigraphical order (http://en.wikipedia.org/wiki/Lexigraphical_order) of the numbers read from left to right and then top to bottom appears first among its 8 possible orientations is considered.

When a new board position is generated from a player move and a random tile appearance, it is then converted to its minimum lexigraphical-order configuration. Furthermore, each of the blocks is converted to its base-2 logarithm, and empty spaces are 0. Concatenating the 9 numbers and treating this as a base-11 number (where 10 becomes "A" for the 1024 tile) allows for a board position to be stored in a single unsigned 32-bit integer. For example, in the board position shown in Figure 2, the base-11 value for the minimum configuration would be 212469578, which converts to a decimal value of 452,492,774.

Discounting the non-minimum equivalent board configurations, there are a total of 48,713,519 achievable game positions in the 3x3 variant.

--Choice of Game Solve Algorithm
Each turn is comprised of two parts, the player move, which is deterministic, and the random tile generation.

If we were to consider the player as "player 1" and the random tile generation as "player 2", we could attempt to solve the game using a minimax (http://en.wikipedia.org/wiki/Minimax) algorithm. However this would be inaccurate, since the random tile generation does not work in an adversarial fashion. (The randomness is not specifically attempting to thwart the player, even if it sometimes seems that way.) Therefore, using a minimax algorithm would find only strategies that will work for any randomly generated new positions, but would not necessarily find strategies that will work even better in the most likely randomly generated positions.

Rather than using the minimax algorithm, the correct approach is to use an "expectimax" algorithm. After each of the intermediate board positions caused by each of the four possible player moves, the randomness can generate up to 16 new board positions (a 2 or a 4 block in each of up to 8 empty spaces). Since the 2's are generated 9 times more often than the 4's, the value of the new board positions achieved from a 2 appearing must be weighted 9 times more than the positions achieved from a 4 appearing when computing the EXPECTed value of that player move. Finally, the value of the original board position, before the player move, will be the MAXimum of the values of the four possible player moves. This is the essence of the expectimax algorithm.

--Board Position Value
Using the expectimax algorithm described above, an expected value for each possible board position can be calculated by working backwards from the game-end positions. For the goals of reaching a particular tile, the value of a game-end position is 1 (100% win probability) if the board contains that tile, and 0 there are no further legal moves. For the goal of maximizing the expected value of the player's score, the value of every game-end position is 0, since no additional score will be achieved from that point in the game.

For non-game-end positions, the value of the position is the maximum of the values of the four possible player moves, where the value of each possible player move is the weighted average of the possible randomly generated positions after the appearance of a 2 or 4. For the goals of reaching a particular tile, the weighted average is computed from the win probabilities of each of the resulting positions. For the goal of maximizing expected player score, the weighted average is computed from the expected value of each of the resulting positions (the additional score expected to be achieved from that new position), plus the score achieved from the combination of tiles from the player move. So for example, if a particular player move caused two 128 tiles to combine into a 256 tile, the value of that move would be the weighted average of the possible randomly generated positions plus 256.

RESULTS AND OBSERVATIONS
--Key Results
Firstly, the expected values from the start of the game for the four goal conditions are as follows:
256 Tile: 0.99571 (99.571% probability of achieving this tile)
512 Tile: 0.73677 (73.677% probability of achieving this tile)
1024 Tile: 0.01134 (1.134% probability of achieving this tile)
Maximizing Score: 5468.5 (Expected value of final game score is 5468.5 points)

It is important to note that the above expected values are only valid for the specific moves (strategies) calculated with each particular goal in mind. For example, the strategy calculated with the goal of maximizing the expected value of the score will not achieve the 512 tile 73.677% of the time, but at a slightly lower percentage. This is because it might "bail out" in some critical position going for some surefire points, rather than forgoing those immediate points in trying for a very low-probability (but theoretically) possible path to achieving the 512 tile. Similarly, the strategy for achieving a particular tile will not quite maximize the player's expected score, nor will it quite maximize the probability of achieving a different goal tile, because its strategy is solely tuned for achieving only that particular tile.

--Optimal Strategy
The author was surprised that the probabilities of achieving the 256 and 512 tiles were as high as was calculated. The author had been achieving (without computer assistance) the 256 tile perhaps no more than 70% of the time. And in going for the 512 tile, the author was rarely getting to a specific critical position which had just a 55% probability of reaching the 512 tile. In contrast, the computer achieves the 256 tile nearly every game, and has been seen to reach the 1024 tile and a final game score of over 12,000 points.

The key difference in the optimal strategy from what the author had previously thought was good strategy is that the computer prefers to play around a corner of the board rather than in rows (or columns). The optimal strategy seems to place the four highest tiles in a corner (as opposed to placing the six highest tiles in two rows or two columns), with the lowest of those four tiles on the center square. It then uses the other five spaces to generate a tile equal to the center tile. This strategy seems more robust than the by-rows strategy in that it usually seems not to be thrown off when 4 tiles appear. However, this corner strategy seems more difficult to implement by a human, as it is very easy to make incorrect moves that move the four corner tiles out of position, or that quickly lead to a blocked situation in the other five spaces. The author has not yet mastered this corner technique.

--Viewing Results
To aid in viewing and analyzing the results, four functions were created.

The function "show2048strat", gives the winning probabilities for each of the allowable player moves given a board position and a goal tile (either 256, 512, or 1024). An example of its output is shown below.

-----
>> show2048strat([0 2 0;0 64 4;0 8 8],512);
R: 73.6759239587902%
L: 73.6754070877757%
U: 73.6685530252522%
-----

The function "show2048strat_s", gives the expected additional score for each player move, including the points generated by combining tiles in this move. An example of its output is shown below.

-----
>> show2048strat_s([0 2 0;0 64 4;0 8 8]);
R: 5128.02813702287
L: 5127.9948379673
U: 5127.53891560638
-----

The functions "play2048" and "play2048_s" allow a player to play a complete game starting from any achievable position and starting score with the goals of achieving a particular tile or maximizing expected score. After each move in which the computer automatically makes the best strategic play, the user can enter whether a 2 or a 4 was randomly generated, and in which spot it was placed. Both programs also allow the computer to play on its own from the start of the game, in which case the computer randomly generates the 2's and 4's on its own. An example of the function "play2048_s" where the computer is playing by itself is shown below.

-----
-------------------

256  32  16
128  16   8
  4   2   0

D: 2089.47838589565
R: 25.6759642823475

D: 2712 + 0 = 2712
2712 + 2089.47838589565 = 4801.47838589565
-------------------

256  32   2
128  16  16
  4   2   8

L: 2205.22465432917
R: 33.454273792

L: 2712 + 32 = 2744
2744 + 2173.22465432917 = 4917.22465432917
-------------------
-----

For each of the above two board positions shown, the expected additional point value of each of the legal player moves is given. Then the current score is given, plus the immediate value of the move from tiles combining, and a new score. On the final line, the new score is given, plus the expected value of all remaining moves in the game, and finally the expected final score of the game. In this particular example, the fact that a 2 randomly appeared in the upper-right corner had the effect of increasing the expected game value by over 100 points. Had a 4 appeared in that same spot, the resulting position would have not been as favorable in the long run.

CONCLUSIONS AND POSSIBLE FUTURE WORK
Solving the 3x3 variant of 2048 uncovered a superior strategy which seems difficult for humans to emulate: the corner strategy. To what extent would this strategy extend to the original 4x4 version?

As noted before the various strategies computed for each goal do not maximize the expected values of the other goals. It should be possible to calculate the expected values of the four goals given each of the other three strategies.

Could the expectimax algorithm be used to solve the standard 4x4 version? Without any pruning (which would lower the accuracy of the results), likely billions of times more positions would need to be considered than for the 3x3 variant. At the moment, this would not be practical with the author's computational resources.