top of page

Smashology

Hey everyone! It has long been my dream to beat my college roommate and childhood friend, Richie, in Smash Bros for the N64. Anyone that knows Richie, or any of his family members, knows that this will never happen. So once I began diving into neural networks, I knew what needed to be done. I needed to build a neural network which would surpass the master. I needed a smashbot.

Lo and behold, Stanford researchers published a paper at the end of February which did just that. Or at least, they made one which could predict moves well during training and validation given their methods - but it would not perform well on edges. Why? Are neural networks not robust enough to play video games in real time given only camera input like a player? Do they really, as the researchers suggest, require access to game data? Let's take a look at their methods. Their main assumption, which drives their choice of a half second of data input into the neural network, is that time based data is important to decision making in Smash Bros for the N64. They back up this claim with the fact that their neural network uses the previous frames more than it uses the current frame for predicting certain movesets. I would agree that Pikachu's down and b move would be a special case of this being true - and I would agree with it being true for edge guarding, but you would need far more than just a half second of data to predict proper edge guarding tactics depending upon whether the opponent has a jump or move left to use. On top of this limitation, I would like to point out the unlikeliness that the in-game AI uses time based data rather than distance between opponents to make decisions given my experiences playing against the AI. The fact that the in-game AI can occasionally pull off nearly impossible combinations for humans using only distance data indicates that time-based data may not be as necessary for a robust model which does not fall off the edge while fighting as they assumed.

In addition, they only train the network by having one of the lab members play against the hardest computer through multiple games. This prevents them from doing things like guided learning, in which they teach the network what to do in certain situations relative to the opponent in each position on the map in order to help it generalize. Finally, they use Fraps along with an on-screen keyboard display to continuously record their training data, and I believe they also utilize Fraps to feed the data into the neural network during playtime. In my experience, by writing data and then reading it again using third party software, they are significantly reducing their network's reaction time. In addition to this, they are recording quite a bit of unnecessary data during training - their most common data point being "do nothing." There is never a time in Smash which you should be doing nothing - not that you should use as a data point for a neural network at least. If you are not running back and forth as Captain Falcon; you should be taunting them, or grabbing the edge, or throwing something their way. In an effort to improve upon their methods, I've written a set of python scripts which records training data only upon unique key presses and takes a screenshot of a selected screen region with my own naming convention along with saving the pressed keys into a text file. By only recording data when the player gives input, it allows me to use another player to control the opponent and perform guided learning. I also implemented this screen capture into the testing process. Just by switching from a third party image software into a python library which loads the screen data into a matrix during play time, I was able to push my network's reaction time (which for the moment is a slightly simplified version of NVIDIA's self driving car network) from .4 seconds to the default refresh rate of the emulator - meaning I am currently bound by the emulator rather than my network's reaction time. This is a massive improvement over the paper in terms of performance. As I am also only using a single screen's worth of data for the network input, there is the possibility that having .5 seconds worth of data or a more complex network would push up that reaction time. I'll be performing data collection with Richie as soon as possible so that it can learn from the best. Afterwards, I'll be looking at implementing a reinforcement training method through using a separate neural network (text and number recognition training on Smash screens) to quantify performance only. Hopefully the student will be able to surpass the master. What are your thoughts for improvement before we get down to the final behavioral cloning training data collection?

Random Forest (the Blog): 

 

Big Data, Big Networks, Big Solutions. This is a blog about the challenges of machine learning geared towards those training our baby AI's.

 UPCOMING EVENTS: 

 

05/08/17:  NVIDIA GPU CONFERENCE

 FOLLOW THE ARTIFACT: 
  • Facebook B&W
  • Twitter B&W
  • Instagram B&W
 RECENT POSTS: 
 SEARCH BY TAGS: 
No tags yet.
bottom of page