Development of artificial intelligence agents tends to frequently be measured by their performance in games, but there’s a good reason for that: Games tend to offer a wide proficiency curve, in terms of being relatively simple to grasp the basics, but difficult to master, and they almost always have a built-in scoring system to evaluate performance. DeepMind’s agents have tackled board game Go, as well as real-time strategy video game StarCraft. But the Alphabet company’s most recent feat is Agent57, a learning agent that can beat the average human on each of 57 Atari games with a wide range of difficulty, characteristics and gameplay styles.
Being better than humans at 57 Atari games may seem like an odd benchmark against which to measure the performance of a deep learning agent, but it’s actually a standard that goes all the way back to 2012, with a selection of Atari classics including Pitfall, Solaris, Montezuma’s Revenge and many others. Taken together, these games represent a broad range of difficulty levels, as well as requiring a range of different strategies in order to achieve success.
That’s a great type of challenge for creating a deep learning agent because the goal is not to build something that can determine one effective strategy that maximizes your chances of success every time you play a game – instead, the reason researchers build these agents and set them to these tasks at all is to develop something that can learn across multiple and shifting scenarios and conditions, with the long-term aim of building a learning agent that approaches general AI – or AI that is more human in terms of being able to apply its intelligence to any problem put before it, including challenges it’s never encountered before.
DeepMind’s Agent57 is remarkable because it performs better than human players on each of the 57 games in the Atari57 set – previous agents have been able to be better than human players on average – but that’s because they were extremely good at some of the simpler games that basically just worked via a simple action-reward loop, but terrible at games that required more advanced play, including long-term exploration and memory, like Montezuma’s Revenge.
The DeepMind team addressed this by building a distributed agent with different computers tackling different aspects of the problem, with some tuned to focus on novelty rewards (encountering things they haven’t encountered before), with both short- and long-term time horizons for when the novelty value resets. Others sought out more simple exploits, figuring out which repeated pattern provided the biggest reward, and then all the results are combined and managed by an agent equipped with a meta-controller that allows it to weight the costs and benefits of different approaches based on which game it encounters.
In the end, Agent57 is an accomplishment, but the team says it can stand to be improved in a few different ways. First, it’s incredibly computationally expensive to run, so they will seek to streamline that. Second, it’s actually not as good at some of the simpler games as some simpler agents – even though it excels at the the top 5 games in terms of challenge to previous intelligent agents. The team says it has ideas for how to make it even better at the simpler games that other, less sophisticated agents, are even better at.