Some of the information in this article is based on research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. Find out about the role of the peer review process in research here. For further information, please contact the cited source.
Researchers have published a paper documenting the ability of an innovative AI system, dubbed Go-Explore, to outperform both human and state-of-the-art algorithmic rivals at 55 classic Atari 2600 games now used as benchmarks for machine intelligence.
The study was published in Nature.
The team, led by Jeff Clune, conducted most of the initial work whilst part of Uber's AI labs, but have now all moved to non-profit Open AI. They first published their initial data in 2018 in a press release. That initial foray showed that Go-Explore was able to crack two games that had proved nigh-on impossible for AI systems up to that point, Pitfall and Montezuma’s Revenge.
Speaking to Technology Networks, study co-author Joost Huizinga explained how Go-Explore was able to wipe the floor with its rivals. “There are two fairly simply principles that made that possible. The first one is explicitly remembering where you’ve been in the space that you want to explore. A lot of previous reinforcement learning algorithms use intrinsic reward, which means you get rewarded for going somewhere new, which does work, but the entire idea of an intrinsic reward is that it gets reduced over time, meaning that the second time you reach the same time, it’s no longer as new, so you get less reward.”
AI takes on metroidvania madness
For releases such as Montezuma’s Revenge, this would lead to algorithm-controlled characters getting stuck after only exploring a portion of the game world, as these metroidvania-style games require repeat visitations to areas of a map to find new rewards. Go-Explore, by comparison, retains a memory of the places it has explored and repeatedly visits them.
See a "trailer" for Go-Explore here. Credit: Adrien Ecoffet via YouTube
The second element of Go-Explore that sets it apart from rival algorithms is other programs are based, Huizinga says, on the principle of exploring through taking random action in the gameworld in the hope of triggering reward events. In the hazard-filled Montezuma’s Revenge, this leads more often than not to the player character getting bitten by snakes, incinerated in fires or crushed by rolling skulls. This hampers the AI’s ability to explore the game world. Go-Explore, by comparison, will return to previously explore areas without taking random journeys off the path, thereby avoiding repeated deaths.
After Go-Explore’s initial release, some analysts noted that the authors, at the time, relied on an algorithmic trick called domain knowledge encoding to make things easier for Go-Explore. This technique involved manually extracting details from the game, like the room that the character was in, to be able to inform the algorithm that it had found something new. In the Nature paper, Huizinga and colleagues have enhanced the system. “Instead of manually taking those features and providing them to Go-Explore, we have an automated process that takes the pixels of the screen, which is also the input to the agent itself, and then we downscale it in such a way that we can identify whether two frames are very different or effectively the same,” said Huizinga.
A more robust process
This innovation meant that Go-Explore could move beyond Pitfall and Montezuma’s Revenge to now crack other Atari challenges, such as Gravitar, Berzerk and Centipede. After the team showed Go-Explore’s ability to successfully navigate the environments of these games and achieve high scores, they then showed it was capable of adapting to changes in the game world, a process called robustification. Due to budget limitations, the team was only able to show this process off in 11 games.
Criticisms of the initial release focused on the AI’s reliance on simulators, which instantly recreated particular rooms in the games tested, rather than making the AI explore again from the start. The team therefore showed that the AI could simply be instructed to go to a particular area of the game without handholding and still reach it successfully.
But what are the real-world applications of Go-Explore? Huizinga says that it will have a great deal of utility in streamlining robot platforms, by letting them problem solve much more quickly, but he is personally most excited about the applications of the algorithm in finding weaknesses in automated systems. He mentions a preprint paper that tests Go-Explore’s ability to prevent self-driving cars from colliding with pedestrians (a format that explains why Uber was interested in the technology in the first place). “Go-Explore may help in self-driving cars, not so much to directly teach the self-driving cars, but to potentially increase their safety,” concluded Huizinga.
Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J. First return, then explore. Nature. 2021;590(7847):580-586. doi:10.1038/s41586-020-03157-9