DEV Community

Andrew Lucker
Andrew Lucker

Posted on • Originally published at Medium on

(not very) Deep Learning

Tensorflow is not a unicorn, its just another tool

For the last year I’ve been playing around with different algorithms for playing ATARI games through the OpenAI platform. The nice thing about these games is that all actions (or inaction) are deterministic. If you make all the same choices then you will get the same result.

So far the best performance I’ve found is the A3C algorithm. A3C is similar to DQN however it uses many similar variants of the same strategy to learn what actions or insights have large impacts on value or what information can be safely ignored. This is good enough to solve a few of the simpler problems, however it fails to gain any deeper insight into complex environments with specific trouble at object recognition.

A good example is the ATARI River Raid environment. Here is one of my A3C bots playing:

If you watch the clip you will see that the bot adopts the basic strategy of “avoid obstacles and keep shooting”. This would be a good strategy if not for the concept of ‘fuel’ that the game creates. In order to progress further in levels there is a deferred need to 1) not shoot the fuel cartridges and 2) collect them by touching them. The problem is that there is no near-term penalty to ignoring the fuel charges, and thus the agent never learns that they are important. This bot has been fairly well trained, so I would suspect that more training would not help this problem much. It is fundamentally a problem of breadth vs depth of value search.

What I would like to see more in the AI research space is methods that combine Object and Feature Recognition with Policy Reinforcement Learning. This combination would help simplify these game environments and help the agent cut through the immense redundancy in each environment. Atari games are not like Go. Each pixel is not critically important. Usually objects are larger sprites with the exception of bullets and flak.

So that is where I am stuck right now. I will be trying to do some studying of object recognition myself and see if I can rectify that research with the policy learning side. Hopefully we will see deeper soon, the current pace is encouraging.

Top comments (0)