DEV Community: Joy

I spent $15 in DALL·E 2 credits creating this AI image, and here’s what I learned

Joy — Fri, 19 Aug 2022 05:19:48 +0000

Yes, that’s a llama dunking a basketball. A summary of the process, limitations, and lessons learned while experimenting with the closed Beta version of DALL·E 2.

Llama playing basketball, generated using DALL·E 2 by author.

This article was originally published by me on Medium.

I’ve been dying to try DALL·E 2 ever since I first saw this artificially generated image of a “Shiba Inu Bento Box”.

Wow — now that’s disruptive technology.

For those of you unfamiliar, DALL·E 2 is a system created by OpenAI that can generate original images from text.

It’s currently in closed Beta — I signed up for the waitlist in early May and got access at the end of July. During the Beta, users receive credits (50 free in the first month, 15 credits every month after that) where every use costs 1 credit, and each use results in 3–4 images. You can also purchase 115 credits for US$15.

P.S. If you can’t wait to try it, give DALL·E mini a go for free. However, the quality of its images are generally poorer (giving rise to a host of DALL·E memes) and takes about ~60 seconds per prompt (DALL·E 2 in comparison only takes 5 seconds or so).

You’ve probably seen various cherry-picked images online showing what DALL·E 2 is capable of (provided the right creative prompt). In this article, I share a candid walkthrough of what it takes to create a usable image from scratch for the subject matter: “a llama playing basketball”. You might find it useful if you’re thinking of trying out DALL·E 2 yourself, or you’re just interested in understanding what it’s capable of.

The starting point

There’s both an art and science to knowing what prompt to feed DALL·E 2. To illustrate, here are the results for “llama playing basketball”:

Images generated by the author using DALL·E 2 with prompt “llama playing basketball.”

Why is DALL·E 2 inclined to generate cartoon images for this prompt? I assume it has something to do with the lack of actual images of a llama playing basketball seen during training.

I attempted to go a step further by adding the key term ‘realistic photo of’:

Images generated by the author using DALL·E 2 with prompt “realistic photo of llama playing basketball”

That llama’s looking more photorealistic, but the whole image is starting to look like a botched Photoshop job. In this case, DALL·E 2 clearly needed some hand-holding to create a cohesive scene.

Prompt engineering, aka the art of specifying exactly what you want

In the context of DALL·E, prompt engineering refers to the process of designing prompts to give you the desired results.

The DALL·E 2 Prompt Book is a fantastic resource for this. It contains a detailed list of inspirations for prompts using keywords from photography and art.

Why is something like this necessary? Because getting a usable output from DALL·E 2 is finicky (especially when you’re not sure what DALL·E 2 is capable of). So much so that a new startup is creating a marketplace charging $1.99 for prompts to save you the time and money from coming up with your own.

My personal favorite find is “dramatic backlighting”:

Now we’re talking! Images generated by the author using DALL·E 2 with prompt: “Film still of a llama dunking a basketball, low angle, extreme long shot, indoors, dramatic backlighting.”

It’s important to tell DALL·E 2 exactly what you want. Apparently, it’s not obvious from the context that this llama should be dressed for the occasion. DALL·E 2 does a great job realizing this fantasy scene however, when ‘llama wearing a jersey’ is specified:

Basketball dunking llama, now comes with jerseys. Images generated by author with DALL·E 2 using prompt: “film still of an alpaca wearing a jersey, dunking a basketball, low angle, long shot, indoors, dramatic backlighting, high detail.”

It doesn’t stop there. To add some drama to the image and really get this llama flying, I needed to specify phrases such as ‘dunking a basketball', ‘action shot of…’, or my personal favorite: “…llama in a jersey dunking a basketball like Michael Jordan”:

Michael Jordan — if he was a llama, according to DALL·E 2. Images generated by author with DALL·E 2 using prompt “film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.”.

Tip: DALL·E 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.

You might have noticed: DALL·E 2 isn’t great at composition.

You’d think that from the context of ‘dunking a basketball,’ it’d be obvious where the relative positions of the llama, ball, and hoop should be. More often than not, the llama dunks the wrong way, or the ball is positioned in such a way that the llama has no real hope of making the shot. Though all the elements of the prompt are there, DALL·E 2 doesn’t truly ‘understand’ the relationship between them. This article covers the topic in more depth.

Image generated by author using DALL·E 2 with prompt: “Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.”

Another artifact of DALL·E 2 not really ‘understanding’ the scene is the occasional mix-up in textures. In the image below, the net is made out of fur (a morbid scene once you think about it):

Image generated by author using DALL·E 2 with prompt: “Expressive photo of a llama wearing a jersey dunking a basketball like Michael Jordan, low angle, extreme wide shot, indoors, dramatic backlighting, high detail.”

DALL·E 2 struggles to generate realistic faces

According to some sources, this may have been a deliberate attempt to avoid generating deepfakes. I thought that would only apply to human subjects, but apparently, it applies to llamas too.

Some of the results were downright creepy.

Image generated by author using DALL·E 2 with prompt: “Dramatic photo of an llama wearing a jersey dunking a basketball like Michael Jordan, low angle, wide shot, indoors, dramatic backlighting, high detail.”

Some other limitations of DALL·E 2

Here are some other minor issues I experienced:

Angles and shots are interpreted loosely

No matter how many variants of ‘in the distance’ or ‘extreme long shot’ I used, it was difficult to find images where the entire llama fit within the frame.

In some cases, the framing was ignored entirely:

Image generated by the author using DALL·E 2 with prompt: “Dramatic film still of a llama wearing a jersey dunking a basketball, low angle, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, indoors, dramatic backlighting, high detail.”

DALL·E 2 can’t spell

I guess this shouldn’t be too surprising given that DALL·E 2 struggles to ‘understand’ the relationship between components. It is, however, capable of attempting some fully formed letters in the right context:

Image generated by author using DALL·E 2 with prompt: “Film still of a fluffy llama in a jersey dunking a basketball like Michael Jordan, low angle, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.”

DALL·E 2 can be temperamental with complex or poorly-worded prompts

Occasionally, adding keywords or phrasing the prompt in certain ways led to results that were completely different from what was expected.

In this case, the real subject of the prompt (llama wearing a jersey) was completely ignored:

Now that is an impressive dunk. Images generated by author using DALL·E 2 with prompt: “A low angle, long shot, indoors, dramatic backlighting, professional photo of a llama wearing a jersey, dunking a basketball.”

Even adding the term ‘fluffy’ led to dramatically worse performance and multiple cases where it looked like DALL·E 2 just… broke:

Images generated by the author using DALL·E 2 with prompt: “Film still of a fluffy llama in a jersey dunking a basketball like Michael Jordan, high detail, indoors, dramatic backlighting.” (Image intentionally modified to blur and hide faces).

In working with DALL·E 2, it’s important to be specific about what you want without over-stuffing or adding redundant words.

DALL·E 2’s ability to transfer styles is impressive

You need to try this!

Once you have your keyword subject matter, you can generate the image in an impressive number of other art styles.

‘Abstract painting of….’

Images generated by the author using DALL·E 2 with prompt: “Abstract painting of a llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, indoors. In the background is a stadium full of people.”

‘Vaporwave’

Images generated by the author using DALL·E 2 with prompt: “Film still of a llama in a jersey dunking a basketball like Michael Jordan, dramatic backlighting, vibrant sunset, vaporwave.”

‘Digital art’

Images generated by the author using DALL·E 2 with prompt: “llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art”

‘Screenshots from the Miyazaki anime movie’

Images generated by the author using DALL·E 2 with prompt: “Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie”. Thanks to the tip in this article.

Final thoughts

After over 100 credits (~US$13) and a lot of trial-and-error, here’s my final image:

My winning image. https://labs.openai.com/s/HYv3Kp8ElKDAWKHq2vs76VXu

The image isn’t perfect, but DALL·E 2 managed to fulfill about 80% of the brief.

Most of the credits went towards trying to get the right combination of style, faces, and composition to work together.

According to OpenAI’s DALL·E announcement,

“…users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise.”

Expect many users to play fast and loose with these rules.

As a content creator, DALL·E 2 will be most useful for creating simple illustrations, photos, and graphics for blogs and websites. I’ll be using it as an alternative to Unsplash to create blog cover images that won’t look the same as everyone else’s.

If you’re about to try out DALL·E 2 yourself, here’s a tl;dr of tips before you start:

Check out the DALL·E 2 Prompt Book! (Also, the fan-made Prompt Engineering Sheet).
Be prepared to do some trial-and-error to get what you want. Fifteen free credits might sound like a lot, but it really isn’t. Expect to use at least 15 credits to generate a usable image. DALL·E 2 is not cheap.
Don’t forget to save your favorite images as you go.

Thanks for reading! I’d love to hear your experience with DALL·E 2 and welcome any thoughts or feedback.

If you enjoyed reading this, here are some articles by other writers you might like as well:

How I used DALL-E 2 to Generate The Logo for OctoSQL by Jacob Martins
How I Used AI to Reimagine 10 Famous Landscape Paintings by Alberto Romero
What DALL-E 2 can and cannot do by Swimmer963

7 Best machine learning communities to advance your skills in 2022

Joy — Wed, 04 May 2022 06:13:17 +0000

At some point during your machine learning journey you may get stuck on a problem, start to lose motivation, or find yourself unable to keep up the rapid rate of new developments. In these situations, I find communities have a lot to offer regardless of your skill level.

There are tons of communities out there — however, many are no longer active or not well-moderated. To help streamline your search, I’ve curated a list of (what I think) are some of the most active, helpful, and interesting communities to check out — not just based on overall size. I’ve also included a couple of niche communities if you’re interested in discovering new topics to explore.

Did I miss any that you’d recommend? I’m actively on the look-out for other communities to continually improve this article. Let me know in the comments!

1. For general discussion and latest news: r/machinelearning

Reddit is home to a whole host of forums (known as subreddits) covering various aspects of machine learning. Of these, r/machinelearning is the go-to subreddit with over 2 million members sharing machine learning projects, latest research, and discussions. It’s well-moderated and regularly contributed to by industry veterans, meaning you’ll find plenty of quality content here.

If you’re looking for something a bit more beginner-friendly, I’d recommend checking out r/learnmachinelearning instead. This is where you can ask beginner questions and share beginner projects for feedback (they also have a Discord server).

Some other related subreddits you might find useful include:

r/datascience (500K+ members) — discussion on data science careers
r/artificial (145K+ members) — general AI news
r/reinforcementlearning (20K+ members) — focused on reinforcement learning

2. For competitions: Kaggle

Kaggle is the biggest data science competition platform. They partner with businesses to run challenges made up of a dataset and problem statement for anyone in the world to solve. Challenge topics vary from computer vision to stock exchange predictions. Joining a competition and contributing to the Kaggle Forums can be a useful way to collaborate with others working on the same project as you. Here you’ll be able to discuss approaches, algorithms, and advice for feature engineering.

If the current competitions on Kaggle aren’t to your liking, some other data science competition platforms worth checking out include AICrowd, Omdena, MachineHack, DrivenData, and Zindi.

3. For getting started: Learn AI Together Discord

Learn AI together has over 24,000 members and is one of the largest AI communities on Discord. The community is managed by Louis at What’s AI — a YouTube channel dedicated to beginner-friendly resources on getting started in machine learning. There’s a huge list of discussion topics from AGI to Kaggle competitions to healthcare (30+ and counting!), and dedicated sections to ask questions and share resources on the latest news and events.

4. For NLP: Hugging Face

Hugging Face started originally with open-source tools for NLP projects, but has since expanded into fields such as Computer Vision and Reinforcement Learning. On the Hugging Face platform you can download and share models, and discuss projects on their Discord or Forum. If you’re having trouble figuring what type of project to build, heading over to Hugging Face may be a great source of inspiration.

5. For reinforcement learning: Reinforcement Learning Discussion Discord

Reinforcement Learning Discussion is an active Discord server with over 3,000 members. It’s managed by researchers in the reinforcement learning field, and they’re particularly friendly and catering for beginners. It can be a great place to ask questions on popular courses such as DeepMind’s Reinforcement Learning lectures or Spinning Up by OpenAI, share progress and experiments with public reinforcement learning environments, and stay up-to-date on the latest research (many authors will share their latest papers directly in the server).

6. For DALL-E and similar generative projects: LAION

LAION is a not-for-profit community whose main goal is to work together to replicate OpenAI’s DALL-E. They have an active Discord server with over 3,000 members at the time of writing. It’s a great place to keep up with (and contribute to) the open-source project, discuss related audio/video/3D topics, and share your own generative project for feedback.

7. For AI in gaming: StartCraft II AI Arena

If the achievements of Deepmind’s AlphaStar or OpenAI’s Dota 2 AI brought you into the space, you might be interested in checking out AI Arena. They’re a community of researchers, practitioners, and hobbyists building both scripted and deep learning agents for StarCraft II. They have an open Discord for meeting others, run regular community streams on Twitch, and provide getting started resources for creating your own agent to enter their ranked tournament ladders.

Closing Remarks

I hope that this list has helped you find a new community to meet others on a similar journey and take your skills to the next level. Join one or try them all to see what suits you best. Good luck!

7 real-world applications of reinforcement learning

Joy — Thu, 17 Feb 2022 03:30:33 +0000

Reinforcement learning is a subdomain of machine learning in which agents learn to make decisions by interacting with their environment. It recently gained popularity through its ability to achieve superhuman-levels of play in games like Go, Chess, Dota, and StarCraft II.

In this article, I’ve put together a list of 7 examples where reinforcement learning is being applied in real-world use cases.

1. Autonomous driving with Wayve

Approaches to self-driving cars have historically involved defining logic rules. This can be difficult to scale out to the countless number of situations that might be encountered by autonomous vehicles on public roads. This is where deep reinforcement learning may be promising.

Wayve is a UK-based company that has been testing autonomous vehicles on public roads since 2018. In their paper, 'Learning to Drive in a Day', they describe how they used deep reinforcement learning to train a model using a monocular image as input. The reward was the distance travelled by the vehicle without the safety driver taking control. The model was trained in a driving simulation and then deployed in the real world on a 250-meter section of road.

While their autonomous vehicle technology continues to evolve, they claim that reinforcement learning continues to play a part in motion planning (ensuring the existence of a feasible path between the target and destination points).

2. Personalizing your Netflix recommendations

Netflix has 200 million users in over 190 countries. For each of these users, Netflix aims to present the most entertaining and relevant videos. In the presentation 'Netflix Explains Recommendations and Personalization' by Justin Basilico (Director of Machine Learning and Recommender Systems at Netflix), he describes how they achieve this by combining four key approaches: deep learning, causality, bandits & reinforcement learning, and objectives.

The challenge is to train a model that optimizes for a user’s long-term satisfaction, over immediate gratification. Reinforcement learning can help by introducing exploration which lets the model learn about new interests over time.

Justin notes that reinforcement learning is challenging to apply in this setting due to the high dimensionality and large problem space. To help with this, the team developed Accordion — a simulator for long-term training.

3. Optimizing inventory levels for Walmart

Walmart is the world's largest retailer and grocer with over 4,650 stores. Walmart must constantly move unsold inventory to make space for new and better-selling items. The usual strategy to move unwanted stock is to implement price reduction. This is a time-consuming and laborious undertaking that requires re-labelling discounted merchandise multiple times on a store-by-store basis.

To reduce operating costs, Walmart created an algorithm to optimize price reductions. The algorithm ingests data including sales data, operating costs, number and type of merchandise, and the dynamic time frame for when the merchandise must be sold by.

The approach applies data analytics, reinforcement learning, and dynamic optimization to make automated decisions for each individual product, and is tailored to each store. The result is lowered operating costs and increased sales, with some stores experiencing up to 15% higher sales of the stock to be moved.

4. Improving search engine results with search.io

Search.io is an AI search engine for on-site search queries. They use both 'learn-to-rank' and reinforcement learning techniques to improve their search ranking algorithm.

Learn-to-rank involves using a machine learning model trained on a dataset of query-result pairs scored based on their relevance. One disadvantage of this technique is that the inputs (query-result pair scores) remain static.

Reinforcement learning helps to improve the search algorithm over time using feedback in the form of clicks, sales, signups, etc. The challenge with applying reinforcement learning in this setting is that the search result quality typically starts out low, and needs time and data before it starts to meet customer expectations.

5. Improving language models with OpenAI's WebGPT

GPT-3 is a language model used to generate human-like text. A downside of these language models is the tendency to 'hallucinate' information when performing tasks that require obscure real-world knowledge. To improve this, OpenAI taught GPT-3 to use a text-based web browser. The model is able to search and collect information from web pages, and use these to compose answers to open-ended questions.

The model is initially trained using human demonstrations. From there, the helpfulness and accuracy of the model are improved by training a reward model to predict human preferences. The system is then optimized against this reward model using either reinforcement learning or rejection sampling. The result was that the system was found to be more 'truthful' than GPT-3.

6. Trading on the financial markets with IBM's DSX platform

There has been reluctance in the financial industry to apply machine learning due to the high monetary risks. In this article, IBM describes a trading system trained with reinforcement learning.

The advantage of reinforcement learning in this setting is the ability to learn to make predictions that account for whatever effects the algorithm’s actions have had on the state of the market. This feedback loop allows the algorithm to auto-tune over time, continually making it more powerful and adaptable. The reward function is based on the profit or loss made in each trade.

The model was assessed against a Buy-and-Hold strategy and ARIMA-GARCH (a forecasting model). They found that the model was able to capture head-and-shoulder patterns, which is a non-trivial feat.

7. Robotics with the University of California, Berkeley

Developing controllers for robotics is a challenging task. Typical methods include careful modelling, but can be prone to failure when exposed to unexpected situations and environments.

A team at the University of California, Berkeley tried to address this by training a real bipedal robot using reinforcement learning. The team was able to develop a model that resulted in a more diverse and robust walking control of a robot named Cassie.

The deployed model was able to perform various behaviours such as changing walking heights, fast walking, walking sideways and turning in the real world. It was also robust to changes in the robot itself (e.g. partially damaged motors) and the environment (e.g. changes in ground friction and being pushed from different directions). You can watch Cassie in action in this video.

Conclusion

While reinforcement learning applications in the real world are still in their early days, I hope this list highlights the potential of the technology and the exciting progress that has already taken place so far. Who knows what else we might see in the next few years with ongoing developments in data collection, simulations, processing power, and research?

If the field of reinforcement learning excites you, here are some of my other articles you might find useful:

Thanks for reading!

Active and upcoming reinforcement learning competitions

Joy — Thu, 28 Oct 2021 09:05:39 +0000

Reinforcement learning (RL) is a subdomain of machine learning which involves agents learning to make decisions by interacting with their environment. While popular competition platforms like Kaggle are mainly suited for supervised learning problems, RL competitions are harder to come by.

In this post, I've compiled a list of 7 ongoing and annual competitions which are suitable for RL.

Criteria: any active (or upcoming) event or platform which involves a large number of individuals/teams competing for some form of incentive (e.g. prize money, co-authorships, leaderboard ranking etc.).

For AI competitions that are not necessarily tailored for RL, you might be interested in the list 15 Active AI Game Competitions.

1. AWS DeepRacer (2018 —, ongoing competition)

AWS DeepRacer is a beginner-friendly 3D racing simulator aimed at helping developers get started with RL. Participants can train models on Amazon SageMaker (first 10 hours are free) and enter monthly competitions in the form of an ongoing AWS DeepRacer League.

The AWS DeepRacer League is run in time trial format (although other challenges such as head-to-head racing exist). Top racers win prizes including merchandise, customizations, and an expenses-paid trip to Las Vegas to attend AWS re:invent for the Championship Cup. Participants can also win or purchase a physical 1/18th scale race car for USD399 to test their models in the real-world.

2. AIArena (2016 —, ongoing competition)

You might remember when AlphaStar reached Grandmaster status and beat two of the world's top players in StarCraft II in 2019. StarCraft II was originally open-sourced in 2017 by Blizzard to accelerate AI research in highly complex environments.

You can still get involved with training deep RL agents in StarCraft II with the community at AIArena. They run an ongoing ranked ladder where you can compete head-to-head against other teams. Matches are 24/7 livestreamed to Twitch, with occasional community stream events.

For original StarCraft, you can also check out:

SCHNAIL: Human vs AI competitions
SSCAIT: Student StarCraft AI Tournament

3. Bomberland (2020—, ongoing competition)

Bomberland is our own machine learning competition based on the classic console game, Bomberman. Teams build agents which compete head-to-head in an ongoing competition against other teams.

The Bomberland environment is challenging for out-of-the-box machine learning, requiring planning, real-time decision making, and navigating both adversarial and cooperative play.

The competition officially starts 3rd December 2021. Top teams win prizes including merchandise, customizations, cash, and are featured on the finale Twitch livestream.

4. Flatland (2019—, annual competition)

Flatland is an annual competition featured as part of NeurIPS 2020. It is designed to tackle the problem of efficiently managing dense traffic on complex railway networks. The goal is to construct the best schedule that minimizes the delay in the requested arrival time of all trains.

The 2021 competition is currently being run on the AICrowd platform. Submissions are evaluated and ranked according to the total reward accumulated in a controlled setting. RL approaches are encouraged, with a separate prize track for RL submissions. Prizes this year include drones and VR headsets.

5. MineRL (2019—, annual competition)

MineRL is concerned with the development of sample-efficient deep RL algorithms which can solve hierarchical, sparse reward environments using human demonstrations in Minecraft.

Participants have access to a large imitation learning dataset of over 60 million frames of recorded human player data in Minecraft. The goal is to develop systems that can complete tasks such as obtaining a diamond, building a house, searching for a cave, etc.

The competition has been running as part of NeurIPS from 2019 — 2021 on AICrowd. Prizes include co-authorships and over $10,000 cash.

6. NetHack (2020—, annual competition)

NetHack is another annual competition at NeurIPS 2021 held on AICrowd. Teams compete to build the best agents to play NetHack, an ASCII-rendered single-player dungeon crawl game. NetHack features procedurally-generated levels, with hundreds of complex scenarios, making it an extremely challenging environment for current state-of-the-art RL.

Like Flatland and MineRL, submissions are ranked on a leaderboard based on score in a controlled test setting. The competition this year features a $20,000 USD cash prize pool. RL approaches are encouraged, but non-RL approaches are also accepted.

7. CompilerGym (2021—, leaderboard)

CompilerGym is actually a toolkit for applying reinforcement learning to compiler optimizations, rather than a competition. However, users can submit algorithms to the public repo leaderboard with their write-up and results.

Bonus: competition platforms and conferences

I prioritized competitions that are ongoing or run regularly for this list. Another good way to keep track of running competitions is to follow the competition platforms and conferences they are run as part of. Here's some worth keeping your eye on:

AICrowd: Runs a combination of supervised ML competitions as well as RL competitions.
Kaggle: Mainly supervised ML/data science competitions, but also feature simulation competitions which can be good problems for RL.
NeurIPS: Annual conference with a competition track for various machine learning competitions
IEEE CoGs: Annual conference with a competition track, specifically for research in games.

Closing remarks

I hope this list has helped you find an interesting competition to check out and practise reinforcement learning in. As new competitions come and go, I'll aim to keep this list up-to-date. Good luck!

Competitive self-play with Unity ML-Agents

Joy — Fri, 22 Oct 2021 06:47:04 +0000

An overview of self-play

Competitive self-play involves training an agent against itself. It was used in famous systems such as AlphaGo and OpenAI Five (Dota 2). By playing increasingly stronger versions of itself, agents can discover new and better strategies.

In this post, we walk through using competitive self-play in Unity ML-Agents to train agents to play volleyball. This article is also part 5 of the series 'A hands-on introduction to deep reinforcement learning using Unity ML-Agents'.

The case for self-play

We previously trained agents using PPO with the following setup:

Symmetric environment
Both agents shared the same policy
Observations: velocity, rotation, and position vectors of the agent and ball
Reward function: +1 for hitting the ball over the net

This resulted in agents that were able to successfully volley the ball back-and-forth after ~20M training steps:

You can see that the agents make 'easy' passes by aiming the ball towards the centre of the court. This is because we set the reward function to incentivize keeping the ball in play.

Our aim now is to train competitive agents that are rewarded for winning (i.e. landing the ball in the opponent's court). We expect this will lead to agents that learn interesting strategies and make passes that are harder to return.

Self-play setup in ML-Agents

To follow along this section, you will need:

Unity ML-Agents Release 18+ (getting started instructions)
The latest version of the Ultimate Volleyball repo (or, you can use your own volleyball environment if you've been following the tutorial series)

Step 1: Put the agents on opposing teams

Open the Ultimate Volleyball environment in Unity
Open Assets > Prefabs > 2PVolleyballArea.prefab
Select either the PurpleAgent or BlueAgent object
In Inspector > Behavior Parameters, set TeamId to 1 (the actual value doesn't matter, as long as the PurpleAgent and BlueAgent have different Team ID's):

Step 2: Set up the self-play reward function

Our previous reward function was +1 for hitting the ball over the net.

For self-play, we'll switch to:

+1 to the winning team
-1 to the losing team

Open VolleyballEnvController.cs and add the rewards to the ResolveEvent() method:

case Event.HitBlueGoal:
    // blue wins
    blueAgent.AddReward(1f);
    purpleAgent.AddReward(-1f);

    // turn floor blue
    StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.blueGoalMaterial, RenderersList, .5f));

    // end episode
    blueAgent.EndEpisode();
    purpleAgent.EndEpisode();
    ResetScene();
    break;

case Event.HitPurpleGoal:
    // purple wins
    purpleAgent.AddReward(1f);
    blueAgent.AddReward(-1f);

    // turn floor purple
    StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.purpleGoalMaterial, RenderersList, .5f));

    // end episode
    blueAgent.EndEpisode();
    purpleAgent.EndEpisode();
    ResetScene();
    break;

Remove AddReward from the other cases
You can also set penalties for hitting the ball out of the court (in case Event.HitOutOfBounds). From my experience, this may take longer for the agents to learn to hit the ball.

Step 3: Add self-play training parameters to the trainer config

Create a new .yaml file and copy in the following:

behaviors:
  Volleyball:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0002
      beta: 0.003
      epsilon: 0.15
      lambd: 0.93
      num_epoch: 4
      learning_rate_schedule: constant
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.96
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 80000000
    time_horizon: 1000
    summary_freq: 20000
    self_play:
      window: 10
      play_against_latest_model_ratio: 0.5
      save_steps: 20000
      swap_steps: 10000
      team_change: 100000

Explaining self-play parameters

During self-play, one of the agents will be set as the learning agent and the other as the fixed policy opponent.

Every save_steps=20000 steps, a snapshot of the learning agent's existing policy will be taken. Up to window=10 snapshots will be stored. When a new snapshot is taken, the oldest one is discarded. These past versions of itself become the 'opponents' that the learning agent trains against.

Every swap_steps=10000 steps, the opponent's policy will be swapped with a different snapshot. The snapshot is sampled with a probability of play_against_latest_model_ratio=0.5 that it will play against the latest policy (i.e. the strongest opponent). This helps to prevent overfitting to a single opponent playstyle.

After team_change=100000 steps, the learning agent and opponent teams will be switched.

Feel free to play around with these default hyperparameters (more information available in the official ML-Agents documentation).

Training with self-play

Training with self-play in ML-Agents is done the same way as any other form of training:

Activate the virtual environment containing your installation of ml-agents.
Navigate to your working directory, and run in the terminal:

mlagents-learn <path to config file> --run-id=VB_1 --time-scale=1

When you see the message "Start training by pressing the Play button in the Unity Editor", click ▶ within the Unity GUI.
In another terminal window, run tensorboard --logdir results from your working directory to observe the training process.

Self-play training results

In a stable training run, you should see the ELO gradually increase.

In the diagram below, the three inflexion points correspond to the agent:

Learning to serve
Learning to return the ball
Learning more competitive shots

Compared to our previous training results, I found that even after ~80M steps, the agents trained using self-play don't serve or return the ball as reliably. However, they do learn to hit some interesting shots, like hitting the ball towards the edge of the court:

If you discover any other interesting playstyles, let me know!

Wrap-up

Thanks for reading! I hope you found this post useful.

If you have any feedback or questions, feel free to post them on the Ultimate Volleyball Repo.

8+ Reinforcement Learning Project Ideas

Joy — Thu, 30 Sep 2021 07:00:29 +0000

This blog post is a compilation of reinforcement learning (RL) project ideas to check out. I've tried to select projects covering a range of different difficulties, concepts, and algorithms in RL.

1. Solve toy problems with OpenAI Gym (beginner-friendly)

OpenAI Gym has become the de facto standard for reinforcement learning frameworks among researchers and practitioners. Solving toy problems from the gym library will help familiarize you with this popular framework. Good starting points include Cartpole, Lunar Lander and Taxi.

If you're interested in a step-by-step walkthrough, check out our introductory Q-learning tutorial with Taxi.

2. Play Atari games from pixel input with OpenAI Gym

OpenAI Gym also contains a suite of Atari game environments as part of its Arcade Learning Environment (ALE) framework. Examples include Breakout, Montezuma Revenge, and Space Invaders. Environment observations are available in the form of screen input or RAM (direct observation of the Atari 2600's 1024 bits of memory).

Additional resources:

Jupyter notebook tutorial for Space Invaders by Thomas Simonini

3. Simulate control tasks with PyBullet

Gym provides a library of continuous physics simulations in the form of its MuJoCo environments. Since MuJoCo requires a paid license, I recommend checking out PyBullet as a free open-source alternative. Using PyBullet/MuJoCo, you can teach a variety of robots to walk, run, or swim.

4. Create your own reinforcement learning environment with Unity ML-Agents (beginner-friendly)

Unity ML-Agents is a relatively new add-on to the Unity game engine. It allows game developers to train intelligent NPCs for games and enables researchers to create graphics- and physics-rich RL environments. Project ideas to explore include:

Experimenting with algorithms like PPO, SAC, GAIL, and Self-Play provided out-of-the-box
Training agents in a library of 18+ environments including Dodgeball, Soccer, and classic control problems
Creating your own custom 3D RL environment

Additional resources:

5. Race self-driving cars with AWS DeepRacer (beginner-friendly)

AWS DeepRacer is a 3D racing simulator designed to help developers get started with RL using Amazon SageMaker. You'll need to pay for training and evaluating your model on AWS. It features monthly competitive races as part of the AWS DeepRacer league, which awards prizes and the chance to compete at re:Invent.

DeepRacer also gives you the option of purchasing a physical 1/18th scale race car for USD399 that will allow you to deploy your model in the real-world.

Some other open-source projects relating to autonomous driving to check out:

AirSim
CARLA

6. Mine diamonds in Minecraft with MineRL

MineRL contains an imitation learning dataset of over 60 million frames of recorded human player data in Minecraft. The goal is to train agents that can navigate an open world and overcome inherent challenges such as tasks with lots of hierarchy and sparse rewards.

MineRL is currently running two competition tracks as part of NeurIPS 2021:

Diamond: Obtain a diamond provided a fixed limit of raw pixel sample data and time training
BASALT: Solve almost-lifelike tasks (e.g. build a house, search for a cave)

7. Join the community at AIArena building agents for StarCraft II

If you're looking to train agents to play highly complex mainstream games, you should check out AIArena. They run regular streams and ladders for a community of researchers, practitioners, and hobbyists building deep learning agents for StarCraft II.

Some other games with RL frameworks you might be interested in:

8. Build a Chess Bot with OpenSpiel

OpenSpiel by DeepMind is worth taking a look at if you've been inspired by programs like StockFish or AlphaGo. It contains a collection of environments and algorithms for general RL and planning/search in a variety of games including Chess, Go, Backgammon, and more.

Bonus ideas

Here are some additional project ideas that are also worth checking out:

Predict stock prices with TensorTrade
Train cooperative agents with PettingZoo
Build a Poker bot with RLCard
Join an AI Programming competition

Closing remarks

There's a huge range of exciting projects to explore in reinforcement learning. This list is by no means comprehensive, but I hope it's given you some inspiration for your own RL project!

How to train agents to play volleyball using deep reinforcement learning

Joy — Thu, 23 Sep 2021 01:48:10 +0000

This article is part 4 of the series 'A hands-on introduction to deep reinforcement learning using Unity ML-Agents'. It's also suitable for anyone interested in using Unity ML-Agents for their own reinforcement learning project.

Recap and overview

In parts 2 and 3, we built a volleyball environment using Unity ML-Agents.

To recap, here is the reinforcement learning setup:

Agent actions (4 discrete branches):
- Move forward/backward
- Rotate clockwise/anti-clockwise
- Move left/right
- Jump
Agent observations:
- Agent's y-rotation [1 float]
- Agent's x,y,z-velocity [3 floats]
- Agent's x,y,z-normalized vector to the ball (i.e. direction to the ball) [3 floats]
- Ball's x,y,z-velocity [3 floats]
Reward function: +1 for hitting the ball over the net

In this tutorial, we'll use ML-Agents to train these agents to play volleyball using the PPO reinforcement learning algorithm.

A note on PPO

Proximal Policy Optimization (PPO) by OpenAI is an on-policy reinforcement learning algorithm. We won't go into detail, but we choose to use it here because ML-Agents provides an implementation of it out-of-the-box. It produces stable results in this environment and is also recommended by ML-Agents for use with Self-Play (which we'll cover in the next tutorial).

Setting up for training

If you didn't follow along with the previous tutorials, you can clone or download a copy of the volleyball environment here:

Ultimate Volleyball Repo

If you did follow along with the previous tutorials:

Load the Volleyball.unity scene
Select the VolleyballArea object
Ctrl (or CMD) + D to duplicate the object
Position the VolleyballArea objects so that they don't overlap
Repeat 2 - 4 until you have ~16 copies of the environment

Each VolleyballArea object is an exact copy of the reinforcement learning environment. All these agents act independently but share the same model. This speeds up training, since all agents contribute to training in parallel.

Selecting hyperparameters

In your project working directory, create a file called Volleyball.yaml. If you've downloaded the full Ultimate-Volleyball repo earlier, this is located in the config folder.

Volleyball.yaml is a trainer configuration file that specifies all the hyperparameters and other settings used during training. Paste the following inside Volleyball.yaml:

behaviors:
  Volleyball:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0002
      beta: 0.003
      epsilon: 0.15
      lambd: 0.93
      num_epoch: 4
      learning_rate_schedule: constant
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.96
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 20000000
    time_horizon: 1000
    summary_freq: 20000

Descriptions of the configurations are available in the ML-Agents official documentation.

Training

Make sure that Behavior Types are set to Default:
1. Open Assets > Prefabs > VolleyballArea.prefab
2. Select the PurpleAgent object
3. Go to Inspector window > Behavior Parameters > Behavior Type > Set to Default
4. Repeat for Blue Agent

Note: the Behavior Name (Volleyball) above must match the behavior name in the Volleyball.yaml trainer config file (line 2).

(Optional) Set up a training camera so that you can view the whole scene while training.
- If using the pre-built repo, select the Main Camera and turn it off in the Inspector.
- If using your own project, create a camera object: right click in Hierarchy > Camera.
Activate the virtual environment containing your installation of ml-agents.
Navigate to your working directory, and run in the terminal:

mlagents-learn <path to config file> --run-id=VB_1 --time-scale=1

Notes:
- Replace <path to config file> , e.g. config/Volleyball.yaml
- ML-Agents defaults to a time scale of 20x to speed up training. Setting the flag --time-scale=1 is important because the physics in this environment are time-dependant. Without it, you may notice that your agents perform differently during inference compared to training.

When you see the message "Start training by pressing the Play button in the Unity Editor", click ▶ within the Unity GUI.
In another terminal window, run tensorboard --logdir results from your working directory to observe the training process.

You can pause training at any time by clicking the ▶ button in Unity. To see how the agents are performing:
1. Locate the results in results/VB_1/Volleyball.onnx
2. Copy this .onnx model into the Unity project
3. Drag the model into the Model field of the Behavior Parameters component.
4. Click ▶ to watch the agents use this model for inference.
To resume training, add the --resume flag (e.g. mlagents-learn config/Volleyball.yaml --run-id=VB_1 --time-scale=1 --resume)

Leave the agents to train. At about ~5M you'll start to see the agents occasionally touching the ball. At ~10M the agents can start to volley:

At ~20M steps, the agents should be able to successfully volley the ball back-and-forth!

Next steps

In this tutorial, you successfully trained agents to play volleyball in ~20M steps using PPO. Try playing around with the hyperparameters in Volleyball.yaml or training for more steps to get a better result.

These agents are trained to keep the ball in the play. You won't be able to train competitive agents (with the intention of winning the game) with this setup because its a zero-sum game and both purple and blue agents share the same model. This is where competitive Self-Play comes in.

Bomberland: a competitive sandbox for practising machine learning

Joy — Mon, 20 Sep 2021 00:48:51 +0000

Welcome to Bomberland

Bomberland is a new 1v1 AI competition developed by Coder One. It features a multi-agent adversarial environment inspired by the classic console game, Bomberman.

Your task is to program an intelligent agent navigating a 2D grid world. Your agent controls a team of units collecting powerups and placing explosives, with the ultimate goal of taking your opponent down.

Bomberland is a challenging problem for out-of-the-box machine learning algorithms. Be prepared to manage real-time decision making, planning, game theory, and both adversarial and cooperative play.

An open Bomberland arena

Bomberland will feature an ongoing, always-on arena with an active leaderboard. Participants can get direct feedback on their strategies in 1v1 matches against other players.

From time to time, we'll hold tournaments featuring live streams and prizes. Check out our previous AI Sports Challenge streams for a sneak peek of what's ahead.

Why Bomberland?

We're creating Bomberland as a place for the community to explore and experiment with the latest cutting edge technologies from tree search algorithms to deep reinforcement learning.

You'll want to check it out if:

You're looking for a challenging hands-on ML project
You're looking for a place to try out new libraries, frameworks, or research papers
You've been fascinated by the work of companies like DeepMind and OpenAI

The future

We envision Bomberland to evolve over time with new metas and challenges.

Bomberland is part of our larger goal at Coder One to make cutting-edge ML accessible. We're focused on building out the right tools and infrastructure to support the community in progressively pushing the boundaries of what's possible.

Join us for Bomberland!

The Bomberland competition is now live 🎉

We have starter kits in Python and TypeScript to help you get started (and encourage any community contributions to the starter kit repo).

Design reinforcement learning agents using Unity ML-Agents

Joy — Wed, 08 Sep 2021 01:22:22 +0000

This article is part 3 of the series 'A hands-on introduction to deep reinforcement learning using Unity ML-Agents'. It's also suitable for anyone new to Unity interested in using ML-Agents for their own reinforcement learning project.

Recap and overview

In part 2, we built a 3D physics-based volleyball environment in Unity. We also added rewards to encourage agents to 'volley'.

In this tutorial, we'll add agents to the environment. The goal is to let them observe and interact with the environment so that we can train them later using deep reinforcement learning.

Letting our agents make decisions

We want our agent to learn which actions to take given a certain state of the environment — e.g. if the ball is on our side of the court, our agent should get it before it hits the floor.

The goal of reinforcement learning is to learn the best policy (a mapping of states to actions) that will maximise possible rewards. The theory behind how reinforcement learning algorithms achieve this is beyond the scope of this series, but the courses I shared in the series introduction will cover it in great depth.

While training, the agent will either take actions:

At random (to explore which actions lead to rewards and which don't)
From its current policy (the optimal action given the current state)

ML-Agents provides a convenient Decision Requester component which will handle the alternation between these for us during training.

To add a Decision Requester:

Select the PurpleAgent game object (within the PurplePlayArea parent).
Add Component > Decision Requester.
Leave decision period default as 5.

Defining the agent behavior

Both agents are already set up with the VolleyballAgent.cs script and Behavior Parameters component (which we'll come back to later).

In this part we'll walk through VolleyballAgent.cs. This script contains all the logic that defines the agents' actions and observations. It contains some helper methods already:

Start() — called when the environment is first rendered. Grabs the parent Volleyball environment and saves it to a variable envController for easy reference to its methods later.
Initialize() — called when the agent is first initialized. Grabs some useful constants and objects. Also sets agentRot to ensure symmetry so that the same policy can be shared between both agents.
MoveTowards(), CheckIfGrounded() & Jump() — from ML-Agents sample projects. Used for jumping.
OnCollisionEnter() — called when the Agent collides with something. Used to update lastHitter to decide which agent gets penalized if the ball is hit out of bounds or rewarded if hit over the net.

Adding an agent in Unity ML-Agents usually involves extending the base Agent class, and implementing the following methods:

OnActionReceived()
Heuristic()
CollectObservations()
OnEpisodeBegin() (Note: usually used for resetting starting conditions. We don't implement it here, because the reset logic is already defined at the environment-level in VolleyballEnvController. This makes more sense for us since we also need to reset the ball and not just the agents.)

Agent actions

At a high level, the Decision Requester will select an action for our agent to take and trigger OnActionReceived(). This in turn calls MoveAgent().

`MoveAgent()`

This method resolves the selected action.

Within the MoveAgent() method, start by declaring vector variables for our agents direction and rotation movements:

var dirToGo = Vector3.zero;
var rotateDir = Vector3.zero;

We'll also add a 'grounded' check to see whether its possible for the agent to jump:

var grounded = CheckIfGrounded();

The actions passed into this method (actionBuffers.DiscreteActions) will be an array of integers which we'll map to some behavior. It's not important which order we assign them, as long as they remain consistent:

var dirToGoForwardAction = act[0];
var rotateDirAction = act[1];
var dirToGoSideAction = act[2];
var jumpAction = act[3];

In Unity, every object has a transform class that stores its position, rotation and scale. We'll use it to create a vector pointing to the correct direction in which we want our agent to move.

Based on the previous assignment order, this is how we'll map our actions to behaviors:

dirToGoForwardAction: Do nothing [0] | Move forward [1] | Move backward [2]
rotateDirAction: Do nothing [0] | Rotate clockwise [1] | Rotate anti-clockwise [2]
dirToGoSideAction: Do nothing [0] | Move left [1] | Move right [2]
jumpAction: Don't jump [0] | Jump [1]

Add to the MoveAgent() method:

if (dirToGoForwardAction == 1)
    dirToGo = (grounded ? 1f : 0.5f) * transform.forward * 1f;
else if (dirToGoForwardAction == 2)
    dirToGo = (grounded ? 1f : 0.5f) * transform.forward * volleyballSettings.speedReductionFactor * -1f;

if (rotateDirAction == 1)
    rotateDir = transform.up * -1f;
else if (rotateDirAction == 2)
    rotateDir = transform.up * 1f;

if (dirToGoSideAction == 1)
    dirToGo = (grounded ? 1f : 0.5f) * transform.right * volleyballSettings.speedReductionFactor * -1f;
else if (dirToGoSideAction == 2)
    dirToGo = (grounded ? 1f : 0.5f) * transform.right * volleyballSettings.speedReductionFactor;

if (jumpAction == 1)
{
    if (((jumpingTime <= 0f) && grounded))
    {
        Jump();
    }
}

Note:

volleyballSettings.speedReductionFactor is a constant that slows backwards and strafe movement to be more 'realistic'.

Next, apply the movement using Unity's provided Rotate and AddForce methods:

transform.Rotate(rotateDir, Time.fixedDeltaTime * 200f);
agentRb.AddForce(agentRot * dirToGo * volleyballSettings.agentRunSpeed,
    ForceMode.VelocityChange);

Finally, add in the logic for controlling jump behavior:

// makes the agent physically "jump"
if (jumpingTime > 0f)
{
    jumpTargetPos =
        new Vector3(agentRb.position.x,
            jumpStartingPos.y + volleyballSettings.agentJumpHeight,
            agentRb.position.z) + agentRot*dirToGo;

    MoveTowards(jumpTargetPos, agentRb, volleyballSettings.agentJumpVelocity,
        volleyballSettings.agentJumpVelocityMaxChange);
}

// provides a downward force to end the jump
if (!(jumpingTime > 0f) && !grounded)
{
    agentRb.AddForce(
        Vector3.down * volleyballSettings.fallingForce, ForceMode.Acceleration);
}

// controls the jump sequence
if (jumpingTime > 0f)
{
    jumpingTime -= Time.fixedDeltaTime;
}

`Heuristic()`

To test that we've resolved the actions properly, lets implement the Heuristic() method. This will map actions to a keyboard input, so that we can playtest as a human controller.

Add to Heuristic():

var discreteActionsOut = actionsOut.DiscreteActions;
if (Input.GetKey(KeyCode.D))
{
    // rotate right
    discreteActionsOut[1] = 2;
}
if (Input.GetKey(KeyCode.W) || Input.GetKey(KeyCode.UpArrow))
{
    // forward
    discreteActionsOut[0] = 1;
}
if (Input.GetKey(KeyCode.A))
{
    // rotate left
    discreteActionsOut[1] = 1;
}
if (Input.GetKey(KeyCode.S) || Input.GetKey(KeyCode.DownArrow))
{
    // backward
    discreteActionsOut[0] = 2;
}
if (Input.GetKey(KeyCode.LeftArrow))
{
    // move left
    discreteActionsOut[2] = 1;
}
if (Input.GetKey(KeyCode.RightArrow))
{
    // move right
    discreteActionsOut[2] = 2;
}
discreteActionsOut[3] = Input.GetKey(KeyCode.Space) ? 1 : 0;

Save your script and return to the Unity editor.

In the Behavior Parameters component of the PurpleAgent:

Set Behavior Type to Heuristic Only. This will call the Heuristic() method.
Set up the Actions:
1. Discrete Branches = 4
  1. Branch 0 Size = 3 [No movement, move forward, move backward]
  2. Branch 1 Size = 3 [No movement, move left, move right]
  3. Branch 2 Size = 3 [No rotation, rotate clockwise, rotate anti-clockwise]
  4. Branch 4 Size = 2 [No Jump, jump]

Press ▶️ in the editor and you'll be able to use the arrow keys (or WASD) and space bar to control your agent!

Note: It might be easier to playtest if you comment out the EndEpisode() calls in ResolveEvent() of VolleyballEnvController.cs to stop the episode resetting.

Observations

Observations are how our agent 'sees' its environment.

In ML-Agents, there are 3 types of observations we can use:

Vectors — "direct" information about our environment (e.g. a list of floats containing the position, scale, velocity, etc of objects)
Raycasts — "beams" that shoot out from the agent and detect nearby objects
Visual/camera input

In this project, we'll implement vector observations to keep things simple. The goal is to include only the observations that are relevant for making an informed decision.

With some trial and error, here's what I decided to use for observations:

Agent's y-rotation [1 float]
Agent's x,y,z-velocity [3 floats]
Agent's x,y,z-normalized vector to the ball (i.e. direction to the ball) [3 floats]
Ball's x,y,z-velocity [3 floats]

This is a total of 11 vector observations. Feel free to experiment with different observations. For example, you might've noticed that the agent knows nothing about its opponent. This ends up working fine for training a simple agent that can bounce the ball over the net, but won't be great at training a competitive agent that wants to win.

Also note that selecting observations depends on your goal. If you're trying to replicate a 'real world' scenario, these observations won't make sense. It would be very unlikely for a player to 'know' these direct values about the environment .

To add observations, you'll need to implement the Agent class CollectObservations() method:

public override void CollectObservations(VectorSensor sensor)
{
    // Agent rotation (1 float)
    sensor.AddObservation(this.transform.rotation.y);

    // Vector from agent to ball (direction to ball) (3 floats)
    Vector3 toBall = new Vector3((ballRb.transform.position.x - this.transform.position.x)*agentRot, 
    (ballRb.transform.position.y - this.transform.position.y),
    (ballRb.transform.position.z - this.transform.position.z)*agentRot);

    sensor.AddObservation(toBall.normalized);

    // Distance from the ball (1 float)
    sensor.AddObservation(toBall.magnitude);

    // Agent velocity (3 floats)
    sensor.AddObservation(agentRb.velocity);

    // Ball velocity (3 floats)
    sensor.AddObservation(ballRb.velocity.y);
    sensor.AddObservation(ballRb.velocity.z*agentRot);
    sensor.AddObservation(ballRb.velocity.x*agentRot);
}

Now we'll finish setting up the Behavior Parameters:

Set Behavior Name to 'Volleyball'. Later, this is how our trainer will know which agent to train.
Set Vector Observation:
1. Space Size: 11
2. Stacked Vectors: 1

Wrap-up

You're now all set up to train your reinforcement learning agents.

If you get stuck, check out the pre-configured BlueAgent , or see the full source code in the Ultimate Volleyball project repo.

In the next section, we'll train our agents using PPO — a state of the art RL algorithm provided out-of-the-box by Unity ML-Agents.

If you have any feedback or questions, please let me know!

Build a reinforcement learning environment using Unity ML-Agents

Joy — Thu, 02 Sep 2021 01:16:57 +0000

This article is part 2 of the series 'A hands-on introduction to deep reinforcement learning using Unity ML-Agents'. It's also suitable for anyone new to Unity interested in using ML-Agents for their own reinforcement learning project.

Recap and overview

In my previous post, I went over how to set up ML-Agents and train an agent.

In this article, I'll walk through how to build a 3D physics-based volleyball environment in Unity. We'll use this environment later to train agents that can successfully play volleyball using deep reinforcement learning.

Setting up the court

Download or clone the starter project from this repo.
Open Unity Hub and go to Projects > Add.
Select the 'ultimate-volleyball-starter' project folder. You might see some warning messages in the Console but they are safe to ignore for now.
From the Project tab in Unity, navigate to Assets > Scenes.
Load the Volleyball.unity scene.
In the Project tab go to Assets > Prefabs and drag the VolleyballArea.prefab object into the scene.
Save the project.

If you click Play ▶️ above the Scene viewer you'll notice some weird things happening because we haven't added any physics or logic to define how the game objects should interact yet. We'll do that in the next section.

Setting up the environment

⚠ Before we start, open the VolleyballArea prefab (Project panel > Assets > Prefabs). We'll make our edits to the base prefab, so that they are reflected in all instances of this prefab. This will come in handy later when we duplicate our environment multiple times for parallel training.

Volleyball

Make our volleyball subject to Unity's physics engine:

In the Hierarchy panel, expand the VolleyballArea object and select the Volleyball.
From the Inspector panel, set the tag to ball.
Click Add Component > RigidBody.
Set mass = 3, drag = 1 and angular drag = 1. Feel free to play around with default values. A heavier ball will make the environment 'harder'.

Add 'bounciness' to our ball:

Add a Sphere Collider component.
Set Radius to 0.15.
From the Project panel, go to Assets > Materials > Physic Materials.
Drag Bouncy.physicMaterial into the 'Material' slot.
You can double-click Bouncy.physicMaterial to change the 'bounciness'.

Both blue and purple agent cubes have already been set up for you in a similar way to the Volleyball.

Ground

Select the Ground game object
From the Inspector panel, set the tag to walkableSurface. This is used later to check whether or not the agent is 'grounded' for its jump action.
Add a Box Collider component. This is used to register collisions with other game objects containing Rigid Body components. Without it, they will just fall through the ground.

Goals

Goals are represented by a thin layer on top of the ground.

Expand the BluePlayArea and PurplePlayArea parent objects.
Add a Box Collider to both the BlueGoal and PurpleGoal game objects.
Check the 'Is Trigger' box for both goals.

When a game object is set as a trigger, it no longer registers any physics-based collisions. Even though the goals are placed above the ground layer, technically the agents are moving on the Ground layer collider we created earlier.

Setting triggers allows us to use the OnTriggerEnter method later which will detect when a ball has hit the collider.

Net

Select the Net game object within VolleyballNet.
Add a Box Collider.
Click the 'Edit Collider' icon.
Click and drag the bottom node of the green collider so that it covers the entire height of the net. Feel free to play around with the thickness. The intention here is to create a physical 'blocker' that will prevent the ball from going under or around the net.

💡 Some shortcuts: Alt+click to rotate, middle-click to pan, middle mouse wheel to zoom in/out.

Boundaries

There are three invisible boundaries:

OuterBoundaries (checks for ball going out of bounds)
BlueBoundary (checks for ball going into the blue side of court)
PurpleBoundary (checks for ball going into the purple side of court)

Colliders, tags, and triggers for these boundaries have already been set up for you.

Scripting the environment

In this section, we'll add scripts that define the environment behavior (e.g. what happens when the ball hits the floor or when the episode starts).

`VolleyballSettings.cs`

Our first script will simply hold some constants that we'll reuse throughout the project.

Go back to the Volleyball Scene and select the VolleyballSettings game object.
In the Inspector, you'll see a Script component attached. Double click the VolleyballSettings script to open it in your IDE of choice.
You should see the following:

public float agentRunSpeed = 1.5f;
public float agentJumpHeight = 2.75f;
public float agentJumpVelocity = 777;
public float agentJumpVelocityMaxChange = 10;

// Slows down strafe & backward movement
public float speedReductionFactor = 0.75f;

public Material blueGoalMaterial;
public Material purpleGoalMaterial;
public Material defaultMaterial;

// This is a downward force applied when falling to make jumps look less floaty
public float fallingForce = 150;

Note: there is also a ProjectSettingsOverride.cs script provided. This contains additional default settings related to time-stepping and resolving physics.

Go back to the Unity editor and select the VolleyballSettings game object. You should see that these variables are available in the Inspector panel.

`VolleyballController.cs`

This script is attached to the Volleyball game object and lets us detect when the ball has hit our boundary or goal trigger.

Open the VolleyballController.cs script attached to the Volleyball.
At the start of our VolleyballController : MonoBehaviour class (above the Start() method), declare the variables:

[HideInInspector]
public VolleyballEnvController envController;

public GameObject purpleGoal;
public GameObject blueGoal;
Collider purpleGoalCollider;
Collider blueGoalCollider;

Save the script.
In the Unity editor, click the Volleyball game object.
Drag the PurpleGoal game object into the Purple Goal slot in the Inspector.
Drag the BlueGoal game object into the Blue Goal slot in the Inspector.

This will allow us to access their child objects later.

Start()

This method is called when the environment is first rendered. It will:

Fetch the PurpleGoal & BlueGoal Colliders themselves (the components that register physics-based collisions) using the GetComponent<Collider> method:

purpleGoalCollider = purpleGoal.GetComponent<Collider>();
blueGoalCollider = blueGoal.GetComponent<Collider>();

Assign the parent VolleyballArea game object to a variable 'envController' for easier reference later.

envController = GetComponentInParent<VolleyballEnvController>();

Copy these statements into the Start() method:

void Start()
{
    envController = GetComponentInParent<VolleyballEnvController>();
    purpleGoalCollider = purpleGoal.GetComponent<Collider>();
    blueGoalCollider = blueGoal.GetComponent<Collider>();
}

OnTriggerEnter(Collider other)

This method is called when the ball hits a collider.

Some scenarios to detect are:

Ball hits the floor/goals
Ball goes out of bounds
Ball is hit over the net (to encourage volleying for training later)

This method will detect each scenario and pass this information to envController (which we'll add in the next section). Copy the following block into this method:

if (other.gameObject.CompareTag("boundary"))
{
    // ball went out of bounds
    envController.ResolveEvent(Event.HitOutOfBounds);
}
else if (other.gameObject.CompareTag("blueBoundary"))
{
    // ball hit into blue side
    envController.ResolveEvent(Event.HitIntoBlueArea);
}
else if (other.gameObject.CompareTag("purpleBoundary"))
{
    // ball hit into purple side
    envController.ResolveEvent(Event.HitIntoPurpleArea);
}
else if (other.gameObject.CompareTag("purpleGoal"))
{
    // ball hit purple goal (blue side court)
    envController.ResolveEvent(Event.HitPurpleGoal);
}
else if (other.gameObject.CompareTag("blueGoal"))
{
    // ball hit blue goal (purple side court)
    envController.ResolveEvent(Event.HitBlueGoal);
}

`VolleyballEnvController.cs`

This script holds all the main logic for the environment: the max steps it should run for, how the ball and agents should spawn, when the episode should end, how rewards should be assigned, etc.

In the sample skeleton script, some variables and helper methods are already provided:

Start() — fetch the components and objects we'll need for later
UpdateLastHitter() — keeps track of which agent was last in control of the ball
GoalScoredSwapGroundMaterial() — changes the color of the ground (helps us visualise which agent scored)

FixedUpdate()

This is called by the Unity engine each time there is a frame update (which is set to every FixedDeltaTime=0.02 seconds in ProjectSettingsOverride.cs).

This will control the max number of updates (i.e. 'steps') the environment takes before we interrupt the episode (e.g. if the ball gets stuck somewhere).

Add the following to void FixedUpdate():

/// <summary>
/// Called every step. Control max env steps.
/// </summary>
void FixedUpdate()
{
    resetTimer += 1;
    if (resetTimer >= MaxEnvironmentSteps && MaxEnvironmentSteps > 0)
    {
        blueAgent.EpisodeInterrupted();
        purpleAgent.EpisodeInterrupted();
        ResetScene();
    }
}

ResetScene()

This controls the starting spawn behavior.

Our goal is to learn a model that allows our agent to return the ball from its side of the court no matter where the ball is sent. To help with training, we'll randomise the starting conditions of the agents and ball within some reasonable boundaries:

/// <summary>
/// Reset agent and ball spawn conditions.
/// </summary>
public void ResetScene()
{
    resetTimer = 0;

    lastHitter = Team.Default; // reset last hitter

    foreach (var agent in AgentsList)
    {
        // randomise starting positions and rotations
        var randomPosX = Random.Range(-2f, 2f);
        var randomPosZ = Random.Range(-2f, 2f);
        var randomPosY = Random.Range(0.5f, 3.75f); // depends on jump height
        var randomRot = Random.Range(-45f, 45f);

        agent.transform.localPosition = new Vector3(randomPosX, randomPosY, randomPosZ);
        agent.transform.eulerAngles = new Vector3(0, randomRot, 0);

        agent.GetComponent<Rigidbody>().velocity = default(Vector3);
    }

    // reset ball to starting conditions
    ResetBall();
}

/// <summary>
/// Reset ball spawn conditions
/// </summary>
void ResetBall()
{
    var randomPosX = Random.Range(-2f, 2f);
    var randomPosZ = Random.Range(6f, 10f);
    var randomPosY = Random.Range(6f, 8f);

    // alternate ball spawn side
    // -1 = spawn blue side, 1 = spawn purple side
    ballSpawnSide = -1 * ballSpawnSide;

    if (ballSpawnSide == -1)
    {
        ball.transform.localPosition = new Vector3(randomPosX, randomPosY, randomPosZ);
    }
    else if (ballSpawnSide == 1)
    {
        ball.transform.localPosition = new Vector3(randomPosX, randomPosY, -1 * randomPosZ);
    }

    ballRb.angularVelocity = Vector3.zero;
    ballRb.velocity = Vector3.zero;
}

ResolveEvent()

This method will resolve the scenarios we defined earlier in VolleyballController.cs.

We can use this method to assign rewards in different ways to encourage different types of behavior. In general, it's good practise to keep rewards within [-1,1].

To keep it simple, our goal for now is to train agents that can bounce the ball back and forth and keep the ball in play. We'll assign a reward of +1 each time an agent hits the ball over the net using the AddReward(1f) method in the corresponding scenario:

case Event.HitIntoBlueArea:
    if (lastHitter == Team.Purple)
    {
        purpleAgent.AddReward(1);
    }
    break;

case Event.HitIntoPurpleArea:
    if (lastHitter == Team.Blue)
    {
        blueAgent.AddReward(1);
    }
    break;

We won't assign any rewards for now if a goal is scored or the ball is hit out of bounds. If either of these scenarios happen, we'll just end the episode. Add the following code block to the sections indicated by the // end episode comment.

blueAgent.EndEpisode();
purpleAgent.EndEpisode();
ResetScene();

Here's what ResolveEvent should look like:

/// <summary>
/// Resolves scenarios when ball enters a trigger and assigns rewards
/// </summary>
public void ResolveEvent(Event triggerEvent)
{
    switch (triggerEvent)
    {
        case Event.HitOutOfBounds:
            if (lastHitter == Team.Blue)
            {
                // apply penalty to blue agent
            }
            else if (lastHitter == Team.Purple)
            {
                // apply penalty to purple agent
            }

            // end episode
            blueAgent.EndEpisode();
            purpleAgent.EndEpisode();
            ResetScene();
            break;

        case Event.HitBlueGoal:
            // blue wins

            // turn floor blue
            StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.blueGoalMaterial, RenderersList, .5f));

            // end episode
            blueAgent.EndEpisode();
            purpleAgent.EndEpisode();
            ResetScene();
            break;

        case Event.HitPurpleGoal:
            // purple wins

            // turn floor purple
            StartCoroutine(GoalScoredSwapGroundMaterial(volleyballSettings.purpleGoalMaterial, RenderersList, .5f));

            // end episode
            blueAgent.EndEpisode();
            purpleAgent.EndEpisode();
            ResetScene();
            break;

                case Event.HitIntoBlueArea:
                    if (lastHitter == Team.Purple)
                    {
                        purpleAgent.AddReward(1);
                    }
                    break;

                case Event.HitIntoPurpleArea:
                    if (lastHitter == Team.Blue)
                    {
                        blueAgent.AddReward(1);
                    }
                    break;
                    }
}

Now when you click Play ▶️ you should see the environment working correctly: the ball is affected by gravity, the agents can stand on the ground, and the episode resets when the ball hits the floor.

Wrap-up

You should now have a volleyball environment ready for our agents to train in. It will assign our agents rewards to encourage a certain type of behavior (volleying the ball back and forth).

In the next section, we'll design our agents and give it actions to choose from and a way to observe its environment.

If you have any feedback or questions, please let me know!

A hands-on introduction to deep reinforcement learning using Unity ML-Agents

Joy — Thu, 26 Aug 2021 09:21:48 +0000

⚠ Note: this series is still a work in progress.

This series is up-to-date with the latest ML-Agents Release 18

Purpose

There are plenty of great reinforcement learning (RL) courses out there. Just to name a few:

But if you're anything like me, you might prefer a 'learning by doing' approach. With hands-on experience upfront, it may be easier for you to grasp the theory and math behind the algorithms later.

In this series, I'll walk you through how to use Unity ML-Agents to build a volleyball environment and train agents to play in it using deep RL.

Why ML-Agents?

ML-Agents is an add-on for Unity (a game development platform).

It lets us design a complex physics-rich environment without needing to build any of the physics simulation logic ourselves. It also lets us experiment with state-of-the-art RL algorithms without having to set up any boilerplate code or install additional libraries. The nice graphics and interface are a plus.

A (very brief) overview of reinforcement learning

In a nutshell, think about how you might teach a dog a new trick, like telling it to sit:

If it performs the trick correctly (it sits), you’ll reward it with a treat (positive feedback) ✔️
If it doesn’t sit correctly, it doesn’t get a treat (negative feedback) ❌

By continuing to do things that lead to positive outcomes, the dog will learn to sit when it hears the command in order to get its treat. Reinforcement learning is a subdomain of machine learning which involves training an ‘agent’ (the dog) to learn the correct sequences of actions to take (sitting) on its environment (in response to the command ‘sit’) in order to maximize its reward (getting a treat).

This can be illustrated more formally as:

Source: Sutton & Barto

For more on the theory, check out:

20+ Active machine learning and data science communities

Joy — Wed, 18 Aug 2021 07:22:00 +0000

Whether you're a beginner or veteran in machine learning and data science, you might be interested in a place to ask questions, share projects, or join discussions on the latest developments.

There are many great communities out there for this, but it can be difficult to choose which one (and some may no longer be active or well-maintained).

To help you, I've compiled an up-to-date list of 20+ active machine learning and data science communities grouped by platform.

1. Reddit

Reddit is a powerhouse for many active forums dedicated to all areas across AI, machine learning, and data science.

Here's a list:

r/machinelearning (2M+ members)
r/datascience (500K+ members)
r/learnmachinelearning (200K+ members)
r/artificial (145K+ members)
r/deeplearning (60K+ members)
r/artificialinteligence (50K+ members)
r/reinforcementlearning (20K+ members)

If you're just getting started, I recommend checking out r/learnmachinelearning. It's a welcoming community for sharing beginner questions, projects, and resources (they also have a Discord server).

With over 2 million members, r/machinelearning will likely be your go-to. It's more heavily moderated than the other subreddits, but you'll be sure to find all the latest important news, research papers, and discussions here (you might even bump into industry veterans like @hardmaru).

2. Discord

Discord is an instant messaging platform with private servers that anyone can join using an invite link.

r/learnmachinelearning (7K+ members): a complimentary server for the subreddit community, with dedicated channels for sharing projects, asking questions, and studying popular MOOC courses together.

Learn AI together (16K+ members): the largest Discord community dedicated to AI with a heap of great resources to check out. You'll find discussion topics for anything from memes to AGI here.

Fundamentals of ML (2K+ members): dedicated to those particularly interested in the theory and math behind ML, but also for general ML discussion, projects, and questions.

Data Science (12K+ members): a community of data science professionals and enthusiasts.

The Data Share (6K+ members): a community-driven server moderated by part of team from Towards Data Science.

3. Facebook

Facebook groups can be another way to meet others in the field. Here's some of the largest and most active groups:

Data Mining / Machine Learning / Artificial Intelligence (130K+ members): an open group for discussing and sharing information across the general areas of data and AI.

Artificial Intelligence and Machine Learning (170K+ members): a private beginner-friendly group for people to share resources and learnings.

Global Artificial Intelligence, Machine Learning and Deep Learning (20K+ members): a private group for data scientists, investors, researchers, and corporates to discuss the latest in AI.

4. Other platforms

Kaggle is a well-known data science competition platform. It boasts a community of over 5 million users, where you can compete and share data sets and projects (in the form of notebooks).

The Machine Learning and Data Science LinkedIn group is a community of professionals interested in the space. This includes engineers, data scientists, recruiters, business leaders, and more. It might be particularly worth checking out if you are looking to network or find a new role.

Conclusion

There are plenty of great communities out there to check out whether you're a beginner or an industry veteran. I'll be keeping this list up to date, so if there's something you think is missing, please let me know!