DEV Community: Daniel

AWS DeepRacer Student League Guide

Daniel — Wed, 07 Feb 2024 14:30:47 +0000

I've already written a starter guide for AWS DeepRacer, that can be used both for Regular League and Student League that outlines some valuable information for getting started, but in this guide I will outline in detail the default configuration for Student League. I recommended you read the starter guide to learn about DeepRacer for Cloud.

Who is it for?

Any Student 16+ doesn't matter if high school, home school, or university. there are generally scholarship prizes and the season is typically paired with a Udacity Scholarship for various Nano Degrees.

Log Analysis

Currently, there isn't a way for the student league to do any log analysis, the only package they can download from the console is the car package to load the model into the car. None of these files give valuable insight into the training performance.

Ways to Analyze the Performance

Ideally, since you don't have log analysis and a limited number of hours to train on the Student League console. You can do some mock testing with sample data sets and pass the value to your reward function within Excel. The only feasible way to robustly test your reward function would be to use DeepRacer for Cloud on your hardware. This way you can save your 10 hours until you are ready to use them to get the best performance possible. I'll outline the default that applies to both AWS Console and DRfC

Action Space

Student League has no control over the action space within the console. They are strictly defined as a Continuous action space, With a min and max speed of 0.5 to 1, and a steering angle of -30 to 30.

In DRfC(see starter guide) this is defined as a JSON document like this:

{
    "action_space": {"speed": {"high": 1, "low": 0.5}, "steering_angle": {"high": 30, "low": -30}},
    "sensor": ["FRONT_FACING_CAMERA"],
    "neural_network": "DEEP_CONVOLUTIONAL_NETWORK_SHALLOW",
    "training_algorithm": "clipped_ppo", 
    "action_space_type": "continuous",
    "version": "5"
}

💡it is important to note that the speed that gets passed to the reward function will never exceed your max speed in the action space. a lot of people get confused by this partly because the video has an error output of speed. I believe this is caused by Python's Global Interpreter Lock due to the nature of how the data is called causing timing issues. while the math behind it is sound there is something fundamentally throwing off the output. Which I believe it to be caused by GIL

Hyperparameters

Again, this is another item Student League has no control over, because of this no control it adds another layer of difficulty due to the discount factor being 0.999 meaning the reinforcement learning is looking at the whole lap to calculate the future reward to make a decision and not just a little bit ahead. this is also defined by a JSON document within DRfC. These are the default student league is stuck with while training.


{
    "batch_size": 64,
    "beta_entropy": 0.01,
    "discount_factor": 0.999,
    "e_greedy_value": 0.05,
    "epsilon_steps": 10000,
    "exploration_type": "categorical",
    "loss_type": "huber",
    "lr": 0.0003,
    "num_episodes_between_training": 20,
    "num_epochs": 10,
    "stack_size": 1,
    "term_cond_avg_score": 350.0,
    "term_cond_max_episodes": 1000,
    "sac_alpha": 0.2
  }

Reward Function

as I said in my starter guide, there are many different parameters you can pick from and some are for object avoidance. Which isn't needed for the time trial races. Again See my starter guide for additional Details on your reward function. Instead Going to outline some issues I've seen from other Racers.

Min and Max Reward Cannot exceed 10k.

If it does this will either cause a validation error or cause training to stop.

Import reporting module not available

DeepRacer doesn't use the latest Numpy so this generally reports as a syntax error. if you attempt to import/use a module that doesn't exist in that version of numpy.

logic errors + syntax errors

the validator only passes in a small sample set of data. Your carefully crafted reward function might not throw a validation error, but later cause your training to stop. this is generally because the sample set never fully triggered your reward function. Logic errors could also cause unexpected behavior. Always test your reward function with sample parameters to see if you get the expected reward.

Additionally, to avoid some of these errors it is best practice to initialize your variables before you get to any logic statements. this will avoid edge cases where you call a variable without any assignment.

expecting speed to exceed max speed of action space

I've said it above, and I'll say it again because I've had to explain this several times to some people before it finally clicks. Speed will never exceed the maximum speed you have set in your action space.

Udacity Scholarship

Generally, there are a few steps to do for the Udacity Scholarship, one is completing a lap under X number of minutes. you'll be submitting to the virtual circuit track to achieve this requirement. Which should be found on the left-hand side of the navigation panel. This step can be difficult to achieve depending on the length of the track.

Comprehensive starter guide to AWS DeepRacer

Daniel — Thu, 01 Feb 2024 19:13:50 +0000

Comprehensive starter guide to AWS DeepRacer

What is AWS DeepRacer?

AWS DeepRacer is the fastest way to get started with machine learning and understanding the basics of coding. It doesn’t matter the current career path or age. The only thing that matters is your will to learn. What is AWS DeepRacer? AWS DeepRacer is a 1/18th scale RC autonomous racing league. These cars are 100% self-driving and use reinforcement learning to train themselves to get around the track. You are the developer who only provides minimal input.

Normally machine learning is very complex and takes time to build, but AWS neatly packages the experience into a user-friendly experience. We’ll delve into the 3 sections a developer has control of here shortly which are Hyperparameters, Action space, and reward function parameters but as I said this is a comprehensive guide.

I’ll be referring to AWS console and DeepRacer for Cloud for short I'll use DRfC. AWS console refers to training directly on AWS-provided infrastructure, but since AWS DeepRacer is open source. a group of us decided to take what AWS provided and package it into a system known as DRfC, this not only unlocks some additional features but allows you to train on your own platform.

Physical Racing vs Virtual Racing

Virtual and Physical are two completely different beasts and many racers struggle with the changes between the two. In Virtual Racing the track we train on is usually the same as the one we submit the model to. We can get away with high speeds in action space and overfitting

Physical racing, however, is very sensitive to high speed and overfitting. High-speed results in having to put a higher throttle in the car which can be seen as the vehicle stalling out on the track, or just not wanting to move without some assistance. a well-generalized model performs better in the real world.

Two misconceptions during actual racing are two things, the car is no longer learning. it is now exploiting the knowledge it has collected in the model. Second, the reward function has no role in the decision-making process. The only input the car relies on is from the camera, and using that with the sum of its experience to make a probability choice based on weights. a learning model has weights of a matrix embedded in a machine learning model.

DeepRacer League vs DeepRacer Student League

The Student League is a reduced-down AWS console to allow a younger generation to learn machine learning at an early age. They usually pair the season with a chance of an AI Scholarship with Udacity plus the usual prizes. You can still use a model in Student League on a physical car and be competitive. This is reserved for people 16+. Since there are limitations within the Student League I will note if a feature isn’t available.

DeepRacer League is the adult version without the training wheels. You have full control of all the features available to you, and prizes for performing well during the season. Which can end with a trip to AWS Reinvent. Restrictions are 18+

Policies

I’m only going to talk about policies at a high level, but just like you don’t talk about fight club. We don’t talk about SAC. While SAC is an option available with policies it is very sensitive to hyperparameters, and this has caused no one in the DeepRacer league to be successful with it. (as far as we know).

It is best to keep to PPO policy especially when starting your journey.

Convergence

Convergence is the ultimate goal for any ML engineer to achieve while training their model. This is the point that the car is no longer exploring its environment but instead exploiting in knowledge to get around the environment. When looking at charts to see if you are converging this is usually when training lap completion and evaluation lap completion meet. If these aren’t meeting you might need to try adjusting your hyperparameters. Hyperparameters tend to cause one of two outcomes which are constantly fighting each other, learning stability vs convergence. If hyperparameters aren’t set right the model might not be learning enough to reach convergence or might take a long time to reach convergence.

Another method for seeing if a car is converging, which is easier to see within DRfC is watching entropy. Entropy simply is how much uncertainty is left in the system. If the model is moving in the right direction this will decrease over time and if it isn’t you’ll see it increase. You can see this in the AWS console once you download the logs, but with DRfC we can watch it live. Entropy will be near 1 when the car starts doing laps and as low as 0.2-0.4 for a well-balanced function.

Log Analyze

Before I get into Hyperparameters that affect convergence, let's talk about how to visibly see convergence within the environment.

Log analysis is going to be key to being successful to understand how your model is performing where you can make improvements and if the reward function is favoring the ultimate goal. The ultimate goal is to get the most points possible. I’ll go into a little more detail on this later.

AWS DeepRacer community provides several tools for you to analyze your model and all you need are the training logs. I’m not going into depth on how to do this but the repo for the main tool I use can be found here. https://github.com/aws-deepracer-community/deepracer-analysis there are several others available including the one made by JPMC known are log guru. I’ll be looking at writing a more in-depth guide to deepracer-analysis in the coming months.

The student league can’t do log analysis because they can’t download the log. they could in theory analyze the reward in one of two ways, create a sample run and pass in the reward function to see the outcomes.

Second, would be to set up DRfC to test the reward function before burning the 10 hours. due to speed differences in hardware, you’ll need to estimate the AWS console with DRfC by looking at the number of iterations and matching closely to get the expected performance.

Hyperparameters

💡Not Available in Student League

Hyperparameters are settings that affect the training performance of the model. Sometimes these items need to be tweaked to make what might seem like a bad reward function into a good one. Generally, starting you should leave these at default except for the discount factor I would suggest setting that to 0.99

Hyperparameters	Description	Platform Availability
Batch Size	the number of experiences that will be sampled at random used to update the model reducing can promote a more stable policy	both
Beta Entropy	it is a value of uncertainty that is added to the policy. The lower the number the less uncertain. Increasing this will promote exploration over exploitation at the cost of convergence.	both
Epochs	The number of iterations through the training data to update the model. this can also affect learning, the trade-off is lower value higher stability at the cost of convergence	both
Learning Rate	Control how much Gradient descent updates the weights. decreasing can increase stability but at the cost of time.	both
Discount factor	How far in the future the car looks ahead to calculate the possible rewards it collects. If the environment is noisy decreasing this will help stability to focus on more short-term reward. More later	Both
Exploration Type	Two choices Categorical which is default, and e-greedy. With epsilon-greedy, you choose a value for ε (epsilon), typically between 0 and 1. The agent will explore (choose a random action) with probability ε and exploit (choose the best-known action) with probability (1 - ε). DRfC only In categorical exploration, you model the exploration as a probability distribution over actions. Instead of having a single ε value, you define a probability distribution over actions, where each action has a probability associated with it. This distribution can be learned and updated as the agent interacts with the environment. AWS Console and DRfC	DRfC
e greedy	Promotes the difference between exploration and exploration when using e-greedy exploration type	DRfC
epsilon steps	(Unconfirmed) I believe this is part of the PPO Clipping that controls how much a policy can deviate once a certain number of steps is hit during training. Higher promotes more exploration	DRfC
Loss type	the objective function used to update the network’s weights. values Huber, and Mean squared error	Both
Number of Episodes	The number of sessions used to collect data before the policy update. For more complex problems increasing can lead to a stable model	Both

Action Space

💡Not Available in Student League. Student League is fixed at a continuous action space between 0.5 to 1 speed, -30 to 30 steering

As a Developer, you will think about how you want your vehicle to behave crafting a reward function to achieve the desired results. Similar to how you would train an animal if you wanted it to perform a trick by giving it treats every time it performs a desired action or actions. Currently, there are two different choices available.

Discrete is a list of consistent speeds paired with a steering angle, and DeepRacer will select from this list. How it selects from this list is affected by the exploration type. Which can only be changed in DRfC

Continuous you only set your min and max steering angle and speed. The car will select a random float between those values so during training the car may never select the same value twice.

Reward Function Parameters

Parameters are all the different values that get passed into the reward function during training, this happens every 1/15th of a second, and actions picked during that time are passed to the reward function to calculate the reward. There is a wide range of parameters, and some are only used if you are doing object avoidance. You’ll need to think carefully about which values suit your end goals to weave into your reward function. incidentally that 1/15th a second is always what is called a step in the parameters.

🏎️ Note: speed is a hard value based on the action space. it is best to think of speed as throttle and not the actual velocity of the car’s current state.

A comprehensive list can be found here:

Input parameters of the AWS DeepRacer reward function - AWS DeepRacer

Reward Function

Crafting a reward function can get pretty overwhelming with all the parameters you have available. Typically, people take one of two approaches when they first get started they either do reward = speed in hopes to generate a fast model, or they make the reward function overly complex. Complexity is fine if it uses only a few parameters. Some of my best models have used up to 2 parameters at most.

Use reward shaping where you can in your reward function. What is reward shaping? it where you still give a reward even if it is small if the action is off what you desire. For Example below is the AWS-provided Center line function

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''

    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward

This reward shaping in mind this center line function can be simplified to the following:

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''

    # Read input parameters
    distance_from_center = params['distance_from_center']

    reward = 1 - distance_from_center



    return reward

with this simplification, the car has some freedom to deviate from the center line and still receive a decent reward, but the further away it gets the lower the reward.

Now, we talked about simplifying and reward shaping. So let's talk about the elephant in the room using reward = speed or even reward = speed + other parameters is just plain bad for you.

Earlier I said 1 step is 1/15th of a second this value is key for the concept I’m about to show you. Firstly, you need to understand the car’s ultimate goal is to get the highest value reward possible. Meaning if don’t do log analysis on your reward function it might learn bad behavior due to this ultimate goal. Let's take a car that has two speed values of 0.5 m/s and 1 m/s on a 30-meter track.

💡 This is an oversimplification of the whole process but this allows you to understand why things can happen when you do not expect

We’ll assume constant speed for these two scenarios

30 meters * 1 m/s = 30 seconds

30 meters * 0.5 m/s = 60 seconds

Now we know the time it would take for each speed to get around the track, but now we need to account for the number of steps in one second which is 15.

30 second * 15 = 450 steps

60 seconds * 15 = 900 steps

Now we need to times the value by speed because that was what the reward was reward = speed.

450 steps * 1 reward = 450 points

900 steps * 0.5 reward = 450 points

So now you see if you just use the speed parameter you end up with the same number of points, which in a real scenario might end up with a car speeding up or slowing down, or just settling on a slow speed if you pair it with another parameter. For example, speed + distance from the center line if the value of the distance from the center line is always 1, you just made the new formula steps * speed reward + steps * distance reward. You’ve quickly just ruined your speed.

There is also another issue that can occur during training which is snaking, or zig-zagging. this is typically because the car has learned it can get a higher reward for maximizing its time. If you aren’t in the student league you can generally fix some of these issues by adjusting the discount factor. discount factor can make a seemingly bad reward a better one by limiting the number of steps it looks at in the future to make the decisions.

Build your own AI Image Generation

Daniel — Thu, 25 Aug 2022 16:16:00 +0000

Impact

The latest trend on the internet is easy access to Image generators like midjourney, Dall-e 2 or stable diffusion. We a few of these behind paywalls. I'll show you with a few lines of code you can get started on using your own on in AWS Sagemaker Notebook instance!

Thanks to public release by stable diffusion, you can download and run the ai model generating the images and sometimes the results can be horrifying, and other times just elegant. The examples provided below where simply "realistic pikachu", and "old man with a hat". Results will be different each time the prompt is ran!

it is hard to believe this person isn't real. However, do these image generations have other practical application other than art work? Of course, they'll have an impact to come in design industry with a few simple prompts someone can get a new inspiring design such a piece of furniture!

prerequisite

1.Sign up for an account on Huggingface

Accept Terms of Services.
Generate token

Launching the notebook

WARNING: Sagemaker Notebook costs money to run! Make sure you clean up the instance as soon as you are done in order to avoid costs! Proceed at your own risk!

Login to AWS Console.
Search for Amazon Sagemaker, and select it.

Click on Notebook Instance from left hand menu
Click Create Instance

Select a Name from the Instance. Instant type MUSt be Accelerated computing instance! Select ml.p2.xlarge or better

Permissions and encryption leave default
Open Up network settings. Set it into the default VPC. it will need internet access.

Select Launch Instance!

Configuring the notebook

If you've followed along so far, it will take a bit for the instance to get prepared. However, we need to do some changes to the instance and add some packages. We'll modify an existing environment variable.

Once instance is ready. Select Open Jupyter Lab.

Select Terminal
Activate the Conda Environment

source /home/ec2-user/anaconda3/etc/profile.d/conda.sh
conda activate pytorch_38

Install required packages for this to work.

pip install diffusers==0.2.3 transformers scipy
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

Configure huggingface-cli login

huggingface-cli login

this will prompt you for the token from earlier

Output if successful

Login successful Your token has been saved to /home/ec2-user/.huggingface/token Authenticated through git-credential store but this isn't the helper defined on your machine. You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

The Code!

We're done with the configuration! Select File -> New -> New Notebook.
Copy paste the following code into the cell

from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler


lms = LMSDiscreteScheduler(
    beta_start=0.00085, 
    beta_end=0.012, 
    beta_schedule="scaled_linear"
)

# this will substitute the default PNDM scheduler for K-LMS  
pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    scheduler=lms,
    use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  

image.save("astronaut_rides_horse.png")

Click Run! NOTE: first run will take a bit as it downloads the AI model. Final output should be a beautiful photo and should look different than mine!

conclusion

You can use this to generate many different type of images by editing the prompt and file name. the results will almost never be the same. this is just basic framework and much more complex system could be created to generate images based on a web request. Easily launched on an EC2 instance with a GPU attached to generate images and push to S3.

WARNING DO NOT FORGET TO TERMINATE NOTEBOOK TO STOP CHARGES

Source:

HuggingFace