I trained a Robot Arm: What I failed to learn.

#ai #deeplearning #learning #machinelearning

First, there is so much to learn.

Understanding ML foundational concepts and having AI-accelerated workflows doesn't mean you can just jump in without going through the hard curve of learning. I learned that the expensive way.

Reinforcement Learning (RL) is distinct from other ML fields. Even though they share boundaries, there are concepts that even hardcore ML engineers won't just grasp immediately.

My first mistake was trying to skip steps. I was ambitious, that was glaring. I wanted results ASAP (my self-destructive habit of posing to the world). I was way too focused on seeing it work without employing my intuition to the hard details.

After completing my first RL project with AI's help, I could feel it in my gut: I had learned nothing, or maybe just too minimal to fit my acclaimed achievement. That's when I went back to relearn. It took time, but it was rewarding.

Now that you've heard the story of my life, let's get technical.

The Setup

The robot arm is the Pusher from Gymnasium with 7 Degrees of Freedom (DOF). It's a multi-jointed robot arm similar to a human arm. The goal is to move a target cylinder (the object) to a goal position using the robot's end effector (the fingertip). The robot has shoulder, elbow, forearm, and wrist joints.Gymnasium

In RL environments, we usually deal with discrete or continuous action spaces. Discrete action spaces have finite sets of actions - they're usually easier to learn, even with low compute. Continuous action spaces are close to infinite sets of available actions. The gradient can easily explode, or the agent gets stuck in a local minimum, or doesn't learn at all. It takes a lot of time to train.

Our Robot is a continuous action space environment. Our battle is long and the environment is as unpredictable as a raving sea.

A cool intuition to this would be:

Since discrete action spaces are easier and faster to learn, What if we try to discretise the action spaces in effort to use the algorithms that work well in discrete action spaces like (DQN)? But The problem is that the number of actions increases exponentially with the degrees of freedom. (Source DDPG Paper ). Hence, the curse of dimensionality.

The Training Journey

I initially tried SAC (Soft Actor Critic). After around 2 million timesteps, it failed to learn significantly. It took around 14 hours of training on my CPU. I introduced tricks: checkpointing (suggested by Mave), interval video recording, low-end optimisations, free GPU runs on Google Colab.

I made significant progress, but it was hard to get something encouraging at that stage. The rewards were signaling that it wasn't learning. I had to terminate.

Best model at 2 million timesteps, it's obvious it ain't learning.😓

But after I dug further, I saw that I could improve my results by trying different algorithms like HER (Hindsight Experience Replay). HER is an off-policy algorithm aimed to work in applications where admissible behaviors aren't necessarily known. Previous algorithms required careful reward shaping and in-depth domain knowledge. HER solves this by learning from unshaped reward signals.

What's Next?

Just like Andrew Ng wrote in "The Batch" (huge fan😊):

"The single biggest predictor of how rapidly a team makes progress building an AI agent lies in their ability to drive a disciplined process for evals (measuring the system's performance) and error analysis (identifying the causes of errors). It's tempting to shortcut these processes and quickly attempt fixes to mistakes rather than slowing down to identify root causes. But evals and error analysis can lead to much faster progress."

He also emphasized that without understanding how computers work, you can't just "vibe code" your way to greatness. Fundamentals are essential.

The ability to understand concepts and apply them is greater than just producing results without adequate understanding. That's exactly what I was doing; shortcutting the process, chasing results without grasping the fundamentals.

So, what will I do differently? I'll check out the original algorithm papers before starting implementation. I'll digest first. I'll also improve my RL algorithm debugging skills - because understanding why something fails is just as important as making it work.

Stay tuned for my learning updates.

Till then,

Keep Learning,
Samuel Ibiyemi

Top comments (4)

Blessing Agbor • Dec 1 '25

I love the last two paragraphs; introspective moments like this are the core reason why learning matters.
I'm glad you worked on this.
Thanks for sharing, Samuel.