DEV Community

Dipti M
Dipti M

Posted on

A Step-by-Step Guide to Reinforcement Learning in R

Machine learning algorithms are generally grouped into three categories:

Supervised learning algorithms – for classification and regression tasks.

Unsupervised learning algorithms – for clustering and pattern discovery.

Reinforcement learning algorithms – for learning by interaction and feedback.

We have already discussed supervised and unsupervised learning in earlier articles. In this article, we turn to the third category: reinforcement learning (RL). Unlike the other two, reinforcement learning is not about passively learning from static datasets. Instead, it is about actively exploring and learning from an environment, using trial-and-error, rewards, and penalties.

Let’s dive in step by step.

Table of Contents

  • Reinforcement learning real-life example
  • Typical reinforcement process
  • Reinforcement learning workflow
  • Divide and Rule approach
  • Reinforcement learning in R
  • MDP toolbox implementation
  • Using the GitHub ReinforcementLearning package
  • Adapting to new environments
  • Complete code

Reinforcement Learning in Real Life

Consider how students learn in schools. A teacher introduces a concept, solves a few examples, and then expects students to practice similar problems independently. Students get feedback—sometimes correct, sometimes wrong—and adjust their approach over time.

This trial-and-error cycle is exactly what reinforcement learning embodies. The “agent” (learner) interacts with an “environment,” makes decisions (actions), receives “feedback” (rewards or penalties), and gradually learns the best strategy (policy).

Typical Reinforcement Process

In RL, the machine acts as the student. It:

Takes an action in a given state.

Observes the result (new state + reward).

Updates its knowledge to maximize long-term rewards.

Unlike supervised learning, RL doesn’t require labeled training data. Instead, it learns from experience. This makes RL especially useful where predefined datasets are unavailable—like game-playing, robotics, navigation, and adaptive decision-making.

Reinforcement Learning Workflow

At the core of RL lies the Markov Decision Process (MDP). It has five elements:

  • States (S): Possible situations the agent can be in.
  • Actions (A): Choices available in each state.
  • Rewards (R): Feedback for each action taken.
  • Policy (π): A strategy that defines which action to take in each state.
  • Value (V): Expected cumulative reward from a given state under a policy.

The goal:

  • learn an optimal policy π* that maximizes rewards over time.
  • Divide and Rule: Breaking Down RL
  • To make RL manageable, we define:
  • Policies: The rules guiding agent actions.
  • Rewards/Penalties: Metrics to evaluate performance.
  • Training limits: How long the agent should explore.

For example, consider a simple 3×3 grid navigation problem. The agent must go from “Start” to “Exit” while avoiding a “Pit.” Each step has a small penalty (to encourage shorter paths), falling into the pit has a large penalty, and reaching the exit gives a positive reward. Over iterations, the agent learns the best route.

Implementing Reinforcement Learning in R

R offers multiple ways to implement RL. Two popular options are:

MDPtoolbox package – good for small Markov decision process problems.

ReinforcementLearning package (from GitHub) – useful for toy problems and simulations.

Example 1: MDPtoolbox

The MDPtoolbox package lets us define states, actions, and rewards in a compact form.

install.packages("MDPtoolbox")

library(MDPtoolbox)

Define actions (up, down, left, right) in a 2x2 grid

Each matrix row sums to 1 (probabilities)

up <- matrix(c(1,0,0,0, 0.7,0.2,0.1,0, 0,0.1,0.2,0.7, 0,0,0,1), nrow=4, byrow=TRUE)
down <- matrix(c(0.3,0.7,0,0, 0,0.9,0.1,0, 0,0.1,0.9,0, 0,0,0.7,0.3), nrow=4, byrow=TRUE)
left <- matrix(c(0.9,0.1,0,0, 0.1,0.9,0,0, 0,0.7,0.2,0.1, 0,0,0.1,0.9), nrow=4, byrow=TRUE)
right <- matrix(c(0.9,0.1,0,0, 0.1,0.2,0.7,0, 0,0,0.9,0.1, 0,0,0.1,0.9), nrow=4, byrow=TRUE)

Actions <- list(up=up, down=down, left=left, right=right)

Define rewards: -1 per move, +10 for reaching goal

Rewards <- matrix(c(-1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, 10,10,10,10),
nrow=4, byrow=TRUE)

Solve using policy iteration

solver <- mdp_policy_iteration(P=Actions, R=Rewards, discount=0.1)
solver$policy
solver$V

This approach is ideal for structured problems like grid-worlds or decision trees.

Example 2: ReinforcementLearning Package

The GitHub-only ReinforcementLearning package provides an easy-to-use framework for RL tasks.

devtools::install_github("nproellochs/ReinforcementLearning")

library(ReinforcementLearning)

Use the built-in gridworld environment

states <- c("s1","s2","s3","s4")
actions <- c("up","down","left","right")

Generate experience

sequences <- sampleExperience(N=1000, env=gridworldEnvironment,
states=states, actions=actions)

Train model

solver_rl <- ReinforcementLearning(sequences,
s="State", a="Action", r="Reward", s_new="NextState")

solver_rl$Policy
solver_rl$Reward

The package also includes a tic-tac-toe dataset with ~400,000 steps, allowing RL experiments on game-playing.

Real-World Applications of RL

Reinforcement learning is more than toy problems—it powers:

  • Game AI: AlphaGo and AlphaZero by DeepMind.
  • Robotics: Teaching robots to walk, grasp, or navigate.
  • Finance: Portfolio optimization and algorithmic trading.
  • Healthcare: Personalized treatment strategies.
  • Recommendation systems: Adaptive personalization based on user behavior.

R may not be the most common language for large-scale RL (Python dominates), but it remains useful for teaching, prototyping, and experimenting with small environments.

Conclusion

Reinforcement learning represents a paradigm shift in machine learning—learning by doing rather than passively consuming data. Using R’s MDPtoolbox and ReinforcementLearning packages, we can experiment with simple problems, understand policies, and build intuition about how agents learn.

Though R is not the go-to tool for production-level RL, its packages provide an excellent way to grasp the fundamentals. From grid navigation to tic-tac-toe, these experiments lay the foundation for tackling real-world RL applications like robotics, finance, and AI-driven personalization.

As RL continues to evolve, blending human-like learning with AI, it holds the promise of creating systems that don’t just process data but adapt, explore, and improve over time.

We deliver end-to-end analytics expertise through Power BI consulting, trusted AI Consulting, and specialized Tableau Consulting Services, helping businesses transform data into actionable insights.

Top comments (0)