DEV Community

Cover image for Python Robot Control: Build Intelligent Machines with ROS, Computer Vision, and PID Systems
Nithin Bharadwaj
Nithin Bharadwaj

Posted on

Python Robot Control: Build Intelligent Machines with ROS, Computer Vision, and PID Systems

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Python might not be the first language you think of for controlling machines, but it has become my secret weapon for building and managing robotic systems. It acts as the brains of the operation—a high-level conductor that tells lower-level systems what to do. It’s perfect for taking data from sensors, making decisions, and sending commands to motors, all while being flexible enough for quick tests and serious deployment.

I often use it as the main logic layer on small, affordable computers like the Raspberry Pi. These devices connect to the real world through pins and ports, and Python scripts are what bring them to life, turning code into physical movement and reaction.

Let me show you some of the most effective methods I use to make this happen.


The Robot Operating System, or ROS, is a game-changer. Think of it as a communication network designed specifically for robots. Different parts of a robot—like a camera, a laser scanner, or a motor controller—can run as independent programs called "nodes." ROS lets these nodes find each other and share information seamlessly through "topics."

A camera node can publish video data to a topic called /camera. A navigation node can subscribe to that same topic to see what the camera sees, while also subscribing to a /laser_scan topic from a distance sensor. It then calculates a safe path and publishes velocity commands to a /motor_control topic.

Here’s a simplified look at how you might structure a navigation node in ROS 2 with Python.

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan
from geometry_msgs.msg import Twist
import numpy as np

class SimpleNavigator(Node):
    def __init__(self):
        super().__init__('simple_navigator')
        # Listen to the laser scanner
        self.scan_sub = self.create_subscription(
            LaserScan, '/scan', self.scan_received, 10)
        # Talk to the motor controller
        self.cmd_pub = self.create_publisher(Twist, '/cmd_vel', 10)
        self.safe_distance = 0.5  # meters

    def scan_received(self, msg):
        # Check the distance readings
        ranges = np.array(msg.ranges)
        ranges[ranges == np.inf] = msg.range_max  # Handle 'no reading'
        closest = np.min(ranges)

        # Decide what to do
        move_cmd = Twist()
        if closest > self.safe_distance:
            # Path is clear, go forward
            move_cmd.linear.x = 0.3
            self.get_logger().info('Moving forward.')
        else:
            # Too close! Stop and turn.
            move_cmd.angular.z = 0.5
            self.get_logger().warn('Obstacle detected, turning.')

        # Send the command
        self.cmd_pub.publish(move_cmd)

def main():
    rclpy.init()
    node = SimpleNavigator()
    rclpy.spin(node)  # Keeps the node running
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

This pattern is powerful. You can plug in new sensors or algorithms by just having them publish or subscribe to the right topics, without rewriting everything else.


Robots need to know where they are. A single sensor is often unreliable—a GPS might drop out indoors, an accelerometer drifts over time. The solution is to combine them, a process called sensor fusion. My go-to mathematical tool for this is the Kalman Filter.

Imagine you're estimating a robot's position. You have a prediction from its wheel encoders (it should have moved 1 meter), and you have a measurement from a camera (it sees the robot at 1.2 meters). The Kalman Filter intelligently blends these, trusting the prediction or the measurement more based on their known reliability.

It constantly cycles: predict the new state, then update that prediction with fresh sensor data. Here’s a basic example fusing position and velocity.

import numpy as np

class SimpleKalmanFilter:
    def __init__(self, initial_pos, initial_vel):
        # State: [position, velocity]
        self.state = np.array([initial_pos, initial_vel])
        # Uncertainty (covariance) matrix
        self.P = np.eye(2) * 100  # Start with high uncertainty

        # Model matrices
        self.F = np.array([[1, 0.1],  # State transition: pos_new = pos + 0.1*vel
                           [0, 1]])   # vel_new = vel
        self.Q = np.eye(2) * 0.01     # Small process noise (model isn't perfect)

        # Measurement matrix: We only measure position directly
        self.H = np.array([[1, 0]])
        self.R = np.array([[0.1]])    # Measurement noise (sensor accuracy)

    def predict(self):
        # Project the state forward
        self.state = self.F @ self.state
        # Update the uncertainty
        self.P = self.F @ self.P @ self.F.T + self.Q
        return self.state[0]  # Return predicted position

    def update(self, measurement):
        # Calculate Kalman Gain (how much to trust the measurement)
        S = self.H @ self.P @ self.H.T + self.R
        K = self.P @ self.H.T @ np.linalg.inv(S)

        # Update state with measurement
        y = measurement - self.H @ self.state  # Innovation (measurement residual)
        self.state = self.state + K @ y

        # Update uncertainty
        self.P = (np.eye(2) - K @ self.H) @ self.P
        return self.state[0]  # Return updated position

# Let's test it
kf = SimpleKalmanFilter(initial_pos=0, initial_vel=1.0)
true_position = 0
for step in range(20):
    true_position += 0.95  # Robot moves ~0.95m per step (slightly slippery)
    predicted = kf.predict()
    # Simulate a noisy sensor reading
    measurement = true_position + np.random.randn() * 0.3
    estimated = kf.update(measurement)
    print(f"Step {step}: True={true_position:.2f}, Measured={measurement:.2f}, Estimated={estimated:.2f}")
Enter fullscreen mode Exit fullscreen mode

The output shows the filter smoothing out the noisy measurements, tracking the true position much closer than the sensor could alone. This principle scales up to fuse data from inertial measurement units (IMUs), GPS, cameras, and more.


Turning a command like "go to position X" into precise motor movement requires a dedicated controller. The Proportional-Integral-Derivative (PID) controller is the workhorse here. It calculates a control signal (like motor voltage) based on the error between where you want to be (setpoint) and where you are (feedback).

  • Proportional: Reacts to the current error. Bigger error, bigger reaction.
  • Integral: Reacts to the accumulation of past error. Corrects for steady, small offsets.
  • Derivative: Predicts future error based on its rate of change. Damps the system to prevent overshooting.

Here’s a PID controller for a motor's position.

import time

class PIDController:
    def __init__(self, Kp, Ki, Kd, setpoint=0):
        self.Kp, self.Ki, self.Kd = Kp, Ki, Kd
        self.setpoint = setpoint
        self.reset()

    def reset(self):
        self.integral = 0.0
        self.prev_error = 0.0
        self.prev_time = time.time()

    def compute(self, measurement):
        now = time.time()
        dt = now - self.prev_time
        if dt <= 0:
            dt = 1e-6  # Avoid division by zero

        error = self.setpoint - measurement

        # P term
        P = self.Kp * error

        # I term (with clamping to prevent "windup")
        self.integral += error * dt
        I = self.Ki * self.integral

        # D term
        derivative = (error - self.prev_error) / dt
        D = self.Kd * derivative

        # Total output
        output = P + I + D

        # Remember for next time
        self.prev_error = error
        self.prev_time = now

        return output

# Simulating a motor
class SimulatedMotor:
    def __init__(self):
        self.position = 0.0
        self.velocity = 0.0
        self.load = 0.1  # Simulating friction

    def apply_voltage(self, voltage, dt=0.01):
        # Simple physics: voltage creates force, force changes velocity
        force = voltage - self.load * self.velocity
        self.velocity += force * dt
        self.position += self.velocity * dt
        return self.position

# Test: Command the motor to hold at position 10
pid = PIDController(Kp=2.0, Ki=0.5, Kd=0.1, setpoint=10.0)
motor = SimulatedMotor()

print("Time(s) | Target | Position | Control Signal")
for i in range(200):
    pos = motor.position
    control = pid.compute(pos)
    new_pos = motor.apply_voltage(control)
    if i % 20 == 0:
        print(f"{i*0.01:6.2f} | {10:7.1f} | {new_pos:8.3f} | {control:12.3f}")
Enter fullscreen mode Exit fullscreen mode

You’ll see the control signal start high to move the motor, then adjust to hold it steady at the setpoint, fighting against the simulated friction. Tuning the Kp, Ki, and Kd values is an art form to get fast, stable response without shaking.


For a robot to interact with the world, it needs to see. This is where computer vision comes in, and Python’s OpenCV library is incredibly powerful for real-time processing.

A common first task is detecting colored objects. Let’s say we want a robot to find a bright green ball.

import cv2
import numpy as np

def find_green_ball(frame):
    # Convert from BGR (OpenCV default) to HSV color space
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Define range for "green" in HSV
    lower_green = np.array([40, 50, 50])
    upper_green = np.array([80, 255, 255])

    # Create a mask: white for green pixels, black for everything else
    mask = cv2.inRange(hsv, lower_green, upper_green)

    # Clean up the mask with some filtering
    mask = cv2.erode(mask, None, iterations=2)
    mask = cv2.dilate(mask, None, iterations=2)

    # Find contours (blobs) in the mask
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    if contours:
        # Find the largest green blob
        largest_contour = max(contours, key=cv2.contourArea)
        ((x, y), radius) = cv2.minEnclosingCircle(largest_contour)
        if radius > 10:  # Ignore tiny noise
            # Draw a circle around it
            cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
            cv2.putText(frame, "Green Ball", (int(x), int(y - 10)),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
            return frame, (int(x), int(y), int(radius))
    return frame, None

# Use a webcam
cap = cv2.VideoCapture(0)
print("Looking for a green object... Press 'q' to quit.")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    processed_frame, ball_data = find_green_ball(frame)
    if ball_data:
        x, y, r = ball_data
        print(f"Ball center: ({x}, {y}), Radius: {r}")  # Data for the robot to use

    cv2.imshow('Vision', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
Enter fullscreen mode Exit fullscreen mode

This code gives the robot the (x, y) coordinates of the ball in the camera image. With camera calibration, you could convert this to a real-world direction. This is the foundation for tasks like object tracking, line following, or even using advanced deep learning models for recognition.


These techniques form a toolkit. In a real project, you’d combine them: use ROS nodes to manage communication, fuse IMU and wheel data to track location, use PID loops to control each wheel’s speed, and employ computer vision to identify a target. Python ties it all together, allowing you to focus on the high-level logic of what you want the robot to do, while it handles the complex math and coordination underneath. It makes the journey from a simple idea to a moving, sensing machine not just possible, but remarkably accessible.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)