Avirup Basu

Posted on Sep 21

IoT & AI - Explore ways of integrating physical and digital systems

#iot #ai #gemini #python

Introduction

The world of IoT involves physical devices. With the use of the right technology, each of these devices get smarter. AI in IoT is not new. Infact its been there for decades. Think about robotics in general. Modern day robotics have transformed a lot of industries. The greatest modernisation in this space was achieved by automobile factories. Today, as we wittness the world is adapting to the use of LLM and generative AI, there are new ways how devices can not only become smarter but also fundamentally change the way how we interact with them.
In this article, we will explore the following.

How IoT can be intelligent
How LLMs can be used in IoT
What is AI on edge
A modern era where machines are not only smart but intelligent

Smart & Intelligent devices

Let's directly put it in this way
"All intelligent devices are smart but all smart devices aren't intelligent"

Well, I am sure somebody mentioned the statement much before I did.

Essentially, smart devices are those devices from where we can monitor and control the devices in a modern way either through Internet or through a cell phone or something similar. They are not totally off-grid devices and they have connectivity and sensors.
In the world of IoT, smart devices are defined by rules. For example,
if temperature > 30, do this else do something else. Typically, the logic will be "if this then that" logic.

But when it comes to intelligent devices or systems, its different as its not strictly defined through rules but through different mathematical models, or statistical analysis or ML. There is always an intelligent factor to it. These devices and systems go beyond than just automation they do a lot more than that.

However, it is to be noted that in order to bring in intelligence into devices, the foundational layer to make it smart MUST exist.

Intelligence Internet of Things

To get started with IoT and intelligence, lets have a look at some of the key aspects where intelligence can be used.

Computer vision: An area where ML and AI lies at its core. AI is extensively used in computer vision. Starting from self driving cars all the way to crowd simulation, AI is actively being used in computer vision.
Device telemetry: All systems generate data. Be it a time-series data or something else, telemetry is the key to understand how a device or a machine is performing. A lot of ML models using in-depth statiscal analysis is being used to analyse device telemetry. Models like ARIMA are used to predict based on historical data.
Control systems: This is an area where we get the power to not only analyse telemetry but also to control the device in itself. Its essentially giving machines the direct power of AI. A lot of this depends on the data that is incoming from the sensors. A classic example of this are self-driving cars.
Edge analytics and Intelligence: Running ML models locally to do AI on edge. Its actively being used in almost all major industrial IoT systems. Here the models arent executed through API calls but runs locally and thus these are used in areas with less or no bandwidth and in air-gapped systems.
Natural language interfaces: With the advent of LLM, the way how we interact with systems are changing. IoT is no different. We can now talk to our devices. In this article we are going to cover this section where we control a device using Gemini.

What are we going to build?

We're going to build a system that involves an ESP32 / Arduino to control two LEDs green and yellow. It also contains a DHT22 sensor. Then we will have a python script which will use Gemini and Serial Port Communication to communicate with the device. Writing simple commands will execute the commands on the device.

Components:

ESP32 or Arduino
Breadboard
Yellow LED
Red LED
DHT22 sensor
Connecting cables (USB & Jumper)

Design

Let's understand the system using a design first.
The below image is the pinout of ESP32 which I will be using to design the system on.

Based on the above image, we will do the following connections.

I definitely need to work on my circuit diagram skills.

Take the referrence of the below table for referrence.

ESP32 Pin	Connected To	Component	Notes
GPIO 4	Anode (long leg)	Red LED	Controls Red LED
GPIO 16	Anode (long leg)	Yellow LED	Controls Yellow LED
GPIO 17	Data Pin	DHT11/DHT22 Sensor	Reads temperature & humidity
+5V	Vcc Pin	DHT11/DHT22 Sensor	Powers the sensor
GND	Cathodes + GND	Red LED, Yellow LED, DHT11/DHT22	Common ground for all components

Based on the above connections, we should be good to write the code for the device. We're going to use the Arduino IDE for burning the code to the ESP32.


#include <Arduino.h>
#include <Adafruit_Sensor.h>
#include <DHT.h>

// Pin assignments
#define LED_YELLOW 4
#define LED_RED    16
#define DHTPIN     17
#define DHTTYPE    DHT22

DHT dht(DHTPIN, DHTTYPE);
String cmd;

void setup() 
{
  pinMode(LED_YELLOW, OUTPUT);
  pinMode(LED_RED, OUTPUT);
  digitalWrite(LED_YELLOW, LOW);
  digitalWrite(LED_RED, LOW);

  Serial.begin(115200);
  while (!Serial) { delay(10); }

  dht.begin();
  Serial.println("READY");
}

void loop() 
{
  while (Serial.available()) 
  {
    char c = Serial.read();
    if (c == '\n' || c == '\r') 
    {
      cmd.trim();
      if (cmd.length() > 0) 
      {
        handleCommand(cmd);
        cmd = "";
      }
    } 
    else 
    {
      if (cmd.length() < 100) cmd += c; 
    }
  }
}

void handleCommand(String s) 
{
  s.toUpperCase();

  if (s == "YELLOW ON") 
  {
    digitalWrite(LED_YELLOW, HIGH);
    Serial.println("OK");
  } 
  else if (s == "YELLOW OFF") 
  {
    digitalWrite(LED_YELLOW, LOW);
    Serial.println("OK");
  } 
  else if (s == "RED ON") {
    digitalWrite(LED_RED, HIGH);
    Serial.println("OK");
  } 
  else if (s == "RED OFF") 
  {
    digitalWrite(LED_RED, LOW);
    Serial.println("OK");
  } 
  else if (s == "READ DHT") 
  {
    float t = dht.readTemperature();  
    float h = dht.readHumidity();
    if (isnan(t) || isnan(h)) 
    {
      Serial.println("ERR");
    } 
    else 
    {
      Serial.print("DHT TEMP="); 
      Serial.print(t, 2);
      Serial.print(" HUM=");     
      Serial.println(h, 2);
    }
  } 
  else 
  {
    Serial.println("ERR");
  }
}

The above code is straightforward. Let's see how it works.

In the first section of the code, we initialise the header files, do the pin assignments based on the schematic above and finally create some objects which will be used later on. We also define some global variables.
We also use libraries like the following.

Adafruit_Sensor.h
DHT.h The above two are responsible for getting the value from the DHT sensor.


#include <Arduino.h>
#include <Adafruit_Sensor.h>
#include <DHT.h>

// Pin assignments
#define LED_YELLOW 4
#define LED_RED    16
#define DHTPIN     17
#define DHTTYPE    DHT22

DHT dht(DHTPIN, DHTTYPE);
String cmd;

Next, we write the setup method. This essentially runs one time whenever the board is loaded.

void setup() 
{
  pinMode(LED_YELLOW, OUTPUT);
  pinMode(LED_RED, OUTPUT);
  digitalWrite(LED_YELLOW, LOW);
  digitalWrite(LED_RED, LOW);

  Serial.begin(115200);
  while (!Serial) { delay(10); }

  dht.begin();
  Serial.println("READY");
}

We've defined the LED pins as output and set it to default condition as low. We also initiatlised the Serial port communication through which the ESP32 will be communicating with the PC. In this block, we also initialise the sensor which will be recording the values that are captured by the sensor.

Now, that all the setup has been completed, let's head to loop method. It is an infinite loop that executes infinitely and all the logic goes inside here.


void loop() 
{
  while (Serial.available()) 
  {
    char c = Serial.read();
    if (c == '\n' || c == '\r') 
    {
      cmd.trim();
      if (cmd.length() > 0) 
      {
        handleCommand(cmd);
        cmd = "";
      }
    } 
    else 
    {
      if (cmd.length() < 100) cmd += c; 
    }
  }
}

In this block, we simply handle the Serial port communication and call the handleCommand method to execute an action.

The last else is responsible for collecting all the characters and making sure the max length is less than 100.

Finally, in the handleCommand method, we simply perform the action based on the command received.

void handleCommand(String s) 
{
  s.toUpperCase();

  if (s == "YELLOW ON") 
  {
    digitalWrite(LED_YELLOW, HIGH);
    Serial.println("OK");
  } 
  else if (s == "YELLOW OFF") 
  {
    digitalWrite(LED_YELLOW, LOW);
    Serial.println("OK");
  } 
  else if (s == "RED ON") {
    digitalWrite(LED_RED, HIGH);
    Serial.println("OK");
  } 
  else if (s == "RED OFF") 
  {
    digitalWrite(LED_RED, LOW);
    Serial.println("OK");
  } 
  else if (s == "READ DHT") 
  {
    float t = dht.readTemperature();  
    float h = dht.readHumidity();
    if (isnan(t) || isnan(h)) 
    {
      Serial.println("ERR");
    } 
    else 
    {
      Serial.print("DHT TEMP="); 
      Serial.print(t, 2);
      Serial.print(" HUM=");     
      Serial.println(h, 2);
    }
  } 
  else 
  {
    Serial.println("ERR");
  }
}

Now, before we head to the main intelligence part where we use Gemini, we can test the above using Serial Port Communication. In Arduino IDE, there is a Serial terming. On entering the correct command, the corresponding action should be executed.

Thus, if you type commands like yellow on, red on, read dht, it should execute the action and return the corresponding response as defined in handleCommand() method.

Once the above is sorted, lets approach the driver section from where the code will be driven through Serial port communication and Gemini.

Implementation of the Gemini Driver

This is going to be the place from where we drive the entire logic. We ideally want to do the following.

Maintain a context of the conversation
Control the LEDs through natural language
Get the DHT sensor raw values and then summarise it.

To implement the following, we will use python and the respective modules. Before proceeding forward make sure to have the API key.

Before getting started with the code, lets check-out the flow of the logic first. The flow will consist of the following.

Initiate the modules
Prepare the prompt
Capture the user input
Embedd the user imput inside the propmt.
Initiate the request while maintaining context
Get the response from the LLM
Send Serial command to the device by decoding the response

API usage - Google Generative AI

We are interested in the use of python-genai module. Some of the key features to look for are the following.

System instructions:

These are like a rule book for the LLM. They wont change during each chat or command. They define the persona of the LLM. Here we are using this to define exactly what we want. The system instruction used for this project is mentioned below.

You are an IoT command router for an ESP32 device. Convert user inputs into EXACTLY one of:
- YELLOW ON
- YELLOW OFF
- RED ON
- RED OFF
- READ DHT
- NONE

Rules:
- Use conversational context to resolve pronouns like "it".
- If the user implies on/off without color, infer from the most recently referenced LED.
- If the user asks for temp/humidity, use READ DHT.
- If unclear or unrelated, return NONE.

Return ONLY the command token (no explanations, punctuation, or extra text).

This means that the LLM response will be limited to the above which makes it a lot easier for us to handle the commands to be sent to the ESP32.

Client steup & Authentication

As was mentioned earlier, we need the API key to use this API.
We can use environment variables to read the API key.

client = genai.Client(api_key=api_key)

Chat session & Context memory

This is one of the most key features. We need to maintain the context of the conversation. For example, if I switch on the LEDs in sequence, lets say red and then yellow followed by prompting "switch it off", that should switch off the yellow LED. How do we handle it?

chat = client.chats.create(
        model="gemini-2.5-flash",  # or "gemini-1.5-flash"
        config=types.GenerateContentConfig(system_instruction=SYSTEM_INSTRUCTION),
        history=[],  # explicit, equivalent to a fresh session
    )

In the above block of code, the client.chats.create() initiates a chat session. The context is preserved internally by the library. Follow up chats will be taken care by:

resp = chat.send_message(user_input)

That way, we can maintain the context.

Code flow

Let's have a look at the entire code and then have a look at the key blocks.

import os
import sys
import time
import json
import serial

from google import genai
from google.genai import types

BAUD = 115200

SYSTEM_INSTRUCTION = """
You are an IoT command router for an ESP32 device. Convert user inputs into EXACTLY one of:
- YELLOW ON
- YELLOW OFF
- RED ON
- RED OFF
- READ DHT
- NONE

Rules:
- Use conversational context to resolve pronouns like "it".
- If the user implies on/off without color, infer from the most recently referenced LED.
- If the user asks for temp/humidity, use READ DHT.
- If unclear or unrelated, return NONE.

Return ONLY the command token (no explanations, punctuation, or extra text).
"""

ALLOWED = {"YELLOW ON", "YELLOW OFF", "RED ON", "RED OFF", "READ DHT", "NONE"}

def choose_port() -> str:
    try:
        from serial.tools import list_ports
        ports = list(list_ports.comports())
        if not ports:
            print("No serial ports found. Enter manually.")
            return input("Serial port (e.g., COM5, /dev/ttyUSB0, /dev/cu.usbserial-XXXX): ").strip()
        print("Available ports:")
        for i, p in enumerate(ports, 1):
            print(f"  {i}. {p.device} - {p.description}")
        sel = input("Select port number or type a path: ").strip()
        if sel.isdigit():
            idx = int(sel)
            if 1 <= idx <= len(ports):
                return ports[idx-1].device
        return sel
    except Exception:
        return input("Serial port (e.g., COM5, /dev/ttyUSB0, /dev/cu.usbserial-XXXX): ").strip()

def open_serial(port: str, baud: int) -> serial.Serial:
    ser = serial.Serial(port=port, baudrate=baud, timeout=1.0)
    time.sleep(2.0)  # allow ESP32 to (auto)reset and print boot logs
    boot = ser.read_all().decode(errors="ignore").strip()
    if boot:
        print(f"[device] {boot}")
    return ser

def parse_dht_line(line: str):
    # Expected: "DHT TEMP=<C> HUM=<%>"
    if not line.startswith("DHT"):
        return None
    try:
        parts = line.replace("DHT", "").strip().split()
        t = float(parts[0].split("=")[1])
        h = float(parts[1].split("=")[1])
        return {"temp_c": t, "hum_pct": h}
    except Exception:
        return None

def summarize_dht(client: genai.Client, dht: dict) -> str:
    payload = json.dumps(dht)
    resp = client.models.generate_content(
        model="gemini-2.5-flash",  # or "gemini-1.5-flash" if you prefer
        contents=payload,
        config=types.GenerateContentConfig(
            system_instruction=(
                "You are an IoT analyst. Given DHT22 readings, write a concise summary (<=40 words) "
                "and one recommended action. If temp > 30°C, suggest cooling/ventilation. "
                "If humidity > 70%, suggest dehumidification. Use plain, human-readable language."
            )
        ),
    )
    return (resp.text or "").strip()

def main():
    api_key = "<YOUR API KEY>"
    if not api_key:
        print("ERROR: GOOGLE_API_KEY not set.", file=sys.stderr)
        sys.exit(1)

    # Create SDK client (Gemini Developer API)
    client = genai.Client(api_key=api_key)

    # Create a chat session with system instruction; start with empty history
    chat = client.chats.create(
        model="gemini-2.5-flash",  # or "gemini-1.5-flash"
        config=types.GenerateContentConfig(system_instruction=SYSTEM_INSTRUCTION),
        history=[],  # explicit, equivalent to a fresh session
    )

    port = choose_port()
    ser = open_serial(port, BAUD)

    print("\nType instructions (e.g., 'turn on yellow light', 'switch off red', 'what's the temperature').")
    print("Type 'quit' to exit.\n")

    try:
        while True:
            user_input = input("> ").strip()
            if user_input.lower() in {"quit", "exit", "q"}:
                break

            # Send the user message into the same chat session
            resp = chat.send_message(user_input)
            cmd = (resp.text or "").strip().upper()
            print(f"[gemini] {cmd}")

            if cmd not in ALLOWED or cmd == "NONE":
                print("[info] No action taken.")
                # Keep the chat aware that no action occurred (optional)
                try:
                    chat.send_message("NOTE: No valid device action taken for that request.")
                except Exception:
                    pass
                continue

            # Send command to device
            ser.write((cmd + "\n").encode())
            reply = ser.readline().decode(errors="ignore").strip()
            print(f"[device] {reply}")

            # Add device reply back into the chat for context (non-fatal on errors)
            try:
                chat.send_message(f"Device reply: {reply}. Acknowledge and remember this context.")
            except Exception as e:
                print(f"[warn] Could not add device reply to context: {e}", file=sys.stderr)

            # If the device returned DHT readings, generate a short summary
            dht = parse_dht_line(reply)
            if dht:
                try:
                    summary_text = summarize_dht(client, dht)
                    print(f"[gemini-summary] {summary_text}")
                except Exception as e:
                    print(f"[warn] DHT summary failed: {e}", file=sys.stderr)

    finally:
        try:
            ser.close()
        except Exception:
            pass

if __name__ == "__main__":
    main()

The choose_port() method simply displays the list of serial devices connected to the PC and takes in the input from the user for the python script to connect to. It relies on the usage of the library pyserial.

Additionally, we have open_serial method to establish the Serial port communication.

I have a video specifically on the usage of Python and Serial port communication. It can be viewed here.

Moving forward, we have another helper method

We've a method which parses the response received from the ESP32 for temperature and humidity data-points.

This is followed by another method summarise_dht() which simply uses Gemini to summarise the results in a descriptive way. It uses generate_content and system_instructions to do the same.

def summarize_dht(client: genai.Client, dht: dict) -> str:
    payload = json.dumps(dht)
    resp = client.models.generate_content(
        model="gemini-2.5-flash",  # or "gemini-1.5-flash" if you prefer
        contents=payload,
        config=types.GenerateContentConfig(
            system_instruction=(
                "You are an IoT analyst. Given DHT22 readings, write a concise summary (<=40 words) "
                "and one recommended action. If temp > 30°C, suggest cooling/ventilation. "
                "If humidity > 70%, suggest dehumidification. Use plain, human-readable language."
            )
        ),
    )
    return (resp.text or "").strip()

Finally, we have the main method, which establishes the following flow.

READ api key
Create the SDK client
Connect to ESP32
Trigger an infinite loop to take in input from the user
Use chat.send_message() to initiate the chat in a contextual way
Summarise dht values if triggered.

The main core logic lies in the inifinite loop section.

while True:
            user_input = input("> ").strip()
            if user_input.lower() in {"quit", "exit", "q"}:
                break

            # Send the user message into the same chat session
            resp = chat.send_message(user_input)
            cmd = (resp.text or "").strip().upper()
            print(f"[gemini] {cmd}")

            if cmd not in ALLOWED or cmd == "NONE":
                print("[info] No action taken.")
                # Keep the chat aware that no action occurred (optional)
                try:
                    chat.send_message("NOTE: No valid device action taken for that request.")
                except Exception:
                    pass
                continue

            # Send command to device
            ser.write((cmd + "\n").encode())
            reply = ser.readline().decode(errors="ignore").strip()
            print(f"[device] {reply}")

            # Add device reply back into the chat for context (non-fatal on errors)
            try:
                chat.send_message(f"Device reply: {reply}. Acknowledge and remember this context.")
            except Exception as e:
                print(f"[warn] Could not add device reply to context: {e}", file=sys.stderr)

            # If the device returned DHT readings, generate a short summary
            dht = parse_dht_line(reply)
            if dht:
                try:
                    summary_text = summarize_dht(client, dht)
                    print(f"[gemini-summary] {summary_text}")
                except Exception as e:
                    print(f"[warn] DHT summary failed: {e}", file=sys.stderr)

To execute the above codebase, make sure the dependencies are installed. For the same, just generate a requirements.txt file and paste the following.

google-genai>=0.3.0
pyserial>=3.5

Then simply execute pip install -r requirements.txt and all of the depedencies should get installed.

Execute the codebase and enjoy. Please make sure to connect your board first.

In the above screenshot, you can see how it works.

Conclusion & Summary

Through this article, we have established the following.

Establish a communication with a physical IoT device
Communicate with Gemini using the SDKs.
Understand the core concepts of AI + IoT through an elementery project.

In the upcoming articles, we are going to explore these more in-depth and explore the exciting world of intelligent & smart devices.

DEV Community