DEV Community

Cover image for Building a Web3 Camera That Proves Photos Are Real, Powered by Gemini CLI
MOHIT BHAT
MOHIT BHAT

Posted on

Building a Web3 Camera That Proves Photos Are Real, Powered by Gemini CLI

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

So I built this thing called LensMint. It's a physical Web3 camera that solves something I've been thinking about for a while: in a world full of AI-generated images, how do you actually prove a photo is real?

The concept is pretty simple, but building it was honestly a wild ride. Picture a Raspberry Pi 4 with a camera module strapped to it, a little touchscreen display, and a whole lot of cryptography running underneath. You snap a photo, and the camera cryptographically signs it with a key that's literally derived from the hardware itself. Then it uploads to Filecoin for permanent storage, mints an ERC-1155 NFT on Ethereum, kicks off a zero-knowledge proof to verify authenticity on-chain, and throws a QR code on screen so anyone nearby can claim their own edition. All of that happens in seconds, right there on the device.

Now here's the thing. This project ended up being around 7,000+ lines of code across Python, JavaScript, Solidity, and bash scripts. Hardware stuff, blockchain contracts, ZK proof pipelines, decentralized storage, a full-stack web app. That's a LOT of ground to cover for one person.

The reason I was able to actually ship it? Google Gemini CLI. It became my constant companion throughout this build. Not just for generating code, but for thinking through architecture, debugging weird errors at 2am, and navigating SDKs I'd never touched before. Let me walk you through the whole thing.

Demo

GitHub Repository: lensmint-camera-gemini
Video Demo: Watch on YouTube

🔧 Hardware Components

Raspberry Pi 4 Board

Camera Module

3.5 inch Touchscreen Display

UPS HAT Battery Module

Power Supply and Cables

SD Card and Storage

All Components Together

💻 Backend Screenshots

Backend Screenshot 1

Backend Screenshot 2

Backend Screenshot 3

Backend Screenshot 4

Backend Screenshot 5

Backend Screenshot 6

Backend Screenshot 7

Backend Screenshot 8

Backend Screenshot 9

📷 Physical Camera Final Images

Camera Front View

Camera Side View

Camera with Display

Camera UI Preview

Camera Capture Mode

Camera QR Code Display

Camera Back View

Camera in Action 1

Camera in Action 2

Camera in Action 3

Camera in Action 4

Camera Full Setup

Camera Final Build

The Problem LensMint Solves

Think about this for a second. Right now, anyone can fire up Midjourney or DALL-E and generate a photorealistic image of basically anything. That's cool for art, but it's a nightmare for trust. A photojournalist captures something incredible in the field, and people ask "is that even real?" A photographer tries to license their work, and clients wonder if AI made it. Event organizers have no reliable way to prove who actually showed up.

This bugged me enough to build something about it. LensMint attacks it from four angles:

Proof of Authenticity. Every photo is signed by a cryptographic key that's derived from the physical hardware itself (the CPU serial number, MAC address, camera sensor properties, and a persisted salt). The signature proves which exact device captured the image. This isn't software that can be spoofed. The key is deterministic to the hardware.

Permanent Storage. Photos go straight to Filecoin via the Synapse SDK. Not a cloud server. Not someone's S3 bucket. Decentralized, permanent storage that nobody can take down.

On-Chain Verification. Every photo becomes an ERC-1155 NFT. The device must be registered in our DeviceRegistry smart contract before it can mint. Then vlayer generates a TLS-notarized web proof of the metadata, compresses it into a RISC Zero zero-knowledge proof, and submits it on-chain for verification. The blockchain confirms the photo is real without revealing any private data.

Easy Distribution. After capturing a photo, the camera shows a QR code. Anyone nearby can scan it, enter their wallet address on a clean web page, and receive an edition NFT. No wallet setup on the camera. No complicated flows. Scan, paste, done.

The Architecture, Briefly

LensMint has five main components that all talk to each other:

The Camera App is a Python/Kivy application running on the Raspberry Pi. It handles the live camera preview, photo capture, image signing, and QR code display. The identity system generates a SECP256k1 ECDSA keypair from hardware identifiers. When you capture a photo, it computes a SHA-256 hash and signs it with the device's private key.

The Hardware Web3 Service is a Node.js/Express backend also running on the Pi. It receives captured images from the camera app, uploads them to Filecoin, creates claims on the public server, mints NFTs by calling the smart contracts, and orchestrates the entire ZK proof pipeline. It also polls for pending edition requests and mints them automatically.

Smart Contracts on Ethereum Sepolia include the DeviceRegistry (which tracks authorized cameras), the LensMintERC1155 (which handles NFT minting with full provenance metadata), and the LensMintVerifier (which validates RISC Zero ZK proofs on-chain).

The Public Claim Server runs on Render and serves the claim pages. When someone scans the QR code, they land on a page with an animated NFT card preview where they can enter their wallet address.

The Owner Portal is a React/Vite web app with Privy wallet integration for device owners to manage their cameras and gallery.

Raspberry Pi 4 (Camera + Touchscreen)
    │
    ├── Camera App (Python/Kivy)
    │     └── Hardware Identity (SECP256k1 keys from hardware)
    │
    └── Web3 Service (Node.js)
          ├── Filecoin (Synapse SDK for permanent storage)
          ├── Ethereum Sepolia (Smart contracts)
          │     ├── DeviceRegistry.sol
          │     ├── LensMintERC1155.sol
          │     └── LensMintVerifier.sol (RISC Zero)
          ├── vlayer (ZK proof generation)
          └── Public Claim Server (Render)
                └── Owner Portal (React/Vite + Privy)
Enter fullscreen mode Exit fullscreen mode

How Gemini CLI Powered the Entire Build

Okay, this is the part I'm most excited to talk about because honestly, without Gemini CLI, I don't think this project would exist.

Building LensMint meant juggling at least five completely different technology domains at the same time. I2C battery monitoring and camera modules on a Raspberry Pi. Python GUI development with Kivy. Solidity smart contracts. Zero-knowledge proof pipelines (which I'd never touched before). Full-stack web development with React and Express. Any one of these could be its own project. I was trying to do all of them at once.

Gemini CLI is what made that possible. Let me break down what it is, how it actually works, and then show you what it looked like in practice.

What Is Google Gemini?

Before I get into the CLI specifically, let me talk about Gemini itself because understanding the model helps you use the tool better.

Google Gemini is Google's family of multimodal AI models. When people say "Gemini," they're talking about large language models built by Google DeepMind that can understand and generate text, code, images, audio, and video. The model family includes different sizes: Gemini Ultra for the most complex tasks, Gemini Pro for balanced performance, and Gemini Flash for speed-optimized workloads.

What makes Gemini interesting for developers specifically is a few things:

Massive context window. Gemini models support context windows up to 1 million tokens (and Gemini 2.5 Pro pushes even further). For coding, this is huge. It means the model can hold your entire codebase in its working memory and reason about relationships between files that are thousands of lines apart. When I was working on the connection between my Python camera app and my Node.js backend, Gemini could see both codebases simultaneously.

Native code understanding. Gemini was trained on enormous amounts of code across many languages. It doesn't just pattern-match syntax. It understands data flow, error handling patterns, security implications, and architectural trade-offs. When I asked it about my Solidity access control patterns, it caught a reentrancy issue that I had totally missed.

Multimodal capabilities. While I primarily used the text/code capabilities, Gemini can also reason about images, which is relevant for a camera project. The model's ability to understand visual context alongside code context opens up interesting possibilities.

Function calling and tool use. Gemini models can interact with external tools, APIs, and your local environment. This is what makes the CLI integration so powerful. It's not just answering questions in isolation. It can read files, understand project structure, and ground its responses in your actual code.

What Is Gemini CLI?

Gemini CLI is Google's open-source command-line tool that brings all of Gemini's capabilities directly into your terminal. Think of it as having a senior developer sitting next to you who can instantly read any file in your project, understand how everything connects, and help you write, debug, or refactor code.

Here's what makes it different from just chatting with Gemini in a browser:

It lives in your project. When you run gemini in your project directory, it automatically understands your file structure, your dependencies, your existing patterns. You don't have to copy-paste code into a chat window and explain what everything does. It already knows.

It remembers context across turns. You can have a multi-turn conversation where you start by discussing architecture, move into implementation, hit a bug, debug it, and then continue building. Gemini CLI holds the thread of the entire conversation. This was essential for LensMint because I'd often start with "I need a module that does X," get a first draft, test it, hit an error, paste the error back in, and iterate until it worked.

It can read and reference your files. You can say "look at my web3Service.js and create a similar service for vlayer" and it will actually read that file, understand the patterns, and produce code that follows the same conventions. This consistency across a codebase matters way more than people think.

Getting started is dead simple. You install it, authenticate with your Google account, and you're ready. No complex configuration. No API key management headaches. Just npm install -g @anthropic/gemini-cli (or however Google distributes it), authenticate, and go.

The capabilities I leaned on the most:

Code generation that fits your project. When I asked Gemini CLI to create my vlayerService.js module, it had already seen my web3Service.js and filecoinService.js. The generated code followed the same error handling patterns, the same logging style, the same module structure. I didn't have to explain any of that. It just picked it up.

Cross-language reasoning. LensMint has Python talking to Node.js talking to Solidity. I could say something like "the Python hardware identity module exports the private key to a file, and the Node.js service needs to read it, what's the cleanest way to bridge this?" and Gemini CLI gave me a solution that understood both ecosystems properly. The getHardwareKey.js bridge module came directly from that kind of conversation.

Debugging with real stack traces. When stuff broke (and oh man, stuff broke a lot), I'd paste the error right into Gemini CLI and get back debugging advice that referenced my actual file names and function signatures. Not generic "check your imports" advice. Real, contextual diagnosis.

Architecture thinking. Some of my most valuable Gemini CLI sessions were before I wrote any code at all. I'd describe what I wanted to build and explore different approaches. Should the camera app talk directly to the blockchain? Should there be a backend service in between? How should the claim system work? These conversations saved me from going down dead ends.

Concrete Examples from the Build

Alright, theory is nice, but let me show you specific moments where Gemini CLI saved my skin.

The Hardware Identity System. Generating a deterministic ECDSA keypair from hardware identifiers sounds simple enough on paper. In practice? A dozen edge cases I hadn't thought about. What if someone swaps the camera module? What if wlan0 isn't available on that particular Pi? How do you persist the salt safely so the key survives reboots? I brought all these questions to Gemini CLI and we worked through the fallback chain together: try /proc/cpuinfo for CPU serial, fall back to a default. Try wlan0 MAC address, then eth0, then a placeholder. Store the salt at /boot/.device_salt with restricted permissions, keep a backup at ~/.lensmint/.device_salt_backup. That kind of careful defensive programming came directly from iterating with Gemini. I wouldn't have thought of half those fallback scenarios on my own.

Solidity Contract Design. The interaction between DeviceRegistry, LensMintERC1155, and LensMintVerifier required careful design. The ERC1155 contract needs to verify that the caller is a registered and active device before allowing minting. The Verifier contract needs to validate RISC Zero proofs against expected parameters (notary fingerprint, URL pattern, queries hash). Gemini CLI helped me design the access control patterns and the data structures for storing provenance metadata on-chain. It also caught a reentrancy issue I'd missed in an early draft of the mintEdition function.

The ZK Proof Pipeline. This was the most complex part of the build. The flow goes: call the vlayer Web Prover API to create a TLS-notarized web proof of the metadata endpoint, then send that to the ZK Prover API for RISC Zero compression, extract specific fields using JMESPath queries, and finally submit the proof on-chain. Gemini CLI helped me understand vlayer's API structure (which was new to me), design the extraction queries, handle the async proof generation flow, and troubleshoot proof validation failures on-chain.

Kivy UI on a Small Screen. Building a camera interface for a 3.5-inch touchscreen is surprisingly tricky. Everything needs to be touch-friendly with large buttons. The live preview needs to run at 30 FPS without blocking the UI thread. Gemini CLI helped me structure the Kivy layout, implement the texture-based camera preview (using Clock.schedule_interval for frame updates), and design the gallery view with thumbnail grids. It also helped me figure out the correct approach for overlaying QR codes on the live preview without disrupting the camera feed.

Bridging Python and Node.js. The camera app exports the device private key so the Node.js backend can use it for blockchain transactions. This required a careful bridge: the Python side writes the key to a .device_key_export file, and the Node.js side reads it with multiple fallback strategies (read file, run Python export script, or execute inline Python). Gemini CLI helped me design this bridge robustly with proper error handling and caching (the key is cached for one hour to avoid repeated file reads or Python process spawning).

The Development Workflow

My daily rhythm with Gemini CLI settled into something like this:

  1. Think out loud. I'd describe what I needed in plain English ("I need a service that uploads images to Filecoin using the Synapse SDK, handles payment setup, and returns PieceCIDs") and Gemini would help me think through the interface, the error cases, and how it should connect to everything else.

  2. Get a first draft. Gemini CLI would produce a working scaffold. Because it could see my other files, the code already felt like it belonged in the project.

  3. Break things, fix things. I'd test, hit errors, paste the stack trace back into Gemini CLI, get a fix, test again. This loop was incredibly fast because Gemini already understood the full context.

  4. Wire things together. When connecting components across languages, Gemini CLI helped me keep data formats, API contracts, and error handling consistent. It would flag things like "hey, your Python side sends image_hash but your Express endpoint expects imageHash" before those bugs ever hit runtime.

  5. Stress test mentally. Gemini CLI was great at playing devil's advocate. "What happens if the Filecoin upload succeeds but the NFT mint fails? What if the user scans the QR code after the claim expires? What if the device wallet doesn't have enough ETH for gas?" These questions led to retry logic, error recovery, and UX improvements I wouldn't have thought to add.

Tips and Tricks for Getting the Most Out of Gemini CLI

After spending weeks with Gemini CLI on this project, I picked up some patterns that made a real difference. If you're about to start using it, these might save you some time.

1. Always Start From Your Project Root

This sounds obvious but it matters. When you launch Gemini CLI from your project's root directory, it can traverse your file tree and understand how things connect. If you launch it from some random directory, you lose all that context. I always cd into lensmint-camera/ before starting a session.

2. Give It the Big Picture First

Before diving into code, spend your first prompt explaining what the project does. Something like "This is a Web3 camera built on Raspberry Pi. It captures photos, signs them cryptographically, uploads to Filecoin, and mints NFTs on Ethereum. I'm about to work on the ZK proof pipeline." This framing helps Gemini give you much better answers for the rest of the conversation because it understands where each piece fits.

3. Reference Your Existing Files Explicitly

Instead of describing your code patterns from scratch, just point Gemini at an existing file. "Look at how I structured filecoinService.js and create a similar service for vlayer integration" produces way better results than "create a vlayer service with good error handling." It mirrors your actual style.

4. Use It for Architecture Decisions Early

Some of my highest-value Gemini CLI sessions produced zero code. I'd explain what I wanted to build and ask it to help me think through trade-offs. "Should the camera app call the blockchain directly via Python, or should I put a Node.js service in between?" The answer (use Node.js because ethers.js is way more mature than Python's web3 libraries for what I needed) saved me from a painful refactor later.

5. Keep Conversations Focused

I learned this the hard way. Don't try to solve everything in one massive conversation. If you're debugging a Filecoin upload issue, don't suddenly switch to asking about your Solidity contracts. Start a new session. Gemini CLI holds context within a conversation, but keeping topics focused gets you better answers.

6. Paste Full Error Messages, Not Summaries

When debugging, paste the complete stack trace. Don't paraphrase it. Don't say "I'm getting a timeout error." Paste the actual error with line numbers, file paths, all of it. Gemini CLI can trace through your code to find root causes, but only if it has the full picture.

7. Ask It to Review, Not Just Generate

One underused workflow: write your code yourself, then ask Gemini CLI to review it. "Here's my mintEdition function in Solidity. Can you spot any issues?" It caught a reentrancy vulnerability in my contract that I'd completely missed. Code review is where it really shines because it brings a fresh perspective to code you've been staring at for hours.

8. Use It to Learn New SDKs

Whenever I hit a new SDK (Filecoin Synapse, vlayer, Privy), my first move was to describe what I wanted to accomplish and ask Gemini CLI for an implementation. Even if the generated code wasn't perfectly up to date with the latest API, it gave me the right mental model and structure. I'd then adjust the specifics by checking the official docs. It's way faster than reading docs cold.

9. Ask "What Could Go Wrong?"

After implementing a feature, I'd ask Gemini CLI something like "what are the failure modes for this upload flow?" The answers were consistently useful. It would remind me about network timeouts, partial failures, race conditions, and edge cases that only show up in production. A lot of LensMint's retry logic and error recovery exists because of these conversations.

How It Looks in Action

The physical camera is a Raspberry Pi 4 in a custom enclosure with a touchscreen showing the live camera preview. When you press the capture button:

  1. The screen briefly flashes to confirm capture
  2. The image is signed, uploaded, and minted (status indicators show each step)
  3. A QR code appears on screen
  4. Anyone nearby scans the QR code on their phone
  5. They see a claim page with an animated preview of the NFT
  6. They enter their wallet address and receive an edition NFT

The entire flow from capture to QR code display takes about 10 to 15 seconds. Edition minting after someone submits their wallet address takes another 15 to 30 seconds depending on network conditions.

What I Learned

ZK proofs aren't as scary as they look. I'll be honest, I went into this project pretty intimidated by zero-knowledge proofs. The math papers are intense. But vlayer's Web Prover and ZK Prover APIs abstract most of that away. You're basically telling it "prove that this server actually returned this data" and "compress that proof for on-chain verification." The hard part was getting the JMESPath extraction queries right and understanding the journal data format the verifier contract expects. Once that clicked, the ZK pipeline became just another API call. If you've been avoiding ZK proofs because they seem too complex, give them a shot. The tooling has gotten really good.

Hardware identity is an underappreciated idea. Having a device generate its own deterministic crypto identity from its physical components is elegant in a way that surprised me. The Raspberry Pi literally IS its own Ethereum wallet. No seed phrases. No key files to back up (well, just the salt). Same hardware, same keys, every time. I keep thinking about where else this pattern could work. IoT devices, supply chain tracking, sensor networks. Anytime you need to prove which physical device did something.

Bridging languages is harder than it sounds. Getting Python and Node.js to play nice on the same Pi took way more thought than I expected. File-based IPC, spawning subprocesses, caching keys, handling errors that cross language boundaries. These problems don't show up in tutorials but they dominate real-world systems. If your project spans multiple languages, budget extra time for the glue code.

Small screens force good design. Building for a 3.5-inch touchscreen means you literally cannot have a complex UI. Every screen gets one job. The camera preview is full-screen with tiny overlays. The QR code is big and centered. The gallery is a simple grid. Having that constraint actually made the UX way better than if I'd had a big screen to fill.

Gemini CLI compresses the learning curve. Before Gemini CLI, running into an unfamiliar SDK meant hours of reading docs, searching for examples, and trial-and-error. With it, I could describe what I needed, get a working implementation, and iterate from there. It doesn't eliminate the learning. You still need to understand what you're doing. But it compresses the time dramatically. Filecoin integration that would have taken a full day took a couple of hours.

Google Gemini Feedback

What Worked Really Well

Cross-language understanding is best in class. I could discuss Python, JavaScript, and Solidity in the same conversation, and Gemini CLI tracked the data flow across all three. When I said "the Python module signs the image hash and sends it to the Express endpoint, which then passes it to the Solidity contract's mintOriginal function," it understood the entire chain and could spot inconsistencies I'd missed. That kind of cross-language reasoning saved me hours of debugging integration issues.

It learns your style. Gemini CLI's ability to read my project files and understand my existing patterns meant the code it generated actually felt like mine. It picked up my logging conventions, my error handling style, how I organize modules. This matters more than people realize. When you come back to maintain the code six months later, you don't want half the codebase in one style and half in another.

Debugging is a superpower. Pasting a stack trace and getting back a diagnosis that references my specific file names, variables, and function signatures is night and day compared to generic debugging advice. Multiple times it identified root causes that I'm pretty sure would have cost me hours to find on my own.

Architecture brainstorming is maybe the highest ROI use case. The conversations I had before writing code might have been the most valuable of all. Gemini CLI talked me out of doing blockchain calls directly from Python and into building a separate Node.js service instead. That turned out to be exactly the right call because ethers.js is significantly more mature than Python's web3 options for what I needed.

Where It Struggled

I want to be honest here because I think balanced feedback is more useful than hype.

Brand new SDKs are hit or miss. The Filecoin Synapse SDK was pretty new when I was building, and Gemini CLI sometimes generated code with outdated API signatures. Totally understandable since training data has a cutoff. I just had to cross-reference with the actual SDK source code and docs in those cases. Not a dealbreaker, but worth knowing about.

Raspberry Pi weirdness. Some things are just... Pi-specific. Picamera2's quirks with different camera module versions, I2C communication edge cases with the UPS HAT, display driver issues. Gemini CLI gave reasonable approximations but couldn't always nail the Pi-specific details. I ended up writing a fix_picamera2.sh script for installation edge cases that no AI could fully predict.

Very long sessions can drift. In marathon debugging sessions (20+ back and forth exchanges), I occasionally noticed earlier context fading. My workaround was simple: start a new conversation with a fresh summary of where I was. Splitting conversations by component (one for the camera app, one for the blockchain stuff, one for the ZK pipeline) worked better than one giant thread.

The Bottom Line

Gemini CLI genuinely changed the speed at which I could build this project. LensMint touches hardware, cryptography, blockchain, decentralized storage, zero-knowledge proofs, and full-stack web dev. That's an absurd amount of ground for one developer. Without Gemini CLI holding context across all those domains simultaneously, this project would have taken months. It took weeks.

It's not magic. I want to be clear about that. You still need to understand what you're building. You still need to review every line of generated code. You still hit edge cases that you have to solve yourself. But it compresses the distance between having an idea and having working code to a degree that genuinely surprised me.

If you're working on something ambitious, especially if it crosses multiple tech domains like LensMint does, give Gemini CLI a real shot. Open your terminal, point it at your project, and start talking to it like you'd talk to a teammate. You might be surprised how far you get in an afternoon.

Top comments (0)