Gunnar Grosch for AWS

Posted on Dec 13, 2025

DEV Track Spotlight: Designing Local Generative AI Inference with AWS IoT Greengrass (DEV316)

#aws #iot #ai #machinelearning

Physical AI is transforming how machines interact with the real world - and DEV316 showed us exactly how to make it work in production. Kohei "Max" Matsushita, AWS Community Hero and Tech Evangelist at Soracom, delivered an incredibly practical session on running generative AI inference locally using AWS IoT Greengrass, complete with live robot arm demonstrations.

Max's opening set the stage perfectly: "AI has moved from the digital world into the cyber world, real world. AI is no longer just on the screen. It can move things in the physical world."

Watch the full session:

What is Physical AI?

Max defined physical AI with three core capabilities that machines need to operate in the real world:

Responsiveness - AI can react fast to what it sees and feels, letting machines move quickly when things change

Autonomy - AI can do more than follow orders. It can think and act by itself, running and creating new value

Collaboration - AI can understand human intent and work naturally with people, where humans and machines act as one team with one goal

The session featured a compelling real-world example from LINKWIZ, a Japanese company that gives robot arms vision using 3D scanners. Traditional robot arms use "teaching" - repeating fixed movements. If an object is in a different position or angle, the robot cannot pick it up. Humans end up working for the robot, manually positioning objects.

With LINKWIZ's solution using VLA (Vision Language Action) models, the robot uses cameras as vision and AI as its brain. The system can see the shape and angle of objects and adjust the robot arm's movement accordingly - demonstrating all three principles of physical AI working together.

The Latency Question: Local vs Cloud

The heart of the session addressed a critical question: Where should AI inference run for physical applications?

Max demonstrated this with a brilliant live demo using two synchronized robot arms. First, connected via USB with minimal latency - the arms moved in perfect sync. Then, routing the same signals through an LTE network to a server in US East and back - the delay was immediately visible, measuring 500-600 milliseconds.

His key insight: "Some say there is a 100 millisecond limit. When latency goes past that point, humans feel that something is wrong." He compared it to the response time after pushing a button on a vending machine - a perfect reference point for acceptable latency.

But Max was clear that Cloud AI isn't ruled out for physical applications: "If your system's acceptable, this latency then Cloud inference is still possible but you also need a stability connection to the Cloud."

Three Cases for Local Edge AI

Max identified three scenarios where local inference becomes essential:

Camera and Robot Combination - Systems that must make fast inference and quick action on production lines (for example, one object per second). If inference runs far away, the object is already gone.

Safety - When there is danger, we cannot wait for network latency and we want it to work offline. This must be local.

Privacy and Security - Sensitive camera data must be handled with care. One way is to keep the processing local.

Vision Language Action (VLA) Models

The session provided an excellent explanation of VLA models - the key technology for physical AI. While Large Language Models (LLMs) take text and images as input and output text and images, VLA models take real-world input from sensors and cameras and generate movement as output.

Max demonstrated this with a Raspberry Pi 5 and simple web camera running VLA models from Hugging Face. The robot arm could detect objects (first a black box, then a white box after a model update) and pick them up from different positions - something traditional rule-based teaching cannot do.

His comparison was striking: "Traditional teaching does not have vision. Rule-based method works well when the same task repeat many times and short time. But if the task is a little different or if no human helps, we need a different approach."

The Core Philosophy: Updateability

Max's most important message wasn't about choosing the right model - it was about staying updateable:

"Today, latest AI models may be outdated tomorrow. So we must build a way to keep AI models updateable and always use the best one. Don't lock your system to one model."

This is where AWS IoT Greengrass becomes critical. Max explained: "In the AWS Cloud, Amazon Bedrock gives us many models... but on the device, model choice is still hard because the devices has limits. We can install only one or two models."

The solution? Build systems that can update models in the field. "An AI application that cannot be updated becomes old very fast. AI needs not only code and data, it also needs the modifiers."

Live Demo: Updating VLA Models with AWS IoT Greengrass

Max demonstrated the complete workflow for deploying and updating AI models at the edge:

Architecture Overview:

VLA models stored on Hugging Face
Docker containers built with models bundled inside
Container images pushed to Amazon Elastic Container Registry (Amazon ECR)
Deployment commands sent through AWS IoT Greengrass
Local device runs docker pull and starts containers

The Demo Flow:

First, Max showed the black box detection model running on the Raspberry Pi, successfully picking up black objects from different positions.

Then, live on stage, he:

Modified the Dockerfile to use a white box detection model from Hugging Face
Built the new Docker image
Pushed it to Amazon ECR
Created a new AWS IoT Greengrass recipe version
Deployed the update to the device

The device automatically pulled the new container, stopped the old model, and started the new one. The robot arm then successfully detected and picked up only white objects, ignoring other colored objects around it.

Handling Large AI Models at the Edge

Max addressed a critical practical challenge: AI model file sizes. "Even a small model is about one gigabyte. It's huge. So often VLA model over 15 gigabytes."

He outlined three deployment methods with AWS IoT Greengrass:

AWS IoT Greengrass Artifact Files - Simple but limited to 2GB

Docker Container (Recommended) - Bundle the model inside the container image. The limit depends on Amazon ECR. This is the method Max demonstrated and recommends because containers make development and testing much easier, and AWS IoT Greengrass can manage them well.

Download with Script - Download files using scripts in the AWS IoT Greengrass recipe. This avoids most limits but requires managing download verification and execution yourself.

Max's recommendation: "Bundle the AI model in the container as you saw in the docker file is so very simple bundle model this way. Containers makes development and testing much easier."

Network Considerations

Max emphasized two important infrastructure points:

Network Environment - You don't need constant connectivity, but deploying and updating AI models requires fast and large data transfer networks. Indoors you can use WiFi, but outdoors or when WiFi isn't available, LTE is an option. Max mentioned that his company, Soracom, provides mobile connectivity for IoT and AI. He also noted that broadband satellite networks like Amazon Kuiper are another option.

File Size Management - With models ranging from 1GB to 15GB+, you need new knowledge to handle them. Using containers helps, but you still need skills to keep container layers small.

Key Takeaways

Physical AI is Real and Accessible - Max demonstrated that physical AI is already possible with simple hardware like Raspberry Pi 5 and web cameras

Measure Your Latency Requirements - The 100 millisecond barrier is a useful reference point. If your system can tolerate higher latency, Cloud inference remains viable

Three Cases for Local AI - Fast camera/robot combinations, safety-critical applications, and privacy/security requirements

Updateability is Critical - Don't lock your system to one model. Build for continuous updates from day one

AWS IoT Greengrass Enables Edge AI - Brings Cloud-style updates straight to devices, turning a difficult hardware challenge into a manageable software platform

Container-Based Deployment - Bundling models in Docker containers provides the best balance of simplicity, manageability, and flexibility

VLA Models Bridge Digital and Physical - Vision Language Action models extend generative AI from cyberspace into the physical world

Max's closing message captured the session's spirit perfectly: "AI will continue to grow and maybe so new normal. Let's use its power in the real world with a updateable local and Edge AI together AWS IoT Greengrass."

About This Series

This post is part of DEV Track Spotlight, a series highlighting the incredible sessions from the AWS re:Invent 2025 Developer Community (DEV) track.

The DEV track featured 60 unique sessions delivered by 93 speakers from the AWS Community - including AWS Heroes, AWS Community Builders, and AWS User Group Leaders - alongside speakers from AWS and Amazon. These sessions covered cutting-edge topics including:

🤖 GenAI & Agentic AI - Multi-agent systems, Strands Agents SDK, Amazon Bedrock
🛠️ Developer Tools - Kiro, Kiro CLI, Amazon Q Developer, AI-driven development
🔒 Security - AI agent security, container security, automated remediation
🏗️ Infrastructure - Serverless, containers, edge computing, observability
⚡ Modernization - Legacy app transformation, CI/CD, feature flags
📊 Data - Amazon Aurora DSQL, real-time processing, vector databases

Each post in this series dives deep into one session, sharing key insights, practical takeaways, and links to the full recordings. Whether you attended re:Invent or are catching up remotely, these sessions represent the best of our developer community sharing real code, real demos, and real learnings.

Follow along as we spotlight these amazing sessions and celebrate the speakers who made the DEV track what it was!

DEV Community