DEV Community

Cover image for Building Smaller Graph Neural Networks for Edge Healthcare Systems
Swapin Vidya
Swapin Vidya

Posted on

Building Smaller Graph Neural Networks for Edge Healthcare Systems

How I explored INT8 quantization, biological graphs, and CPU-only inference using PyTorch Geometric.


Healthcare AI is often discussed in terms of massive cloud infrastructure and expensive GPUs.

But many real-world systems do not operate inside large datacenters.

Small clinics, portable medical systems, rural deployments, and edge diagnostic devices frequently depend on:

  • low-power CPUs
  • limited memory
  • unstable connectivity
  • compact hardware environments

That raises an important engineering question:

Can graph neural networks become smaller and more deployable without completely losing their predictive behavior?

This project explores that question using biological graph data, Graph Neural Networks (GNNs), and manual INT8 quantization.


The Project

BioGraph-Edge-Quantizer GitHub Repository

The repository focuses on:

  • biological graph inference
  • resource-aware deployment
  • CPU-only execution
  • model compression
  • reproducible benchmarking

The system uses:

  • Python
  • PyTorch Geometric
  • GraphSAGE
  • TorchScript
  • Laravel API integration

The goal was not to build a “medical AI product.”

Instead, the focus was:

understanding how graph-based AI systems behave under hardware constraints.


Why Graphs Matter in Biology

Many biological systems naturally behave like graphs.

For example:

  • proteins interact with other proteins
  • genes regulate other genes
  • molecular pathways form connected networks

In this project, the graph structure comes from protein interaction relationships inspired by the STRING dataset.

Each node represents a protein.
Each edge represents a relationship or interaction.

The model then attempts a binary node classification task.

Simplified example:

  • Does this protein belong to a target functional category?
  • Is this interaction pattern significant?

This is where Graph Neural Networks become useful.


Why Use Graph Neural Networks?

Traditional neural networks process:

  • images
  • text
  • tabular data

But biological systems are highly interconnected.

GNNs are useful because they learn:

  • relationships
  • neighborhood behavior
  • graph structure

This project uses GraphSAGE, which is designed for inductive graph learning.

That means:

  • the model can generalize to unseen nodes
  • inference is more flexible for evolving graphs

The Real Problem: Edge Deployment

Most machine learning tutorials stop after:

“The model works.”

But deployment creates a different set of challenges:

  • memory limits
  • latency stability
  • CPU constraints
  • model size
  • reproducibility

This project explores:

  • INT8 weight packing
  • TorchScript deployment
  • bounded inference variance
  • edge-device behavior

What is INT8 Quantization?

Most neural networks store weights using FP32 (32-bit floating point values).

Quantization reduces precision.

Instead of:

  • 32-bit weights

we use:

  • 8-bit integer weights

The tradeoff:

  • smaller model
  • lower memory usage
  • possible accuracy reduction

In this project:

  • weights were manually converted to INT8
  • scale factors were stored separately
  • dequantization happened during inference

The Results

The interesting part was not raw speed.

It was understanding where quantization actually helps.

x86 Laptop Results

Hardware:

  • Intel i5-10210U
  • 8 GB RAM
  • Windows 11

Results:

  • ~75% reduction in model size
  • very small latency improvement
  • accuracy drop below 1%

That initially seemed disappointing.

But the explanation matters.

Graph neural networks are often:

  • memory-bound
  • aggregation-heavy

The bottleneck was not matrix multiplication alone.

It was:

  • graph traversal
  • feature movement
  • neighbor aggregation

ARM Edge Device Results

The same experiment was tested on:

  • Raspberry Pi 4
  • Cortex-A72 CPU
  • 4 GB RAM

This time the gains became more noticeable.

Why?

Because smaller devices have:

  • tighter memory limits
  • smaller cache capacity
  • lower memory bandwidth

In those environments:

  • reduced model size matters more
  • memory pressure becomes a real constraint

This is an important observation for:

  • hardware vendors
  • embedded AI systems
  • edge healthcare infrastructure

Current System Architecture

The repository currently separates:

  • ML inference
  • API infrastructure

ML Layer

Python + PyTorch Geometric

API Layer

Laravel-based gateway

Current flow:

Laravel → Python subprocess → GNN inference → API response
Enter fullscreen mode Exit fullscreen mode

This is intentionally simple for experimentation.

The repository also documents:

  • subprocess overhead
  • scalability limitations
  • future migration plans

Why This Matters for Medical Informatics

Medical informatics is not only about large AI models.

It is also about:

  • interoperability
  • infrastructure
  • reproducibility
  • deployment reliability
  • hardware-aware engineering

Even experimental systems benefit from:

  • deterministic execution
  • controlled benchmarking
  • transparent limitations

This project is not a clinical system.
It is a systems-engineering exploration around biological graph inference.


Important Limitations

A good engineering project should clearly state its limitations.

Current limitations include:

  • limited benchmarking scope
  • no clinical validation
  • subprocess overhead
  • no distributed inference
  • limited quantization optimization
  • prototype-level architecture

The project intentionally avoids claiming:

  • medical accuracy
  • production readiness
  • diagnostic capability

Why I Shared This Publicly

I wanted to document:

  • how edge AI systems behave
  • where quantization helps
  • where it does not
  • how biological graph workloads differ from standard AI pipelines

Many tutorials simplify deployment problems.

But practical ML engineering often involves:

  • bottlenecks
  • memory constraints
  • unstable performance behavior
  • tradeoffs between accuracy and footprint

Understanding those tradeoffs is valuable for:

  • developers
  • researchers
  • hardware engineers
  • students entering medical informatics

Repository

GitHub

BioGraph-Edge-Quantizer Repository


Areas Open for Collaboration

I would especially love feedback from people working in:

  • graph neural networks
  • embedded inference
  • medical informatics
  • edge hardware systems
  • PyTorch optimization
  • ONNX / TVM / ExecuTorch
  • biological network analysis

About the Author

Swapin Vidya

Interested in:

  • edge AI systems
  • reproducible ML infrastructure
  • biological graph computing
  • hardware-aware inference pipelines
  • healthcare-oriented systems engineering

GitHub:
Swapin Vidya GitHub

ORCID:
Swapin Vidya ORCID

Top comments (0)