How I explored INT8 quantization, biological graphs, and CPU-only inference using PyTorch Geometric.
Healthcare AI is often discussed in terms of massive cloud infrastructure and expensive GPUs.
But many real-world systems do not operate inside large datacenters.
Small clinics, portable medical systems, rural deployments, and edge diagnostic devices frequently depend on:
- low-power CPUs
- limited memory
- unstable connectivity
- compact hardware environments
That raises an important engineering question:
Can graph neural networks become smaller and more deployable without completely losing their predictive behavior?
This project explores that question using biological graph data, Graph Neural Networks (GNNs), and manual INT8 quantization.
The Project
BioGraph-Edge-Quantizer GitHub Repository
The repository focuses on:
- biological graph inference
- resource-aware deployment
- CPU-only execution
- model compression
- reproducible benchmarking
The system uses:
- Python
- PyTorch Geometric
- GraphSAGE
- TorchScript
- Laravel API integration
The goal was not to build a “medical AI product.”
Instead, the focus was:
understanding how graph-based AI systems behave under hardware constraints.
Why Graphs Matter in Biology
Many biological systems naturally behave like graphs.
For example:
- proteins interact with other proteins
- genes regulate other genes
- molecular pathways form connected networks
In this project, the graph structure comes from protein interaction relationships inspired by the STRING dataset.
Each node represents a protein.
Each edge represents a relationship or interaction.
The model then attempts a binary node classification task.
Simplified example:
- Does this protein belong to a target functional category?
- Is this interaction pattern significant?
This is where Graph Neural Networks become useful.
Why Use Graph Neural Networks?
Traditional neural networks process:
- images
- text
- tabular data
But biological systems are highly interconnected.
GNNs are useful because they learn:
- relationships
- neighborhood behavior
- graph structure
This project uses GraphSAGE, which is designed for inductive graph learning.
That means:
- the model can generalize to unseen nodes
- inference is more flexible for evolving graphs
The Real Problem: Edge Deployment
Most machine learning tutorials stop after:
“The model works.”
But deployment creates a different set of challenges:
- memory limits
- latency stability
- CPU constraints
- model size
- reproducibility
This project explores:
- INT8 weight packing
- TorchScript deployment
- bounded inference variance
- edge-device behavior
What is INT8 Quantization?
Most neural networks store weights using FP32 (32-bit floating point values).
Quantization reduces precision.
Instead of:
- 32-bit weights
we use:
- 8-bit integer weights
The tradeoff:
- smaller model
- lower memory usage
- possible accuracy reduction
In this project:
- weights were manually converted to INT8
- scale factors were stored separately
- dequantization happened during inference
The Results
The interesting part was not raw speed.
It was understanding where quantization actually helps.
x86 Laptop Results
Hardware:
- Intel i5-10210U
- 8 GB RAM
- Windows 11
Results:
- ~75% reduction in model size
- very small latency improvement
- accuracy drop below 1%
That initially seemed disappointing.
But the explanation matters.
Graph neural networks are often:
- memory-bound
- aggregation-heavy
The bottleneck was not matrix multiplication alone.
It was:
- graph traversal
- feature movement
- neighbor aggregation
ARM Edge Device Results
The same experiment was tested on:
- Raspberry Pi 4
- Cortex-A72 CPU
- 4 GB RAM
This time the gains became more noticeable.
Why?
Because smaller devices have:
- tighter memory limits
- smaller cache capacity
- lower memory bandwidth
In those environments:
- reduced model size matters more
- memory pressure becomes a real constraint
This is an important observation for:
- hardware vendors
- embedded AI systems
- edge healthcare infrastructure
Current System Architecture
The repository currently separates:
- ML inference
- API infrastructure
ML Layer
Python + PyTorch Geometric
API Layer
Laravel-based gateway
Current flow:
Laravel → Python subprocess → GNN inference → API response
This is intentionally simple for experimentation.
The repository also documents:
- subprocess overhead
- scalability limitations
- future migration plans
Why This Matters for Medical Informatics
Medical informatics is not only about large AI models.
It is also about:
- interoperability
- infrastructure
- reproducibility
- deployment reliability
- hardware-aware engineering
Even experimental systems benefit from:
- deterministic execution
- controlled benchmarking
- transparent limitations
This project is not a clinical system.
It is a systems-engineering exploration around biological graph inference.
Important Limitations
A good engineering project should clearly state its limitations.
Current limitations include:
- limited benchmarking scope
- no clinical validation
- subprocess overhead
- no distributed inference
- limited quantization optimization
- prototype-level architecture
The project intentionally avoids claiming:
- medical accuracy
- production readiness
- diagnostic capability
Why I Shared This Publicly
I wanted to document:
- how edge AI systems behave
- where quantization helps
- where it does not
- how biological graph workloads differ from standard AI pipelines
Many tutorials simplify deployment problems.
But practical ML engineering often involves:
- bottlenecks
- memory constraints
- unstable performance behavior
- tradeoffs between accuracy and footprint
Understanding those tradeoffs is valuable for:
- developers
- researchers
- hardware engineers
- students entering medical informatics
Repository
GitHub
BioGraph-Edge-Quantizer Repository
Areas Open for Collaboration
I would especially love feedback from people working in:
- graph neural networks
- embedded inference
- medical informatics
- edge hardware systems
- PyTorch optimization
- ONNX / TVM / ExecuTorch
- biological network analysis
About the Author
Swapin Vidya
Interested in:
- edge AI systems
- reproducible ML infrastructure
- biological graph computing
- hardware-aware inference pipelines
- healthcare-oriented systems engineering
GitHub:
Swapin Vidya GitHub
ORCID:
Swapin Vidya ORCID
Top comments (0)