Rob Johnston

Posted on Feb 2 • Originally published at robjohnston.github.io

Depend on Abstractions: Testing Without the $50,000 Spectrometer

#scientificprogramming #solidprinciples #dependencyinversion

Part 5 of the SOLID Principles for Scientific Programmers series

The Problem Every Scientist Has Faced

You've written a data acquisition script that reads from your laboratory spectrometer—a $50,000 instrument that's booked solid for the next two weeks. Your analysis code works perfectly... when you have access to the hardware.
Then your advisor asks: "Can you run the analysis on last month's data to compare?" You can't—the code only reads from the live spectrometer. "Can you test the new averaging algorithm before the experiment?" You can't—testing requires the actual hardware. "The spectrometer is down for calibration—can you still work on the code?" You can't. Your colleague in another lab wants to use your analysis but has different equipment? Your code is hardwired to your specific sensor.

Your analysis logic is imprisoned by hardware dependencies. Brilliant scientific algorithms rendered useless by tight coupling. Your beautiful working code has become a rigid, untestable monolith.

This is the problem the Dependency Inversion Principle (DIP) solves.

What Is the Dependency Inversion Principle?

The Dependency Inversion Principle states:

High-level modules should not depend on low-level modules. Both should depend on abstractions.

Abstractions should not depend on details. Details should depend on abstractions.

In plain English: Your important scientific logic shouldn't be hardwired to specific hardware, file formats, or external services. Instead, it should depend on interfaces or abstract descriptions of what it needs. The concrete implementations can then be swapped out as needed.

Before You Refactor: Is It Worth It?

DIP adds abstraction layers that take time to design and implement. Before refactoring, consider:

Do you need to test without hardware? This is DIP's killer feature for scientists
Will you swap implementations? (file → database, real sensor → simulated, etc.)
Do multiple people work on the code? DIP enables parallel development
Is the dependency causing pain? (hardware unavailable, slow tests, inflexible code)

If you're blocked because you can't test without the $50K spectrometer connected, DIP refactoring is essential.

A Real Example: The Problem

Let's look at a temperature monitoring system for a materials science experiment. Here's what you may write first:

import serial
import csv
from datetime import datetime

class TemperatureMonitor:
    def __init__(self):
        # Hardcoded dependency on specific hardware
        self.sensor = serial.Serial('/dev/ttyUSB0', baudrate=9600)
        # Hardcoded dependency on specific file format
        self.output_file = 'temperature_data.csv'

    def collect_data(self, duration_seconds):
        """Collect temperature data for specified duration."""
        results = []
        start_time = datetime.now()

        while (datetime.now() - start_time).seconds < duration_seconds:
            # Read from serial sensor
            raw_data = self.sensor.readline()
            temperature = float(raw_data.decode().strip())
            timestamp = datetime.now()

            results.append((timestamp, temperature))

            # Check if temperature is in safe range
            if temperature > 100:
                print(f"WARNING: Temperature {temperature}°C exceeds safe limit!")

        # Save to CSV
        with open(self.output_file, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(['Timestamp', 'Temperature'])
            writer.writerows(results)

        return results

# Usage
monitor = TemperatureMonitor()
data = monitor.collect_data(3600)  # Collect for 1 hour

Problems with This Design

This code works, but it has serious problems:

Can't test without hardware: You need the physical sensor connected to run any tests
Can't reuse the logic: The safety check and data collection logic is tied to this specific sensor
Can't work with historical data: No way to run the same analysis on previously collected data
Can't switch output formats: What if you want to save to a database instead of CSV?
Can't simulate failures: How do you test what happens when the sensor malfunctions?

The Solution: Dependency Inversion

Let's refactor this using the Dependency Inversion Principle. First, we define abstractions for the things we depend on:

BEFORE (tight coupling):         AFTER (dependency inversion):

┌─────────────────────┐         ┌──────────────────────────────────┐
│ TemperatureMonitor  │         │ TemperatureMonitor               │
│                     │         │                                  │
│ creates:            │         │ depends on:                      │
│  └─> SerialSensor   │         │  └─> TemperatureSensor (abstract)│
│  └─> CSVFile        │         │  └─> DataStorage (abstract)      │
└─────────────────────┘         └──────────────────────────────────┘
         │                                   △
         │ (rigid)                           │ (flexible)
         ▼                          ┌────────┴────────┐
┌─────────────────┐                 │                 │
│ Hardware        │         ┌──────────┐      ┌──────────┐
│ (must exist)    │         │  Serial  │      │   Mock   │
└─────────────────┘         │  Sensor  │      │  Sensor  │
                            └──────────┘      └──────────┘
  ❌ Can't test!                     ✅ Can test anytime!

from abc import ABC, abstractmethod
from datetime import datetime
from typing import List, Tuple

# ABSTRACTION: What we need from a temperature source
class TemperatureSensor(ABC):
    """Abstract interface for any temperature data source."""

    @abstractmethod
    def read_temperature(self) -> float:
        """Read current temperature in Celsius."""
        pass

# ABSTRACTION: What we need from a data storage mechanism
class DataStorage(ABC):
    """Abstract interface for storing temperature measurements."""

    @abstractmethod
    def save(self, data: List[Tuple[datetime, float]]) -> None:
        """Save temperature data."""
        pass

# HIGH-LEVEL MODULE: Now depends only on abstractions
class TemperatureMonitor:
    def __init__(self, sensor: TemperatureSensor, storage: DataStorage):
        # Dependencies are injected, not created internally
        self.sensor = sensor
        self.storage = storage

    def collect_data(self, duration_seconds: int) -> List[Tuple[datetime, float]]:
        """Collect temperature data for specified duration."""
        results = []
        start_time = datetime.now()

        while (datetime.now() - start_time).seconds < duration_seconds:
            temperature = self.sensor.read_temperature()
            timestamp = datetime.now()

            results.append((timestamp, temperature))

            # Business logic is now independent of hardware details
            if temperature > 100:
                print(f"WARNING: Temperature {temperature}°C exceeds safe limit!")

        self.storage.save(results)
        return results

Now we create concrete implementations of our abstractions:

# CONCRETE IMPLEMENTATIONS: Different sensors and different storage

import serial
import csv

class SerialTemperatureSensor(TemperatureSensor):
    def __init__(self, port: str, baudrate: int = 9600):
        self.sensor = serial.Serial(port, baudrate=baudrate)

    def read_temperature(self) -> float:
        raw_data = self.sensor.readline()
        return float(raw_data.decode().strip())

class CSVStorage(DataStorage):
    def __init__(self, filename: str):
        self.filename = filename

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        with open(self.filename, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(['Timestamp', 'Temperature'])
            writer.writerows(data)

# USAGE: Production - same as before
monitor = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=CSVStorage('temperature_data.csv')
)
data = monitor.collect_data(3600)

Why This Is Better: Flexibility Unlocked

Now that we've inverted the dependencies, we can easily create alternative implementations:

1. Test Without Hardware

import random

# MOCK IMPLEMENTATIONS: For testing without hardware
class MockTemperatureSensor(TemperatureSensor):
    """Simulated sensor for testing."""

    def __init__(self, base_temp: float = 25.0, noise: float = 0.5):
        self.base_temp = base_temp
        self.noise = noise

    def read_temperature(self) -> float:
        # Simulate realistic temperature readings
        return self.base_temp + random.uniform(-self.noise, self.noise)

class InMemoryStorage(DataStorage):
    """Store data in memory for testing."""

    def __init__(self):
        self.data = []

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        self.data = data

# Now we can test without any hardware!
test_monitor = TemperatureMonitor(
    sensor=MockTemperatureSensor(base_temp=25.0),
    storage=InMemoryStorage()
)
test_data = test_monitor.collect_data(60)
print(f"Collected {len(test_data)} test measurements")

Benefit: Develop and test anywhere, anytime. No hardware booking required.

2. Test Edge Cases Safely

class FailingTemperatureSensor(TemperatureSensor):
    """Simulate sensor failures for testing error handling."""

    def __init__(self, fail_after: int = 10):
        self.read_count = 0
        self.fail_after = fail_after

    def read_temperature(self) -> float:
        self.read_count += 1
        if self.read_count > self.fail_after:
            raise IOError("Sensor connection lost!")
        return 25.0

# Test failure handling
monitor = TemperatureMonitor(
    sensor=FailingTemperatureSensor(fail_after=5),
    storage=InMemoryStorage()
)
# This will raise an exception - now you can test your error handling!

Benefit: Simulate dangerous conditions (overheating, sensor failures) without risk.

3. Reuse Logic With Different Sources

class HistoricalDataSensor(TemperatureSensor):
    """Replay previously recorded data."""

    def __init__(self, filename: str):
        with open(filename, 'r') as f:
            reader = csv.reader(f)
            next(reader)  # Skip header
            self.temperatures = [float(row[1]) for row in reader]
        self.index = 0

    def read_temperature(self) -> float:
        if self.index >= len(self.temperatures):
            raise IndexError("No more historical data")
        temp = self.temperatures[self.index]
        self.index += 1
        return temp

# Analyze last week's data with the same code!
historical_monitor = TemperatureMonitor(
    sensor=HistoricalDataSensor('last_week_data.csv'),
    storage=InMemoryStorage()
)

Benefit: Same analysis code works with live data, historical data, or simulations.

4. Swap Components Freely

import json
import sqlite3

class JSONStorage(DataStorage):
    """Save data as JSON."""

    def __init__(self, filename: str):
        self.filename = filename

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        json_data = [
            {"timestamp": ts.isoformat(), "temperature": temp}
            for ts, temp in data
        ]
        with open(self.filename, 'w') as f:
            json.dump(json_data, f, indent=2)

class DatabaseStorage(DataStorage):
    """Save data to SQLite database."""

    def __init__(self, db_path: str):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute('''
            CREATE TABLE IF NOT EXISTS temperatures
            (timestamp TEXT, temperature REAL)
        ''')

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        self.conn.executemany(
            'INSERT INTO temperatures VALUES (?, ?)',
            [(ts.isoformat(), temp) for ts, temp in data]
        )
        self.conn.commit()

# Same monitoring code, different storage!
monitor_json = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=JSONStorage('temps.json')
)

monitor_db = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=DatabaseStorage('temps.db')
)

Benefit: Change storage without touching analysis logic.

The Testing Advantage

The real power of DIP becomes clear when writing tests. Here's a complete example:

import unittest

class TestTemperatureMonitor(unittest.TestCase):
    def setUp(self):
        """Set up test fixtures."""
        self.storage = InMemoryStorage()
        self.sensor = MockTemperatureSensor(base_temp=25.0)
        self.monitor = TemperatureMonitor(self.sensor, self.storage)

    def test_collects_data(self):
        """Test that data collection works."""
        data = self.monitor.collect_data(5)
        self.assertGreater(len(data), 0)
        self.assertEqual(len(self.storage.data), len(data))

    def test_temperature_in_range(self):
        """Test that temperatures are reasonable."""
        data = self.monitor.collect_data(5)
        for timestamp, temp in data:
            self.assertGreater(temp, 20)
            self.assertLess(temp, 30)

    def test_handles_high_temperature(self):
        """Test warning for high temperatures."""
        hot_sensor = MockTemperatureSensor(base_temp=150.0)
        monitor = TemperatureMonitor(hot_sensor, self.storage)

        # Would print warnings, but doesn't crash
        data = monitor.collect_data(5)
        self.assertGreater(len(data), 0)

# Run tests without any hardware connected!
if __name__ == '__main__':
    unittest.main()

No hardware, no external files, no network—just fast, reliable tests.

Real-World Consequences of Tight Coupling

When code depends directly on hardware and external systems:

Graduate student scenario:

Student A writes analysis code tightly coupled to the lab's spectrometer
Student B needs to work on the same code
Only one person can develop at a time—hardware conflict
Testing requires booking lab time and connecting equipment
Bug appears only with certain samples—can't reproduce in testing
Student graduates, code stops working when hardware is upgraded
New student spends months deciphering hardware-dependent code

The problem: The valuable analysis logic is imprisoned by hardware dependencies. Brilliant algorithms become useless when equipment changes.

With DIP: Analysis logic is independent. Test with mock data, develop anywhere, swap hardware freely. The valuable scientific code survives equipment upgrades.

Red Flags That You Need DIP

Watch for these warning signs:

You can't run tests without physical hardware connected
"Just use the production database for development"
Code won't compile/run unless external services are available
You write if testing: ... else: ... branches throughout your code
Team members fight over access to shared hardware
You comment out tests because they require equipment
Changes to hardware require rewriting business logic
You can't work on code while equipment is in use
Switching from file to database requires massive refactoring
Your test suite takes hours because it talks to real systems

If you can't test without expensive equipment, you need DIP.

Common Mistakes: Over-Abstraction

The biggest DIP mistake: abstracting everything.

❌ Don't abstract:

# WRONG: Abstracting basic operations
class Adder(ABC):
    @abstractmethod
    def add(self, a, b): pass

class NumpyAdder(Adder):
    def add(self, a, b):
        return a + b  # Ridiculous!

Just use a + b directly.

✅ Do abstract:

# RIGHT: Abstracting external dependencies
class TemperatureSensor(ABC):
    @abstractmethod
    def read_temperature(self): pass

class SerialSensor(TemperatureSensor):
    def read_temperature(self):
        # Complex hardware communication
        ...

Rule of thumb: Abstract at the boundary where your code meets the outside world (hardware, files, network, databases). Don't abstract internal logic.

Practical Refactoring Strategy

If you have existing code that's tightly coupled, here's how to refactor it:

Step 1: Identify your dependencies (hardware, file I/O, external services)

Step 2: Create abstract interfaces for each dependency

Step 3: Refactor your main class to accept dependencies through its constructor

Step 4: Create concrete implementations of the abstractions

Step 5: Create mock/test implementations

Step 6: Update your code to inject dependencies

You don't have to do this all at once! Start with the dependency that causes you the most pain (usually hardware).

Example: Refactoring tightly-coupled code:

# BEFORE: Tightly coupled
class Analyzer:
    def __init__(self):
        self.sensor = serial.Serial('/dev/ttyUSB0')  # ← Tight coupling

    def analyze(self):
        data = self.sensor.readline()  # ← Can't test without hardware
        return np.mean(data)

# Step 1-3: Extract interface, inject dependency
class Analyzer:
    def __init__(self, sensor: DataSource):  # ← Now flexible
        self.sensor = sensor

    def analyze(self):
        data = self.sensor.read()  # ← Works with any source
        return np.mean(data)

# Step 4-6: Can now test!
mock = MockDataSource()
analyzer = Analyzer(mock)
result = analyzer.analyze()  # ← No hardware needed!

How DIP Relates to Other SOLID Principles

DIP completes the SOLID toolkit:

SRP: TemperatureMonitor has one job (monitoring), not creating sensors
OCP: Add new sensor types without modifying TemperatureMonitor
LSP: All TemperatureSensor implementations are substitutable
ISP: Each abstraction is focused (sensor vs storage, not combined)
DIP: High-level logic depends on abstractions, not concrete hardware

Together, these principles create code that's maintainable, testable, and flexible—exactly what long-running scientific projects need.

Performance Notes

DIP adds a layer of indirection (calling through an interface), but this overhead is negligible compared to actual I/O operations (reading sensors, writing files, database queries).

The real performance benefit: you can optimize or swap implementations without changing calling code. Need a faster storage format? Implement FastBinaryStorage. Need caching? Implement CachedSensor. The abstraction layer enables optimization.

When to Skip DIP

DIP adds upfront complexity. Skip it for:

Quick exploratory scripts (notebook/one-off territory on the spectrum)
One-person projects where you're certain the dependencies won't change
Very simple programs with no testing requirements
Prototypes where you're still figuring out what you need

But consider adding it later when you move from exploratory → production territory:

The code becomes mission-critical
Multiple people need to work on it
You need automated testing
The dependencies start causing pain

Summary

The Dependency Inversion Principle transforms rigid, hardware-dependent scientific code into flexible, testable software. By depending on abstractions instead of concrete implementations, you can:

Test without hardware
Reuse analysis logic with different data sources
Swap components easily
Simulate edge cases
Work on code while equipment is unavailable

The key insight: Your valuable scientific logic should be independent of the messy details of how data gets in and out of your system.

Your Turn

Identify one pain point in your current code where DIP would help
Identify the dependency (sensor, instrument, file format)
Create an abstraction for that dependency
Create a mock implementation for testing
Refactor your code to accept the dependency
Write tests using the mock

Start with just one dependency—the one causing you the most pain.

Series Conclusion

This completes our journey through the five SOLID principles for scientific programming. We've covered:

SRP: Separating concerns so each class has one job
OCP: Extending functionality without modifying tested code
LSP: Ensuring subclasses are truly substitutable
ISP: Creating lean, focused interfaces
DIP: Decoupling logic from hardware dependencies

Together, these principles help you write code that's maintainable, testable, and flexible—code that survives lab equipment changes, team turnover, and evolving requirements.

Remember: Don't apply all principles to all code. Start simple, refactor when pain appears, and use these principles as tools to solve specific problems, not as rigid rules.

Your exploratory Jupyter notebook doesn't need SOLID. Your production pipeline that runs daily for two years? That's where SOLID shines.

Have questions or examples from your own scientific code? Share them in the comments below!

Previous posts in this series:

DEV Community