DEV Community

Cover image for Depend on Abstractions: Testing Without the $50,000 Spectrometer
Rob Johnston
Rob Johnston

Posted on • Originally published at robjohnston.github.io

Depend on Abstractions: Testing Without the $50,000 Spectrometer

Part 5 of the SOLID Principles for Scientific Programmers series

The Problem Every Scientist Has Faced

You've written a data acquisition script that reads from your laboratory spectrometer—a $50,000 instrument that's booked solid for the next two weeks. Your analysis code works perfectly... when you have access to the hardware.
Then your advisor asks: "Can you run the analysis on last month's data to compare?" You can't—the code only reads from the live spectrometer. "Can you test the new averaging algorithm before the experiment?" You can't—testing requires the actual hardware. "The spectrometer is down for calibration—can you still work on the code?" You can't. Your colleague in another lab wants to use your analysis but has different equipment? Your code is hardwired to your specific sensor.

Your analysis logic is imprisoned by hardware dependencies. Brilliant scientific algorithms rendered useless by tight coupling. Your beautiful working code has become a rigid, untestable monolith.

This is the problem the Dependency Inversion Principle (DIP) solves.

What Is the Dependency Inversion Principle?

The Dependency Inversion Principle states:

High-level modules should not depend on low-level modules. Both should depend on abstractions.

Abstractions should not depend on details. Details should depend on abstractions.

In plain English: Your important scientific logic shouldn't be hardwired to specific hardware, file formats, or external services. Instead, it should depend on interfaces or abstract descriptions of what it needs. The concrete implementations can then be swapped out as needed.

Before You Refactor: Is It Worth It?

DIP adds abstraction layers that take time to design and implement. Before refactoring, consider:

  • Do you need to test without hardware? This is DIP's killer feature for scientists
  • Will you swap implementations? (file → database, real sensor → simulated, etc.)
  • Do multiple people work on the code? DIP enables parallel development
  • Is the dependency causing pain? (hardware unavailable, slow tests, inflexible code)

If you're blocked because you can't test without the $50K spectrometer connected, DIP refactoring is essential.

A Real Example: The Problem

Let's look at a temperature monitoring system for a materials science experiment. Here's what you may write first:

import serial
import csv
from datetime import datetime

class TemperatureMonitor:
    def __init__(self):
        # Hardcoded dependency on specific hardware
        self.sensor = serial.Serial('/dev/ttyUSB0', baudrate=9600)
        # Hardcoded dependency on specific file format
        self.output_file = 'temperature_data.csv'

    def collect_data(self, duration_seconds):
        """Collect temperature data for specified duration."""
        results = []
        start_time = datetime.now()

        while (datetime.now() - start_time).seconds < duration_seconds:
            # Read from serial sensor
            raw_data = self.sensor.readline()
            temperature = float(raw_data.decode().strip())
            timestamp = datetime.now()

            results.append((timestamp, temperature))

            # Check if temperature is in safe range
            if temperature > 100:
                print(f"WARNING: Temperature {temperature}°C exceeds safe limit!")

        # Save to CSV
        with open(self.output_file, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(['Timestamp', 'Temperature'])
            writer.writerows(results)

        return results

# Usage
monitor = TemperatureMonitor()
data = monitor.collect_data(3600)  # Collect for 1 hour
Enter fullscreen mode Exit fullscreen mode

Problems with This Design

This code works, but it has serious problems:

  1. Can't test without hardware: You need the physical sensor connected to run any tests
  2. Can't reuse the logic: The safety check and data collection logic is tied to this specific sensor
  3. Can't work with historical data: No way to run the same analysis on previously collected data
  4. Can't switch output formats: What if you want to save to a database instead of CSV?
  5. Can't simulate failures: How do you test what happens when the sensor malfunctions?

The Solution: Dependency Inversion

Let's refactor this using the Dependency Inversion Principle. First, we define abstractions for the things we depend on:

BEFORE (tight coupling):         AFTER (dependency inversion):

┌─────────────────────┐         ┌──────────────────────────────────┐
│ TemperatureMonitor  │         │ TemperatureMonitor               │
│                     │         │                                  │
│ creates:            │         │ depends on:                      │
│  └─> SerialSensor   │         │  └─> TemperatureSensor (abstract)│
│  └─> CSVFile        │         │  └─> DataStorage (abstract)      │
└─────────────────────┘         └──────────────────────────────────┘
         │                                   △
         │ (rigid)                           │ (flexible)
         ▼                          ┌────────┴────────┐
┌─────────────────┐                 │                 │
│ Hardware        │         ┌──────────┐      ┌──────────┐
│ (must exist)    │         │  Serial  │      │   Mock   │
└─────────────────┘         │  Sensor  │      │  Sensor  │
                            └──────────┘      └──────────┘
  ❌ Can't test!                     ✅ Can test anytime!
Enter fullscreen mode Exit fullscreen mode
from abc import ABC, abstractmethod
from datetime import datetime
from typing import List, Tuple

# ABSTRACTION: What we need from a temperature source
class TemperatureSensor(ABC):
    """Abstract interface for any temperature data source."""

    @abstractmethod
    def read_temperature(self) -> float:
        """Read current temperature in Celsius."""
        pass

# ABSTRACTION: What we need from a data storage mechanism
class DataStorage(ABC):
    """Abstract interface for storing temperature measurements."""

    @abstractmethod
    def save(self, data: List[Tuple[datetime, float]]) -> None:
        """Save temperature data."""
        pass

# HIGH-LEVEL MODULE: Now depends only on abstractions
class TemperatureMonitor:
    def __init__(self, sensor: TemperatureSensor, storage: DataStorage):
        # Dependencies are injected, not created internally
        self.sensor = sensor
        self.storage = storage

    def collect_data(self, duration_seconds: int) -> List[Tuple[datetime, float]]:
        """Collect temperature data for specified duration."""
        results = []
        start_time = datetime.now()

        while (datetime.now() - start_time).seconds < duration_seconds:
            temperature = self.sensor.read_temperature()
            timestamp = datetime.now()

            results.append((timestamp, temperature))

            # Business logic is now independent of hardware details
            if temperature > 100:
                print(f"WARNING: Temperature {temperature}°C exceeds safe limit!")

        self.storage.save(results)
        return results

Enter fullscreen mode Exit fullscreen mode

Now we create concrete implementations of our abstractions:

# CONCRETE IMPLEMENTATIONS: Different sensors and different storage

import serial
import csv

class SerialTemperatureSensor(TemperatureSensor):
    def __init__(self, port: str, baudrate: int = 9600):
        self.sensor = serial.Serial(port, baudrate=baudrate)

    def read_temperature(self) -> float:
        raw_data = self.sensor.readline()
        return float(raw_data.decode().strip())

class CSVStorage(DataStorage):
    def __init__(self, filename: str):
        self.filename = filename

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        with open(self.filename, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(['Timestamp', 'Temperature'])
            writer.writerows(data)

Enter fullscreen mode Exit fullscreen mode
# USAGE: Production - same as before
monitor = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=CSVStorage('temperature_data.csv')
)
data = monitor.collect_data(3600)
Enter fullscreen mode Exit fullscreen mode

Why This Is Better: Flexibility Unlocked

Now that we've inverted the dependencies, we can easily create alternative implementations:

1. Test Without Hardware

import random

# MOCK IMPLEMENTATIONS: For testing without hardware
class MockTemperatureSensor(TemperatureSensor):
    """Simulated sensor for testing."""

    def __init__(self, base_temp: float = 25.0, noise: float = 0.5):
        self.base_temp = base_temp
        self.noise = noise

    def read_temperature(self) -> float:
        # Simulate realistic temperature readings
        return self.base_temp + random.uniform(-self.noise, self.noise)

class InMemoryStorage(DataStorage):
    """Store data in memory for testing."""

    def __init__(self):
        self.data = []

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        self.data = data

# Now we can test without any hardware!
test_monitor = TemperatureMonitor(
    sensor=MockTemperatureSensor(base_temp=25.0),
    storage=InMemoryStorage()
)
test_data = test_monitor.collect_data(60)
print(f"Collected {len(test_data)} test measurements")
Enter fullscreen mode Exit fullscreen mode

Benefit: Develop and test anywhere, anytime. No hardware booking required.

2. Test Edge Cases Safely

class FailingTemperatureSensor(TemperatureSensor):
    """Simulate sensor failures for testing error handling."""

    def __init__(self, fail_after: int = 10):
        self.read_count = 0
        self.fail_after = fail_after

    def read_temperature(self) -> float:
        self.read_count += 1
        if self.read_count > self.fail_after:
            raise IOError("Sensor connection lost!")
        return 25.0

# Test failure handling
monitor = TemperatureMonitor(
    sensor=FailingTemperatureSensor(fail_after=5),
    storage=InMemoryStorage()
)
# This will raise an exception - now you can test your error handling!
Enter fullscreen mode Exit fullscreen mode

Benefit: Simulate dangerous conditions (overheating, sensor failures) without risk.

3. Reuse Logic With Different Sources

class HistoricalDataSensor(TemperatureSensor):
    """Replay previously recorded data."""

    def __init__(self, filename: str):
        with open(filename, 'r') as f:
            reader = csv.reader(f)
            next(reader)  # Skip header
            self.temperatures = [float(row[1]) for row in reader]
        self.index = 0

    def read_temperature(self) -> float:
        if self.index >= len(self.temperatures):
            raise IndexError("No more historical data")
        temp = self.temperatures[self.index]
        self.index += 1
        return temp

# Analyze last week's data with the same code!
historical_monitor = TemperatureMonitor(
    sensor=HistoricalDataSensor('last_week_data.csv'),
    storage=InMemoryStorage()
)
Enter fullscreen mode Exit fullscreen mode

Benefit: Same analysis code works with live data, historical data, or simulations.

4. Swap Components Freely

import json
import sqlite3

class JSONStorage(DataStorage):
    """Save data as JSON."""

    def __init__(self, filename: str):
        self.filename = filename

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        json_data = [
            {"timestamp": ts.isoformat(), "temperature": temp}
            for ts, temp in data
        ]
        with open(self.filename, 'w') as f:
            json.dump(json_data, f, indent=2)

class DatabaseStorage(DataStorage):
    """Save data to SQLite database."""

    def __init__(self, db_path: str):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute('''
            CREATE TABLE IF NOT EXISTS temperatures
            (timestamp TEXT, temperature REAL)
        ''')

    def save(self, data: List[Tuple[datetime, float]]) -> None:
        self.conn.executemany(
            'INSERT INTO temperatures VALUES (?, ?)',
            [(ts.isoformat(), temp) for ts, temp in data]
        )
        self.conn.commit()

# Same monitoring code, different storage!
monitor_json = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=JSONStorage('temps.json')
)

monitor_db = TemperatureMonitor(
    sensor=SerialTemperatureSensor('/dev/ttyUSB0'),
    storage=DatabaseStorage('temps.db')
)
Enter fullscreen mode Exit fullscreen mode

Benefit: Change storage without touching analysis logic.

The Testing Advantage

The real power of DIP becomes clear when writing tests. Here's a complete example:

import unittest

class TestTemperatureMonitor(unittest.TestCase):
    def setUp(self):
        """Set up test fixtures."""
        self.storage = InMemoryStorage()
        self.sensor = MockTemperatureSensor(base_temp=25.0)
        self.monitor = TemperatureMonitor(self.sensor, self.storage)

    def test_collects_data(self):
        """Test that data collection works."""
        data = self.monitor.collect_data(5)
        self.assertGreater(len(data), 0)
        self.assertEqual(len(self.storage.data), len(data))

    def test_temperature_in_range(self):
        """Test that temperatures are reasonable."""
        data = self.monitor.collect_data(5)
        for timestamp, temp in data:
            self.assertGreater(temp, 20)
            self.assertLess(temp, 30)

    def test_handles_high_temperature(self):
        """Test warning for high temperatures."""
        hot_sensor = MockTemperatureSensor(base_temp=150.0)
        monitor = TemperatureMonitor(hot_sensor, self.storage)

        # Would print warnings, but doesn't crash
        data = monitor.collect_data(5)
        self.assertGreater(len(data), 0)

# Run tests without any hardware connected!
if __name__ == '__main__':
    unittest.main()
Enter fullscreen mode Exit fullscreen mode

No hardware, no external files, no network—just fast, reliable tests.

Real-World Consequences of Tight Coupling

When code depends directly on hardware and external systems:

Graduate student scenario:

  • Student A writes analysis code tightly coupled to the lab's spectrometer
  • Student B needs to work on the same code
  • Only one person can develop at a time—hardware conflict
  • Testing requires booking lab time and connecting equipment
  • Bug appears only with certain samples—can't reproduce in testing
  • Student graduates, code stops working when hardware is upgraded
  • New student spends months deciphering hardware-dependent code

The problem: The valuable analysis logic is imprisoned by hardware dependencies. Brilliant algorithms become useless when equipment changes.

With DIP: Analysis logic is independent. Test with mock data, develop anywhere, swap hardware freely. The valuable scientific code survives equipment upgrades.

Red Flags That You Need DIP

Watch for these warning signs:

  • You can't run tests without physical hardware connected
  • "Just use the production database for development"
  • Code won't compile/run unless external services are available
  • You write if testing: ... else: ... branches throughout your code
  • Team members fight over access to shared hardware
  • You comment out tests because they require equipment
  • Changes to hardware require rewriting business logic
  • You can't work on code while equipment is in use
  • Switching from file to database requires massive refactoring
  • Your test suite takes hours because it talks to real systems

If you can't test without expensive equipment, you need DIP.

Common Mistakes: Over-Abstraction

The biggest DIP mistake: abstracting everything.

❌ Don't abstract:

# WRONG: Abstracting basic operations
class Adder(ABC):
    @abstractmethod
    def add(self, a, b): pass

class NumpyAdder(Adder):
    def add(self, a, b):
        return a + b  # Ridiculous!
Enter fullscreen mode Exit fullscreen mode

Just use a + b directly.

✅ Do abstract:

# RIGHT: Abstracting external dependencies
class TemperatureSensor(ABC):
    @abstractmethod
    def read_temperature(self): pass

class SerialSensor(TemperatureSensor):
    def read_temperature(self):
        # Complex hardware communication
        ...
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: Abstract at the boundary where your code meets the outside world (hardware, files, network, databases). Don't abstract internal logic.

Practical Refactoring Strategy

If you have existing code that's tightly coupled, here's how to refactor it:

Step 1: Identify your dependencies (hardware, file I/O, external services)

Step 2: Create abstract interfaces for each dependency

Step 3: Refactor your main class to accept dependencies through its constructor

Step 4: Create concrete implementations of the abstractions

Step 5: Create mock/test implementations

Step 6: Update your code to inject dependencies

You don't have to do this all at once! Start with the dependency that causes you the most pain (usually hardware).

Example: Refactoring tightly-coupled code:

# BEFORE: Tightly coupled
class Analyzer:
    def __init__(self):
        self.sensor = serial.Serial('/dev/ttyUSB0')  # ← Tight coupling

    def analyze(self):
        data = self.sensor.readline()  # ← Can't test without hardware
        return np.mean(data)

# Step 1-3: Extract interface, inject dependency
class Analyzer:
    def __init__(self, sensor: DataSource):  # ← Now flexible
        self.sensor = sensor

    def analyze(self):
        data = self.sensor.read()  # ← Works with any source
        return np.mean(data)

# Step 4-6: Can now test!
mock = MockDataSource()
analyzer = Analyzer(mock)
result = analyzer.analyze()  # ← No hardware needed!
Enter fullscreen mode Exit fullscreen mode

How DIP Relates to Other SOLID Principles

DIP completes the SOLID toolkit:

  • SRP: TemperatureMonitor has one job (monitoring), not creating sensors
  • OCP: Add new sensor types without modifying TemperatureMonitor
  • LSP: All TemperatureSensor implementations are substitutable
  • ISP: Each abstraction is focused (sensor vs storage, not combined)
  • DIP: High-level logic depends on abstractions, not concrete hardware

Together, these principles create code that's maintainable, testable, and flexible—exactly what long-running scientific projects need.

Performance Notes

DIP adds a layer of indirection (calling through an interface), but this overhead is negligible compared to actual I/O operations (reading sensors, writing files, database queries).

The real performance benefit: you can optimize or swap implementations without changing calling code. Need a faster storage format? Implement FastBinaryStorage. Need caching? Implement CachedSensor. The abstraction layer enables optimization.

When to Skip DIP

DIP adds upfront complexity. Skip it for:

  • Quick exploratory scripts (notebook/one-off territory on the spectrum)
  • One-person projects where you're certain the dependencies won't change
  • Very simple programs with no testing requirements
  • Prototypes where you're still figuring out what you need

But consider adding it later when you move from exploratory → production territory:

  • The code becomes mission-critical
  • Multiple people need to work on it
  • You need automated testing
  • The dependencies start causing pain

Summary

The Dependency Inversion Principle transforms rigid, hardware-dependent scientific code into flexible, testable software. By depending on abstractions instead of concrete implementations, you can:

  • Test without hardware
  • Reuse analysis logic with different data sources
  • Swap components easily
  • Simulate edge cases
  • Work on code while equipment is unavailable

The key insight: Your valuable scientific logic should be independent of the messy details of how data gets in and out of your system.

Your Turn

  1. Identify one pain point in your current code where DIP would help
  2. Identify the dependency (sensor, instrument, file format)
  3. Create an abstraction for that dependency
  4. Create a mock implementation for testing
  5. Refactor your code to accept the dependency
  6. Write tests using the mock

Start with just one dependency—the one causing you the most pain.


Series Conclusion

This completes our journey through the five SOLID principles for scientific programming. We've covered:

  • SRP: Separating concerns so each class has one job
  • OCP: Extending functionality without modifying tested code
  • LSP: Ensuring subclasses are truly substitutable
  • ISP: Creating lean, focused interfaces
  • DIP: Decoupling logic from hardware dependencies

Together, these principles help you write code that's maintainable, testable, and flexible—code that survives lab equipment changes, team turnover, and evolving requirements.

Remember: Don't apply all principles to all code. Start simple, refactor when pain appears, and use these principles as tools to solve specific problems, not as rigid rules.

Your exploratory Jupyter notebook doesn't need SOLID. Your production pipeline that runs daily for two years? That's where SOLID shines.


Have questions or examples from your own scientific code? Share them in the comments below!

Previous posts in this series:

Top comments (0)