In this post, weβll take a simple implementation of an Epsilon-Greedy Recommender in Java and check whether it follows the SOLID principles. Then, weβll see how to refactor it for better maintainability, extensibility, and testability.
*The Example Code
*
public class EpsilonGreedyRecommender {
private int nItems;
private double epsilon;
private int[] counts;
private double[] values;
private Random random;
public EpsilonGreedyRecommender(int nItems, double epsilon) {
this.nItems = nItems;
this.epsilon = epsilon;
this.counts = new int[nItems];
this.values = new double[nItems];
this.random = new Random();
}
public int recommend() {
if (random.nextDouble() < epsilon) {
return random.nextInt(nItems);
}
int bestIndex = 0;
for (int i = 1; i < nItems; i++) {
if (values[i] > values[bestIndex]) {
bestIndex = i;
}
}
return bestIndex;
}
public void update(int item, double reward) {
counts[item]++;
values[item] += (reward - values[item]) / counts[item];
}
public double[] getValues() {
return values;
}
public int[] getCounts() {
return counts;
}
}
- SRP β Single Responsibility Principle
π Definition:
A class should have only one reason to change β it should have a single responsibility.
π Analysis:
This class is doing multiple things:
- Storing the bandit state (counts, values)
- Implementing the selection policy (recommend())
- Updating statistics (update())
This means any change to the policy logic, or to how state is stored, requires modifying the same class.
β Verdict: SRP is partially violated β we have multiple responsibilities in one place.
- OCP β Open/Closed Principle
π Definition:
Classes should be open for extension but closed for modification.
π Analysis:
If we want to switch to a different policy (e.g., Softmax, UCB), we would have to edit the recommend() method directly.
Better design: define a SelectionPolicy interface and plug in different implementations.
β Verdict: OCP is violated β adding new policies requires modifying the class.
- LSP β Liskov Substitution Principle
π Definition:
Subtypes must be substitutable for their base types without changing program correctness.
π Analysis:
We donβt have inheritance here, so there is nothing to violate.
β Verdict: LSP is respected.
- ISP β Interface Segregation Principle
π Definition:
Clients should not be forced to depend on interfaces they do not use.
π Analysis:
Since we have no interfaces at all, thereβs no problem here.
β Verdict: ISP is respected.
- DIP β Dependency Inversion Principle
π Definition:
Depend on abstractions, not on concrete implementations.
π Analysis:
The class creates its own Random instance. This is a direct dependency on a concrete class, which makes testing harder (no way to inject a predictable RNG).
Better design: inject Random as a dependency via the constructor (or use an interface).
β *Verdict: DIP is violated *β we depend on a concrete Random implementation.
Summary Table
Principle Status Notes
SRP β Multiple responsibilities (state + policy + update logic)
OCP β Cannot add new policies without modifying code
LSP β
No inheritance, no violation
ISP β
No large interfaces, no violation
DIP β Direct dependency on Random, hard to test
Refactored Design
**
Letβs refactor the code to follow **SOLID:
- Introduce a SelectionPolicy interface (Strategy Pattern)
- Inject Random from outside to improve testability
*Step 1: Define the Policy Interface
*
public interface SelectionPolicy {
int select(double[] values);
}
*Step 2: Implement Epsilon-Greedy Policy
*
import java.util.Random;
public class EpsilonGreedyPolicy implements SelectionPolicy {
private final double epsilon;
private final Random random;
public EpsilonGreedyPolicy(double epsilon, Random random) {
this.epsilon = epsilon;
this.random = random;
}
@Override
public int select(double[] values) {
int nItems = values.length;
if (random.nextDouble() < epsilon) {
return random.nextInt(nItems);
}
int bestIndex = 0;
for (int i = 1; i < nItems; i++) {
if (values[i] > values[bestIndex]) {
bestIndex = i;
}
}
return bestIndex;
}
}
*Step 3: Make the Bandit Class Focus on State
*
public class Bandit {
private final int[] counts;
private final double[] values;
private final SelectionPolicy policy;
public Bandit(int nItems, SelectionPolicy policy) {
this.counts = new int[nItems];
this.values = new double[nItems];
this.policy = policy;
}
public int recommend() {
return policy.select(values);
}
public void update(int item, double reward) {
counts[item]++;
values[item] += (reward - values[item]) / counts[item];
}
}
β Now:
SRP is respected β Bandit only manages state, EpsilonGreedyPolicy only handles selection.
OCP is respected β We can add new policies without touching Bandit.
DIP is respected β Random is injected, so we can pass a mock RNG in tests.
Key Takeaways
Applying SOLID makes your code easier to extend and maintain.
Using interfaces and dependency injection helps make your code testable and more robust.
Even small classes can benefit from SOLID β especially if you expect the algorithm to evolve over time.
π‘ What do you think? Would you keep the state and policy together for small projects, or always split them like this?
Top comments (0)