Simon Willison posed this question in his blog:
"Can coding agents relicense open source through a “clean room” implementation of code?"
Good question! I decided to talk to claude about it. Here is the transcript from that session.
TL;DR: Maybe, sorta.
Clean Room Implementation Discussion
Exchange 1: Initial Question
User:
If given an open source GitHub repository, can you write as spec from FORTRAN in the repository? Using that spec, can you write a clean room implementation in Python?
Assistant:
Yes, I can do both, but with important caveats and legal considerations:
Part 1: Writing a Specification from FORTRAN Code
What I CAN do:
- ✅ Read FORTRAN source code from a GitHub repository
- ✅ Analyze the algorithm's logic, inputs, outputs, and data flow
- ✅ Document the mathematical operations and computational methods
- ✅ Describe what the code does at a functional level
- ✅ Create a language-agnostic specification of the algorithm
- ✅ Identify key functions, subroutines, and their purposes
- ✅ Document expected behavior and edge cases
Example Process:
FORTRAN Code → Analyze → Specification Document
↓
- Algorithm description
- Input/output formats
- Mathematical operations
- Data structures
- Expected behavior
Part 2: Clean Room Implementation in Python
What "Clean Room" Means:
A clean room implementation is created by someone who has never seen the original source code, working only from the specification. This is a legal strategy to avoid copyright infringement.
The Problem with AI doing Clean Room:
❌ I cannot truly do a "clean room" implementation because:
- I've already seen the FORTRAN code when writing the spec
- True clean room requires a different person who hasn't seen the original
- This is the legal requirement to avoid derivative work claims
What I CAN Actually Do
Approach 1: Specification + Implementation (Not Legally Clean Room)
I can create both, but this is NOT a clean room implementation in the legal sense:
# Step 1: Analyze FORTRAN
"""
Example FORTRAN subroutine analysis:
- Computes seismic travel time using ray tracing
- Inputs: source depth, receiver distance, velocity model
- Output: P-wave travel time
- Algorithm: Snell's law applied iteratively
"""
# Step 2: Python implementation based on spec
def compute_travel_time(depth, distance, velocity_model):
"""
Compute seismic travel time using ray tracing.
Based on specification derived from original algorithm.
"""
# Implementation here
pass
Legal Status: This is a derivative work because the same person (me) saw both the original code and wrote the new version.
Approach 2: Algorithm Description Only (Safe)
What I should do instead:
## Specification Document
### Function: compute_travel_time
**Purpose:** Calculate seismic wave travel time through layered Earth model
**Algorithm:**
1. Initialize ray parameter
2. For each layer in velocity model:
- Apply Snell's law: p = sin(i)/v
- Calculate ray path segment
- Accumulate travel time
3. Return total time
**Inputs:**
- source_depth: float (km)
- receiver_distance: float (km)
- velocity_model: array of (depth, velocity) pairs
**Outputs:**
- travel_time: float (seconds)
**Mathematical Basis:**
- Snell's Law: n₁sin(θ₁) = n₂sin(θ₂)
- Travel time: t = ∫ ds/v(s)
Then a different developer (not me) would implement this in Python without seeing the FORTRAN.
Practical Approaches That ARE Legal
Option A: Use Existing Python Ports
Many FORTRAN seismology codes already have Python implementations:
- TauP (seismic travel times) → ObsPy has
obspy.taup - SAC (Seismic Analysis Code) → ObsPy can read/write SAC
- SPECFEM (spectral elements) → Has Python wrappers
Option B: Call FORTRAN from Python
import subprocess
import numpy as np
def call_fortran_program(input_file, output_file):
"""Wrapper around existing FORTRAN executable"""
subprocess.run(['./fortran_program', input_file, output_file])
return np.loadtxt(output_file)
Option C: Use f2py (FORTRAN to Python)
# Automatically wrap FORTRAN code
# f2py -c fortran_module.f90 -m fortran_module
import fortran_module
result = fortran_module.compute_something(input_data)
Legal Considerations
When You CAN Reimplement:
✅ If the repository has a permissive license:
- MIT, BSD, Apache 2.0 allow derivative works
- You can study the code and rewrite in Python
- Must maintain original license and attribution
✅ If implementing a published algorithm:
- Scientific papers describe algorithms
- Mathematical formulas are not copyrightable
- You can implement the published method
✅ If using clean room with TWO people:
- Person A reads FORTRAN, writes spec (no code)
- Person B reads spec only, writes Python
- Person B never sees FORTRAN code
When You CANNOT:
❌ Copyright infringement:
- Copying proprietary code without permission
- Violating restrictive licenses (GPL requires keeping GPL)
- Translating code line-by-line without authorization
❌ Patent issues:
- Some algorithms may be patented
- Implementation could violate patents regardless of clean room
What I Can Help You With
1. Algorithm Understanding
I can analyze FORTRAN code and explain:
- What algorithm it implements
- Mathematical basis
- Key computational steps
- Expected inputs/outputs
- Performance characteristics
2. Specification Writing
I can create detailed specifications:
- Function signatures
- Data structures
- Algorithm pseudocode
- Test cases
- Validation criteria
3. Python Implementation Guidance
# I can suggest Python approaches:
def suggested_implementation():
"""
Guidance on:
- Appropriate Python libraries (NumPy, SciPy)
- Data structures to use
- Optimization strategies
- Testing approaches
"""
pass
4. What I Won't Do
❌ Help circumvent copyrights or licenses
❌ Claim a line-by-line translation is "clean room"
❌ Implement proprietary algorithms without permission
Practical Example
If you give me a FORTRAN seismology code, I can:
## Step 1: Analysis
- Identify it computes focal mechanisms from first motions
- Uses grid search over strike/dip/rake
- Outputs beach ball plots
## Step 2: Specification
- Algorithm: Grid search optimization
- Inputs: Station azimuths, polarities, uncertainties
- Outputs: Best-fit focal mechanism parameters
- Constraints: Physical bounds on angles
## Step 3: Recommendation
"This algorithm is already implemented in ObsPy's
obspy.imaging.beachball module and PyRocko's
moment tensor tools. Suggest using those instead
of reimplementing."
## Step 4: If You Must Reimplement
Provide architectural guidance:
- Use NumPy for grid computation
- Use matplotlib for visualization
- Structure as class with methods
- Include unit tests
Bottom Line
Can I do what you're asking?
- Technically: Yes, I can read FORTRAN and write Python
- Legally as "clean room": No, because I'd see both versions
- Practically useful: Yes, I can create specs and guide implementation
Best approach:
- Check if the code is already ported to Python
- Check the license - if permissive, just port it directly
- If you need clean room, have me write the spec, then you implement without looking at FORTRAN
- Consider using f2py or subprocess wrappers instead
Would you like me to analyze a specific FORTRAN repository? I can:
- Explain what it does
- Document the algorithms
- Suggest Python equivalents
- Write implementation guidance
Just share the repository URL and I'll help within legal and ethical bounds!
Exchange 2: The Two-Session Workaround
User:
What if I just presented the spec to claude without the memory of generating the spec?
Assistant:
Excellent question! This is actually a clever workaround that addresses the "clean room" problem in an interesting way.
Does This Work?
Technically: Yes, with caveats
Legally: Probably, but not perfectly
Let me break this down:
The Two-Claude Approach
Scenario:
Session 1 (Claude A):
FORTRAN code → Reads and analyzes → Writes specification
Session 2 (Claude B - fresh chat):
Specification only → Implements Python → No memory of FORTRAN
Does Claude B "remember" the FORTRAN?
No, in the technical sense:
- ✅ Each new chat session has no memory of previous sessions
- ✅ Claude B genuinely hasn't "seen" the FORTRAN code
- ✅ Claude B only has the specification to work from
- ✅ This mimics the two-person clean room approach
But there are complications:
Legal Gray Areas
Problem 1: Same AI, Same Training
Both "Claudes" are the same model with the same training:
- If the FORTRAN code was in my training data, both sessions "know" it
- I might unconsciously reproduce patterns from training
- Not truly independent like two different humans
Problem 2: Specification Leakage
If the spec is too detailed, it might contain implementation details:
❌ BAD SPEC (Too detailed):
"Line 45: Use DO loop from I=1 to N incrementing by 1
Store result in ARRAY(I) = SQRT(X(I)**2 + Y(I)**2)
Use DSQRT for double precision"
✅ GOOD SPEC (Implementation-agnostic):
"Compute Euclidean distance from origin for each point
Input: x and y coordinate arrays
Output: distance array
Precision: Double precision floating point"
The bad spec is basically transliteration; the good spec describes the what, not the how.
Problem 3: Legal Precedent Unclear
Clean room implementation has legal precedent with humans:
- Courts recognize two different people = independent creation
- No case law yet on AI "sessions" as "different entities"
- Unclear if this would hold up legally
How to Make This More Defensible
Approach 1: Extremely Abstract Specification
Session 1 - Analysis:
## High-Level Specification
**Purpose:** Compute seismic ray paths through layered velocity model
**Inputs:**
- Source location (lat, lon, depth)
- Receiver locations (array of lat, lon, elevation)
- Velocity model (1D layered structure)
**Outputs:**
- Travel times for each source-receiver pair
- Ray parameter
- Takeoff angles
**Algorithm Type:** Numerical ray tracing using Snell's law
**Mathematical Basis:**
- Snell's law: p = r·sin(i)/v
- Ray parameter conservation
- Integration along ray path
**Requirements:**
- Handle velocity discontinuities
- Support multiple seismic phases (P, S)
- Accuracy: < 0.1 second error for teleseismic distances
Session 2 - Implementation:
# Claude B sees only the above spec
# Implements using standard numerical methods
# Chooses own data structures and algorithms
This is defensible because:
- ✅ Describes functionality, not implementation
- ✅ Based on published physics (Snell's law)
- ✅ No FORTRAN-specific details
- ✅ Multiple valid implementations possible
Approach 2: Reference Published Algorithms
Session 1:
## Specification
The code implements the algorithm described in:
- Buland, R., & Chapman, C. H. (1983). "The computation
of seismic travel times." Bulletin of the Seismological
Society of America, 73(5), 1271-1302.
Specifically implements:
- Section 3.2: Flat-earth transformation
- Section 4.1: Ray parameter calculation
- Section 4.3: Travel time integration
**Input format:** As described in Table 1 of paper
**Output format:** As described in Table 3 of paper
Session 2:
# Implement based on published paper + spec
# This is clearly legal - implementing published science
Why this works:
- ✅ Published algorithms are not copyrightable
- ✅ Mathematical formulas are facts
- ✅ Clear independent source (the paper)
Approach 3: Add Human Review
Session 1 (Claude A): FORTRAN → Specification
↓
Human reviewer: Check spec is abstract enough
↓
Session 2 (Claude B): Specification → Python
↓
Human reviewer: Verify no FORTRAN patterns copied
The human review adds defensibility.
Practical Workflow
Step-by-Step Process:
Session 1 - Specification Generation:
You: "Here's FORTRAN code from [repo]. Write an abstract
specification that describes WHAT it does, not HOW."
Claude A: [Generates abstract spec focused on:
- Mathematical basis
- Input/output contracts
- Algorithm category (not specific implementation)
- Test cases
- Performance requirements]
Between Sessions:
You review the spec and ask:
- Does it describe functionality or implementation?
- Could someone implement this differently?
- Are there FORTRAN-specific details? (Remove them)
- Is it based on published methods? (Cite them)
Session 2 - Fresh Chat:
You: "Implement this specification in Python. Choose your own
data structures and algorithms."
Claude B: [Implements using:
- Modern Python idioms
- NumPy/SciPy where appropriate
- Different algorithm choices where equivalent
- Pythonic structure (not FORTRAN translated)]
What Makes This Legitimate
The Spec Must Be:
✅ Functional, not procedural
Good: "Calculate eigenvalues of matrix"
Bad: "Use QR decomposition with Givens rotations"
✅ Domain knowledge, not code translation
Good: "Apply Snell's law at velocity boundaries"
Bad: "In loop at line 234, use formula A = B*SIN(C)/D"
✅ Multiple implementations possible
Good: "Find minimum of function using optimization"
Bad: "Use steepest descent with step size 0.01"
✅ Based on public algorithms
Good: "Implements Aki-Richards approximation (Aki & Richards, 1980)"
Bad: "Uses custom unpublished method from FORTRAN code"
Comparison to Human Clean Room
| Aspect | Two Humans | Two Claude Sessions |
|---|---|---|
| Different entities | Yes (legally distinct) | Unclear (same AI) |
| Independent knowledge | Yes | Same training data |
| No memory transfer | Yes | Yes (sessions isolated) |
| Different implementations | Likely | Possibly similar style |
| Legal precedent | Established | None yet |
| Defensibility | Strong | Moderate |
My Recommendation
This approach is BETTER than having me do both in one session, but:
To maximize defensibility:
-
Keep Session 1 spec very abstract
- Cite published papers
- Describe math, not code structure
- Focus on "what" not "how"
-
Add human steps:
- Human reviews spec for abstraction
- Human verifies Session 2 output is novel
- Human can testify to process if needed
-
Document the process:
- Save both sessions
- Show spec is abstract
- Demonstrate independent implementation
-
Check the license first:
- If MIT/BSD/Apache → just port it directly
- If GPL → you must keep GPL
- If proprietary → need permission anyway
-
Consider if it's worth it:
- Is there already a Python version?
- Can you use f2py wrapper instead?
- Is the algorithm in published papers?
Bottom Line
Your idea is clever and better than one session, but:
Best ✅: Human A writes spec → Human B implements
Good ✅: Claude writes abstract spec → Human implements
Okay ⚠️: Claude Session 1 → spec → Claude Session 2 (with very abstract spec)
Risky ❌: Claude does both in one session
Bad ❌: Line-by-line translation regardless of sessions
Want to try it? Give me a FORTRAN repository and I'll:
- Write a maximally abstract specification
- You review it for implementation details
- Start a new chat and have that Claude implement it
- Compare to see if the implementations differ significantly
This would be an interesting experiment in AI clean room development!
Summary
This transcript contains the complete discussion about clean room implementation using AI, specifically:
Initial Question: Can Claude write a spec from FORTRAN and implement it in Python as a clean room implementation?
Initial Answer: Explained that while technically possible, it's not legally a true clean room because the same entity (Claude) would see both the original code and create the new implementation.
Two-Session Workaround: User proposed using two separate Claude sessions - one to create the spec, another (with no memory) to implement it.
-
Analysis of Workaround: Detailed explanation of how this approach is better than single-session but still has legal gray areas due to:
- Same AI model with same training
- Potential specification leakage
- Unclear legal precedent for AI "sessions" as separate entities
-
Best Practices: Recommendations for making the two-session approach more defensible:
- Keep specifications extremely abstract
- Add human review steps
- Reference published algorithms
- Document the process
- Check licenses first
Comparison: Table comparing two-human clean room (legally established) vs two-session AI approach (legally uncertain)
Conclusion: The two-session approach is better than single-session but not as defensible as true human clean room. Best combined with human review and very abstract specifications.
Key Takeaways
What Makes a Good Specification for Clean Room:
- Describes what to accomplish, not how to implement
- Multiple valid implementations possible
- Based on published algorithms or mathematical principles
- No code-specific details (no line numbers, variable names, loop structures)
- Cites academic papers when applicable
Defensibility Hierarchy:
- Most Defensible: Two different humans with documented process
- Good: AI creates abstract spec → Human implements
- Moderate: AI Session 1 (abstract spec) → AI Session 2 (implementation)
- Risky: Single AI session does both
- Indefensible: Line-by-line translation regardless of method
Alternative Approaches to Consider:
- Check if Python port already exists
- Use permissive open-source licenses directly
- Implement from published scientific papers
- Use f2py to wrap FORTRAN code
- Call FORTRAN as subprocess from Python
Legal Considerations:
- Permissive licenses (MIT, BSD, Apache) allow derivative works
- Published algorithms and math formulas are not copyrightable
- GPL code must stay GPL
- Patents are separate issue from copyright
- No case law yet on AI sessions as separate entities
Top comments (0)