DEV Community

Cover image for Relicensing Open Source with an AI
Sophia Parafina
Sophia Parafina

Posted on

Relicensing Open Source with an AI

Simon Willison posed this question in his blog:

"Can coding agents relicense open source through a “clean room” implementation of code?"

Good question! I decided to talk to claude about it. Here is the transcript from that session.

TL;DR: Maybe, sorta.

Clean Room Implementation Discussion

Exchange 1: Initial Question

User:
If given an open source GitHub repository, can you write as spec from FORTRAN in the repository? Using that spec, can you write a clean room implementation in Python?

Assistant:
Yes, I can do both, but with important caveats and legal considerations:

Part 1: Writing a Specification from FORTRAN Code

What I CAN do:

  • ✅ Read FORTRAN source code from a GitHub repository
  • ✅ Analyze the algorithm's logic, inputs, outputs, and data flow
  • ✅ Document the mathematical operations and computational methods
  • ✅ Describe what the code does at a functional level
  • ✅ Create a language-agnostic specification of the algorithm
  • ✅ Identify key functions, subroutines, and their purposes
  • ✅ Document expected behavior and edge cases

Example Process:

FORTRAN Code → Analyze → Specification Document
                ↓
        - Algorithm description
        - Input/output formats
        - Mathematical operations
        - Data structures
        - Expected behavior
Enter fullscreen mode Exit fullscreen mode

Part 2: Clean Room Implementation in Python

What "Clean Room" Means:
A clean room implementation is created by someone who has never seen the original source code, working only from the specification. This is a legal strategy to avoid copyright infringement.

The Problem with AI doing Clean Room:
❌ I cannot truly do a "clean room" implementation because:

  1. I've already seen the FORTRAN code when writing the spec
  2. True clean room requires a different person who hasn't seen the original
  3. This is the legal requirement to avoid derivative work claims

What I CAN Actually Do

Approach 1: Specification + Implementation (Not Legally Clean Room)

I can create both, but this is NOT a clean room implementation in the legal sense:

# Step 1: Analyze FORTRAN
"""
Example FORTRAN subroutine analysis:
- Computes seismic travel time using ray tracing
- Inputs: source depth, receiver distance, velocity model
- Output: P-wave travel time
- Algorithm: Snell's law applied iteratively
"""

# Step 2: Python implementation based on spec
def compute_travel_time(depth, distance, velocity_model):
    """
    Compute seismic travel time using ray tracing.
    Based on specification derived from original algorithm.
    """
    # Implementation here
    pass
Enter fullscreen mode Exit fullscreen mode

Legal Status: This is a derivative work because the same person (me) saw both the original code and wrote the new version.

Approach 2: Algorithm Description Only (Safe)

What I should do instead:

## Specification Document

### Function: compute_travel_time

**Purpose:** Calculate seismic wave travel time through layered Earth model

**Algorithm:**
1. Initialize ray parameter
2. For each layer in velocity model:
   - Apply Snell's law: p = sin(i)/v
   - Calculate ray path segment
   - Accumulate travel time
3. Return total time

**Inputs:**
- source_depth: float (km)
- receiver_distance: float (km)  
- velocity_model: array of (depth, velocity) pairs

**Outputs:**
- travel_time: float (seconds)

**Mathematical Basis:**
- Snell's Law: n₁sin(θ₁) = n₂sin(θ₂)
- Travel time: t = ∫ ds/v(s)
Enter fullscreen mode Exit fullscreen mode

Then a different developer (not me) would implement this in Python without seeing the FORTRAN.

Practical Approaches That ARE Legal

Option A: Use Existing Python Ports

Many FORTRAN seismology codes already have Python implementations:

  • TauP (seismic travel times) → ObsPy has obspy.taup
  • SAC (Seismic Analysis Code) → ObsPy can read/write SAC
  • SPECFEM (spectral elements) → Has Python wrappers

Option B: Call FORTRAN from Python

import subprocess
import numpy as np

def call_fortran_program(input_file, output_file):
    """Wrapper around existing FORTRAN executable"""
    subprocess.run(['./fortran_program', input_file, output_file])
    return np.loadtxt(output_file)
Enter fullscreen mode Exit fullscreen mode

Option C: Use f2py (FORTRAN to Python)

# Automatically wrap FORTRAN code
# f2py -c fortran_module.f90 -m fortran_module

import fortran_module

result = fortran_module.compute_something(input_data)
Enter fullscreen mode Exit fullscreen mode

Legal Considerations

When You CAN Reimplement:

If the repository has a permissive license:

  • MIT, BSD, Apache 2.0 allow derivative works
  • You can study the code and rewrite in Python
  • Must maintain original license and attribution

If implementing a published algorithm:

  • Scientific papers describe algorithms
  • Mathematical formulas are not copyrightable
  • You can implement the published method

If using clean room with TWO people:

  • Person A reads FORTRAN, writes spec (no code)
  • Person B reads spec only, writes Python
  • Person B never sees FORTRAN code

When You CANNOT:

Copyright infringement:

  • Copying proprietary code without permission
  • Violating restrictive licenses (GPL requires keeping GPL)
  • Translating code line-by-line without authorization

Patent issues:

  • Some algorithms may be patented
  • Implementation could violate patents regardless of clean room

What I Can Help You With

1. Algorithm Understanding

I can analyze FORTRAN code and explain:
- What algorithm it implements
- Mathematical basis
- Key computational steps
- Expected inputs/outputs
- Performance characteristics
Enter fullscreen mode Exit fullscreen mode

2. Specification Writing

I can create detailed specifications:
- Function signatures
- Data structures
- Algorithm pseudocode
- Test cases
- Validation criteria
Enter fullscreen mode Exit fullscreen mode

3. Python Implementation Guidance

# I can suggest Python approaches:
def suggested_implementation():
    """
    Guidance on:
    - Appropriate Python libraries (NumPy, SciPy)
    - Data structures to use
    - Optimization strategies
    - Testing approaches
    """
    pass
Enter fullscreen mode Exit fullscreen mode

4. What I Won't Do

❌ Help circumvent copyrights or licenses
❌ Claim a line-by-line translation is "clean room"
❌ Implement proprietary algorithms without permission

Practical Example

If you give me a FORTRAN seismology code, I can:

## Step 1: Analysis
- Identify it computes focal mechanisms from first motions
- Uses grid search over strike/dip/rake
- Outputs beach ball plots

## Step 2: Specification
- Algorithm: Grid search optimization
- Inputs: Station azimuths, polarities, uncertainties
- Outputs: Best-fit focal mechanism parameters
- Constraints: Physical bounds on angles

## Step 3: Recommendation
"This algorithm is already implemented in ObsPy's 
obspy.imaging.beachball module and PyRocko's 
moment tensor tools. Suggest using those instead 
of reimplementing."

## Step 4: If You Must Reimplement
Provide architectural guidance:
- Use NumPy for grid computation
- Use matplotlib for visualization
- Structure as class with methods
- Include unit tests
Enter fullscreen mode Exit fullscreen mode

Bottom Line

Can I do what you're asking?

  • Technically: Yes, I can read FORTRAN and write Python
  • Legally as "clean room": No, because I'd see both versions
  • Practically useful: Yes, I can create specs and guide implementation

Best approach:

  1. Check if the code is already ported to Python
  2. Check the license - if permissive, just port it directly
  3. If you need clean room, have me write the spec, then you implement without looking at FORTRAN
  4. Consider using f2py or subprocess wrappers instead

Would you like me to analyze a specific FORTRAN repository? I can:

  • Explain what it does
  • Document the algorithms
  • Suggest Python equivalents
  • Write implementation guidance

Just share the repository URL and I'll help within legal and ethical bounds!


Exchange 2: The Two-Session Workaround

User:
What if I just presented the spec to claude without the memory of generating the spec?

Assistant:
Excellent question! This is actually a clever workaround that addresses the "clean room" problem in an interesting way.

Does This Work?

Technically: Yes, with caveats
Legally: Probably, but not perfectly

Let me break this down:

The Two-Claude Approach

Scenario:

Session 1 (Claude A): 
  FORTRAN code → Reads and analyzes → Writes specification

Session 2 (Claude B - fresh chat):
  Specification only → Implements Python → No memory of FORTRAN
Enter fullscreen mode Exit fullscreen mode

Does Claude B "remember" the FORTRAN?

No, in the technical sense:

  • ✅ Each new chat session has no memory of previous sessions
  • ✅ Claude B genuinely hasn't "seen" the FORTRAN code
  • ✅ Claude B only has the specification to work from
  • ✅ This mimics the two-person clean room approach

But there are complications:

Legal Gray Areas

Problem 1: Same AI, Same Training

Both "Claudes" are the same model with the same training:

  • If the FORTRAN code was in my training data, both sessions "know" it
  • I might unconsciously reproduce patterns from training
  • Not truly independent like two different humans

Problem 2: Specification Leakage

If the spec is too detailed, it might contain implementation details:

❌ BAD SPEC (Too detailed):
"Line 45: Use DO loop from I=1 to N incrementing by 1
 Store result in ARRAY(I) = SQRT(X(I)**2 + Y(I)**2)
 Use DSQRT for double precision"

✅ GOOD SPEC (Implementation-agnostic):
"Compute Euclidean distance from origin for each point
 Input: x and y coordinate arrays
 Output: distance array
 Precision: Double precision floating point"
Enter fullscreen mode Exit fullscreen mode

The bad spec is basically transliteration; the good spec describes the what, not the how.

Problem 3: Legal Precedent Unclear

Clean room implementation has legal precedent with humans:

  • Courts recognize two different people = independent creation
  • No case law yet on AI "sessions" as "different entities"
  • Unclear if this would hold up legally

How to Make This More Defensible

Approach 1: Extremely Abstract Specification

Session 1 - Analysis:

## High-Level Specification

**Purpose:** Compute seismic ray paths through layered velocity model

**Inputs:**
- Source location (lat, lon, depth)
- Receiver locations (array of lat, lon, elevation)
- Velocity model (1D layered structure)

**Outputs:**
- Travel times for each source-receiver pair
- Ray parameter
- Takeoff angles

**Algorithm Type:** Numerical ray tracing using Snell's law

**Mathematical Basis:**
- Snell's law: p = r·sin(i)/v
- Ray parameter conservation
- Integration along ray path

**Requirements:**
- Handle velocity discontinuities
- Support multiple seismic phases (P, S)
- Accuracy: < 0.1 second error for teleseismic distances
Enter fullscreen mode Exit fullscreen mode

Session 2 - Implementation:

# Claude B sees only the above spec
# Implements using standard numerical methods
# Chooses own data structures and algorithms
Enter fullscreen mode Exit fullscreen mode

This is defensible because:

  • ✅ Describes functionality, not implementation
  • ✅ Based on published physics (Snell's law)
  • ✅ No FORTRAN-specific details
  • ✅ Multiple valid implementations possible

Approach 2: Reference Published Algorithms

Session 1:

## Specification

The code implements the algorithm described in:
- Buland, R., & Chapman, C. H. (1983). "The computation 
  of seismic travel times." Bulletin of the Seismological 
  Society of America, 73(5), 1271-1302.

Specifically implements:
- Section 3.2: Flat-earth transformation
- Section 4.1: Ray parameter calculation
- Section 4.3: Travel time integration

**Input format:** As described in Table 1 of paper
**Output format:** As described in Table 3 of paper
Enter fullscreen mode Exit fullscreen mode

Session 2:

# Implement based on published paper + spec
# This is clearly legal - implementing published science
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • ✅ Published algorithms are not copyrightable
  • ✅ Mathematical formulas are facts
  • ✅ Clear independent source (the paper)

Approach 3: Add Human Review

Session 1 (Claude A): FORTRAN → Specification
         ↓
Human reviewer: Check spec is abstract enough
         ↓
Session 2 (Claude B): Specification → Python
         ↓
Human reviewer: Verify no FORTRAN patterns copied
Enter fullscreen mode Exit fullscreen mode

The human review adds defensibility.

Practical Workflow

Step-by-Step Process:

Session 1 - Specification Generation:

You: "Here's FORTRAN code from [repo]. Write an abstract 
     specification that describes WHAT it does, not HOW."

Claude A: [Generates abstract spec focused on:
          - Mathematical basis
          - Input/output contracts
          - Algorithm category (not specific implementation)
          - Test cases
          - Performance requirements]
Enter fullscreen mode Exit fullscreen mode

Between Sessions:

You review the spec and ask:
- Does it describe functionality or implementation?
- Could someone implement this differently?
- Are there FORTRAN-specific details? (Remove them)
- Is it based on published methods? (Cite them)
Enter fullscreen mode Exit fullscreen mode

Session 2 - Fresh Chat:

You: "Implement this specification in Python. Choose your own
     data structures and algorithms."

Claude B: [Implements using:
          - Modern Python idioms
          - NumPy/SciPy where appropriate
          - Different algorithm choices where equivalent
          - Pythonic structure (not FORTRAN translated)]
Enter fullscreen mode Exit fullscreen mode

What Makes This Legitimate

The Spec Must Be:

Functional, not procedural

Good: "Calculate eigenvalues of matrix"
Bad:  "Use QR decomposition with Givens rotations"
Enter fullscreen mode Exit fullscreen mode

Domain knowledge, not code translation

Good: "Apply Snell's law at velocity boundaries"
Bad:  "In loop at line 234, use formula A = B*SIN(C)/D"
Enter fullscreen mode Exit fullscreen mode

Multiple implementations possible

Good: "Find minimum of function using optimization"
Bad:  "Use steepest descent with step size 0.01"
Enter fullscreen mode Exit fullscreen mode

Based on public algorithms

Good: "Implements Aki-Richards approximation (Aki & Richards, 1980)"
Bad:  "Uses custom unpublished method from FORTRAN code"
Enter fullscreen mode Exit fullscreen mode

Comparison to Human Clean Room

Aspect Two Humans Two Claude Sessions
Different entities Yes (legally distinct) Unclear (same AI)
Independent knowledge Yes Same training data
No memory transfer Yes Yes (sessions isolated)
Different implementations Likely Possibly similar style
Legal precedent Established None yet
Defensibility Strong Moderate

My Recommendation

This approach is BETTER than having me do both in one session, but:

To maximize defensibility:

  1. Keep Session 1 spec very abstract

    • Cite published papers
    • Describe math, not code structure
    • Focus on "what" not "how"
  2. Add human steps:

    • Human reviews spec for abstraction
    • Human verifies Session 2 output is novel
    • Human can testify to process if needed
  3. Document the process:

    • Save both sessions
    • Show spec is abstract
    • Demonstrate independent implementation
  4. Check the license first:

    • If MIT/BSD/Apache → just port it directly
    • If GPL → you must keep GPL
    • If proprietary → need permission anyway
  5. Consider if it's worth it:

    • Is there already a Python version?
    • Can you use f2py wrapper instead?
    • Is the algorithm in published papers?

Bottom Line

Your idea is clever and better than one session, but:

Best ✅: Human A writes spec → Human B implements
Good ✅: Claude writes abstract spec → Human implements

Okay ⚠️: Claude Session 1 → spec → Claude Session 2 (with very abstract spec)
Risky ❌: Claude does both in one session
Bad ❌: Line-by-line translation regardless of sessions

Want to try it? Give me a FORTRAN repository and I'll:

  1. Write a maximally abstract specification
  2. You review it for implementation details
  3. Start a new chat and have that Claude implement it
  4. Compare to see if the implementations differ significantly

This would be an interesting experiment in AI clean room development!


Summary

This transcript contains the complete discussion about clean room implementation using AI, specifically:

  1. Initial Question: Can Claude write a spec from FORTRAN and implement it in Python as a clean room implementation?

  2. Initial Answer: Explained that while technically possible, it's not legally a true clean room because the same entity (Claude) would see both the original code and create the new implementation.

  3. Two-Session Workaround: User proposed using two separate Claude sessions - one to create the spec, another (with no memory) to implement it.

  4. Analysis of Workaround: Detailed explanation of how this approach is better than single-session but still has legal gray areas due to:

    • Same AI model with same training
    • Potential specification leakage
    • Unclear legal precedent for AI "sessions" as separate entities
  5. Best Practices: Recommendations for making the two-session approach more defensible:

    • Keep specifications extremely abstract
    • Add human review steps
    • Reference published algorithms
    • Document the process
    • Check licenses first
  6. Comparison: Table comparing two-human clean room (legally established) vs two-session AI approach (legally uncertain)

  7. Conclusion: The two-session approach is better than single-session but not as defensible as true human clean room. Best combined with human review and very abstract specifications.


Key Takeaways

What Makes a Good Specification for Clean Room:

  • Describes what to accomplish, not how to implement
  • Multiple valid implementations possible
  • Based on published algorithms or mathematical principles
  • No code-specific details (no line numbers, variable names, loop structures)
  • Cites academic papers when applicable

Defensibility Hierarchy:

  1. Most Defensible: Two different humans with documented process
  2. Good: AI creates abstract spec → Human implements
  3. Moderate: AI Session 1 (abstract spec) → AI Session 2 (implementation)
  4. Risky: Single AI session does both
  5. Indefensible: Line-by-line translation regardless of method

Alternative Approaches to Consider:

  • Check if Python port already exists
  • Use permissive open-source licenses directly
  • Implement from published scientific papers
  • Use f2py to wrap FORTRAN code
  • Call FORTRAN as subprocess from Python

Legal Considerations:

  • Permissive licenses (MIT, BSD, Apache) allow derivative works
  • Published algorithms and math formulas are not copyrightable
  • GPL code must stay GPL
  • Patents are separate issue from copyright
  • No case law yet on AI sessions as separate entities

Top comments (0)