ryoto miyake

Posted on Jun 17

"AI-Powered Development: Building a Java/Python LoRA Model Without Writing a Single Line of Code"

#development #coding #llm #ai

Hello everyone,

I am an aspiring software engineer from Japan, currently transitioning from a non-technical background and seeking new opportunities.

This article serves as my portfolio, documenting an experiment I conducted to answer a single question: "Can you build a fine-tuned LLM without writing a single line of code yourself?"

As English is not my first language, this post has been carefully translated and refined from the original Japanese version, which you can find on Qiita here.

My goal is to share the valuable lessons I learned about AI-driven development with a global audience. The complete source code is also available on GitHub.

Introduction

I'm currently enrolled in a vocational training program for Java and Python development in Japan.
In this era of rapid AI evolution, I found myself asking: "How should engineers approach coding in the AI age?"
To explore this question, I embarked on an experimental project: "Having AI write all the code for LoRA fine-tuning."

Through this challenge, I gained clarity on the skills required for AI-era engineers and what we should focus on learning.
In this article, I'll share the insights and practical lessons learned from this journey.

🔗 Table of Contents

🎯 Why I Decided to Have AI Write LoRA
🛠️ Tools Used
🧩 Implementation Approach
✍️ What I Had AI Write (Full Prompts Included)
🐞 Challenges and AI's Mistakes
🔬 Performance Evaluation
🔁 Reflection
🏁 Conclusion

1. Why I Decided to Have AI Write LoRA

The Catalyst

I noticed that existing generative AI models struggle with specialized niche content (highly specialized fields, etc.), and realized that these issues could be solved through efficiency improvements by additional training with specialized data.

Upon discovering that incorporating custom training data could solve these problems, I explored approaches to achieve this and adopted LoRA—"an efficient training method that specializes models on prepared data by training only additional parameters while maintaining pre-trained models."

Why Have AI Write It?

Writing LoRA code from scratch is extremely challenging for programming beginners, but we have a modern tool at our disposal: AI.
This led me to the question: "Can this challenge be solved by having AI write everything?" And so I put it into action.

2. Tools Used

Development Environment
- Cursor: Using Agent feature with Claude 3.7 Sonnet integration
- Claude Desktop: Used alongside for error handling and distributing token usage

I have monthly subscriptions to both Cursor and Claude Pro versions.

For questions about errors and issues, I used Claude Desktop instead of Cursor's built-in Ask feature.
*Note: During the later stages of development, Claude 4.0 was released, so issues encountered from that point were addressed using Claude 4.0.

Why I Didn't Use Claude Code, Windsurf, or Devin
- Token usage was unpredictable during the planning phase
- Claude Code wasn't available for Pro users at that time
- I had already subscribed to Cursor and Claude Pro plans

3. Implementation Approach

First, I outlined the roadmap to implementation:

[1] Prerequisites and Setup
[2] Downloading the Base Model
[3] Formatting Custom Datasets
[4] Executing LoRA Fine-tuning
[5] Implementing Inference Code
[6] Deploying Model and Results

[1] Prerequisites and Setup

Since I was learning Java and Python, I decided to build an LLM specialized in these languages.
The envisioned end product was a conversational AI similar to ChatGPT.

I decided to use Docker to publish the final product on GitHub and enable others to reproduce the same environment.
Additionally, since GPU usage was necessary, I defined CUDA utilization as a prerequisite.

[2] Downloading the Base Model

I selected ELYZA-japanese-CodeLlama-7b.

The selection criteria were: 7B models were optimal for my PC specs, strong code generation capabilities,
and its pre-training with a focus on the Japanese language.

Alternatives Considered:
- Mistral-7B: High versatility but inferior in code specialization
- Gemma-7B: Similarly more general-purpose
- Larger models: Excluded due to VRAM resource constraints
[3] Formatting Custom Datasets

I started by extracting data through web scraping from the following three sources:

GitHub
AtCoder from CodeContests

The selection criteria were as follows:

GitHub
- Limited to highly-rated repositories with 1000+ stars
- High-quality code that has undergone code review
- Learning production-level code structures
~~Qiita~~
- ~~Abundant technical explanations in Japanese~~
- ~~Enhanced learning effect through code-explanation pairs~~
- ~~Strengthening Japanese code generation capabilities~~
AtCoder
- Code patterns for algorithmic thinking
- Efficient code under constraints
- Practical solutions from competitive programming

These were my selection criteria.

Qiita was initially a scraping target but was removed.
The reason was dataset quality issues, which I'll detail later.

After extraction, I converted these into JSON format for fine-tuning.

[4] Executing LoRA Fine-tuning
[5] Implementing Inference Code
[6] Deploying Model and Results
These three steps encountered issues, so I'll describe them in the later section.

4. What I Had AI Write

Everything!

Here's the initial prompt I provided:

Full Prompt

Please build according to the following requirements.
Implementation is intended for Ubuntu under Docker environment, so please generate Dockerfile and Docker Compose as well.
Also, don't forget CUDA setup as we'll be using GPU for tuning.
Expected GPU specs: RTX 4070 Ti Super with 16GB VRAM.
Expected physical memory: 32GB.
Expected CPU: i7-14700K.

Download ELYZA-japanese-CodeLlama-7b (URL omitted in this article) to local environment and build a local CodeLLaMA by LoRA fine-tuning with Java/Python data.
Required datasets are as follows:
- GitHub
- Qiita
- AtCoder from CodeContests
These will be scraped.

Also, scraping intervals should be 1 second for GitHub and 2 seconds for Qiita.
Total scraping count should be 100 items each for both GitHub and Qiita.

For AtCoder data, please scrape from the following URL:
[URL omitted in this article]

API tokens for CodeLLaMA, GitHub, and Qiita will be input by the user later.
Since users will input environment variable settings, please include environment variables in the code.

Next, proceed to dataset formatting.
Please output code to format the scraped datasets into JSON.

After formatting, please output code for LoRA fine-tuning.
Then implement inference code.

Finally, document all these procedures including execution commands in a README.MD file.

Of course, this alone didn't work perfectly.

5. Challenges and AI's Mistakes

Docker Compose Issues

Problems:
- The generated Docker Compose didn't properly allocate CPU and memory
- AtCoder decompression was estimated to take about 26-27 hours
Root Cause:
- Despite providing host PC specs, appropriate Docker Compose wasn't generated
Solution:
- Added supplementary prompts considering WSL environment + memory/GPU/CPU allocation instructions
- Since I was using WSL→Docker connection, I also optimized WSLConfig (manual adjustment)

Here's the corrective prompt I provided:

Docker Compose Reconfiguration Prompt

The current Docker Compose configuration doesn't fully utilize the host PC's specifications.
Host PC environment:
i7-14700K with 20 cores (8P+12E), 64GB memory, GPU is RTX 4070 Ti Super (16GB VRAM).
Please reconfigure to use 16 CPU cores, up to 48GB memory maximum, and allocate 10GB GPU memory.
Since the environment is built with Host PC→WSL→Docker connection, please leave about 4GB memory for WSL.
RTX 4070 Ti Super is a non-MIG compatible device, so please allocate 10GB memory using an alternative approach.

After the fix, I was able to build utilizing the host PC environment properly.
This required more careful attention to the source code - a point for reflection.

API Errors During GitHub Scraping

Problems:
- Frequent API errors due to ambiguous scraping constraints
Root Cause:
- Undefined constraints that should have been determined
- Improper filtering settings
Solution:
- Clearly specified scraping targets
- Improved filtering and sorting to enhance scraping accuracy
- Defined appropriate resource allocation
- Rewrote with additional constraint instructions based on the above

Constraints Prompt

Please add the following constraints to the GitHub scraper:
- API interval should be 1 second
- Skip files with encoding: none
- Add file size limit (process only files under 1MB)
- Implement filtering to detect only .py and .java files
- Extract only repositories with 1000+ stars
- Set default maximum directory traversal depth to 2
- Since resources are ample, set parallel processing that won't hit API limits

These constraints helped eliminate waste and optimize processing.

Fine-tuning Errors

Problem:
- TensorBoard not installed error
Root Cause:
- TensorBoard wasn't installed
Solution:
- Resolved by installing TensorBoard

The AI-generated code was missing necessary library installation definitions.
An interesting discovery that AI can make human-like mistakes and isn't perfect.

Multiple Errors During Inference

Problems:
- Failed when loading LoRA Adapter for inference
- First and second training attempts resulted in repeated "oreferrer" responses
Root Cause:
- Dataset cleaning wasn't optimized
- Base model produced good results, confirming the issue was on the LoRA side
Solution:
- Re-cleaned the dataset and re-executed LoRA training

The issue was resolved after the third training attempt.

Benchmark Errors

Problems:
- All questions failed during HumanEval benchmark testing with LoRA
- Simple tests adapted to training showed good results
Root Cause:
- LoRA trained on 3-space indented code while Python PEP 8 standard is 4 spaces
- Failed to properly incorporate Qiita articles into the dataset
- Determined the issue was with the trained model quality, not the evaluation method
Solution:
- Decided to rebuild from dataset formation, excluding Qiita
- Instructed code modifications for re-formation and re-training accordingly

note info
While I understood intellectually that "dataset quality determines model performance," this failure made me acutely aware of its importance. This was a valuable lesson that could only be gained through practice.

After re-execution, the project was completed successfully. Let me proceed with the accuracy report.

6. Performance Evaluation

Benchmark

Using Bigcode's HumanEval
Executed standard 164 problems
Performance comparison between Code Llama 7B / ELYZA-japanese-Llama-7b / LoRA / GPT-4

Score Comparison

Model comparison of pass@1 scores:

Vertical Bar Chart

Horizontal Line Chart

While GPT-4's score is overwhelming, our tuning successfully improved performance from the base model steadily.
This demonstrates the effectiveness of specialization even with small-scale datasets,
and I believe the results show potential to approach higher-tier models with larger datasets and further refinements.

Next, the pie chart shows the degree of performance improvement from the base model through our tuning:

(Note; Benchmark executions showed score fluctuations of approximately ±3%)

7. Reflection

What Went Well

Completed the Project

The experience of setting a goal and seeing it through to completion was significant.
I was able to build a proper roadmap to the outcome, and feel my approach was generally correct.

Written Entirely by AI

I achieved my goal as planned.
When given appropriate prompts, AI can implement even highly complex problems at a high level.
Understanding architecture and being able to articulate it clearly remains essential for providing appropriate specifications.

Embracing New Knowledge

While I had developed Java applications before, that didn't extend much beyond my existing knowledge.
This project required starting from scratch with new knowledge, but I felt no resistance to learning in new domains.

If generative AI alone can accomplish this, it reinforced a key insight for me: in this new era of creation, "small teams or individuals can rapidly develop valuable services with the right ideas." This realization was a major gain.

Having previously self-taught video editing, I had relatively low resistance to new domains, but this experience further strengthened my ability to approach new challenges from scratch with a positive mindset.

Points for Improvement

Underestimating the Scope

The cause was misjudging the development scale estimation.
This was a medium-scale development of about 2000 lines. For writing larger-scale projects, I should have adopted Claude Code, which excels at parallel development by dividing work into phases and tickets with clear role separation, making debugging and error resolution easier - rather than using Cursor Agent.

While the improvement might seem minimal due to the small learning scale, it was a valuable discovery that even small datasets can show improvements.

Dataset Selection

Using existing datasets made implementation relatively easy and straightforward to form.
The experience of how careless selection can muddy the data was extremely valuable.

Considering actual operation, I feel I could have tried more niche subjects for dataset formation.
Next time, I plan to tackle development with user experience in mind.

Understanding Architecture

My understanding of LLM architecture was insufficient, which seemed to cause frequent errors especially around inference.
With deeper understanding, I could have provided more detailed initial prompts and reduced the amount of rework needed.
In the future, I want to prioritize time for understanding the overall technical landscape before implementation to improve development accuracy.

Challenges

My personal challenge is determining how much AI can be leveraged in actual workplace settings.

Main Challenges:
- Is the configuration suitable for maintenance and operations?
- Is the code free from spaghetti code patterns?
- What about security design?
- AI might not perform well with proprietary frameworks
- Can it be applied when joining ongoing projects?

These challenges are difficult to see in individual development alone, and I'm eager to learn through practical work experience. My next challenge is understanding how to meet workplace requirements such as code maintainability in team development and security levels demanded by products.

Future Applications

By utilizing LLMs, we can adapt to various fields.
The success of LoRA fine-tuning depends on how well we can transform highly unique or specialized fields into quality training data. Therefore, deepening knowledge in structuring domain expertise and dataset design becomes a more important differentiating factor than technical implementation.

Next Steps

Building on the insights and confidence gained from this project, I'm currently challenging myself with more applied development.

I'm currently building an MCP client, applying professional software development practices to its creation.
Specifically, it's an Android app themed around personalization.
I'm documenting this creation process as well and will publish it later.

8. Conclusion

Through this project, I didn't directly acquire the skill of writing LoRA code from scratch by hand.
However, I was able to question the essence of "writing code" and practice a new form of engineering that maximizes the use of AI as a tool to achieve objectives - and this was the greatest gain of all.

It was also a valuable opportunity to test how far I could push my output capabilities when standing in the engineering arena, and I realized that "learning with AI" is an extremely meaningful approach for me.

Even within the AI trend, LoRA fine-tuning is a topic that allows deep learning of architectural principles and approaches, and the modern environment is sufficiently equipped for beginners to get started.

By touching upon the simple truth that "the quality of questions determines outcomes," I gained significant learning in developing "questioning skills" - something often lacking in beginners.

I hope this helps others who are similarly thinking about creating something with AI.
And I myself want to continue challenging further applications and contributions based on this experience.

Thank you for reading to the end.

If you have any opinions or concerns, please feel free to let me know in the comments.
I would like to incorporate them into future articles and activities.