DEV Community

Vitalii K
Vitalii K

Posted on

1

Approaching Refactoring with Thinking models

While writing some MVP seems to easy using modern AI tools, working on real world project remains highly senior task.
Lets see how Thinking models like O1 and Gemini-thinking can be useful on such complex tasks as analysis and refactoring of real world project

Why doing refactoring with AI?

  1. For AI-developed projects: While Cursor Composer can create whole features, it tends to forget about the bigger picture, such as architecture and system design. This article shows how thinking models can be employed to address this weakness by splitting refactoring into a list of smaller tasks that Composer should be able to handle.

  2. For regular projects: AI can do a full project code review in seconds and provide meaningful improvement suggestions from the 3-D person perspective. It can act as a senior team member who will explain his reasoning, answer your questions, and propose solutions.

Let's do it

1. Selecting Model to Use:

What are we looking for from our model?

  • Large context window, so you can feed your entire code base or a big chunk of it
  • Thinking capabilities. Ideally, we want it to be able to understand our project
  • It is a single-time job, so we can employ the best available staff, and it should not be that expensive

In my case, I would use O1 and Gemini-2.0-flash-thinking-exp. They are the best available on the market and are also integrated into Cursor.

2. Selecting project

I’ve randomly selected screenshot-to-code. While it is definitely not a linux repo, it is old enough to already face tech debts and far more complex than those MVP projects people are building with AI.
git clone https://github.com/abi/screenshot-to-code

3. Generating context for model:

We want to feed the entire project to our model as context. The simplest wait is to concatenate everything into one file, and there is already one for that code2prompt.

pip install code2prompt
code2prompt --path ./screenshot-to-code --output project_summary.md

Output will look like project_summary.md

4. Checking context size

We want to verify if our model can read such a large context file.
pip install token-count
token-count --file project_summary.md

For me it says 332387 tokens, when O1 can read 200k tokens and Gemini 1M tokens.

5. What if code base is too big?

I would not say that it is easy to work with big projects using AI, and I haven’t seen it done by somebody else, but there are at least 2 things I would try:

  • Summarize code first: A very well-known approach in data analysis that works when applied correctly. Why not replace implementation with just API documentation or employ another AI to give a short explanation?

  • Per module analysis: As engineers, we already invented many ways to work with complex systems: application layers, feature modules, libraries, etc. How about defining the expected system design and checking if a module fits it?

5. Prompt engineering

Prompt engineering is another beast. You can check some open source. But I would suggest using your 🧠 to think and explain what you want as a result and what specific context you know about the project. AI can do a lot, but it can not read your mind

After playing a bit, I've come up with this one:

Here's our current code base @project\_summary.md. Can you propose improvements or a refactoring plan? Give me a bullet list of such improvements with priorities (High/Medium/Low) and a short explanation of why this improvement is needed and what has to be done for each item.

6. Generating refactoring plan

I've used a new Cursor chat for each model with the same prompt and saved responses to files like *_suggestions.md.

Reviewing Results:

Below is a table of summarized improvements and my subjective ranking

Improvements O1 (Priority) Gemini-2.0-flash-thinking-exp (Priority) Subjective Ranking
Centralize Configuration Files ❌ βœ… (High) (Medium) Makes sense, Configs will be easier to support in the future
Improve Component Organization in Frontend βœ… (High) βœ… (High) (High) Definitely worth doing, it will only become worth in the future
Standardize Naming Conventions / Linter Rules βœ… (Medium) βœ… (Medium) (High) A good linter should be able to solve it
Group Backend Routes ❌ βœ… (Medium) (Medium-Low) Only if we are planning to add more routes
Review Utility and Helper Functions βœ… (Medium) βœ… (Low) (Medium) Why are there 2 places for utility functions at all?
Consolidate Test Directories βœ… (Low) βœ… (Low) (Medium) Definitely YES
Add Consistent Error Handling βœ… (High) ❌ (Medium) Good point, especially if we are aiming for good code quality and not just a demo
Improve TypeScript Strictness βœ… (High) ❌ (Low) Up to the taste of developers. TypeScript strictness is kind a painful already for some people
Extract Reusable Layout Components βœ… (Medium) ❌ (Medium) Sounds good, but need to take a look at what can be reused
Improve Comments and Documentation βœ… (Low) ❌ (Medium) It is always a trade-off between development speed and documentation, if there was a way to employ AI to do it
Optimize for Performance Where Relevant βœ… (Low) ❌ (Low) Only spend time on it if there is a clear bottleneck
Enhance Testing Coverage βœ… (Low) ❌ (Medium) Will help to reach product maturity. Maybe let AI do it also, never seen a developer willing to add tests when a feature is already working

How viable are those suggestions?

Pretty much almost all of them make sense (at least for me) and are worth considering depending on my project needs.

Was it able to understand the project?

Hard to say for sure, even the amount of suggestions is different. Both models highlighted the same 4 areas of improvement:

(High) Component Organization in Frontend

(Medium/High) Standardize Naming Conventions Linter Rules

(Medium/Low) Review Utility and Helper Functions

(Low) Consolidate Test Directories

So there is definitely some consistency and depth of understanding of the project.

Also, I guarantee you that if two developers were asked the same question, they would have two quite different lists. In any case, you will need to consolidate those answers and prioritize them based on team priorities, workload, and upcoming goals.

Conclusion

So, did I get what I needed? The answer is largely yes: I gained a clear, prioritized refactoring roadmap and saved time scoping out big changes.
Refactoring remains an iterative process grounded in your team’s experience, code reviews, and business goals. In that sense, AI isn’t a silver bulletβ€”it’s a catalyst.
Like an experienced team member, πŸ€–βœ¨ can offer an outside perspective, explain his reasoning, and convert them to action plans.
Use it wisely, and you’ll find yourself more in control of your codebase than ever before.

API Trace View

Struggling with slow API calls? πŸ•’

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more β†’

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

πŸ‘‹ Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay