TL;DR:
I "vibecoded" a custom Python tool to convert legacy .fb2 e-books into structured Markdown (perfect for LLMs/RAG/Obsidian) and Plain Text. No ads, no bloat, just 150 lines of code.
- The App: Live Streamlit Demo
- The Code:
FitHappensML / fb2-to-md-converter
A lightweight CLI and Web tool to convert FB2 books to Markdown or Plain Text. Includes a built-in Streamlit reader and smart formatting support.
FB2 to TXT/MD Converter & Reader
π Try the Live Demo: fb2-to-md-converter.streamlit.app
βοΈ Read the Story behind the code: Vibecoding Your Way Out of Format Hell (Medium)
This project is a Python-based utility for converting
.fb2(FictionBook) files into.txt(plain text) or.md(Markdown). The tool offers two interfaces: a user-friendly web UI built with Streamlit for reading and converting, and a command-line interface (CLI) for fast processing and automation.β¨ Key Features
-
Dual Interfaces
- π¨ Web UI (Streamlit): Upload your files, read books directly in the browser, and download the result in your desired format.
- βοΈ Command-Line Interface (CLI): Quickly convert files from your terminal, perfect for scripting and batch processing.
- Smart Formatting: An option to convert FB2 tags (like subtitles and emphasis) into corresponding Markdown syntax.
-
Dual Export Formats: Save your books as clean
.txtor as formatted.mdfiles. - Built-in Readerβ¦
-
Dual Interfaces
Hello, fellow builders!
If you're anything like me, you probably have a digital hoard of books. In my case, it's a massive collection of .fb2 (FictionBook) files. Solid format, xml-based, widely supported... until you need to feed it into a Large Language Model (LLM).
Here's the problem: LLMs eat text, not XML tags.
I needed to convert my entire library into clean, structured Markdown. I needed headers to actually be headers (### Chapter 1), and emphasis to be italics (*wow*), so the model understands the semantic structure of the narrative.
I looked at existing tools.
- The Desktop Apps: Bloated, require installation, often Windows-only 90s relics.
- The Online Converters: "Upload clean_code.fb2... waiting... Download your file after watching this 30s ad". No thanks.
- The Scripts: Most just strip all tags blindly, turning a beautiful dialogue into a wall of text.
If you are a developer in 2026, you don't hunt for software. You vibecode it.
It is faster to tailor a bespoke suit of a script than to shop for ill-fitting off-the-rack solutions. Plus, when you build it, you own the pipeline.
So, I built my own FB2 to Markdown converter. It has a CLI for batch processing and a Streamlit UI because sometimes I just want to read a chapter in the browser.
Here is how I did it, and how you can do it too.
The Strategy: FB2 is just XML
Don't overcomplicate it. An FB2 file is just an XML file with a specific schema. We don't need a heavy e-book library; we need BeautifulSoup.
Here is the core logic. We parse the XML, look for specific tags (<subtitle>, <emphasis>), and map them to Markdown.
The Core Converter
I created a converter.py. The trick is handling nested tags. A paragraph <p> might contain <emphasis> inside it.
from bs4 import BeautifulSoup
from bs4.element import Tag
def _get_formatted_text(tag: Tag) -> str:
"""Recursively process tags to keep italics and bolding."""
parts = []
for item in tag.children:
if isinstance(item, Tag):
if item.name == 'emphasis':
parts.append(f'*{item.get_text(strip=True)}*')
else:
parts.append(_get_formatted_text(item))
else:
parts.append(str(item))
return "".join(parts)
This recursive function is the secret sauce. Instead of text_content() which flattens everything, this preserves the vibe of the text.
Then, the main loop allows us to choose between "Raw Text" and "Smart Formatting":
def convert_fb2_to_txt(fb2_content: str, smart_formatting: bool = False) -> str:
soup = BeautifulSoup(fb2_content, 'lxml-xml')
text_parts = []
# Extract Metadata (Title, Author)
description = soup.find('description')
# ... extraction logic ...
# Extract Body
body = soup.find('body')
for element in body.find_all(['p', 'subtitle', 'empty-line']):
if element.name == 'p':
if smart_formatting:
text_parts.append(_get_formatted_text(element).strip() + '\n\n')
else:
text_parts.append(element.get_text(strip=True) + '\n\n')
elif element.name == 'subtitle':
# Boom: Semantic headers for the LLM
text_parts.append(f"### {element.get_text(strip=True)}\n\n")
return "".join(text_parts)
The Interface: Streamlit for Instant Gratification
I love CLI, but sometimes I want to drag-and-drop. Streamlit is perfect for this. It takes 5 minutes to build a UI that looks decent.
In app.py, I process the file and offer two flavors of download:
import streamlit as st
from converter import convert_fb2_to_txt
st.title("π FB2 Reader & Converter")
uploaded_file = st.file_uploader("Upload .fb2", type=['fb2'])
if uploaded_file:
# Read the bytes
content = uploaded_file.getvalue().decode('utf-8')
# Dual Conversion
plain_text = convert_fb2_to_txt(content, smart_formatting=False)
markdown_text = convert_fb2_to_txt(content, smart_formatting=True)
# The Reader View
st.markdown(markdown_text)
# Sidebar Downloads
st.sidebar.download_button("Download .md", markdown_text, file_name="book.md")
st.sidebar.download_button("Download .txt", plain_text, file_name="book.txt")
This gives me immediate visual verification. I can see if the <subtitle> tags are actually rendering as headers before I commit to converting my whole library.
The CLI: For the Serious Batching
Finally, cli.py. Because I'm not going to drag-and-drop 500 books.
import argparse
from converter import convert_fb2_to_txt
# ... setup format args ...
if args.format == 'md':
result = convert_fb2_to_txt(content, smart_formatting=True)
else:
result = convert_fb2_to_txt(content, smart_formatting=False)
Now I can just run:
python cli.py "War_and_Peace.fb2" -f md
And get a perfect Markdown file ready for RAG (Retrieval-Augmented Generation) or fine-tuning.
Why "Vibecode" it?
Could I have found a tool to do this? Probably.
Would it handle the specifics of <subtitle> tags nested in <section> blocks exactly how I wanted? No.
By spending an hour writing this, I now have a tool that is:
- Fast: No uploads, no ads.
- Private: My books stay on my machine.
- Correct: The output is formatted exactly for my LLM's consumption.
In the era of AI, the ability to quickly whip up data transformation scripts is a superpower. Don't be afraid to reinvent the wheel if the tire on the existing wheel is flat.
Happy coding!

Top comments (0)