Real-world AI systems aren’t built on tutorials. They start with foundational tools. Here’s how I built my own — and why every serious engineer should too.
🛠️ Problem:
Most Python learners finish courses with throwaway scripts.
I finished mine (Python for Everybody) by building a real system: KRAWLIX — a CLI Knowledge Crawler that fetches, stores, and structures topic summaries like the base layer of an AI assistant.
🚀 Features:
- Pure Python: No external libraries (except sqlite3 and urllib).
 - Fetches summaries from DuckDuckGo & Wikipedia APIs.
 - Stores data as both .txt files and in a local SQLite database.
 - Fault-tolerant, modular, CLI-driven — built for real workflows, not just demos.
 
Full code: GitHub Repo
1️⃣ Project Structure
Your repo isn’t a flat script — it’s real engineering.
Directory layout:
krawlix/
│
├── main.py               # CLI entrypoint
├── crawler/              # Core logic modules
│     ├── fetch.py
│     ├── db_writer.py
│     └── utils.py
├── db/                   # SQLite database(s)
├── summaries/            # Text file outputs
├── data/                 # Input topics.txt and test files
├── failed_topics.txt     # Log for failed fetches
├── README.md
└── ... (tests, demo, etc.)
2️⃣ How It Works
A. The CLI Entry point (main.py)
- Takes an input file (data/topics.txt) with one topic per line
 - For each topic:
- Fetches a summary from DuckDuckGo, Wikipedia (via fetch.py)
 - Saves to both a .txt file and SQLite DB (via db_writer.py)
 - Logs failed fetches
 
 
Code:
import sys
import os
from crawler.fetch import fetch_summary
from crawler.db_writer import create_table, insert_summary, save_summary_to_file
from crawler.utils import get_timestamp
from datetime import datetime
def crawl_topics(topics_file_path):
    """
    this function reads topics from a text file
    and fetches summaries for each of them.
    """
    if not os.path.exists(topics_file_path):
        print("File not found:", topics_file_path)
        return
    create_table()
    topics = []
    # this will get topics from file and append it to 'topics' list
    with open(topics_file_path, "r") as file:
        for line in file:
            line = line.strip()
            if line != "":
                topics.append(line)
    for topic in topics:
        print("\n")
        print("Fetching Summary for: ",topic)
        result = fetch_summary(topic)
        if result:
            result["created_at"] = get_timestamp()
            insert_summary(result)
            save_summary_to_file(result)
            print(f"Summary for {topic} saved in DB")
            filename = result["topic"].replace(" ", "_") + ".txt"
            print(f"{filename} file created\n")
        else:
            print("No Summary found for:", topic)
            with open("failed_topics.txt","a",encoding="utf-8") as fail_log:
                fail_log.write(topic + f", {get_timestamp()}" + "\n")
#manage inputs from CLI
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python main.py data/topics.txt")
    else:
        topics_file = sys.argv[1]
        crawl_topics(topics_file)
B. Fetcher Module (crawler/fetch.py)
- Uses DuckDuckGo API as primary, Wikipedia as fallback.
 - Handles network errors, empty results.
 
Code:
import urllib.request
import urllib.parse
import json
def fetch_summary(topic):
    # Fetch summary for the given topic using DuckDuckGo API
    # Return a dictionary with summary or None
    base_url = "https://api.duckduckgo.com/"
    params = {
        'q': topic,
        'format': 'json',
        'mo_redirect': '1',
        'no_html': '1'
    }
    query_string = urllib.parse.urlencode(params)
    full_url = base_url + "?" + query_string
    try:
        with urllib.request.urlopen(full_url) as response:
            data = response.read()
            json_data = json.loads(data)
            summary = json_data.get("Abstract", "").strip()
            url = json_data.get("AbstractURL", "").strip()
            if summary:
                return {
                    "topic": topic,
                    "summary": summary,
                    "source": "DuckDuckGo",
                    "source_url": url
                }
    except Exception as e:
        print('DuckDuckGo fetch Failed', e)
    #If DuckDuckGo gave us nothing, we will try Wikipedia
    try:
        wiki_base_url = "https://en.wikipedia.org/api/rest_v1/page/summary/"
        query_string = urllib.parse.quote(topic)
        wiki_url = wiki_base_url + query_string
        with urllib.request.urlopen(wiki_url) as response:
            data = response.read()
            json_data = json.loads(data)
            summary = json_data.get("extract", "").strip()
            url = json_data.get("content_urls", {}).get("desktop", {}).get("page", "")
            if summary:
                return {
                    "topic": topic,
                    "summary": summary,
                    "source": "Wikipedia",
                    "source_url":url
                }
    except Exception as e:
        print("Wikipedia fetch failed", e)
    return None
C. Storage Modules
File: crawler/utils.py
from datetime import datetime
def get_timestamp():
    return datetime.now().strftime("%d-%m-%Y %H-%M-%S")
File: crawler/db_writer.py
import sqlite3
import os
from crawler.utils import get_timestamp
DB_PATH = os.path.join("db","krawlix.sqlite")
def create_table():
    # Creates knowledge table if it doesn't exists
    connect = sqlite3.connect(DB_PATH)
    cur = connect.cursor()
    cur.execute('''
        CREATE TABLE IF NOT EXISTS knowledge(
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            topic TEXT,
            summary TEXT,
            source TEXT,
            source_url TEXT,
            created_at TEXT
        )
    ''')
    connect.commit()
    connect.close()
def insert_summary(summary_data):
    # Insert summary into knowledge table
    connect = sqlite3.connect(DB_PATH)
    cur = connect.cursor()
    cur.execute('''
        INSERT OR IGNORE INTO knowledge (topic, summary, source, source_url, created_at)
        VALUES (?, ?, ?, ?, ?)
    ''',
    (
        summary_data['topic'],
        summary_data['summary'],
        summary_data['source'],
        summary_data['source_url'],
        summary_data['created_at']
    ))
    connect.commit()
    connect.close()
def save_summary_to_file(summary_data, folder="summaries"):
    # save summary to text file inside summaries/folder
    if not os.path.exists(folder):
        os.makedirs(folder)
    filename = summary_data["topic"].replace(" ", "_") + ".txt"
    filepath = os.path.join(folder, filename)
    with open(filepath, "w", encoding="utf-8") as f:
        f.write(summary_data["summary"])
3️⃣ How To Run
1. Prepare your input:
Edit data/topics.txt with each topic on a new line.
2. Run:
python main.py
*3. Outputs:
*
- 
summaries/:Each topic as a separate .txt file - 
db/krawlix.sqlite:SQLite DB with all summaries - 
failed_topics.txt:Any failed topics for troubleshooting 
4️⃣ What Sets KRAWLIX Apart
- Modular folder structure: Not a monolithic script, but reusable, maintainable modules
 - No external libraries: Runs anywhere with basic Python 3
 - Error logging & resilience: Failures don’t stop the pipeline
 - ** Built for extension:** Easily add new sources (Google, LLMs), new outputs (Markdown, CSV), or convert to API
 
5️⃣ Lessons Learned & AI Relevance
“The habits that make KRAWLIX robust are the same that make AI systems scale: modularity, clean storage, error handling, CLI-first design.”
Now ready to plug this into RAG pipelines, agent stacks, or wrap with FastAPI.
Built from Python for Everybody principles — but leveled up.
#1FahadShah #python #cli #opensource #sqlite #ai #buildinpublic #scraping #api #web
If you want a full step-by-step walk-through, advanced features, or want to see this project evolve into an API or LLM pipeline — let me know in the comments!
🚀 Follow My Build Journey
- GitHub: github.com/1FahadShah
 - Medium: 1fahadshah.medium.com
 - LinkedIn: linkedin.com/in/1fahadshah
 - Twitter/X: x.com/1FahadShah
 - Hashnode: hashnode.com/@1FahadShah
 - Personal Site: 1fahadshah.com (Launching soon!)
 
I post every new tool, deep-dive, and lesson learned—always with code, always with execution. Got questions, want to collaborate, or building something similar? Drop a comment or DM me!
              
    
Top comments (0)