DEV Community

amekusa03
amekusa03

Posted on

Rescuing 216 Pages from the GeoCities Era: How I Built an HTML-to-Blogger Tool

WeCoded 2026: Echoes of Experience 💜

This is a submission for the 2026 WeCoded Challenge: Echoes of Experience

I want to transfer the HTML format to Google Blogger.

Introduction

The outdoor club's website (which was called a homepage back then) was archived when GeoCities shut down. The last update was in April 2014.
If it had become completely inaccessible, I could have accepted it, but it's just too sad that it's become an archive.

Okay, I'll do this!
I declared to the old site administrator (a friend) "It's okay, let me do it."
For now, I'll just go with Vercel's Hobby for the new site.
Then I checked the old site's HTML
. What? 216 pages.
Doing this by hand is impossible!
The old site was beautifully made by my friend while he was learning HTML.
I tried cleaning it with AI, but my instructions were bad and it just didn't work.
With the help of another friend, I aimed to create a stylish new site using React and other things in VS Code.
While doing all that, Windows started to become unstable and eventually crashed.
The data was still there, so I reinstalled Windows and DevStudio. I lost the product ID. It 's completely over .
So I installed Ubuntu, installed VS Code, and decided to develop with Python. (I've never used it before)
While the installation was in progress, I also changed the new site's operational policy.

  • Members can post
  • You can also upload videos. With two additional requirements, the site has also been changed to Google Blogger ♫

Getting Started with Python

I started with 'Hello python' and created a CLI for tag cleaning of HTML files.
Next, I tried posting to Blogger using the API, and I'm starting to get the hang of it.

Software Requirements Definition

Ruthless decision

  • HTML history management
  • Cleaning HTML
    • I want to manage the design entirely in Blogger, so I'm removing font and color settings.
  • Keyword extraction and registration
    • Blogger allows you to set keywords for each post, so I want to use words that match the text in the HTML as keywords (e.g., kayaking, Bouldering).
  • Location extraction and registration
    • This is also a feature that can be added to each Blogger post (Akihabara, Mt. Fuji, etc.).
  • Remove Exif data from images and add watermarks.
  • Upload articles and images
  • GUI

    implementation

    Just as you all expected, I'm totally hooked.

  • I was struggling to get used to Ubuntu's IME, so I installed Fcitx5 and managed to work around the problem.

  • I discovered that the Blogger API doesn't have image-related features → I established and standardized a somewhat easier manual + automatic linking procedure for images.

  • I couldn't get used to Python nesting → I ended up checking the nesting with my finger.

  • Library version issue occurred → Misunderstood Python virtual environment (venv) → Fumbled around trying to create an environment with virtualBox → Realized it was in the terminal and almost went blank.

  • I accidentally sent about 200 Blogger API requests without using a timer → I received an account lock warning email → I sent an apology email to Google and my account was restored the next day.

  • I installed Wine and Excel (MS-Office) to aggregate regional data → got stuck in cell input mode and switched to LibreOffice → discovered nominatim geocoding and solved the problem.

  • Trying to run a virtual Windows using insurance → Too slow. Insufficient HDD space. Ultimately, the license is too expensive, so I gave up.

  • I asked the AI ​​to do various things, but it didn't understand the language specifications and approved them haphazardly → I ended up having to rewrite almost everything.

  • To create a GUI with Tkinter, you write the code → The base is written by AI, and then you just stare at the source code.

  • Tracing Python arguments → Why isn't copy running? → Wait, even queue instances are referenced when you put them?! And so on, intense debugging.

  • I can't do it well, so I end up pressing the keys too hard → My family tells me it's too loud, and I almost cry.

Reaffirming established theories

When it comes to technology, you either invest time, money, or wisdom.
For me, someone who lacks wisdom, it seems it was time.
However, I think I gained a lot.
You can't know the taste of pudding until you try it.
Ubuntu is wonderful. I dabbled in it a little for work, but I didn't know about the community at that time. Ultimately, what's important is human connection.

I would appreciate your help with debugging.

I've tried to code it so that it should work on Windows and Mac as long as you're using a Python environment.
It requires Google API authentication credentials, which is a bit of a hassle, but I'd appreciate it if you could debug it.
Even just a review would be encouraging.
Please feel free to point out any issues with the Python coding!

Introduction:

Video:

code:

GitHub logo amekusa03 / html-blogger

htmlからGoogle Bloggerへ変換とアップロードをサポートします

HTML to Blogger Ver0.98

A desktop application that automatically processes local HTML files and images and posts them to Blogger.
It performs HTML cleaning, adds watermarks to images, assigns keywords and location information, and uploads to Blogger.

Qiita Article

Explanation article for this tool (Qiita)

Introduction Video

Introduction Video

Main Features

  • HTML Cleaning: Removes unnecessary tags and normalizes formatting for posts.
  • Image Processing: Removes EXIF data and adds watermarks.
  • Metadata Assignment: Automatically adds keywords (search tags) and location information (georss tags) by analyzing the article content.
  • Blogger Upload: Uploads image links and articles as drafts using the Blogger API.
  • GUI Operation: User-friendly GUI with progress visualization and error recovery features.

Processing Flow

Processing is executed in the following order:

  1. import_file.py: Imports files from the source folder to the working folder.
  2. serial_file.py: Converts file names into sequential numbering format.
  3. clean_html.py: Cleans up HTML…

Top comments (0)