DEV Community

GokuScraper悟空爬虫
GokuScraper悟空爬虫

Posted on

I Built a Feishu-to-Markdown Tool, and Overworked Office Workers Kept It Alive for Free

I Built a Feishu-to-Markdown Tool, and Overworked Office Workers Kept It Alive for Free

1. The Pain Points of Feishu (Lark)

Feishu (known globally as Lark) is genuinely great for writing, but exporting your work is a nightmare. The official export feature is buried deep in the menus, and the formatting often gets completely messed up. If you want to feed your docs into Obsidian, Notion, or GitHub, you have to spend ages tweaking them manually.

What’s even more annoying are the public documents shared by others. The web version restricts copying and won’t let you download at all. If you want to save it locally for later reading, you're just out of luck. What are you supposed to do, transcribe it by hand?

So, I built a web app called the Goku Feishu-to-Markdown Exporter. You just paste a link, and it downloads a Markdown file directly, automatically grabbing the images from the document and zipping them all up.

No extensions to install, no login required. Just open the page and use it.

To be honest, I didn't write the core conversion code. I found an MIT-licensed open-source project on GitHub, wrapped it in a UI using Streamlit, and threw it up on the cloud. Rather than asking folks to clone the repo and set it up themselves, it made more sense to just provide an out-of-the-box online tool. And of course, my wrapper code is open source too.

2. Surprisingly, People Kept Showing Up

When I tossed this site onto the cloud, I knew about a specific catch.

image-20260508141155002

Streamlit Cloud has a strict policy: if your app goes 24 hours without traffic, it gets aggressively put to sleep. The next person to open it has to sit there like an idiot for a minute or two while the container spins back up.

Hey, you get what you pay for when you're on the free tier.

So, I didn't have high expectations at first. I figured the site would spend most of its life napping. If someone used it, lucky them; if not, it would just lie dormant.

Then, after it had been running for a while, I peeked at the backend analytics and froze.

It hadn't gone to sleep. Not once.

It was online and stable every single day. Instant load times whenever I checked, as if the 24-hour sleep policy didn't even exist.

But wait—I definitely didn’t write a keep-alive script, nor did I pay for a premium tier. Who the hell was constantly hammering away at this scrappy little site that doesn’t even have a login page?

3. They Were All Real People

When I opened the backend visitor logs, I had to laugh.

image-20260508134622744

Anonymous Altocumulus, Levitating Danish... strings of random aliases drifting through the server like a bunch of cyber ghosts.

I didn’t come up with those names. To protect privacy, Streamlit dynamically assigns these quirky pseudonyms to unauthenticated visitors. If you check the backend without user accounts, all you see is this parade of "anonymous pastries" and "unidentified weather phenomena" popping in and out.

At this point, you might wonder: Was it just search engine crawlers artificially bumping up the traffic?

Impossible.

Streamlit uses a Single Page Application (SPA) architecture. The UI is fully rendered via JavaScript, and communication runs entirely over persistent WebSocket connections. Your run-of-the-mill crawler—like a Python requests script, a curl command, or any basic scraper—will only ever see an empty HTML skeleton. They physically can't establish a WebSocket session.

In Streamlit's analytics, no session means no visit. Crawlers can't even get through the front door, let alone get assigned a name.

So, this bizarre list of "Danish pastries" and "Altocumulus clouds" in my backend? Every single one of them was a real, live human being.

The ones creeping in at 2 or 3 AM were programmers still wrestling with bugs, desperately trying to back up some technical specs from Feishu. The ones showing up at 6 or 7 AM were product managers rolling out of bed to organize their knowledge bases, converting a competitor's public docs into Markdown for their own notes.

The mismatched time zones combined with the heartbreaking reality of standard around-the-clock crunch culture meant that every few hours, guaranteed, an actual human body was opening the site, pasting a link, and hitting download.

They probably had no idea Streamlit has a 24-hour hibernation policy. They didn't know they were acting as "keep-alive" pings. They just needed the tool, used it, and left.

But every single click reset that 24-hour countdown.

No scripts, no black magic. Just an army of exhausted office workers, driven by pure necessity, manually keeping my server awake.

4. It Accidentally Became a Cyber Confessional

This tiny site has an incredibly stable Daily Active User (DAU) count of around 10 people. The number is so small that I couldn’t be bothered to build an actual user system. No database, no auth, no isolation. As long as it works, we’re good.

But that led to a totally unexpected side effect.

Because I never bothered to clear the input field state, whenever the next person opened the page, they’d see the leftover Feishu document link pasted by the previous user.

Sometimes I’d go to the site and see a completely foreign link sitting in the text box. I'd click it—sometimes it was a public manifesto from an influencer, sometimes an open document from a startup team, or maybe just someone’s public reading notes.

On this cold, free-tier server, that residual link became a strange sort of beacon. No chatboxes, no avatars, no social features—but when you saw it, you knew you weren't the only one grinding over documents in the middle of the night.

A tiny, leftover bug had turned into a digital safe haven.

Occasionally, I'd see that someone had pasted a link to a private, permission-locked document. Obviously, that extraction would fail. My tool can only grab public docs; anything behind a login wall is off-limits.

But they pasted it anyway.

Which means they really tried. Maybe out of sheer desperation, or just holding onto a "what if it actually works" kind of hope. Seeing links like that felt both funny and a little sad—just another poor soul driven mad by Feishu's export restrictions.

5. Turn Software into SaaS, and You Win

Honestly, there are plenty of open-source Feishu exporters out there. A quick GitHub search brings up several.

The problem is, almost all of them require you to jump through hoops. You either have to clone the repo, set up a Python environment, and run CLI commands, or you have to download some executable and configure a bunch of settings. Realistically, very few people are going to endure that workflow. Most users see the words "Please install dependencies" and immediately close the tab.

All I did was one thing: skip all those steps.

No downloads, no configuration, no installs. Open the page, paste the link, hit download, done. I turned a piece of software directly into a SaaS.

I didn't write the core code, and the UI is as bare-bones as it gets. But it has one massive advantage: if you show it to someone with zero technical background, they instantly know how to use it.

Put simply: keep the complexity for yourself, and give the simplicity to the user.

If you can find the pain point that forces someone to suffer for 10 minutes every day, and whittle that down to 10 seconds, you’ve built something that will survive on its own without a dime of marketing.

Wrapping Up

Looking back, the reason this sketchy little site didn’t get killed by Streamlit’s 24-hour sleep policy wasn’t because I summoned any keep-alive dark magic. It was simply because of one thing: it’s stupidly easy to use.

Finding pain points can be complex, or it can be simple. You just have to locate the dirtiest, most tedious chore that people spend 10 minutes a day complaining about, hack together an MVP as fast as humanly possible, throw it online, and make the experience zero-friction. You don't have to manage the rest.

Real demand will keep it alive for you.

Those overworked tech folks will use their own physical presence to stomp your server awake in the middle of the night. That pack of Anonymous Altocumulus and Levitating Danish had no idea they were doing a good deed, but every time they needed the tool, clicked in, and exported a file, they gave the container another 24-hour lease on life.

The server doesn’t sleep, because the users' bare necessity pumps oxygen into it every single day.

Thanks

Thanks for tuning in, folks! If you found this interesting, don't hesitate to like, share, and spread the word!

To catch my posts as soon as they drop, don't forget to star/bookmark this page so you don't lose it.

Alright, that's all for today.

Top comments (0)