DEV Community

filtede98
filtede98

Posted on

Stop using BeautifulSoup: Convert any webpage to clean Markdown in 1 second

If you're still doing this:


python                                                                                                             
  from bs4 import BeautifulSoup                                   
  import requests                                                                                                       

  response = requests.get("https://example.com")                                                                        
  soup = BeautifulSoup(response.text, "html.parser")              

  # Remove scripts, styles...                 
  for tag in soup(["script", "style", "nav", "footer"]):
      tag.decompose()

  text = soup.get_text()                      
  # Now clean up whitespace...                                                                                          
  lines = (line.strip() for line in text.splitlines())            
  text = '\n'.join(line for line in lines if line)                                                                      

  ...you're working way too hard. And you're losing all the structure — headings, tables, code blocks, links — gone.    

  There's a better way

  One API call. Any URL. Clean Markdown back in under 1 second.   

  curl -X POST https://wtmapi.com/api/v1/convert \                
    -H "x-api-key: YOUR_KEY" \                                                                                          
    -H "Content-Type: application/json" \ 
    -d '{"url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map"}'          

  What you get back                       

  Instead of a blob of plain text, you get structured Markdown:                                                         

  # Array.prototype.map()                                                                                               

  The **map()** method of Array instances creates a new array
  populated with the results of calling a provided function                                                             
  on every element in the calling array.                          

  ## Syntax                               

  map(callbackFn)                                                                                                       
  map(callbackFn, thisArg)

  ## Examples                                                     

  const numbers = [1, 4, 9];                  
  const roots = numbers.map((num) => Math.sqrt(num));
  // roots is now [1, 2, 3]

  Headings, code blocks, bold, links, tables — all preserved.

  BeautifulSoup vs WTM API                                                                                              

  ┌─────────────┬─────────────────────────┬───────────────────────────────┐                                             
  │             │      BeautifulSoup      │            WTM API            │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Output      │ Raw text                │ Structured Markdown           │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Headings    │ Lost                    │ Preserved (h1-h6)             │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Code blocks │ Lost                    │ Preserved with language hints │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Tables      │ Lost                    │ Converted to Markdown tables  │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Links       │ Lost                    │ Absolute URLs preserved       │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Setup       │ 10-50 lines of code     │ 1 API call                    │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Speed       │ Depends on your code    │ < 1 second                    │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Maintenance │ You maintain the parser │ Zero                          │                                             
  └─────────────┴─────────────────────────┴───────────────────────────────┘

  Python example                                                  

  import requests                                                 

  response = requests.post(                                       
      "https://wtmapi.com/api/v1/convert",
      headers={                                                                                                         
          "x-api-key": "wtm_your_key",
          "Content-Type": "application/json"                                                                            
      },                                                          
      json={"url": "https://en.wikipedia.org/wiki/Mars"}
  )                                                                                                                     

  data = response.json()                                                                                                
  markdown = data["data"]["markdown"]                             
  print(f"Got {data['data']['length']} chars in {data['meta']['response_time_ms']}ms")

  Works great with LangChain too              

  pip install langchain-wtmapi                                                                                          

  from langchain_wtmapi import WTMApiLoader                                                                             

  loader = WTMApiLoader(                                          
      urls=["https://docs.python.org/3/tutorial/"],                                                                     
      api_key="wtm_your_key",                 
  )                                                                                                                     
  docs = loader.load()                                            
  # Ready for your RAG pipeline                                                                                         

  When to still use BeautifulSoup                                                                                       

  To be fair, BeautifulSoup is still great when you need to:                                                            
  - Extract specific elements (e.g. all prices on a page)
  - Parse XML/RSS feeds                                                                                                 
  - Work offline without API calls                                                                                      
  - Have full control over the parsing logic

  But if you just need web content as Markdown — for RAG, content migration, documentation archival — an API call is
  simpler, faster, and gives you better output.                                                                         

  Try it free                                 

  Live demo at https://wtmapi.com — 3 free conversions without signing up. Free tier: 50 calls/month.

  What do you think? Would love to hear what URLs you test it on.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)