DEV Community

Cover image for BeautifulSoup: REPLACEMENT CHARACTER
YURII DE.
YURII DE.

Posted on • Edited on

13 3

BeautifulSoup: REPLACEMENT CHARACTER

BeautifulSoup: Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER

Use UnicodeDammit, more https://www.crummy.com/software/BeautifulSoup/bs4/doc/#unicode-dammit

self.bs = BeautifulSoup(
    UnicodeDammit(
        content, 
        ["latin-1", "iso-8859-1", "windows-1251"]
    ).unicode_markup,
    "html.parser")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more