DEV Community

YURIIDE
YURIIDE

Posted on

BeautifulSoup: REPLACEMENT CHARACTER

BeautifulSoup: Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER

Use UnicodeDammit, more https://www.crummy.com/software/BeautifulSoup/bs4/doc/#unicode-dammit

self.bs = BeautifulSoup(
    UnicodeDammit(
        content, 
        ["latin-1", "iso-8859-1", "windows-1251"]
    ).unicode_markup,
    "html.parser")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)