Introduction
XML and HTML are everywhere—from APIs to scraped websites. In this post, I’ll show you how to use lxml, a powerful and fast Python library for parsing and manipulating XML/HTML.
Installation
pip install lxml
Parsing XML
from lxml import etree
xml_data = '''<root><item>One</item><item>Two</item></root>'''
root = etree.fromstring(xml_data)
for item in root.findall('item'):
print(item.text)
Parsing HTML
from lxml import html
html_content = '<html><body><h1>Hello</h1></body></html>'
tree = html.fromstring(html_content)
heading = tree.xpath('//h1/text()')
print(heading[0]) # Output: Hello
XPath Basics
Explain how XPath is used to select nodes:
links = tree.xpath('//a/@href')
Error Handling & Best Practices
- Use try/except
- Validate structure before parsing
Real-world Use Cases
- Scraping
- Working with config files
- Parsing API responses
Conclusion
lxml gives you superpowers for XML and HTML parsing. Whether you're a beginner or advanced dev, it’s worth mastering!
Top comments (1)
This is what i want, thanks for posting this