To help you get started working with XPath, this section will help you to build a basic understanding of XPath quickly and introduce its application in the web scraping tool, Octoparse.
Table of content:
- What is XPath？
- How to write an XPath？
- What is Octoparse XPath Tool？
1. What is XPath？
XPath(XML Path Language) is a query language for selecting elements from an XML/HTML document. It can help you find an element from the whole document precisely and quickly.
Web pages are generally in a language called HTML. If you load a web page on a browser(Chrome, Firefox, etc), you can easily access the corresponding HTML doc by hitting the F12 key. Everything you see on the webpage can be found within the HTML, such as images, blocks of text, links, etc.
This image is apart of an HTML doc. It's easy to notice that there are 3 levels of the element in this HTML section.
Level 1: Bookstore
Level 2: Book
Level 3: Title, author, year and price.
Text with angle brackets() is called a tag. An HTML element usually consists of a start tag and an end tag, with the content inserted in between.
Content goes here...
XPath uses "/" to connect tags of different levels from the top to the bottom to specify the location of an element. For our example, if we want to locate the element "author", the XPath would be like:
That is pretty similar to a file structure as the below image shows.
We can conclude that XPath is the address for locating a precise place in an HTML doc.
2. How to write an XPath？
Writing an XPath is easy if you understand the logic of an HTML and the grammars of XPath.
Sounds easy? Yet it takes some time to learn. Here are some useful tutorials for beginners, as least for me.
To make things easier for you, here is a cheat sheet of helpful XPath expressions to help you quickly target any elements in the HTML.
*Note that the attribute and text value are all case-sensitive.
*For a more exhaustive list of XPath expressions, check this out.
3. What is XPath Tool
We know the basic rules of writing an XPath and we can start writing it. Congratulation!
Yet, how can we know whether the XPath is correct or not？ In this case, we should use an XPath tool to help with verification.
I would love to recommend 2 XPath Tools.
Octoparse offers an XPath tool to help you write XPath easily.
- Chrome Add-on: XPath Helper
XPath Helper is a superb chrome extension that allows you to look up XPath by simply hovering over the element from the browser. You can also edit the XPath query directly in the console. You'll get the result(s) immediately so you know if your XPath is working correctly or not.
That's the end of the whole article. If you have better ideas on how to learn XPath more effectively, please leave the comment below!
Claim your page on DEV before someone else does
Level up every day