My Intro to XPath! (Guest Starring XML)

Header image by Bob Jenkins on Wikimedia

Why the interest in XPath?

Once more I’m reminiscing about conversations with programmers I’ve known. I seem to do this a lot.

Today, I’ve remembered being encouraged to learn XPath. I was even given a link to a game to entice me further, called XPath Diner. Despite having beaten it multiple times (it’s fun!), I've never actually sat down to research what XPath is or it’s uses.
It's past time that I change this!

What is XPath XML?

XPath stands for XML Path Language, and XML stands for Extensible Markup Language. Before diving into XPath, we need to learn a little bit about XML. XML is written in both a human and machine readable format that is similar to HTML except it has no predefined tags, you create your own!

All XML data is stored as plain text, which makes it accessible on just about any kind of hardware or software. This is super handy because there are multitudes of unique file & format types for similar kinds of data. Such as different programs having their own custom file types for document writing, spreadsheet building, illustration rendering, etc.

There’s also some limits on what file types can be opened between different Operating systems without finding special conversion software.
So, say you need to send files to a client, or coworker, who uses not only a different operating system, but also a different program. This has the risk of data loss and other time consuming troubleshooting issues. But with XML, everything is written, stored, and transferred as a basic plain text file.

However, if you open up the XML file you’ve written, or received, it will not be formatted, you will see the written XML code. So, while an XML can be written in a basic text editor, you must find, or write, a program to translate it into its structured form.
XPath… does not do this. But it’s still important!

What is XPath

Once more, XPath stands for XML Path Language. We now know what XML is, but the ‘Path’ part is a reference to how it uses Path Notation (like a URL to a website) to navigate XML documents.

XPath was established by the World Wide Web Consortium in 1999 with version 1.0. Since then, version 2.0 was released in 2007 & 2010, version 3.0 in 2014, and 3.1 in 2017. It was initially created “to provide a common syntax and semantics for functionality shared between XSL Transformations and XPointer.” ((XML Cover Pages))

XPath is a Markup Language specifically for querying and navigating XML documents in order to find & retrieve data. Though you can view your XML documents code in your preferred text editor, and search it just by scrolling. Scanning pages & pages of code then copying the data elsewhere is just not efficient.

XPath treats XML documents like node trees, with the top element being referred to as the ‘root’. There are seven kinds of “nodes” it takes into consideration: attribute, comment, element, namespace, processing instruction, text, and the root.

XPath traverses the node filled trees of the XML forest and allows you to target specific XML elements. Whether by name or the nth element that's been given an multiuse identifier. You can do some text manipulation, such as changing the case from Upper to Lower and vice versa. Or update the values of fields in a message.

XPath Syntax & Code Examples

Here’s is a XML example I made to experiment with XPath expressions:

  <!-- Custom tags -->
<critter_sanctuary>

<!-- Custom Class/Ids/However you want to refer to them   -->
  <critter_roster id="tenants">  

<critter_species category="bunnies">

  <critter_stats id="001">
    <critter_name>Pippington</critter_name>
    <critter_age>5</critter_age>
    <critter_breed>Netherland Dwarf</critter_breed>
    <critter_markings>Black with Silver flecks</critter_markings>
  </critter_stats>

  <critter_stats id="002">
    <critter_name>Fluffsworth</critter_name>
    <critter_age>6</critter_age>
    <critter_breed>Lionshead</critter_breed>
    <critter_markings>Orange with a cream underbelly</critter_markings>
  </critter_stats>

  <critter_stats id="003">
    <critter_name>George</critter_name>
    <critter_age>1</critter_age>
    <critter_breed>Mixed</critter_breed>
    <critter_markings>Milk chocolate brown with dark chocolate spots</critter_markings>
  </critter_stats>

</critter_species>



<critter_species category="turtles"> 

  <critter_stats id="004">
    <critter_name>Bowser</critter_name>
    <critter_age>35</critter_age>
    <critter_breed>Koopa</critter_breed>
    <critter_markings>Orange with a green shell and red hair</critter_markings>
  </critter_stats>

  <critter_stats id="005">
    <critter_name>Michelangelo</critter_name>
    <critter_age>15</critter_age>
    <critter_breed>Ninja</critter_breed>
    <critter_markings>Bright green with a firest green shell and orange mask</critter_markings>
  </critter_stats>

  <critter_stats id="005">
    <critter_name>Crush</critter_name>
    <critter_age>150</critter_age>
    <critter_breed>Sea Turtle</critter_breed>
    <critter_markings>Green with a brown shell and spots </critter_markings>
  </critter_stats>

  </critter_species>   

    </critter_roster>

</critter_sanctuary>

Let’s say that someone works for a pet daycare facility, and for roll call reasons they need a list of every name of the day's attendees.
This will select every value stored in every tag:

//critter_name

Perhaps you need a list of all the hamsters, then this will search for the tag with the Hamsters category and return it’s information:

/critter_sanctuary/critter_roster/critter_species[@category='Hamsters']

Or you just need to know which pets are the ones with the Ids 002 & 004:

/critter_sanctuary/critter_roster/critter_species/critter_stats[@id='002' or @id='004']

With the above examples you can really see how an XPath expression is like a URL. They are also examples of using some XPath selectors.
‘/’ A forward slash is being used to select elements starting from the root node then each specified child element.
‘[ ]’ Brackets are used to include ‘Predicates’, which are used for finding nodes with specific values.
‘@’ is used to specify attributes you wish to target.

XPath also uses Axis, which are keywords that represent a relationship to the currently specified node.
One of them is ‘following’, used to select everything in the document after the closing tag of the targeted node. This will target all data in the that’s listed AFTER the bunny category:

//critter_species[@category='bunnies']/following::critter_stats

Another way to target elements is using node tests. “A node test is part of an expression to retrieve one or more nodes.”Quackit
The following XPath expression is using the ‘text()’ node test to target all the text nodes that are children of the elements.

//critter_name/child::text()

Other node tests are:
node() - Selects any node on the targeted axis
attribute() - Selects any attribute node on the targeted axis.
comment() and element() work the same way.

Using XPath

XPath, and its BFF XML, can be used together with many programming languages. C#, Java, Python, & Ruby just to name a few. In just about any editor or IDE.

While doing my research, and playing around with Browser apps like XPather, I wondered how easy it would be to use XPath (and XML) with JavaScript in two of the IDEs I’m most familiar with: Replit & VScode. Setup for both was super quick and easy!

When using XPath in VSCode, You must install the 'XPath' & 'XMLDOM' packages in your Node.js environment. In the VSCode Terminal, you run:
npm install xpath
AND:
npm install xmldom
OR:
npm install xmldom xpath
You're good to go!

In Replit:
Open the Packages tab from the Tools section on the left menu bar
Then enter ‘XPath’ in the search bar & click the 'Install' button beneath the XPath result.
You're good to go! Again!

Now you just need to follow some Handy documentation on setting up your first XPath file in JavaScript.

Conclusion

Learning about XPath was more involved than I expected, but that’s not a bad thing! There are just so many kinds of XML documents you can make that I kept Branching off the main Tree to read about them. But I was always able to find the right Path to get back on track.
I feel like I have a much better understanding of what XPath is and it’s usefulness now. So I will take this confidence and go replay XPath Diner again with fresh eyes!