XML & XPath & XSLT

XPath | XSLT

Tips on using and transforming XML. I primarily use XML::LibXML and XML::LibXSLT Perl modules for my XML processing. I usually recommend against XML::Simple, as I have and have seen others waste too much time fiddling around with the resulting data structures. Consider stepping up from XML::Simple to XML::LibXML.

XML will not suit all workflows: alternatives could vary from JSON to Markdown to YAML to many others, depending on the need.

XPath

Reference the XML Path Language (XPath) specification to learn XPath. XSLT makes heavy use of XPath. Not all XPath implementations may support the full range of features found in the XPath specification. HTML::Selector::XPath can convert CSS2 selectors to an equivalent XPath statement. Find a tool with which to experiment, such as xpath-tester, which is used in the examples below, along with xpath-position-example.xml:

<xml> x
<one a="a1">1</one>
<two a="a2">2</two>
<three> 33 <!-- context node in examples -->
<first a="f1">1st</first>
<second a="f2">2nd</second>
<third a="f3">3rd</third>
<fourth/>
<fifth a="f5">5th</fifth>
</three>
<four a="a4">4</four>
<five a="a5">5</five>
</xml>

Examples

Excessively complicated XPath expressions may either simply not work, be hard to debug, or otherwise be difficult to understand and support. This is akin to using multiple regular expressions instead of a single long mess, or various lines of code instead of cramming too much into a single expression. Keep XPath expressions simple, and handle logic, iteration, and recursion in XSLT or other software outside of XPath.

xmlns

xmlns can be troublesome to deal with, notably those that declare no namespace prefix (undec.xml):

<undec xmlns="http://example.org/undec/1.0/">
<a>a</a>
<b>b</b>
</undec>

Undeclared namespace require custom registration in the XPath software being used, such as via the registerNs method of XML::LibXML::XPathContext, and also that elements in the namespace be prefixed with the appropriate tag. Some expressions will still match elements in a custom namespace (//* due to the nature of *), though this should not be relied on instead of properly declaring the xmlns.

$ xpath-tester undec.xml '//u:a'
a

XSLT

XSL Transformations (XSLT) details the XSLT language. The XSLT stylesheets for this website are available online. Tricks include different handling of acronym elements depending on whether or not the acronym has already been shown on the page, and more. XSLT makes heavy use of XPath statements.