Monthly Archives: November 2013

Transforming and Validating XML with Python and lxml

XML isn’t nearly as sexy as JSON these days, but it’s still out there in the wild. And it is powerful. For example, it’s pretty awesome that you can assemble an XSL transform to parse XML and turn it into newly formatted XML. It’s also pretty awesome that you can verify XML against a schema to ensure the XML meets all requirements (say, for example, that an ID be unique across all instances) — that the XML is “valid.”

If you are a front-end developer, chances are that you make a series of HTTP requests and receive data — it’s a pretty common thing. For the purposes of this post, we’ll assume that data is XML. But, there’s a problem: the XML is not using the tags you need for your application. So, you apply an XSL transform. Your application makes many assumptions about the format of this massaged data, so you employ a schema or XSD to validate each assumption.

There’s also a pretty good chance that the folks maintaining these services want to tinker. So it would be immensely helpful to be able to quickly test out each URL to be confident that changes made to services won’t negatively affect your application. It would be wise to structure these as actual unit tests, but that is beyond the scope of my focus here.

Commence the tool-making! This seemed like a perfect candidate for Python, so I hopped to it. After some googling, I quickly got the impression that lxml was the perfect library for the job, able to handle both XML transforms and XSD validation. It couldn’t have been easier to work with.

I whipped up a Python script to read URLs from a designated text file, iterate over each one, hit the URL, transform the XML, validate the XML, and write any validation errors to a log file. Pretty straight-forward, and I can now validate all of my URLs at a moment’s notice, and have a full report generated in seconds.

Below is my script: