Monthly Archives: June 2013

Boy Meets Python

Last week I needed a quick solution to convert a CSV file to an XML file, and because C# is my primary language, I was able to throw this together in less than 10 minutes:

So what does this have to do with Python? Well, this weekend, I had the sudden urge to learn some Python. I wanted to build something that a.) would force me to learn a few things about the language, and b.) had value. The CSV to XML converter was fresh in my mind, and so I thought it would be a great way to begin my Python journey.

To start, I installed Python on Windows. I downloaded the installer from http://www.python.org/getit/, and was writing Python in just a few minutes. Pretty painless.

Writing Python was slightly awkward at first, but I quickly got the hang of things. Having taken the time to learn LINQ and lambda expressions a few years ago certainly helped.

Command line arguments were a breeze using argparse. Within minutes I had a way to specify the CSV input file and the XML output file. It isn’t absolutely necessary, but argparse makes specifying expected parameters easy, and comes prepackaged with --help. Nice.

Next, I stumbled upon csv, which was certainly helpful. But, again, I’m pretty sure I could have survived without it, treating the input file as a standard text file and reading one line at a time.

A long time ago I got into the habit of encapsulating file I/O with using() in C#. It felt awkward acquiring a file handle and having to call close() on it explicitly, but once I discovered Python’s with keyword, I felt right at home.

The rest of the script, which is really the meat of the conversion, required me to learn a little bit about lists and strings. I’m an avid user of string.Format(...) in C# and was happy to see that I could call format(...) in Python.

I began by reading in the first line, which always contains the headers. I wanted to form a string format something to the effect of <row col0="{0}" col1="{1}"/> that I could use when processing each subsequent line. I discovered the join() method on the string, and thought that might allow me to dynamically assemble the attributes. Calling join() on the string " " and passing to it a collection of strings generated by an iterator that iterates over the headers cleverly assembles the format string — in one line of code! (I felt pretty stupid when I realized that the string in C# also has this feature.)

The last remaining piece was processing each line of the CSV file. This was trivial once I generated a format string, with one exception. For each line, I thought I could call format() on the format string, pass in the list of values from the line, and write the newly constructed string to the file. The problem was, format() is expecting comma-delimited parameters, and I was holding onto a list of values as strings. Simply passing the reference to the list, line, was not sufficient. To my surprise, I discovered that I could essentially dereference the list (as such: *line), satisfying format().

And that completed the exercise! I won’t admit how long it took me to write, but let’s just say it took longer than 10 minutes.

Below is the script:

(Having spent the time to set up the row format in Python, I thought I should go back and use the same approach in C# , complete with using Join(), for a more apples-to-apples comparison.)

The mere fact that I did all of this on Windows felt slightly sacrilegious, so I decided to go back and conduct the same exercise, this time on Linux — Ubuntu 13.04 to be exact.

Ubuntu ships with Python installed, so technically there were even fewer steps to get started. But, it ships with v2.7.4, and the script I wrote on Windows apparently uses language features that didn’t exist until v3.x. So, I grabbed Python 3.3.2 for Linux from http://www.python.org/getit/, and followed these excellent instructions so that I could have both v2.7.4 and v3.3.2 installed simultaneously. Once installed, the script I wrote on Windows ran equally well on Linux.

It was clear during this exercise that I merely scratched the surface with Python. It appears to have quite an exhaustive API, contains many of the same constructs that I’m used to in C#, and I will not hesitate to use it for all of my future scripting needs.