More Python Fun

It’s no secret that, recently, I’ve been teaching myself Python. A couple of weeks ago, I wrote a Python script to convert a CSV file to an XML file, and that wet my appetite for more.

Earlier today, I discovered Anaconda from Continuum Analytics, which comes with IPython Notebook. Not only is it a really nice tool for learning Python, but you can also plot points! This would have made Calculus way more fun 15 years ago!

At any rate, I started fooling around with some basic list slicing, list comprehension and the functional favorites: filter, map and reduce. IPython Notebook made this incredibly simple. Wanting to tackle something a bit more complicated, I sought out a coding interview problem.

The problem is such that you’re provided an initial collection of integers, and you are to produce a sum of the highest, non-adjacent integers in the collection. It sounds challenging, but when you break it up into smaller pieces, it’s pretty trivial.

I started by building a min heap of the original collection such that I could pop off the largest values in order. A max heap is technically more appropriate, but the Python heapq module that turns a list into a heap only supports min. As for the values themselves, I simply inverted them by multiplying each by -1.

The index of each item is also critical in determining whether adjacent items have already been applied toward the sum. So instead of pushing the raw value onto the heap, I pushed a tuple containing the value and its index.

With the heap fully constructed, the next thing needed was some way of keeping track of which items were used toward the sum. I chose the simple solution of creating a list of boolean values, each initialized to False, such that when an item at the same index is used toward the sum, its value is changed to True.

While popping items off the heap, each item’s neighbors are examined to determine whether it’s a candidate for the sum. If it is, its value is added to a final list, from which a sum can easily be reduced.

Here’s the full script:

Could this problem be solved other ways, either by reducing allocations or increasing speed? Quite possibly, but remember, this was just an exercise to flex my new Python muscles.


Boy Meets Python

Last week I needed a quick solution to convert a CSV file to an XML file, and because C# is my primary language, I was able to throw this together in less than 10 minutes:

So what does this have to do with Python? Well, this weekend, I had the sudden urge to learn some Python. I wanted to build something that a.) would force me to learn a few things about the language, and b.) had value. The CSV to XML converter was fresh in my mind, and so I thought it would be a great way to begin my Python journey.

To start, I installed Python on Windows. I downloaded the installer from http://www.python.org/getit/, and was writing Python in just a few minutes. Pretty painless.

Writing Python was slightly awkward at first, but I quickly got the hang of things. Having taken the time to learn LINQ and lambda expressions a few years ago certainly helped.

Command line arguments were a breeze using argparse. Within minutes I had a way to specify the CSV input file and the XML output file. It isn’t absolutely necessary, but argparse makes specifying expected parameters easy, and comes prepackaged with --help. Nice.

Next, I stumbled upon csv, which was certainly helpful. But, again, I’m pretty sure I could have survived without it, treating the input file as a standard text file and reading one line at a time.

A long time ago I got into the habit of encapsulating file I/O with using() in C#. It felt awkward acquiring a file handle and having to call close() on it explicitly, but once I discovered Python’s with keyword, I felt right at home.

The rest of the script, which is really the meat of the conversion, required me to learn a little bit about lists and strings. I’m an avid user of string.Format(...) in C# and was happy to see that I could call format(...) in Python.

I began by reading in the first line, which always contains the headers. I wanted to form a string format something to the effect of <row col0="{0}" col1="{1}"/> that I could use when processing each subsequent line. I discovered the join() method on the string, and thought that might allow me to dynamically assemble the attributes. Calling join() on the string " " and passing to it a collection of strings generated by an iterator that iterates over the headers cleverly assembles the format string — in one line of code! (I felt pretty stupid when I realized that the string in C# also has this feature.)

The last remaining piece was processing each line of the CSV file. This was trivial once I generated a format string, with one exception. For each line, I thought I could call format() on the format string, pass in the list of values from the line, and write the newly constructed string to the file. The problem was, format() is expecting comma-delimited parameters, and I was holding onto a list of values as strings. Simply passing the reference to the list, line, was not sufficient. To my surprise, I discovered that I could essentially dereference the list (as such: *line), satisfying format().

And that completed the exercise! I won’t admit how long it took me to write, but let’s just say it took longer than 10 minutes.

Below is the script:

(Having spent the time to set up the row format in Python, I thought I should go back and use the same approach in C# , complete with using Join(), for a more apples-to-apples comparison.)

The mere fact that I did all of this on Windows felt slightly sacrilegious, so I decided to go back and conduct the same exercise, this time on Linux — Ubuntu 13.04 to be exact.

Ubuntu ships with Python installed, so technically there were even fewer steps to get started. But, it ships with v2.7.4, and the script I wrote on Windows apparently uses language features that didn’t exist until v3.x. So, I grabbed Python 3.3.2 for Linux from http://www.python.org/getit/, and followed these excellent instructions so that I could have both v2.7.4 and v3.3.2 installed simultaneously. Once installed, the script I wrote on Windows ran equally well on Linux.

It was clear during this exercise that I merely scratched the surface with Python. It appears to have quite an exhaustive API, contains many of the same constructs that I’m used to in C#, and I will not hesitate to use it for all of my future scripting needs.


A date with JSON

I don’t work with JSON every day. In fact, I hadn’t used it at all until the beginning of this year, when I made REST calls to Twitter and retrieved gobs of tweets as JSON.

I’m now working on a project that contains collections of immutable C# objects, and those objects need to make their way to ActionScript. Given that ActionScript is based on ECMAScript, it seems appropriate to serialize these objects as JSON so that ActionScript might easily consume them.

During my Twitter tinkering, I was using an older version of the .NET runtime, and I had no other choice but to rely on third party libraries for JSON support, lest I roll my own. This time, I have the latest and greatest at my fingertips, and I decided to take it for a test drive.

JavaScriptSerializer started off well, for the most part. I could easily serialize any object with a single line of code:

var myObj = new MyObject(...);

var jsonText = new JavaScriptSerializer().Serialize(myObj);

It’s simple, and perfectly innocent. Deserializing, however, proved to be slightly more difficult.

var deserializedObj = new JavaScriptSerializer().Deserialize<MyObject>(jsonText);

This would have worked, but the Deserialize() method depends on the existence of a default constructor, and invokes each property setter individually. That’s fine if you’re working with mutable objects, but for concurrency concerns, I insisted my objects be immutable.

The overload of Serialize() produced the same results.

var deserializedObj = new JavaScriptSerializer().Deserialize(jsonText, typeof(MyObject));

There was one last method on the JavaScriptSerializer class that had some potential: DeserializeObject().

var objGraph = new JavaScriptSerializer().DeserializeObject(jsonText);

DeserializeObject() returned a dictionary of objects keyed by string. I added a constructor to MyObject specifically to consume it. This worked, but I wasn’t pleased with having to add a separate constructor, and I wondered what I might do if types didn’t match up properly.

I continued to capture my assumptions as unit tests, and everything seemed to be working decently … until I hit a DateTime object. I would serialize a DateTime, and it would deserialize as a DateTime four hours ahead. Something was clearly awry.

A quick google search landed me at Scott Hanselman’s post from earlier this year in which he exposed JSON’s poor support of dates, and pointed out that Json.NET does a much, much better job.

I’ve used Json.NET in the past, and so with confidence, I fired up NuGet, downloaded Json.NET, and within the span of about five minutes, was able to produce this:

var jsonText = JsonConvert.SerializeObject(myObj);

var deserializedObj = JsonConvert.DeserializeObject<MyObject>(jsonText);

And, voila! Notice how I’m providing the type? Json.NET is intelligent enough to invoke my constructor with the proper values rather than just relying on a default constructor and invoking each property.

The conclusion? I spent the better part of a day trying to work around the shortcomings of JavaScriptSerializer, and Json.NET solved all of my problems in minutes. Microsoft, take note!


Paranoid Android

Two years ago I finally convinced my wife that we should trade in our lame feature phones for a couple of nice, capable smartphones. I had been making the case for months that we should have our calendars digital and synced with each other. This came to a head when we double-booked a weekend back in the summer of 2010 in which we were supposed to be a.) on vacation in Cape May, N.J., for an extended weekend, and b.) attending our close friends’ 25th wedding anniversary party. The result? We drove to the shore, and we attended the party. Attending the party required us to give up an entire day of our vacation; missing the party was not an option. I had the vacation on my calendar, she had the party on her calendar. I rested my case.

So began the quest for finding the best smartphone available. I had whittled the options down to the iPhone 4 and the Droid Incredible. We were Verizon Wireless customers at the time, and the iPhone was still an AT&T exclusive. I was hoping that the Incredible would do the job so we could stay on Verizon’s network, but watching my wife interact with the UI was painful. My wife is a very smart person, but she found the Incredible’s UI navigation to be, well, not so incredible. When she picked up the iPhone, however, she quickly was able to find her way around. It was a done deal. We were getting iPhones. In fact, she loved the device so much that she recommended it to our friends, and many of them now have iPhones. Quite a stark contrast from her protest just a few months prior!

That was two years ago. Now, our two-year contract is up, and we’ve found AT&T’s signal to be atrocious in many of the areas we travel. It’s particularly bad at our home, so I installed a 3G Microcell device that basically turned our iPhones into VoIP phones. (AT&T charged me $150 for the device, when in fact they should have paid me! But that’s a story for another day.) It’s really great, when it works. Sometimes it doesn’t work, and that causes me a lot of grief. We are unquestionably switching back to Verizon.

I could have gotten an Android device on AT&T’s network for myself, but I hesitated, because:

At this point, though, I’m ready to ditch Apple. If my wife decides to get a new iPhone (and she can now that the iPhone is no longer exclusive to AT&T), she knows the UI inside and out, and doesn’t need my help. And that’s exactly why I don’t want a new iPhone. Sure, they still make an incredible piece of hardware, but iOS is getting pretty tired. People obviously are eating it up, but I keep wanting more from my phone and Apple just isn’t doing it for me these days. Meanwhile, Android has made huge strides in the last two years, and that’s what I like to see — progress! Android still has challenges with OS fragmentation and carrier rollout of updates, but Apple withholds many new iOS features from older devices. For example, when iOS 5 came out, my iPhone 4 didn’t get Siri. Sure, many people will tell you that Siri sucks, but I never got the chance to judge for myself. Fragmentation, anyone? Oh, and don’t get me started on those stupid new connectors that require a $30 adapter to use older cables, or the big maps failure.

What I want more than anything is a Nexus phone on Verizon’s network — if for nothing else, to minimize OS fragmentation and reduce carrier influence over the device. But, based on the rumors I’ve been reading online, the upcoming Nexus phone might not make it onto Verizon’s network and/or not have MicroSD support. If any of this materializes, I will likely end up with the Droid Razr HD (unless someone can talk me out of it). It’s also tempting to grab the Samsung Galaxy Nexus on the cheap, but there is no guarantee that it will continue to be a candidate for future Android releases.

I have another week or two to mull over the options. Hopefully some fresh details will emerge about the upcoming Nexus phone. After the new phones are purchased, I will post a review.


Hello, GitHub

I’m a little behind the times. I’ve been committing my code to an SVN repository for the last five years, and the interwebs are all about Git these days. I decided to take it for a test drive. But with what?

Almost a year ago, I started working on a data structures library in C# — not a useful library, just something I whipped up to essentially prove that I could still write a linked list and perform operations on it (such as reversing and sorting). It’s hardly something to brag about, but it’s a perfect test candidate for Git.

I went looking for my project, and boy, does my ignorance know no bounds. When I authored this library, I thought it would be best to test it out using a console app, spitting out values to the console, and inspecting the output by hand. Obviously, I should have just expressed that as a series of unit tests, but I probably felt that doing so would slow my progress with the library. Stupid. So, I converted the console code to a series of unit tests in MSTest. (I’ve only ever used NUnit, but hey, this is an adventure, right?) And besides, I was going to be committing this code to a public repository on Github, and I wanted to spare myself from embarrassment.

From there, I created an account with Github, and set up my first repository. Next, I installed Github for Windows to clone the repository on my local machine.

Everything was in place. But, to make things a little easier, I installed Git Source Control Provider plugin for Visual Studio, which also required the installation of straight-up Git. (By the way, Github’s instructions for setting up Git are very good.) Using this plugin was certainly a welcome change from managing my source from outside the IDE.

Via the Git plugin, I added all of the necessary files to the repository from within Visual Studio. Perfect. At this point, I had my data structures solution in Visual Studio 2012, complete with unit tests, and code fully committed to the local Git repository. The next logical step was to click “sync” in Github for Windows to sync all of my local changes with Github. Right? I did that, and received this gem:

Github for Windows: Unstaged Changes Error

Github for Windows: Unstaged Changes Error

OK, so how do I “stage” my changes? Sure, I could open the shell, but do what, exactly? Well, Github for Windows seemed to offer no solution, so the shell quickly became my only option. I fired up Git Bash (part of Git), and typed git commit to see what happened. The response was that there was nothing to commit. Ah, so git commit is only local. I typed git help to see what other options were available, and git push looked like it had some potential. I tried that out, was prompted for my Github credentials, and bam — all of my local commits have been pushed up to Github.

I’ve only gotten started. I have more code to add to the project, and I will try adding it from multiple machines to get a feel for the experience of team collaboration with Git/Github. Look for those details in a future post.

Update: Github for Windows JUST made a liar out of me. I repeated all of these steps on another machine, made a small change, synced from Github for Windows, and pushed the changes to Github without any problems. Go figure!

Update 2: It wasn’t Github’s fault, it was mine. It helps when you point Visual Studio’s Git plugin to Git running on Windows.

Incorrect Path to Git for Windows

It helps when you provide a valid path to Git for Windows.


A Strange Loop in St. Louis

I traveled to St. Louis last week to attend my first Strange Loop conference. (It is a strange name. The FAQ offers an explanation, but because I get the feeling that most of its attendees would like to “strangle OOP,” I’m a little suspicious.)

ANYway, Strange Loop is a fairly new conference (this is its fourth year) aimed to attract developers from various disciplines. Talks cover a variety of areas, including databases (both big and small), emerging languages, web and mobile development, etc. I first learned about the conference less than a year ago when I watched Rich Hickey’s keynote at last year’s conference. I was hooked.

From talking to other attendees, I gathered that the overall theme changes from year to year, and this year there seemed to be lots of discussion about databases — in particular their methods of persistence, and the relevance of transactions and/or ACID. Opinions varied widely, and it was both amusing and informative to watch it play out.

Strange Loop is About to Begin

Strange Loop is about to begin.

This year’s venue was the Peabody Opera House, which is gorgeous. There was ample room for attendees, the facility was clean and kept after, the acoustics were great, the WiFi was excellent, and the coffee was plentiful. The organizers of this event did a tremendous job putting all of this together.

Day 0: Pre-Conference and Pre-Party

I did not attend the Emerging Language Camp pre-sessions, but I heard they were terrific. I did, however, attend the pre-party at the Schlafly Tap Room, enjoyed an Oktoberfest and listened to Teddy Presberg and the Restoration Organ Trio tear it up for about and hour and a half. Those guys were excellent. It was the perfect way to end the day, and to start off the conference.

Day 1: Databases, Bootstraps, Lies and Oh! The Arch

Michael Stonebraker keynoted the event with his talk titled “In-Memory Databases – The Future is Now!” Michael talked about what he calls “NewSQL” (that is neither traditional SQL nor NoSQL) and his implementation of it. I’m a little suspicious because the whole thing is single-threaded, and in my mind, that processor had better be bangin’ to handle all of that work. To be fair, though, multi-threaded databases incur a ton of overhead dealing with concurrency, and by making the database itself single-threaded, a tremendous amount of complexity can be eliminated.

I remember not long ago when Twitter Bootstrap came out, and I made a mental note to check it out. Well, Howard Lewis Ship beat me to it and gave a very nice overview, complete with working examples. I’m not an active web developer, and am not up to date with all the latest frameworks (and there are many — it makes my head hurt, to be honest), but the promise of cross-platform and consistent look and feel coupled with ease of use make Bootstrap quite compelling.

Stuart Sierra gave a talk titled “Functional Design Patterns,” in which he identified a series of coding patterns much like the Gang of Four, but specific to functional programming. One pattern he introduced as the state/event pattern, which, from what I can tell, is the same thing as event sourcing. He introduced a number of other patterns that I probably would have appreciated more if I was more proficient in a functional language. Apparently, many of these patterns are “monadic” in nature.

(Everyone laughs here when someone says “monads.” Either everyone has a 10-year-old’s sense of humor because it rhymes with “gonads,” or there’s some kind of inside joke that I don’t know about.)

Gary Bernhardt blew our minds with his talk titled “A Whole New World,” wherein he developed a new console, complete with interactive visuals. Impressive, no? Indeed … until we realized that he built the whole thing in Keynote. Lame! His point wasn’t lost on me though. We should not settle for old, dusty tools. Sure, the console is tied to the kernel, but so what?! We should want better tools, and we should build them … just not in Keynote.

The Gateway Arch

I snuck out of the conference long enough to check out the Gateway Arch. It’s even more majestic in person!

Amanda Laucher and Paul Snively duked it out in their “Types vs. Tests: An Epic Battle” talk. Epic is kind of a strong word, but the talk was interesting. They worked on some code katas independently, and neither seemed to agree on the initial approach. E.g., do I write out all of my unit tests first, or define all of my types? They did seem to agree, however, that both types and tests have their places. As a C# developer, I’m quite comfortable and proficient with types. But, I’m really dragging my feet with TDD, and I was glad to hear that I’m not the only one who dreads the thought of writing tons of unit tests. Say what you want, but it’s downright dreadful.

Rich Hickey challenged my thinking with respect to databases with his talk, titled “The Database as a Value.” Relational databases historically have been a necessity in order to minimize storage space. Today, however, storage is cheap and ubiquitous, allowing developers to consider data persistence alternatives. Rich made the argument that developers commit code to version control systems without regard for space consumption, and such systems keep each revision forever. No relational database offers that, but we should want it. In fact, we should demand it! (Permalinks, anyone?) Not only is Rich demanding it, he’s building it.

Day 2: Neuroscience, Philosophy, Abstractions and Expressions

For the keynote, Jeff Hawkins gave us an education on the neocortex of the brain and how Numenta is using this information to build learning systems. The neocortex is a predictive modeling/memory system, not a computing system, and Numenta’s system(s) are built similarly. Artificial intelligence has a bright future, and Jeff envisions a future in which it’s used for continued information discovery, particularly in areas unsuitable for humans. (Think deep space travel.)

Matt Butcher proposed that all of computer science can be traced back to Plato, much like A. N. Whitehead suggested that all philosophy is a footnote to Plato. Matt theorized that Plato’s focus on being would likely place him in the OOP camp, and that Aristotle would be a champion of functional programming as he was far more interested in the becoming, the perpetual change of state. And, to wit, if you attempt to combine both philosophies, you end up with Scala. (Laughter ensued, but I’ve never written a line of Scala, so the joke was lost on me. I bet it was really funny though.)

I went on to attend three other talks, all of which were interesting:

In summary, the conference accomplished exactly what it set out to do. It attracted really smart people from various disciplines, covered a variety of topics, and blew people’s minds. Quality was high, all around. In fact, even the tunes were great! I can’t think of a single thing to really complain about, and that speaks volumes because complaining is something I’m incredibly skilled at. (OK, the hotel WiFi SUCKED, but that’s about it.)

For those of you who did not attend, all of the talks have been recorded and will eventually show up on InfoQ. Also, talk contents (slides, code, etc.) can be found here.


A Splash of Regular Expressions

Several times each year I find myself in need of regular expressions. And each time, I crack my knuckles, and start googling for a quickstart guide.

Today was one such day. A colleague sent me a comma-delimited text file of values, and I needed to weed out some duplicates, and reformat the remainder as XML. I opened the text file in Notepad++ and went to work.

A quick replace of "," with ",\n" yielded about 6,500 lines. A manual edit was going to take way too long. Regular expressions were going to be my friend.

First, to weed out the dupes. Fortunately, each dupe contained an underscore followed by a unique integer. Inserting "_\d+" into my existing expression matched up with all of the dupes, and replacing each match with nothing shortened my file to about 1,300 lines. Super.

\d{3}[+-np]{1}\d{5}_\d+,\W

Now all I needed to do was drop the comma, and surround the remaining values with <ID> and </ID> tags. Piece of cake, except I forgot to insert parentheses into the expression, and this caused me grief when trying to use "\1" in the “Replace with” box. Note to self: next time, don’t forget about the parentheses!

Find what: (\d{3}[+-np]{1}\d{5})(,)

Replace with: <ID>\1</ID>

The comma maps to "\2", and I didn’t use that in my result expression, so the comma simply disappeared.

The rest of the XML formatting was a piece of cake, adding header and footer tags.

Sources that helped me today: