Software Boundaries

I’ve given a lot of thought to the notion of boundaries in software lately, ever since I watched this excellent talk titled “Boundaries” by Gary Bernhardt at Ruby Conf 12 a few months back. It spurred in me a new appreciation for boundaries in software, particularly as it relates to design and testability, imperative versus functional approaches, etc.

Largely for amusement, but also as an act of gentle self reinforcement, I reread Erik Dietrich’s colorful blog post titled “Visualization Mnemonics for Software Principles” that is a great overview of SOLID principles and Law of Demeter. It struck me that there is a common thread across all of these principles: boundaries.

Each principle more or less establishes a boundary, and instructs how it should be respected.

First, there’s the Law of Demeter. It basically states that you shouldn’t hand over more information to a method or component than is necessary for that method or component to function. The invocation of that method or component is the boundary, and at that boundary, provide only what is necessary.

This continues with the SOLID principles.

The Single Responsibility Principle, or SRP, is pretty self explanatory. A component/class/function/whatever should only do one thing. This essentially promotes composability, where you can assemble a larger thing that does many things from smaller, singular pieces. When SRP is violated, boundaries between responsibilities is blurred.

The Open/Closed Principle states that components should be open for extension, closed for modification. It’s describing a boundary. Here’s this component that may or may not have multiple responsibilities, but you aren’t permitted to meddle with those responsibilities directly. Instead, a specific interface — a boundary — is provided that allows you to alter the overall behavior by extension, preserving the default behaviors.

The Liskov/Substitution Principle, which is pretty specific to OOP, says that all derived types should be able to act as stand-ins for their ancestors. When this principle is violated, you end up with a derived object that only appears to be like all the others. It’s a boundary-within-a-boundary, wherein the imposter derivatives are disrespecting the boundary its ancestors have established.

The Interface Segregation Principle favors smaller, more digestible interfaces instead of larger, heavier ones. In a way, it’s just applying SRP to interfaces. It also has a Law-of-Demeter feel to it, given that smaller interfaces require less overall definition to be satisfied. This principle is reinforcing boundaries between responsibilities.

The Dependency Inversion Principle, which is practically at odds with encapsulation, calls for components to code against abstractions rather than the concrete. It forces a boundary where perhaps there previously was none. Instead of a component taking responsibility for instantiating dependencies, there’s a boundary where, abstractly, that dependency can be supplied, or “injected.”

Another boundary-oriented principle that is familiar to many, but is not a member of the SOLID elite, is Don’t Repeat Yourself, or DRY. It is, I think, oft misunderstood, as it is applied literally by squashing code duplication. But, it can and should be applied more generally to concepts. By consolidating a concept into a single place, be it a component or function, you’re establishing a firm boundary around it. When a concept is scattered about, the boundary is once again blurred.

These widely accepted principles are hardly orthogonal; they are bound by boundary.

“What” is Agile?

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

That’s the manifesto. An agile organization recognizes and values these basic principles.

The use of the word over is intentional and deliberate. It doesn’t say instead of or not. The stuff on the right is important, but the stuff on the left is more important. The recognition of this preference is the quintessential definition of agile.

What’s also intentional is the absence of instruction and ceremony. Nowhere in the manifesto, for example, does it say that in order to produce working software, you must have a daily scrum. Scrum is merely an inconsequential process. A person doesn’t become a master carpenter by simply clutching a hammer, and neither is a team agile if it adopts scrum (or any other software development process or tool).

Put differently, the manifesto doesn’t pontificate on the how — there’s nothing imperative about agile. There’s a clear separation of concerns: the manifesto declares the what, and the team implements the how.

When the how satisfies the what, the team is an agile one. The manifesto is not a list of instructions; it’s the acceptance criteria.

A quick blurb about automated/unit testing

I know many¬†of us place value on unit tests. But, like most things in life, benefits bring costs along for the ride. It’s a routine exercise in prudence.

I think unit tests can be valuable. I also think they can be very expensive. And it’s very easy for the costs to surpass the benefits if we’re not careful.

I’m especially fond of Gerard Meszaros’ perspective on automated testing. In his presentation, he makes the argument that tests deserve the same level of craftsmanship and care as our production code, and provides some excellent techniques to distill these tests such that they lower the costs associated with writing and maintaining them.

The Unit Skeptic

I’ve been spending a lot of time lately thinking about unit tests. There’s a divide that exists between developers who champion their use, and those who are skeptical. I fall firmly into the former camp, and while I’m trying to understand why the other camp is as large as I perceive it to be, I also might understand why.

A portion of the skeptics have actually tried it, but had a poor experience. The tests they wrote broke often. The tests also were long and complex, and when they broke, they took a long time to repair (or were simply deleted). This experience is familiar, and mine was no exception.

As soon as I thought I understood the value proposition, I immediately started to write tests against an existing codebase because I wanted what the tests promised. (First mistake.) I installed NUnit, created some test fixtures, compiled … I was off to the races! From there, the intuitive thing to do was to mimic the path of execution I expected a given class to take. (Second mistake.) Does that sound like an integration or service-level test? You bet, but everything compiled and all of the tests passed, so I had no reason to believe I was doing anything wrong.

I understood that there was to be an element of isolation to these tests, and the classes I was testing had multiple dependencies and cross-cutting statics. TypeMock to the rescue! (Third mistake.) Writing mocks sucked the life out of me. Writing mocks is, I just, ugh. Graphing out the order of each method call and hardcoding each return value on an external module is one of the least exciting things one can do with a computer.

A couple of weeks later, I had a small suite of passing “unit” tests! It felt good. Until they started to break.

As soon as I refactored code, tests broke. I could understand if it was a class that I had written tests for, but the mocks? Changes to the object I was mocking also broke the mocks — go figure! Ugh, and they were such a pain to write. This wasn’t supposed to happen. Both grokking the broken tests (and their mocks) and fixing them was no fun. Tests were supposed to encourage refactoring, not discourage it. Test maintenance was becoming expensive and rapidly not providing any tangible benefit. As more tests broke, more tests were deleted.

I took a break from unit tests. But, you couldn’t read a blog or listen to a podcast without someone plugging Test-driven development (TDD) and all of its greatness. Despite my skepticism, it was hard to ignore that there was something different about my experience versus TDD: when the tests are written.

With TDD, you write tests while you’re writing your code. (Purists would have you writing the tests first, but I’m still not there yet. Baby steps.) Determined not to miss out on something great, I gave it another go, but this time with a feature that hadn’t yet been written. I established a cadence of writing a small piece of code (e.g., a method) and then writing tests for it. I discovered pretty quickly that my tests were influencing my architecture in positive ways. For example, I started parameterizing context. If my production method needed to know the current time, I’d pass it in as a parameter versus making a hardcoded call to DateTime.Now. Then, my unit test could provide a fixed context. Before I knew it I was using full-blown dependency injection to parameterize context. Not only was this better architecture, but the tests were so much easier to write. (And, no mocks ftw!)

This second pass at unit testing was a far more rewarding experience. In addition to tests taking less time to write and having a positive architectural influence, they were also less brittle. Since this experience, I’ve unit tested nearly every new chunk of code I’ve written, with zero regret.

That’s all well and good for new stuff, but how about that legacy code — the stuff that many newcomers are tempted to write tests for? My first attempt at this was an epic failure. I haven’t given up, but clearly it is something that takes more work than just authoring a few test fixtures. And I won’t attempt it again until after I’ve read Working Effectively with Legacy Code, because I understand that Michael Feathers imparts lots of wisdom in this space. (It’s on my short list, I’m hoping to absorb it in the coming weeks.)

Having gone through the exercise of journaling my past experience with unit testing, I suppose I can understand some of the skepticism among those who’ve tried it. But, when the movement first began, there was a lot less guidance available. It’s also important to understand that a unit testing framework is just like any other tool in that it can be used in any number of ways, including ways in which it was never intended.

Finding the Shortest Path

I’ve known about Dijkstra’s algorithm for a long time, but never took the time to review it and then try to implement it on my own to prove whether I really understood the concept. Until today. I stumbled upon Eoin Bailey’s explanation of Dijkstra’s algorithm, and found it to be quite helpful.

As I was reviewing the algorithm, it struck me that I could probably use a min heap in order to keep track of which node to visit next. Fortunately, a few months ago I wrote a series of C# extension methods to “heapify” a list in exactly the same way that heapq does for Python. It was incomplete (and still is), but enough of the methods were in place that I could make use of it.

I ran into a few bugs, particularly when a longer path was calculated. It turns out the incomplete min heap had a few bugs in it. Once those were ironed out, the algorithm implementation seemed to work flawlessly.

My Dijkstra’s algorithm implementation is contained in my slowly growing DataStructures project on GitHub, if you’re interested in taking a peek.

GUID for Javascript

This morning, I needed to be able to generate a GUID in Javascript. Like any developer, I hit up the interwebs for some help, and landed here. (Gotta love Stack Overflow.)

Anyway, I ended up going with this implementation, submitted by broofa.

function newGuid(){
    return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
        var r = Math.random()*16|0, v = c == 'x' ? r : (r&0x3|0x8);
        return v.toString(16);

It’s exactly what I was looking for, all nice and tidy. Happy coding!

Transforming and Validating XML with Python and lxml

XML isn’t nearly as sexy as JSON these days, but it’s still out there in the wild. And it is powerful. For example, it’s pretty awesome that you can assemble an XSL transform to parse XML and turn it into newly formatted XML. It’s also pretty awesome that you can verify XML against a schema to ensure the XML meets all requirements (say, for example, that an ID be unique across all instances) — that the XML is “valid.”

If you are a front-end developer, chances are that you make a series of HTTP requests and receive data — it’s a pretty common thing. For the purposes of this post, we’ll assume that data is XML. But, there’s a problem: the XML is not using the tags you need for your application. So, you apply an XSL transform. Your application makes many assumptions about the format of this massaged data, so you employ a schema or XSD to validate each assumption.

There’s also a pretty good chance that the folks maintaining these services want to tinker. So it would be immensely helpful to be able to quickly test out each URL to be confident that changes made to services won’t negatively affect your application. It would be wise to structure these as actual unit tests, but that is beyond the scope of my focus here.

Commence the tool-making! This seemed like a perfect candidate for Python, so I hopped to it. After some googling, I quickly got the impression that lxml was the perfect library for the job, able to handle both XML transforms and XSD validation. It couldn’t have been easier to work with.

I whipped up a Python script to read URLs from a designated text file, iterate over each one, hit the URL, transform the XML, validate the XML, and write any validation errors to a log file. Pretty straight-forward, and I can now validate all of my URLs at a moment’s notice, and have a full report generated in seconds.

Below is my script: