HighlightJS

Friday, December 13, 2019

Good tests tell a story. Good stories don't follow a template.

Each word I write drops a little more of me onto the page. In time, I will be the book, the book will be me, and the story will be told. - Very Short Story

Kent Beck and Kelly Sutton have been sharing a series of very helpful byte-sized videos about the desirable properties of tests. The one on readability suggests that tests should read like a story. In it, they explain with an example and a movie analogy how a test should be a self-contained story. I'd like push that analogy of a test telling a story and the idea of self-containment further.

So, first, here's the test code as shown in the video:

RSpec.describe MinimumWage::HourlyWageCalculator do
  describe '.calculate' do
    subject { described_class.calculate(hours, rate, zip_code) }
    
    let(:hours) { 40 }
    let(:rate) { 8 }
    let(:zip_code) { nil }
    
    it "calculates the total hourly wages" do
      expect(subject).to eq(320)
    end
    
    context "zip_code is in San Francisco" do
      let(:zip_code) { 94103 }
      
      context "rate is below $15 per hour" do
        let(:rate) { 14 }
        
        it "uses $15 as the rate" do
          expect(subject).to eq(600)
        end
      end

      context "rate is above $15 per hour" do
        let(:rate) { 16 }
        
        it "uses the given rate" do
          expect(subject).to eq(640)
        end
      end
    end
  end
end

Apart from the reasons discussed in the video, I think the test fails to tell a good story for a number of reasons mostly to do with what appears to be conventional RSpec usage.

First off, Kelly begins his explanation of the story with: For this class, for this method, here's what should happen... [...] reading this aloud in my head, I'm, like, "For the 'calculate' method, it calculates the total hourly wages."

Now, I don't think talking about methods makes for a great story. Look at the output from running the tests for a moment:

MinimumWage::HourlyWageCalculator
  .calculate
    calculates the total hourly wages
  ...

I've explained elsewhere why that's not a great story, and how you could tell a better one by emphasizing behavior over structure.

Next, when Kelly moves the magic number 40 (hours/week) to a method, Kent claims that the test is less readable now than it was before. He explains that the relationship between hours and rate and the expected value is really clear if we can just see the 40 right there. I think that using a stricter definition of right there, we can vastly improve the clarity of our tests further. A DRYing process like the one Kent talks about (extracting some setup code, moving the setup to a factory, ...) is just one way we may end up moving related elements from right there to further apart. However, the boilerplate RSpec style I see in most code bases and code examples is a much more common cause of unnecessary distance between related elements relevant to a test.

Kent describes how even a 2-line test where you've gone too far with DRYing becomes completely incomprehensible unless you've read the other [n] lines of code outside the test. Let's look at how that's true for tests in this example even without the DRYing, just due to the RSpec DSL boilerplate.

it "calculates the total hourly wages" do
  expect(subject).to eq(320)
end

To understand that, you'd first have to know this:
subject { described_class.calculate(hours, rate, zip_code) }
But to understand that, you'd have to know this:
RSpec.describe MinimumWage::HourlyWageCalculator do
and this:
let(:hours) { 40 }
and this:
let(:rate) { 8 }
and this:
let(:zip_code) { nil }
Consider this DSL-light version of the same tests instead:

RSpec.describe MinimumWage::HourlyWageCalculator do
  def weekly_wage(rate:, zip_code: nil, work_hours: 40)
    MinimumWage::HourlyWageCalculator.calculate(work_hours, rate, zip_code)
  end

  it "calculates the total weekly wages" do
    expect(weekly_wage(rate: 8).to eq(320)
  end

  context "zip_code is in San Francisco" do
    let(:zip_code_in_sf) { 94103 }

    it "uses $15 as the rate when rate is below $15 per hour" do
      expect(weekly_wage(rate: 14, zip_code: zip_code_in_sf).to eq(600)
    end

    it "uses the given rate when rate is above $15 per hour" do
      expect(weekly_wage(rate: 16, zip_code: zip_code_in_sf).to eq(640)
    end
  end
end

This version eschews the multiple lets in favor of a fixture built using Inline Setup. It hides Irrelevant Information from the tests behind an evocatively named Creation Method. Besides using less space to convey the same information, I claim this alternative scores a lot better on the self-contained story dimension. Pull out any individual test (i.e. just the it block), and you still have the whole story without any reference to any external or surrounding context. For instance, look at this example:

it "uses $15 as the rate when rate is below $15 per hour" do
  expect(weekly_wage(rate: 14, zip_code: zip_code_in_sf).to eq(600)
end

Looking at it, you can tell, using Kent's own words:
  • Here's the characters (rate: $14, zip code: somewhere in SF, duration: a week)
  • Here's the action (compute the weekly wage)
  • Here're the consequences (computed wage using 600 proving the rate used was $15)
For comparison, here's the original test stripped down to the bare essentials required to tell the story of just that one case:

RSpec.describe MinimumWage::HourlyWageCalculator do
  describe '.calculate' do
    subject { described_class.calculate(hours, rate, zip_code) }

    let(:hours) { 40 }

    context "zip_code is in San Francisco" do
      let(:zip_code) { 94103 }

      context "rate is below $15 per hour" do
        let(:rate) { 14 }

        it "uses $15 as the rate" do
          expect(subject).to eq(600)
        end
      end
    end
  end
end
That's 15 lines of screen space (discounting blank lines) used up to tell the 3-line story we've just seen above in the DSL-light version.

So, let's not forget that having a DSL doesn't mean having to use a DSL. Use your testing framework’s convenience helpers sparingly. Strive to write each test as a self-contained story.

Wednesday, August 29, 2018

Emphasize behavior over structure

Behavior is the mirror in which everyone shows their image. - Johann Wolfgang von Goethe

RSpec is a Behaviour-Driven Development tool for Ruby programmers. Created shortly after JBehave in 2005, its raison d'ĂȘtre has been writing human-readable specifications in the BDD tradition - accessible to not just developers but also business users, analysts, and testers. Over the years, though, that idea of accessibility to an audience beyond just developers seems to have been lost on many. I suspect that's because many view RSpec as little more than a testing tool with a DSL rather than the BDD tool that it's meant to be.

This narrower developer-focused view leads to dubious guidelines like the ones documented on Better Specs. One of the very first ones there is about how to describe your methods.

How to describe your methods

 

The only guideline needed on that topic is: Don't do it. Describe behaviors, not methods.

Instead, you get guidelines like: Be clear about what method you are describing. For instance, use the Ruby documentation convention of . (or ::) when referring to a class method's name and # when referring to an instance method's name.

Below is an example of a supposedly good practice according to that guideline.

# (supposedly) good
describe Lemon
  describe "#yellow?" do
    context "when ripe" do
      it "returns true"
    end
  end
end

# vs (supposedly) bad
describe Lemon
  it "is yellow if ripe"
end

When trying to reason about good practice, I find it always instructive to go back to the origins of an idea. Here too, Dan North's article introducing BDD offers great insight. Some quotes:
[...] when they wrote the method name in the language of the business domain, the generated documents made sense to business users, analysts, and testers.
Notice how, in contrast to the above example, the emphasis is on the language of the business domain (e.g. Lemon is yellow if ripe), rather than the language of code (e.g. Lemon#yellow?, when ripe, returns true) that would make little sense to someone other than a developer.
This sentence template – The class should do something – means you can only define a test for the current class.
Notice how the reference is to a class, not a method. The class, not its methods, should do something. Why should anyone care about whether Lemon#yellow? returns true rather than whether Lemon is yellow?
If you find yourself writing a test whose name doesn’t fit this template, it suggests the behaviour may belong elsewhere.
Consider this Active Record test (assuming Lemon is an Active Record model).

describe Lemon
  describe ".ripe" do
    it "returns only ripe lemons"
  end
  describe ".rotten" do
    it "returns only rotten lemons"
  end
end

This test just can't be written in the form 'Lemon should do something' following the suggested template. Because after all, we aren't really describing a Lemon here. What we're describing is the equivalent of a Lemon Repository (or Lemon Basket, perhaps, in the UbiquitousLanguage of the domain) or Data Access Object or whatever floats your goat when it comes to persistence. Active Record sadly conflates a model's class and its persistence mechanism, even though they're two clearly distinct concepts with distinct concerns. If the two concepts weren't conflated, you'd write tests that don't need to describe methods anymore but can describe the behavior/responsibilities of a class, as below:

describe LemonBasket
  it "should provide only ripe lemons"
  it "should provide only rotten lemons"
end

Another way to think about this is to remember that a class is an object, and class methods are, therefore, really instance methods of the (singleton) class object. So, if you include descriptions of .class_methods and #instance_methods in the description of a class, you're really mixing the descriptions of two different classes (pun intended) of objects. Consider this sample from RSpec's own documentation:

RSpec.describe Widget, :type => :model do
  it "has none to begin with" do
    expect(Widget.count).to eq 0
  end

  it "has one after adding one" do
    Widget.create
    expect(Widget.count).to eq 1
  end
end

That description just doesn't make any sense as soon as you try to read it as sentences that'd appear in RSpec output formatted as documentation.
Widget
 has none to begin with
 has one after adding one
That doesn't make sense because what you're really describing is conceptually something like a WidgetWarehouse.
WidgetWarehouse
 has no widgets to begin with
 has one widget after adding (storing?) one
With Active Record, the Widget class above doubles up as a singleton WidgetWarehouse (WidgetDAO/WidgetRepository/whatever). But instead of letting that incidental implementation detail drive the structure of your tests, you could recognize that they're essentially two distinct concepts/concerns, and describe them in independent tests rather than first mixing them up in the same class test and then trying to distinguish them through syntactic conventions.

Moreover, method descriptions aren't conducive to true BDD or TDD style in general. The behaviors of an object are known before the tests, but the structure (method names, signatures, etc.) quite often emerges. For instance, the behavior WidgetWarehouse has one widget after adding one is known at the time of writing the test, but the method that'd provide that behavior isn't (.add or .store? - I shouldn't have to make up my mind before writing the first test.).

Using cutesy syntactic conventions to describe classes based on their structure allows, even encourages, you to sidestep such considerations and suppresses the feedback generation mechanism that's at the heart of TDD and BDD. Focusing on and describing behavior forces you to think more deeply about the behavior and where it belongs.

Thursday, April 19, 2018

Dual track development - myths and anti-patterns

People just don’t read. It’s a miracle you’ve read this far yourself. - Jeff Patton

But people do look at pictures. (- Jeff Patton, again). And sometimes they misinterpret them.

That's why Jeff Patton felt the need to write "Dual Track Development is not Duel Track".

And because people sometimes misinterpret and then misrepresent pictures and ideas and concepts, someone wrote this mischaracterization of not one but at least three first-class concepts: dual-track development, product discovery and refactoring.

This is my attempt to counter the SemanticDiffusion of those concepts by busting some myths and listing some anti-patterns as mentioned in that article.

What not to think, aka myths


Myth 1: Dual-track scrum is an organizational structure separating the effort to deliver working software from the effort to understand user needs and discover the best solution.

It's not an organizational structure.

It's a way of working - a pattern which lots of people doing rigorous design and validation work in Agile development have arrived at. It may be an (intra-team) work organization mechanism, but organizational structure is just the wrong way to do it, because...

It's not about separating the effort to deliver from the effort to understand.

Jeff Patton has used two different visualizations to represent the dual-track model.


Intertwined model
Conjoined model

Neither is perfect, but each one is really trying to convey how Agile thinking and product thinking work together in the same process.

The original paper about dual-track development (when it wasn't called that) says:
"... we work with developers very closely through design and development. Although the dual tracks depicted seem separate, in reality, designers need to communicate every day with developers."
Discovery work happens concurrently and continuously with development work.

It's not (really) about discovering the best solution.

It's primarily about discovering:
  • enough about the problems we’re solving to make good decisions about what we should build
  • if we're building the right thing (not the best solution) and if we'll get value in return
  • if people have the problem we’re solving and will happily try, use, and adopt our solution
  • risks and risky assumptions in our ideas and solution options
The primary objective of discovery is really validating whether the product is worth building. Once that validation is achieved, the best solution may have been either discovered in the process (as a happy byproduct), or it may be subsequently invented using any number of techniques. Coming up with the best solution may require a lot of tactical design decisions, but these are generally not the risky decisions that need validation through discovery.

Myth 2: Discovery is no more than purposeful iteration on solution options.

Discovery is a whole lot more about validation of problems to solve, opportunities and ideas than it is about iteration on solution options.

Myth 3: Knowledge comes from having thought things through beforehand. The discovery process is where that detailed analysis takes place.

The ability to think things through comes from knowledge, not the other way round. Without knowledge, thinking things through will only generate ideas, options and assumptions (which is all great, too - and you may feed those into discovery).

But no, discovery isn't where detailed analysis takes place - it's where exploration and validation takes place. Yes, to actually start building software, you’ll still need to make a lot of tactical design decisions. It’s often that designers need to spend time refining and finishing design work in preparation for writing code. That's why Jeff Patton clarifies, "OK, it’s really 3 kinds of work".

Myth 4: Refactoring is natural when you’re pivoting.

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. Pivoting, on the other hand, is a structured course correction designed to test a new fundamental hypothesis.

So, one requires preserving external behavior while the other is hard to imagine without changing external behavior. So, what's natural when you're pivoting is some form of rework - maybe a redesign, maybe a rewrite - but refactoring doesn't seem to be it.

Myth 5: If you're experiencing frequent refactoring, you probably aren’t putting enough effort into product discovery. Too much refactoring? You may need to invest in product discovery.

Contrarily, if you're experiencing infrequent refactoring, you probably aren't putting enough effort into product upkeep and evolution. Too much rework? You may need to invest in evolutionary engineering skills.

Myth 6: Those weeks of refactoring are a direct drag on speed.
Development work focuses on predictability and quality. Discovery work focuses on fast learning and validation.

Refactoring is what enables you to maximize learning velocity in discovery and delivery velocity and predicability in, well, delivery. The knowledge that you can come back and safely and systematically improve quality (read refactor) allows you to avoid wasting time over-investing in quality stuff when you're uncertain about the solution. Reflecting your learning in the code and keeping its theory consistent and structure healthy through MercilessRefactoring brings predictability and speed to development because you aren't constantly thrown off by surprises lurking in the code.

On the other hand, if you're spending weeks refactoring, you're likely not refactoring (but doing some other form of rework) or, ironically, not refactoring enough.

Myth 7: And “we didn’t think of that” is still a flaw even if you weren’t “supposed to.”

If your weren't supposed to think of something, and so you didn't, it's not a flaw - it's just called not being psychic.

What not to do, aka anti-patterns


Anti-pattern 1: Each track is given a general mandate: discover new needs, or deliver working software. Taking the baton first, the Discovery track is responsible for validating hypotheses about solving user problems, and preparing user stories for the Delivery track.

This sounds suspiciously like talk of two teams rather than two tracks within a single team and a single process. If the whole team is responsible for product success, not just getting things built, then the whole team understands and contributes to both kinds of work.

Anti-pattern 2: A sampling of "how it plays out in my current environment":
- The initial solution concepts are built from the needs brought to the team by Product Management. These often come from known functionality needs, but sometimes directly from client conversations.
- Product Manager provides a prioritized list of user needs & feature requests and helps to explain the need for each.
- The Discovery track generates an understanding of requirements, wireframes and architectural approach.
- The Delivery track takes the artifacts generated in the Discovery track and creates the detailed user stories required to deliver on those ideas.
- The handoff between Discovery and Delivery turned out to be a little hairy.

First, if you already have a prioritized list of requirements/known needs (that only sometimes come directly from client conversations) and can already explain the need for each, where's the element of discovery? You either don't need it, or aren't doing it (right).

Secondly, instead of building a shared understanding of user needs and potential solutions by working together as a whole team, if one group of people in generating an understanding of requirements and handing off artifacts to another group of people, don't be surprised if the handoff turns out to be a little hairy.

Anti-pattern 3: [A variety of specialists] whiteboard wireframes and talk through happy path and edge cases to see if a design holds water. When it doesn’t, changes are made and the talk-through happens again. Simple features may require only a few iterations; more complex ones may be iterated on for several weeks off and on, woven in with other responsibilities. When the team is confident in the design, [...] members of client facing teams are invited to a walk-through, and select clients are recruited to provide feedback.

The team iterating until it feels 'a design holds water' isn't learning and isn't validation and isn't discovery. Sure, it's progressive refinement of ideas and solutions through deliberate analysis and thinking through. And those are wonderful things and should done. However, that's not discovery, so don't call it that.

What to do instead


I've tried to clarify what dual-track development is not, without properly explaining what it is. Others have done a much better job of it than I could at this point, so here're some starting points:
Agile engineering is equally crucial, because without that you'd end up with FlaccidDualTrack (similar to FlaccidScrum - mentally substitute Scrum with dual-track development in that article and it all remains just as true, if not more). For highly relevant engineering perspective, check out:

Monday, February 8, 2016

Cognitive load testing, anyone?

There are burdens which are bad and blameworthy, and these it is our duty at once to cast away. - James Hamilton

A team-mate recently sought help over email while debugging a piece of JavaScript code that he was trying to get working. He had a working reference implementation in Python, but the JS equivalent didn't seem to be working. Below are the Python and JS snippets he shared, asking if anybody could spot the difference.

Python code:

self.Loan_amount/((1.0-(1.0/(1+(self.Interest_rate/(self.Payment_peryear*100)))**(self.Payment_peryear*self.Payment_period)))/(self.Interest_rate/(self.Payment_peryear*100)))

JS code:

var paymentPerYear = totalPayments($scope.loan.interestRepaymentTerm);
parseInt($scope.loan.loanAmount) / ((1.0 - (1.0 / Math.pow(1 + (parseFloat($scope.loan.interest) / (paymentPerYear * 100)), (paymentPerYear * parseInt($scope.loan.tenor))))) / (parseFloat($scope.loan.interest) / (paymentPerYear * 100)));

It took just a glance for me to realize that spotting the difference in the pieces of code as they were written was gonna take me dangerously close to the limits on my capacity for processing information, what with upto 7 - 2 = 5 levels of nesting in the code. So, instead of trying to rise up to the challenge, I decided to begin by lowering the challenge for myself (a strategy I've found generally useful in software development).

The code and the situation here, even though almost trivial, present an instructive example of how programmers often fail to consider - to their own and their teams' detriment - the effects of cognitive load when both reading and writing code. At such a small scale, such failures may seem like minor condonable infractions, but they add up superlinearly to the difficulty of understanding, and resultantly to that of debugging and modifying code. So, let's look at some of the problems.

First, for the JS code:
  • The last line is the meat of the solution, and it's supposed to be a simple mathematical formula to calculate equated periodic installments. However, the formula, which is the signal here, seems to be somewhat drowning in the noise of incidental and irrelevant details like $scope, parseInt and parseFloat.
  • The names are a bit misleading. paymentPerYear seems to indicate it stands for an amount, whereas it really is meant to be the number of paymentsPerYear (note the plural). interest should actually be interestRate.
Now, for the Python code:
  • The only noise is the self references. While I really appreciate the beauty and the power of the self in Python, I feel it often hurts readability by just the slightest amount.
  • Thelackofanyspacinginthecodemakesithardertocomprehend.
  • The inconsistent naming forces you to do a double-take every once in a while. You see the first variable, Loan_amount, and tentatively assume the naming convention to be 'snake case starting with an uppercase first letter'. You see the second variable, Interest_rate, and it kinda confirms your assumption. And then you see... wait a minute, Payment_peryear. Double-take... Payment_per_year. Ah, that wasn't too bad. Except that now you know you can't just cruise along at a comfortable consistent speed on this road - you gotta be prepared to backtrack every once in a while, or simply slow down throughout.
Now, coming to spotting the difference. When diffing as well as debugging, I find it useful to strip out all the noise so you can focus on the signal and analyze it. When diffing, the commonalities are the noise, while the differences are the signal. In order to strip out the commonalities though, you first need to play them up and make them obvious. So, here's what I did to make the commonalities (or is it the differences?) stand out:
  • insert some spacing for readability
  • get rid of the following extraneous things: self, $scope.loan., parseInt and parseFloat
  • factor out some duplication into a variable: r = interestRate / (paymentPerYear * 100)
  • factor out a computation into a variable: n = paymentPerYear * tenor
  • rename loanAmount to P
And here's what I got:

P / ((1.0 - (1.0 / ((1 + r) ** n))) / r) # Python

P / ((1.0 - (1.0 / Math.pow(1 + r, n))) / r) // JS

Or better yet (just the Python version):

(P * r) / (1.0 - (1.0 / ((1 + r) ** n))) # Python

In both cases, I see a somewhat convoluted form of the formula for periodic amortization payment. I can't spot anything but syntactical differences between the two versions, and the only clue about what could be wrong with the JS code I have is a few guesses. But that's beside the point.

The point is, I had to expend effort I think was undue to transform both the versions of the code into a form that wouldn't strain my brain before I could say anything about them. The code would've turned out much better if the original author(s) had paid attention to minimizing the cognitive load for themselves and future readers and maintainers. With upto six levels of nesting (requiring the building up and subsequent winding down of a stack that deep in your mind) and the problems listed above, and having seen what the mathematical formula actually is, it would almost seem as if the original version was set up to obfuscate the solution rather than reveal it. This last version, with two-thirds the levels of nesting, is a direct and concise transliteration of the mathematical formula into programming language syntax, making it so transparent and easy to test that it highly minimizes, even nearly obviates, the possibility of bugs and the need to debug. As a bonus, one wouldn't need to seek help to spot the differences between versions in two different programming languages.

In his excellent book "Specification by Example", Gojko Adzic says this about examples used to illustrate specifications:
With traditional specifications, examples appear and disappear several times in the software development process. Everyone [e.g.Business analysts, developers and testers] invents their own examples, but there’s nothing to ensure that these examples are even consistent, let alone complete. In software development, this is why the end result is often different from what was expected at the beginning. To avoid this, we have to prevent misinterpretation between different roles and maintain one source of truth.
I think this description very nearly fits what most often happens with program source code as well. Every reader/maintainer ends up creating a less noisy representation of the code in their head while trying to understand it, and most often, in the absence of MercilessRefactoring, there it disappears into a black hole, only to be recreated the next time the code is read. So, the burden of the cognitive load is created once when writing the code, but is borne by everyone who has to read the code every time they do it... "create once carry forever", anyone?

We commonly subject our code to load tests to understand how much load they can handle. How about if our code could undergo some form of cognitive load tests so we could gauge how much cognitive load it would impose? Now, I wish "cognitive load testing" were a thing.

Wednesday, April 18, 2012

Playing the game as if you meant it

"Acting has to do with saying it as if you meant it, so for me the words are always very important. It's very important for me to know my lines, know them so well that I don't have to think about them." - Christopher Walken

A few weeks back, I did the Game of Life kata in Ruby. After the first two times, I got bored of implementing eql? and hash methods for the Location class I ended up with. So, the third time, I tried to make it more interesting by doing the exercise in a 'TDD as if you meant it' style. I'd been very intrigued by the idea ever since I read about it, but hadn't really tried it myself. Here's a quick recap of the rules:
  1. write exactly ONE failing test
  2. make the test from (1) pass by first writing implementation code IN THE TEST
  3. create a new implementation method/function by:
    1. doing extract method on implementation code created as per (2), or
    2. moving implementation code as per (2) into an existing implementation method 
  4. only ever create new methods IN THE TEST CLASS
  5. only ever create implementation classes to provide a destination for extracting a method created as per (4).
  6. populate implementation classes by doing move method from a test class into them
  7. refactor as required
  8. go to (1)
I've published my code, and tried to make it so that following the commits from the first to the last should give one a good idea of almost every step I followed. The rest of this post is a walk through the micro-process of TDD as if you meant it (TAIYMI) as I applied it to the Game of Life. So, here we go...

The first five commits are quite unremarkable, except that, in accordance with rules 1-4 above, all the implementation code is in the rspec "test class".


describe "In the next gen, a grid with" do
  [[], [[0,0]], [[0,0], [0,1]]].each do |cells|
    context "#{cells.size} live cells" do
      it "should have no live cells" do
        g = cells
        ng = next_gen(g)
        ng.should == []
      end
    end
  end

  def next_gen(grid)
    []
  end
end
At this stage, I have tests for grids with 0, 1 and 2 cells, all of which become empty (of any live cells) in the next generation. The next step, where I write a test for 3 live cells in a diagonal, is where it starts to become interesting.


context "3 live cells in a diagonal" do
  it "should have only the middle cell alive" do
    g = [[0,0],
                [1,1],
                       [2,2]]
    ng = next_gen(g)
    pending "calculation of neighbors"
    ng.should == [[1,1]]
  end
end

I need to keep it pending "calculation of neighbors". Traditionally, at this point, I think I might've stubbed a call to calculate neighbors of a Cell/Location to make this test pass, and later gone on to test-drive the calculation of neighbors. But the "rules of the game" prohibit me from doing so. So, I'll calculate neighbors first.

The test for "neighbors of origin" looks quite silly, but I don't know how else to write it "as if I meant it".


describe "a location's neighbors" do
  it "should be all locations offset by a single row and/or column" do
    location = [0,0]
    neighbors = [[-1,-1], [-1,0], [-1,1],
                 [ 0,-1],         [ 0,1],
                 [ 1,-1], [ 1,0], [ 1,1]]
    neighbors.should =~ [[-1,-1], [-1,0], [-1,1],
                         [ 0,-1],         [ 0,1],
                         [ 1,-1], [ 1,0], [ 1,1]]
  end
end

I probably already have something in mind about how my next test will look. But first, I extract neighbors_of(location) and the test doesn't look that silly anymore, while laying the ground for the next test, which is to find the neighbors of location [1,2].


describe "neighbors of location" do
  it "[0,0] should be [-1,-1], [-1,0], [-1,1], [ 0,-1], [ 0,1], [ 1,-1], [ 1,0], [ 1,1] in any order" do
    location = [0,0]
    neighbors_of(location).should =~ [[-1,-1], [-1,0], [-1,1],
                                      [ 0,-1],         [ 0,1],
                                      [ 1,-1], [ 1,0], [ 1,1]]
  end

  it "[1,2] should be [0,1], [0,2], [0,3], [1,1], [1,3], [2,1], [2,2], [2,3] in any order" do
    location = [1,2]
    neighbors = neighbors_of(location).map { |loc| [loc[0] + location[0], loc[1] + location[1]] }
    neighbors.should =~ [[0,1], [0,2], [0,3],
                         [1,1],        [1,3],
                         [2,1], [2,2], [2,3]]
  end

  def neighbors_of(location)
    [[-1,-1], [-1,0], [-1,1],
     [ 0,-1],         [ 0,1],
     [ 1,-1], [ 1,0], [ 1,1]]
  end
end

The one thing I don't like about the two neighbors_of tests now, is that the tests don't read it "should ..." anymore. And worse, the "it" declarations duplicate the programmatic assertions. That's one thing I dislike often about rspec tests which include expected data (rather than a statement about the expected state) in the it declaration. In cases like this, I don't know how to avoid that (and I'd love to hear suggestions).

With the ability to find neighbors_of(location), I can now go and finish that pending test for 3 live cells. I make it pass by writing code in the test method first, and then extracting it into a method, so that I end up with next_gen(grid).


context "3 live cells in a diagonal" do
  it "should have only the middle cell alive" do
    g = [[0,0],
                [1,1],
                       [2,2]]
    ng = next_gen(g)
    ng.should == [[1,1]]
  end
end

def next_gen(grid)
  grid.select { |location| neighbors_of(location).select { |neighbor| grid.include?(neighbor) }.count == 2 }
end

Extracting the alive? method next pushes me for the first time towards rules 5 and 6 (to create implementation classes as destination for moving a method).


def next_gen(grid)
  @grid = grid
  @grid.select { |location| live_neighbors_of(location).count == 2 }
end

def live_neighbors_of(location)
  neighbors_of(location).select { |neighbor| alive?(neighbor) }
end

def alive?(location)
  @grid.include?(location)
end

I now have an instance variable, indicating the need for an object that can hold it. In preparation for a move method, I convert grid from an instance variable to a method parameter (aside: I think the variable grid should've been called live_locations in the first place).


def next_gen(grid)
  grid.select { |location| live_neighbors_of(location, grid).count == 2 }
end

def live_neighbors_of(location, grid)
  neighbors_of(location).select { |neighbor| alive?(neighbor, grid) }
end

def alive?(location, grid)
  grid.include?(location)
end

Creation of the Grid class with the next_gen method next isn't quite an atomic refactoring, but I think it's simple enough that I can spare myself the tedium of writing each individual step.


class Grid
  attr_reader :live_locations

  def initialize(live_locations)
    @live_locations = live_locations 
  end

  def next_gen
    live_locations.select { |location| live_neighbors_of(location).count == 2 }
  end

  def live_neighbors_of(location)
    neighbors_of(location).select { |neighbor| alive?(neighbor) }
  end

  def alive?(location)
    live_locations.include?(location)
  end
end

That's followed by another test (and, of course, implementation and refactoring) for live cells with 3 live neighbors. That takes care of all the rules relating to live cells. Handling the dead cells is easily done. I'm pleasantly surprised that I can still make the test pass by writing the implementation code in the test method first.


context "3 live cells in a row" do
  it "should have instead 3 live cells in a column, with the middle cell unchanged" do
    g = [[0,0], [0,1], [0,2]]
    grid = Grid.new(g)
    ng = grid.next_gen + g.flat_map { |location| neighbors_of(location) }.select { |location| grid.live_neighbors_of(location).count == 3 }.uniq
    ng.should =~ [[-1,1], [0,1], [1,1]]
  end
end

After moving the flat_map into next_gen and a little bit of refactoring, we're done implementing all the rules of the game (of life) following the rules of the game (of TAIYMI). With tests to keep me in check, I can now go wild with refactoring. And I do, and come up with the following:


class Grid
  attr_reader :live_cells

  def initialize(live_cells)
    @live_cells = live_cells.uniq
  end

  def next_gen
    surviving(live_cells) + vivifying(dead(neighbors_of_all(live_cells)))
  end

  def neighbors_of_all(cells)
    cells.flat_map { |cell| neighbors_of(cell) }.uniq
  end

  def vivifying(cells)
    cells.select { |cell| vivifying?(cell) }
  end

  def vivifying?(cell)
    live(neighbors_of(cell)).count == 3 
  end

  def dead(cells)
    cells.reject { |cell| alive?(cell) }
  end

  def surviving(cells)
    cells.select { |cell| surviving?(cell) }
  end

  def surviving?(cell)
    [2, 3].include?(live(neighbors_of(cell)).count)
  end

  def live(cells)
    cells.select { |cell| alive?(cell) }
  end

  def alive?(cell)
    live_cells.include?(cell)
  end
end

Note that in the next_gen method, I don't really need to pass around live_cells as a method parameter, since it's there in the instance state for any method to grab. But I choose to pass it, simply because


surviving(live_cells) + vivifying(dead(neighbors_of_all(live_cells)))

is a much more explicit and complete formulation of the solution than, say

surviving + vivifying(dead(neighbors_of_all))

The latter is terser, but the former, even with its superfluousness, is more succinct.

Note that I've now replaced all uses of 'location' with 'cell', because the distinction I had to maintain in my earlier OO solution between them (a cell has a location and is either alive or dead, while a location is simply the coordinate of a cell) is no longer necessary to maintain. Usage of 'cell' enables the framing of the entire solution in terms of terms coming straight from the domain language.

Just for good measure, I throw in another pair of tests for live and dead cells with 4 live neighbors, respectively. But now, I don't like how the spec declaration for those two tests read. Here's the rspec output documentation for the grid:
In the next gen, a grid with
  0 live cells
    should have no live cells
  1 live cells
    should have no live cells
  2 live cells
    should have no live cells
  3 live cells in a diagonal
    should have only the middle cell alive
  4 live cells in a square block
    should remain unchanged
  3 live cells in a row
    should have instead 3 live cells in a column, with the middle cell unchanged
  5 live cells forming a +
    should have instead 8 live cells forming a hollow square
  8 live cells forming a hollow square
    should have instead 8 live cells forming a diamond
Those last two tests don't quite serve as documentation, I'm afraid, because they make neither the initial nor the expected state obvious. While that's true to some extent even for the other tests, I don't terribly mind the slight mental computation they require. Also, the special way in which I've been laying out the arrays to visually represent the grid has the potential to be misleading, in that the coordinate values don't have to match up with their visual placement. But (hint, hint), suggestions for improvement anywhere in my solution, approach, or process are highly welcome.

In conclusion, though, a comparison of my earlier solution - with its classes for Cell, Location and Grid - to this one seems to support the observation that classes developed like this tend to have very little or no state, coming much closer to the functional programming paradigm.

Sunday, November 14, 2010

Maybe foreach is the new if

"Maybe . . . Maybe not. But at least it would be something real." - Nikita, in La Femme Nikita

Last weekend, I spent some time translating Nat Pryce's implementation of a Maybe type for Java to C#. You can find the motivation behind, and an introduction to, the construct in his blog post. He also suggests some better ways of avoiding nulls, BTW. But I'll stick to Maybe for now, and jump directly to some questions, observations and thoughts that arose from a discussion I had with my colleague, Victor, about the type and its usage.

First came the question of how to introduce such a construct into an already long-running project. My answer would be: piecemeal. You simply use it in new code you write, or code you refactor or otherwise modify in the normal course of work. You don't have to touch code you wouldn't have otherwise touched. So, there'll be some methods that declare they may return a nonexistent value (through the use of Maybe), and some others that'll hand you a null without bothering to set expectations. But having even a few methods that'll declare the possibility of nonexistence is better than having none, since you have that many fewer surprises to deal with.

Next, there were some questions about the how code using the type would look. So, here's an if-else structure using a null value:


@Test
public void ifThen() throws Exception {
    Maybe foo = "foo";

    if (foo != null) {
        assertThat(foo, equalTo("foo"));
    }
    else {
        fail("should not have been called");
    }
}

Here's the same structure using a Maybe:


@Test
public void ifThen() throws Exception {
    Maybe foo = definitely("foo");

    if (foo.isKnown()) for (String s : foo) {
        assertThat(s, equalTo("foo"));
    }
    else {
        fail("should not have been called");
    }
}

They look quite similar, but the Maybe example is interesting for two reasons. The first is the use of the Iterable interface (through the for loop) for accessing the 'wrapped' value. It may feel itchy to use a for loop to access what you know is a single value, but we'll address that soon. The second is the clever placement of the for statement on the same line as the if, effectively de-emphasizing the iteration aspect, and making the overall structure look no(t much) different from a regular if-else structure. Too bad, none of the automatic code formatters I've used is clever enough to see the cleverness of the formatting and retain it.

Getting back to the itchiness of the for loop for accessing the single value. After the discussion with Victor, I felt as uncomfortable with the single-element-iteration as he did. Even though it hadn't bothered me at all earlier, it now bothered me enough that I had to find an alternative. Since we were talking about my C# version of Maybe, on my way home, it struck me that one could use the .First() LINQ extension method like this:


if (foo.IsKnown())
{
    Assert.That(foo.First(), Is.EqualTo("foo"))
}

instead of:


if (foo.IsKnown()) foreach (s in foo)
{
   Assert.That(s, Is.EqualTo("foo")
}

In case of an unknown value, .First() would throw an exception. Just what you'd expect. So, that's a solution if you don't like the for loop. As an aside, the if check in the last foreach example is redundant. It's needed only if it's to be followed by an else clause. Maybe foreach is the new if.

What's so wrong with the for loop, though? I realized the reason the iteration felt awkward to me is because I knew there'll never be more than one value. Isn't an iteration really meant for multiple values? Somehow, despite the fact that Maybe implements Iterable, I expected it to hide that fact from me. I assumed the iterability is more or less a hack to be exploited by collections APIs (Guava for Java, LINQ for .net). But reflecting further, these assumptions and expectations seemed to be based on a fuzzy feeling about the form of an iterable, not on a published contract, or even some kinda logic, for that matter.

For methods that return a collection of objects, an empty collection is an acceptable - I'd even say, the only acceptable - return value (some people return a null... ouch!). Returning an empty collection as a NullObject in such cases allows one to avoid the clutter of the SentinelPattern. The iterable/collection interface allows client code to treat all return values - whether with zero, one or many elements - uniformly. So, if a collection with zero or many is fine, a collection with zero or one should be just as fine. Which is exactly what the Maybe type is - an iterable that promises to yield exactly either zero or one element. But instead of being ashamed of being an iterable, it's proud of being one with a stronger guarantee.

BTW, Maybe would definitely() be a worthy type addition to the Babbage language, next to its MIGHT DO and WHY NOT? statements.

Tuesday, May 12, 2009

TDD and the status quo in software

"Status quo, you know, that is Latin for the mess we're in." - Ronald Reagan

I just had this question put to me: "Is TDD appropriate for the RAD (rapid application development) model?". Wikipedia lists XP as one of the flavors of RAD. So, I can only say that for at least one particular flavor of RAD, TDD isn't just appropriate, but essential to its definition.

I think it's about time the debate about TDD was turned on its head. Rather than practitioners or advocates of TDD being expected to provide answers to such questions (and back them up with numbers), I think it's the skeptics that should have to prove the inappropriateness of TDD for RAD or any other context.

I've been a long-time follower of the TDD group on Yahoo!, and I've often seen individuals posting requests there for empirical studies and other arguments to be presented to managers in order to convince them to enforce/encourage/bless the adoption of TDD. Despite being a firm believer in TDD, I don't think it'll be too difficult for skeptics to dismiss the results of most such studies I've come across, mainly because we CannotMeasureProductivity, and proving effectiveness would require measurement and comparison. Not saying such a study is impossible, but it's very difficult, and I haven't seen it done yet.

But I'll bet the difficulty in proving by numbers is equally real for the skeptics. Because who's measuring the amounts (and effects) of any of the following?
  • Duplicated code
  • Unreadable code
  • Untestable code
  • Minutes spent writing superfluous comments
  • Hours spent debugging
  • Hours spent blocked because another team-member is working on the same file you need to edit
  • Hours spent navigating through multiple screens, manually testing every code edit through the user interface
  • Days spent creating ER diagrams, flow charts, UML diagrams
  • Days spent transferring knowledge to co-workers, because it's all in the mind, not in the code
  • Days spent bug-fixing
All of these are unquestioningly accepted because, in our trade, that's the status quo. No one knows how much the TechnicalDebt we accumulate when we fail to RefactorMercilessly is holding us down. No one knows if that Big Rewrite we had to undertake, and will surely have to again undertake if we continue to ignore the TechnicalDebt, could've turned out totally evitable had we regularly made the minimum payments in the form of small continuous refactorings. So, when you don't have to justify CopyAndPasteProgramming, why should you be expected to justify TestFirstProgramming? If anything, it's the lack of automated tests (written first or last) that should be questioned.

I wish it were the status quo that's questioned before we reached the point of questioning the appropriateness of what is arguably the only professional way of writing software.

I also wish the Flying Spaghetti Monster were considered no lesser a God than, say, Jesus Christ.