Late Binding: tdd

Showing posts with label tdd. Show all posts

Wednesday, August 29, 2018

Emphasize behavior over structure

Behavior is the mirror in which everyone shows their image. - Johann Wolfgang von Goethe

RSpec is a Behaviour-Driven Development tool for Ruby programmers. Created shortly after JBehave in 2005, its raison d'être has been writing human-readable specifications in the BDD tradition - accessible to not just developers but also business users, analysts, and testers. Over the years, though, that idea of accessibility to an audience beyond just developers seems to have been lost on many. I suspect that's because many view RSpec as little more than a testing tool with a DSL rather than the BDD tool that it's meant to be.

This narrower developer-focused view leads to dubious guidelines like the ones documented on Better Specs. One of the very first ones there is about how to describe your methods.

How to describe your methods

The only guideline needed on that topic is: Don't do it. Describe behaviors, not methods.

Instead, you get guidelines like: Be clear about what method you are describing. For instance, use the Ruby documentation convention of . (or ::) when referring to a class method's name and # when referring to an instance method's name.

Below is an example of a supposedly good practice according to that guideline.


# (supposedly) good
describe Lemon
  describe "#yellow?" do
    context "when ripe" do
      it "returns true"
    end
  end
end

# vs (supposedly) bad
describe Lemon
  it "is yellow if ripe"
end

When trying to reason about good practice, I find it always instructive to go back to the origins of an idea. Here too, Dan North's article introducing BDD offers great insight. Some quotes:

[...] when they wrote the method name in the language of the business domain, the generated documents made sense to business users, analysts, and testers.

Notice how, in contrast to the above example, the emphasis is on the language of the business domain (e.g. Lemon is yellow if ripe), rather than the language of code (e.g. Lemon#yellow?, when ripe, returns true) that would make little sense to someone other than a developer.

This sentence template – The class should do something – means you can only define a test for the current class.

Notice how the reference is to a class, not a method. The class, not its methods, should do something. Why should anyone care about whether Lemon#yellow? returns true rather than whether Lemon is yellow?

If you find yourself writing a test whose name doesn’t fit this template, it suggests the behaviour may belong elsewhere.

Consider this Active Record test (assuming Lemon is an Active Record model).


describe Lemon
  describe ".ripe" do
    it "returns only ripe lemons"
  end
  describe ".rotten" do
    it "returns only rotten lemons"
  end
end

This test just can't be written in the form 'Lemon should do something' following the suggested template. Because after all, we aren't really describing a Lemon here. What we're describing is the equivalent of a Lemon Repository (or Lemon Basket, perhaps, in the UbiquitousLanguage of the domain) or Data Access Object or whatever floats your goat when it comes to persistence. Active Record sadly conflates a model's class and its persistence mechanism, even though they're two clearly distinct concepts with distinct concerns. If the two concepts weren't conflated, you'd write tests that don't need to describe methods anymore but can describe the behavior/responsibilities of a class, as below:


describe LemonBasket
  it "should provide only ripe lemons"
  it "should provide only rotten lemons"
end

Another way to think about this is to remember that a class is an object, and class methods are, therefore, really instance methods of the (singleton) class object. So, if you include descriptions of .class_methods and #instance_methods in the description of a class, you're really mixing the descriptions of two different classes (pun intended) of objects. Consider this sample from RSpec's own documentation:


RSpec.describe Widget, :type => :model do
  it "has none to begin with" do
    expect(Widget.count).to eq 0
  end

  it "has one after adding one" do
    Widget.create
    expect(Widget.count).to eq 1
  end
end

That description just doesn't make any sense as soon as you try to read it as sentences that'd appear in RSpec output formatted as documentation.

Widget
 has none to begin with
 has one after adding one

That doesn't make sense because what you're really describing is conceptually something like a WidgetWarehouse.

WidgetWarehouse
 has no widgets to begin with
 has one widget after adding (storing?) one

With Active Record, the Widget class above doubles up as a singleton WidgetWarehouse (WidgetDAO/WidgetRepository/whatever). But instead of letting that incidental implementation detail drive the structure of your tests, you could recognize that they're essentially two distinct concepts/concerns, and describe them in independent tests rather than first mixing them up in the same class test and then trying to distinguish them through syntactic conventions.

Moreover, method descriptions aren't conducive to true BDD or TDD style in general. The behaviors of an object are known before the tests, but the structure (method names, signatures, etc.) quite often emerges. For instance, the behavior WidgetWarehouse has one widget after adding one is known at the time of writing the test, but the method that'd provide that behavior isn't (.add or .store? - I shouldn't have to make up my mind before writing the first test.).

Using cutesy syntactic conventions to describe classes based on their structure allows, even encourages, you to sidestep such considerations and suppresses the feedback generation mechanism that's at the heart of TDD and BDD. Focusing on and describing behavior forces you to think more deeply about the behavior and where it belongs.

Wednesday, April 18, 2012

Playing the game as if you meant it

"Acting has to do with saying it as if you meant it, so for me the words are always very important. It's very important for me to know my lines, know them so well that I don't have to think about them." - Christopher Walken

A few weeks back, I did the Game of Life kata in Ruby. After the first two times, I got bored of implementing eql? and hash methods for the Location class I ended up with. So, the third time, I tried to make it more interesting by doing the exercise in a 'TDD as if you meant it' style. I'd been very intrigued by the idea ever since I read about it, but hadn't really tried it myself. Here's a quick recap of the rules:

write exactly ONE failing test
make the test from (1) pass by first writing implementation code IN THE TEST
create a new implementation method/function by:

doing extract method on implementation code created as per (2), or
moving implementation code as per (2) into an existing implementation method

only ever create new methods IN THE TEST CLASS
only ever create implementation classes to provide a destination for extracting a method created as per (4).
populate implementation classes by doing move method from a test class into them
refactor as required
go to (1)

I've published my code, and tried to make it so that following the commits from the first to the last should give one a good idea of almost every step I followed. The rest of this post is a walk through the micro-process of TDD as if you meant it (TAIYMI) as I applied it to the Game of Life. So, here we go...

The first five commits are quite unremarkable, except that, in accordance with rules 1-4 above, all the implementation code is in the rspec "test class".


describe "In the next gen, a grid with" do
  [[], [[0,0]], [[0,0], [0,1]]].each do |cells|
    context "#{cells.size} live cells" do
      it "should have no live cells" do
        g = cells
        ng = next_gen(g)
        ng.should == []
      end
    end
  end

  def next_gen(grid)
    []
  end
end

At this stage, I have tests for grids with 0, 1 and 2 cells, all of which become empty (of any live cells) in the next generation. The next step, where I write a test for 3 live cells in a diagonal, is where it starts to become interesting.


context "3 live cells in a diagonal" do
  it "should have only the middle cell alive" do
    g = [[0,0],
                [1,1],
                       [2,2]]
    ng = next_gen(g)
    pending "calculation of neighbors"
    ng.should == [[1,1]]
  end
end

I need to keep it pending "calculation of neighbors". Traditionally, at this point, I think I might've stubbed a call to calculate neighbors of a Cell/Location to make this test pass, and later gone on to test-drive the calculation of neighbors. But the "rules of the game" prohibit me from doing so. So, I'll calculate neighbors first.

The test for "neighbors of origin" looks quite silly, but I don't know how else to write it "as if I meant it".


describe "a location's neighbors" do
  it "should be all locations offset by a single row and/or column" do
    location = [0,0]
    neighbors = [[-1,-1], [-1,0], [-1,1],
                 [ 0,-1],         [ 0,1],
                 [ 1,-1], [ 1,0], [ 1,1]]
    neighbors.should =~ [[-1,-1], [-1,0], [-1,1],
                         [ 0,-1],         [ 0,1],
                         [ 1,-1], [ 1,0], [ 1,1]]
  end
end

I probably already have something in mind about how my next test will look. But first, I extract neighbors_of(location) and the test doesn't look that silly anymore, while laying the ground for the next test, which is to find the neighbors of location [1,2].


describe "neighbors of location" do
  it "[0,0] should be [-1,-1], [-1,0], [-1,1], [ 0,-1], [ 0,1], [ 1,-1], [ 1,0], [ 1,1] in any order" do
    location = [0,0]
    neighbors_of(location).should =~ [[-1,-1], [-1,0], [-1,1],
                                      [ 0,-1],         [ 0,1],
                                      [ 1,-1], [ 1,0], [ 1,1]]
  end

  it "[1,2] should be [0,1], [0,2], [0,3], [1,1], [1,3], [2,1], [2,2], [2,3] in any order" do
    location = [1,2]
    neighbors = neighbors_of(location).map { |loc| [loc[0] + location[0], loc[1] + location[1]] }
    neighbors.should =~ [[0,1], [0,2], [0,3],
                         [1,1],        [1,3],
                         [2,1], [2,2], [2,3]]
  end

  def neighbors_of(location)
    [[-1,-1], [-1,0], [-1,1],
     [ 0,-1],         [ 0,1],
     [ 1,-1], [ 1,0], [ 1,1]]
  end
end

The one thing I don't like about the two neighbors_of tests now, is that the tests don't read it "should ..." anymore. And worse, the "it" declarations duplicate the programmatic assertions. That's one thing I dislike often about rspec tests which include expected data (rather than a statement about the expected state) in the it declaration. In cases like this, I don't know how to avoid that (and I'd love to hear suggestions).

With the ability to find neighbors_of(location), I can now go and finish that pending test for 3 live cells. I make it pass by writing code in the test method first, and then extracting it into a method, so that I end up with next_gen(grid).


context "3 live cells in a diagonal" do
  it "should have only the middle cell alive" do
    g = [[0,0],
                [1,1],
                       [2,2]]
    ng = next_gen(g)
    ng.should == [[1,1]]
  end
end

def next_gen(grid)
  grid.select { |location| neighbors_of(location).select { |neighbor| grid.include?(neighbor) }.count == 2 }
end

Extracting the alive? method next pushes me for the first time towards rules 5 and 6 (to create implementation classes as destination for moving a method).


def next_gen(grid)
  @grid = grid
  @grid.select { |location| live_neighbors_of(location).count == 2 }
end

def live_neighbors_of(location)
  neighbors_of(location).select { |neighbor| alive?(neighbor) }
end

def alive?(location)
  @grid.include?(location)
end

I now have an instance variable, indicating the need for an object that can hold it. In preparation for a move method, I convert grid from an instance variable to a method parameter (aside: I think the variable grid should've been called live_locations in the first place).


def next_gen(grid)
  grid.select { |location| live_neighbors_of(location, grid).count == 2 }
end

def live_neighbors_of(location, grid)
  neighbors_of(location).select { |neighbor| alive?(neighbor, grid) }
end

def alive?(location, grid)
  grid.include?(location)
end

Creation of the Grid class with the next_gen method next isn't quite an atomic refactoring, but I think it's simple enough that I can spare myself the tedium of writing each individual step.


class Grid
  attr_reader :live_locations

  def initialize(live_locations)
    @live_locations = live_locations 
  end

  def next_gen
    live_locations.select { |location| live_neighbors_of(location).count == 2 }
  end

  def live_neighbors_of(location)
    neighbors_of(location).select { |neighbor| alive?(neighbor) }
  end

  def alive?(location)
    live_locations.include?(location)
  end
end

That's followed by another test (and, of course, implementation and refactoring) for live cells with 3 live neighbors. That takes care of all the rules relating to live cells. Handling the dead cells is easily done. I'm pleasantly surprised that I can still make the test pass by writing the implementation code in the test method first.


context "3 live cells in a row" do
  it "should have instead 3 live cells in a column, with the middle cell unchanged" do
    g = [[0,0], [0,1], [0,2]]
    grid = Grid.new(g)
    ng = grid.next_gen + g.flat_map { |location| neighbors_of(location) }.select { |location| grid.live_neighbors_of(location).count == 3 }.uniq
    ng.should =~ [[-1,1], [0,1], [1,1]]
  end
end

After moving the flat_map into next_gen and a little bit of refactoring, we're done implementing all the rules of the game (of life) following the rules of the game (of TAIYMI). With tests to keep me in check, I can now go wild with refactoring. And I do, and come up with the following:


class Grid
  attr_reader :live_cells

  def initialize(live_cells)
    @live_cells = live_cells.uniq
  end

  def next_gen
    surviving(live_cells) + vivifying(dead(neighbors_of_all(live_cells)))
  end

  def neighbors_of_all(cells)
    cells.flat_map { |cell| neighbors_of(cell) }.uniq
  end

  def vivifying(cells)
    cells.select { |cell| vivifying?(cell) }
  end

  def vivifying?(cell)
    live(neighbors_of(cell)).count == 3 
  end

  def dead(cells)
    cells.reject { |cell| alive?(cell) }
  end

  def surviving(cells)
    cells.select { |cell| surviving?(cell) }
  end

  def surviving?(cell)
    [2, 3].include?(live(neighbors_of(cell)).count)
  end

  def live(cells)
    cells.select { |cell| alive?(cell) }
  end

  def alive?(cell)
    live_cells.include?(cell)
  end
end

Note that in the next_gen method, I don't really need to pass around live_cells as a method parameter, since it's there in the instance state for any method to grab. But I choose to pass it, simply because


surviving(live_cells) + vivifying(dead(neighbors_of_all(live_cells)))

is a much more explicit and complete formulation of the solution than, say

surviving + vivifying(dead(neighbors_of_all))

The latter is terser, but the former, even with its superfluousness, is more succinct.

Note that I've now replaced all uses of 'location' with 'cell', because the distinction I had to maintain in my earlier OO solution between them (a cell has a location and is either alive or dead, while a location is simply the coordinate of a cell) is no longer necessary to maintain. Usage of 'cell' enables the framing of the entire solution in terms of terms coming straight from the domain language.

Just for good measure, I throw in another pair of tests for live and dead cells with 4 live neighbors, respectively. But now, I don't like how the spec declaration for those two tests read. Here's the rspec output documentation for the grid:

In the next gen, a grid with
  0 live cells
    should have no live cells
  1 live cells
    should have no live cells
  2 live cells
    should have no live cells
  3 live cells in a diagonal
    should have only the middle cell alive
  4 live cells in a square block
    should remain unchanged
  3 live cells in a row
    should have instead 3 live cells in a column, with the middle cell unchanged
  5 live cells forming a +
    should have instead 8 live cells forming a hollow square
  8 live cells forming a hollow square
    should have instead 8 live cells forming a diamond

Those last two tests don't quite serve as documentation, I'm afraid, because they make neither the initial nor the expected state obvious. While that's true to some extent even for the other tests, I don't terribly mind the slight mental computation they require. Also, the special way in which I've been laying out the arrays to visually represent the grid has the potential to be misleading, in that the coordinate values don't have to match up with their visual placement. But (hint, hint), suggestions for improvement anywhere in my solution, approach, or process are highly welcome.

In conclusion, though, a comparison of my earlier solution - with its classes for Cell, Location and Grid - to this one seems to support the observation that classes developed like this tend to have very little or no state, coming much closer to the functional programming paradigm.

Tuesday, May 12, 2009

TDD and the status quo in software

"Status quo, you know, that is Latin for the mess we're in." - Ronald Reagan

I just had this question put to me: "Is TDD appropriate for the RAD (rapid application development) model?". Wikipedia lists XP as one of the flavors of RAD. So, I can only say that for at least one particular flavor of RAD, TDD isn't just appropriate, but essential to its definition.

I think it's about time the debate about TDD was turned on its head. Rather than practitioners or advocates of TDD being expected to provide answers to such questions (and back them up with numbers), I think it's the skeptics that should have to prove the inappropriateness of TDD for RAD or any other context.

I've been a long-time follower of the TDD group on Yahoo!, and I've often seen individuals posting requests there for empirical studies and other arguments to be presented to managers in order to convince them to enforce/encourage/bless the adoption of TDD. Despite being a firm believer in TDD, I don't think it'll be too difficult for skeptics to dismiss the results of most such studies I've come across, mainly because we CannotMeasureProductivity, and proving effectiveness would require measurement and comparison. Not saying such a study is impossible, but it's very difficult, and I haven't seen it done yet.

But I'll bet the difficulty in proving by numbers is equally real for the skeptics. Because who's measuring the amounts (and effects) of any of the following?

Duplicated code
Unreadable code
Untestable code
Minutes spent writing superfluous comments
Hours spent debugging
Hours spent blocked because another team-member is working on the same file you need to edit
Hours spent navigating through multiple screens, manually testing every code edit through the user interface
Days spent creating ER diagrams, flow charts, UML diagrams
Days spent transferring knowledge to co-workers, because it's all in the mind, not in the code
Days spent bug-fixing

All of these are unquestioningly accepted because, in our trade, that's the status quo. No one knows how much the TechnicalDebt we accumulate when we fail to RefactorMercilessly is holding us down. No one knows if that Big Rewrite we had to undertake, and will surely have to again undertake if we continue to ignore the TechnicalDebt, could've turned out totally evitable had we regularly made the minimum payments in the form of small continuous refactorings. So, when you don't have to justify CopyAndPasteProgramming, why should you be expected to justify TestFirstProgramming? If anything, it's the lack of automated tests (written first or last) that should be questioned.

I wish it were the status quo that's questioned before we reached the point of questioning the appropriateness of what is arguably the only professional way of writing software.

I also wish the Flying Spaghetti Monster were considered no lesser a God than, say, Jesus Christ.

HighlightJS