HighlightJS

Showing posts with label cognition. Show all posts
Showing posts with label cognition. Show all posts

Friday, December 13, 2019

Good tests tell a story. Good stories don't follow a template.

Each word I write drops a little more of me onto the page. In time, I will be the book, the book will be me, and the story will be told. - Very Short Story

Kent Beck and Kelly Sutton have been sharing a series of very helpful byte-sized videos about the desirable properties of tests. The one on readability suggests that tests should read like a story. In it, they explain with an example and a movie analogy how a test should be a self-contained story. I'd like push that analogy of a test telling a story and the idea of self-containment further.

So, first, here's the test code as shown in the video:

RSpec.describe MinimumWage::HourlyWageCalculator do
  describe '.calculate' do
    subject { described_class.calculate(hours, rate, zip_code) }
    
    let(:hours) { 40 }
    let(:rate) { 8 }
    let(:zip_code) { nil }
    
    it "calculates the total hourly wages" do
      expect(subject).to eq(320)
    end
    
    context "zip_code is in San Francisco" do
      let(:zip_code) { 94103 }
      
      context "rate is below $15 per hour" do
        let(:rate) { 14 }
        
        it "uses $15 as the rate" do
          expect(subject).to eq(600)
        end
      end

      context "rate is above $15 per hour" do
        let(:rate) { 16 }
        
        it "uses the given rate" do
          expect(subject).to eq(640)
        end
      end
    end
  end
end

Apart from the reasons discussed in the video, I think the test fails to tell a good story for a number of reasons mostly to do with what appears to be conventional RSpec usage.

First off, Kelly begins his explanation of the story with: For this class, for this method, here's what should happen... [...] reading this aloud in my head, I'm, like, "For the 'calculate' method, it calculates the total hourly wages."

Now, I don't think talking about methods makes for a great story. Look at the output from running the tests for a moment:

MinimumWage::HourlyWageCalculator
  .calculate
    calculates the total hourly wages
  ...

I've explained elsewhere why that's not a great story, and how you could tell a better one by emphasizing behavior over structure.

Next, when Kelly moves the magic number 40 (hours/week) to a method, Kent claims that the test is less readable now than it was before. He explains that the relationship between hours and rate and the expected value is really clear if we can just see the 40 right there. I think that using a stricter definition of right there, we can vastly improve the clarity of our tests further. A DRYing process like the one Kent talks about (extracting some setup code, moving the setup to a factory, ...) is just one way we may end up moving related elements from right there to further apart. However, the boilerplate RSpec style I see in most code bases and code examples is a much more common cause of unnecessary distance between related elements relevant to a test.

Kent describes how even a 2-line test where you've gone too far with DRYing becomes completely incomprehensible unless you've read the other [n] lines of code outside the test. Let's look at how that's true for tests in this example even without the DRYing, just due to the RSpec DSL boilerplate.

it "calculates the total hourly wages" do
  expect(subject).to eq(320)
end

To understand that, you'd first have to know this:
subject { described_class.calculate(hours, rate, zip_code) }
But to understand that, you'd have to know this:
RSpec.describe MinimumWage::HourlyWageCalculator do
and this:
let(:hours) { 40 }
and this:
let(:rate) { 8 }
and this:
let(:zip_code) { nil }
Consider this DSL-light version of the same tests instead:

RSpec.describe MinimumWage::HourlyWageCalculator do
  def weekly_wage(rate:, zip_code: nil, work_hours: 40)
    MinimumWage::HourlyWageCalculator.calculate(work_hours, rate, zip_code)
  end

  it "calculates the total weekly wages" do
    expect(weekly_wage(rate: 8).to eq(320)
  end

  context "zip_code is in San Francisco" do
    let(:zip_code_in_sf) { 94103 }

    it "uses $15 as the rate when rate is below $15 per hour" do
      expect(weekly_wage(rate: 14, zip_code: zip_code_in_sf).to eq(600)
    end

    it "uses the given rate when rate is above $15 per hour" do
      expect(weekly_wage(rate: 16, zip_code: zip_code_in_sf).to eq(640)
    end
  end
end

This version eschews the multiple lets in favor of a fixture built using Inline Setup. It hides Irrelevant Information from the tests behind an evocatively named Creation Method. Besides using less space to convey the same information, I claim this alternative scores a lot better on the self-contained story dimension. Pull out any individual test (i.e. just the it block), and you still have the whole story without any reference to any external or surrounding context. For instance, look at this example:

it "uses $15 as the rate when rate is below $15 per hour" do
  expect(weekly_wage(rate: 14, zip_code: zip_code_in_sf).to eq(600)
end

Looking at it, you can tell, using Kent's own words:
  • Here's the characters (rate: $14, zip code: somewhere in SF, duration: a week)
  • Here's the action (compute the weekly wage)
  • Here're the consequences (computed wage using 600 proving the rate used was $15)
For comparison, here's the original test stripped down to the bare essentials required to tell the story of just that one case:

RSpec.describe MinimumWage::HourlyWageCalculator do
  describe '.calculate' do
    subject { described_class.calculate(hours, rate, zip_code) }

    let(:hours) { 40 }

    context "zip_code is in San Francisco" do
      let(:zip_code) { 94103 }

      context "rate is below $15 per hour" do
        let(:rate) { 14 }

        it "uses $15 as the rate" do
          expect(subject).to eq(600)
        end
      end
    end
  end
end
That's 15 lines of screen space (discounting blank lines) used up to tell the 3-line story we've just seen above in the DSL-light version.

So, let's not forget that having a DSL doesn't mean having to use a DSL. Use your testing framework’s convenience helpers sparingly. Strive to write each test as a self-contained story.

Monday, February 8, 2016

Cognitive load testing, anyone?

There are burdens which are bad and blameworthy, and these it is our duty at once to cast away. - James Hamilton

A team-mate recently sought help over email while debugging a piece of JavaScript code that he was trying to get working. He had a working reference implementation in Python, but the JS equivalent didn't seem to be working. Below are the Python and JS snippets he shared, asking if anybody could spot the difference.

Python code:

self.Loan_amount/((1.0-(1.0/(1+(self.Interest_rate/(self.Payment_peryear*100)))**(self.Payment_peryear*self.Payment_period)))/(self.Interest_rate/(self.Payment_peryear*100)))

JS code:

var paymentPerYear = totalPayments($scope.loan.interestRepaymentTerm);
parseInt($scope.loan.loanAmount) / ((1.0 - (1.0 / Math.pow(1 + (parseFloat($scope.loan.interest) / (paymentPerYear * 100)), (paymentPerYear * parseInt($scope.loan.tenor))))) / (parseFloat($scope.loan.interest) / (paymentPerYear * 100)));

It took just a glance for me to realize that spotting the difference in the pieces of code as they were written was gonna take me dangerously close to the limits on my capacity for processing information, what with upto 7 - 2 = 5 levels of nesting in the code. So, instead of trying to rise up to the challenge, I decided to begin by lowering the challenge for myself (a strategy I've found generally useful in software development).

The code and the situation here, even though almost trivial, present an instructive example of how programmers often fail to consider - to their own and their teams' detriment - the effects of cognitive load when both reading and writing code. At such a small scale, such failures may seem like minor condonable infractions, but they add up superlinearly to the difficulty of understanding, and resultantly to that of debugging and modifying code. So, let's look at some of the problems.

First, for the JS code:
  • The last line is the meat of the solution, and it's supposed to be a simple mathematical formula to calculate equated periodic installments. However, the formula, which is the signal here, seems to be somewhat drowning in the noise of incidental and irrelevant details like $scope, parseInt and parseFloat.
  • The names are a bit misleading. paymentPerYear seems to indicate it stands for an amount, whereas it really is meant to be the number of paymentsPerYear (note the plural). interest should actually be interestRate.
Now, for the Python code:
  • The only noise is the self references. While I really appreciate the beauty and the power of the self in Python, I feel it often hurts readability by just the slightest amount.
  • Thelackofanyspacinginthecodemakesithardertocomprehend.
  • The inconsistent naming forces you to do a double-take every once in a while. You see the first variable, Loan_amount, and tentatively assume the naming convention to be 'snake case starting with an uppercase first letter'. You see the second variable, Interest_rate, and it kinda confirms your assumption. And then you see... wait a minute, Payment_peryear. Double-take... Payment_per_year. Ah, that wasn't too bad. Except that now you know you can't just cruise along at a comfortable consistent speed on this road - you gotta be prepared to backtrack every once in a while, or simply slow down throughout.
Now, coming to spotting the difference. When diffing as well as debugging, I find it useful to strip out all the noise so you can focus on the signal and analyze it. When diffing, the commonalities are the noise, while the differences are the signal. In order to strip out the commonalities though, you first need to play them up and make them obvious. So, here's what I did to make the commonalities (or is it the differences?) stand out:
  • insert some spacing for readability
  • get rid of the following extraneous things: self, $scope.loan., parseInt and parseFloat
  • factor out some duplication into a variable: r = interestRate / (paymentPerYear * 100)
  • factor out a computation into a variable: n = paymentPerYear * tenor
  • rename loanAmount to P
And here's what I got:

P / ((1.0 - (1.0 / ((1 + r) ** n))) / r) # Python

P / ((1.0 - (1.0 / Math.pow(1 + r, n))) / r) // JS

Or better yet (just the Python version):

(P * r) / (1.0 - (1.0 / ((1 + r) ** n))) # Python

In both cases, I see a somewhat convoluted form of the formula for periodic amortization payment. I can't spot anything but syntactical differences between the two versions, and the only clue about what could be wrong with the JS code I have is a few guesses. But that's beside the point.

The point is, I had to expend effort I think was undue to transform both the versions of the code into a form that wouldn't strain my brain before I could say anything about them. The code would've turned out much better if the original author(s) had paid attention to minimizing the cognitive load for themselves and future readers and maintainers. With upto six levels of nesting (requiring the building up and subsequent winding down of a stack that deep in your mind) and the problems listed above, and having seen what the mathematical formula actually is, it would almost seem as if the original version was set up to obfuscate the solution rather than reveal it. This last version, with two-thirds the levels of nesting, is a direct and concise transliteration of the mathematical formula into programming language syntax, making it so transparent and easy to test that it highly minimizes, even nearly obviates, the possibility of bugs and the need to debug. As a bonus, one wouldn't need to seek help to spot the differences between versions in two different programming languages.

In his excellent book "Specification by Example", Gojko Adzic says this about examples used to illustrate specifications:
With traditional specifications, examples appear and disappear several times in the software development process. Everyone [e.g.Business analysts, developers and testers] invents their own examples, but there’s nothing to ensure that these examples are even consistent, let alone complete. In software development, this is why the end result is often different from what was expected at the beginning. To avoid this, we have to prevent misinterpretation between different roles and maintain one source of truth.
I think this description very nearly fits what most often happens with program source code as well. Every reader/maintainer ends up creating a less noisy representation of the code in their head while trying to understand it, and most often, in the absence of MercilessRefactoring, there it disappears into a black hole, only to be recreated the next time the code is read. So, the burden of the cognitive load is created once when writing the code, but is borne by everyone who has to read the code every time they do it... "create once carry forever", anyone?

We commonly subject our code to load tests to understand how much load they can handle. How about if our code could undergo some form of cognitive load tests so we could gauge how much cognitive load it would impose? Now, I wish "cognitive load testing" were a thing.