Archives

Categories

This is What Sufficiently Tested Looks Like

I recently read this page on How SQLite Is Tested.

This is seriously well tested software. They actually have 100% branch coverage for their most common configuration (I have in the past recommended against trying to achieve this goal). Their test code outweighs their actual source on a SLoC basis by a factor of over 1000. They actually do memory allocation failure insertion testing, and write() failure insertion testing.

I look at this and my first reaction is that I want to cry from frustration and envy. Why can’t I work on software that is as well tested as this?

Basically the answer is that for most software teams testing is something that isn’t a priority and is done begrudgingly. Like documentation, but even more boring.

As a software developer I understand this attitude entirely. “We’ll test it another day”. “I want to fix one more bug today”. “There isn’t time”. “Hey, it worked when I wrote it, have you broken it?” I’ve said all these things at one time or another.

But as a person who cares about software, this attitude really grinds my guts. Writing tests is a useful and under-appreciated skill that should be a mandatory part of the programming experience. Like performing code reviews.

This is why one of the first things I did when starting to work on the Cyrus IMAP server at Opera two years ago was ask, “What’s our test suite?”. The answer was “test suite? what test suite?”. Months of hard yakka later and we now have two test suites for Cyrus IMAP.

unit tests
written in C and using the CUnit unit test framework, these implement small tactical tests which access APIs in the Cyrus libraries directly.
Cassandane integration tests
written in Perl using the Test::Unit unit test framework and a lot of supporting Perl code, these start up temporary carefully managed instances of Cyrus servers, one or more per test, and connect to them using the Mail::IMAPTalk IMAP client module.

As you can see from the Cyrus Continuous Integration server, there are now nearly five hundred test cases. Those tests are run twice daily.

It’s not enough. It’s not nearly enough. There should be ten times as many tests. There are entire subsystems and modes of operation (coughmurdercough) which are more or less completely untested but have experienced non-trivial changes since the last major release. Our line coverage is 29%.

But on the other hand…when I started our line coverage was 0%. The test suites are useful; they’ve found bugs; we’ve even had people who aren’t me contribute to them. We suck much less than when I started. I feel proud, like I’ve made a difference to the project’s culture.

And BTW I do not recommend CUnit, it’s rubbish. That’s why I wrote NovaProva instead.

4 comments to This is What Sufficiently Tested Looks Like

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

IMPORTANT! To be able to proceed, you need to solve the following simple math (so we know that you are a human) :-)

What is 13 + 10 ?
Please leave these two fields as-is: