TDD Kata 06 - Roman Numerals - schmengler-se.de

This is my weekly Kata post. Read the first one to learn what it is all about.

Last week: Karate Chop

This is going to be a longer post because my goal was to follow the kata description and try out five different approaches. I will explain all of them:

Approach 1: Object oriented PHP

My goal in this approach was to use as little PHP functions as possible and implement the algorithm in an object oriented way

The final solution had the following classes:

interface IntegerHaystack
{
    public function positionOf(int $needle) : int;
    public function isEmpty() : bool;
}
class SortedIntegers implements IntegerHaystack
{
    public function __construct(int ...$integers) { ... }
    ...
}
class SortedIntegersRange implements IntegerHaystack
{
    public function __construct(SortedIntegers $sortedIntegers, int $indexFrom, int $indexTo) { ... }
    ...
}
class SortedIntegersElement implements IntegerHaystack
{
    public function __construct(SortedIntegers $sortedIntegers, int $index) { ... }
    ...
}
class ListEmpty extends \RuntimeException { }
class NotFound extends \RuntimeException { }

SortedIntegers is an immutable list of integers and SortedIntegersRange represents a range within a SortedIntegers object. I’ll explain SortedIntegersElement later.

The test case itself was one very simple PHPUnit test with data provider. I created a function according to the specification that looked like this in the end:

function chop(int $needle, int ...$haystack) : int
{
    try {
        $sorted = new SortedIntegers(...$haystack);
        return $sorted->positionOf($needle);
    } catch (NotFound $e) {
        return -1;
    } catch (ListEmpty $e) {
        return -1;
    }
}

Before my final refactoring, SortedIntegersRange::positionOf looked like this:

    public function positionOf(int $needle) : int
    {
        if ($this->indexTo > $this->indexFrom) {
            switch ($this->sortedIntegers->at($this->indexMiddle()) <=> $needle) {
                case 1:
                    return $this->subRangeTo($this->indexMiddle() - 1)->positionOf($needle);
                case -1:
                    return $this->subRangeFrom($this->indexMiddle() + 1)->positionOf($needle);
                default:
                    return $this->indexMiddle();
            }
        }
        if ($this->sortedIntegers->at($this->indexFrom) === $needle) {
            return $this->indexFrom;
        }
        throw new NotFound();
    }

I was not happy with it because all the conditions. For example it was not obvious that the second if statement is made for the case that the range only has one element. After I made it explicit with a private method isSingleElement(), I noticed that it would not make a difference to always check the first element before halving the range if we assume that a comparison between two integers in the list costs the same than a comparision between two indexes.

Thus, the next iteration was:

    public function positionOf(int $needle) : int
    {
        try {
            if ($this->sortedIntegers->at($this->indexFrom) === $needle) {
                return $this->indexFrom;
            }
            switch ($this->sortedIntegers->at($this->indexMiddle()) <=> $needle) {
                case 1:
                    return $this->subRangeTo($this->indexMiddle() - 1)->positionOf($needle);
                case -1:
                    return $this->subRangeFrom($this->indexMiddle() + 1)->positionOf($needle);
                default:
                    return $this->indexMiddle();
            }
        } catch (\OutOfRangeException $e) {
            throw new NotFound();
        }
    }

I handled the case of a one-element range with no matching element by throwing out of range exceptions in subRangeFrom() and subRangeTo() if the start or end point are not within the current range.

Now the method looked a bit clearer, but using exceptions for control flow seemed like cheating, so I gave the previous approach another try. I introduced a new class for a range with one element, SortedIntegersElement, a method isEmpty() that returns true if from > to and made the rest a bit more explicit.

I considered replacing the switch statement and the spaceship operator with “if” conditions, but switching the arguments and reordering the cases already helped a lot to make it more clear:

    public function positionOf(int $needle) : int
    {
        if ($this->isEmpty()) {
            throw new NotFound;
        }
        if ($this->isSingleElement()) {
            return $this->firstElement()->positionOf($needle);
        }
        switch ($needle <=> $this->valuePivot()) {
            case -1:
                return $this->firstHalf()->positionOf($needle);
            case 0:
                return $this->indexPivot();
            case 1:
                return $this->secondHalf()->positionOf($needle);
        }
    }

Now it is clear that there are three cases, empty range, single element range and range with >1 elements. Also all lower level details are hidden in private methods, following the “Single Level of Abstraction” principle.

Off-by-one Errors

As the Kata description predicted, I ran into an off-by-one error because I first used the position of the pivot element as new border for the subset instead of its neighbors. This resulted in infinite recursion in some cases.

Full solution

Approach 2: Esoteric

I knew it would be difficult to come up with five different approaches, so I tried to think outside the box early and solved the Kata in CJam, an esoteric programming language (What’s this?). CJam is stack based, but also has variables, and all operators/functions consist of only one or two characters.

This is the program

ri:A;{r:I}{RIi+:R;}wR,1-:N;W:F;{FW=TN1+<e&}{NTm2/iT+:P;RP=A=P-1?:F;RP=A<{P1+:T;}{P1-:N;}?}wF

I “worked” with languages like this before for fun, but bringing TDD to them was new to me. Since CJam does not have its own testing capacities, I wrote the test in Bash:

#!/usr/bin/bash
function test {
    RES=`echo $1 |  java -jar ./cjam-0.6.5.jar ./kc.cjam`
    if [ "$RES" == "$2" ]; then
            echo  -n "."
    else
            FAILURES=$FAILURES"Input $1 - expected $2, got $RES\n"
            echo -n "F"
    fi
}
FAILURES=""
test "3" -1
test "3 1" -1
test "1 1" 0

test "1 1 2" 0
test "2 1 2" 1

test "2 3 5 7" -1
test "3 3 5 7" 0
test "4 3 5 7" -1
test "5 3 5 7" 1
test "6 3 5 7" -1
test "7 3 5 7" 2
test "8 3 5 7" -1

echo
echo -e $FAILURES

And using TDD worked really well here. So, having no test framework is no excuse to write no tests!

Errors

I avoided the same off-by-one error from the first approach, lesson learned! I still ran into infinite loops sometimes, but only because I still was figuring out how the various CJam operators work.

Approach 3: Recursive Ruby Function

On the third day I implemented a recursive function in Ruby that passes a part of the array to itself:

def chop needle, haystack
    begin
        chop_recursive needle, haystack
    rescue
        return -1
    end
end
def chop_recursive needle, haystack
    pos = (haystack.length / 2).floor
    if haystack[pos] == needle then
        return pos
    elsif haystack.length <= 1 then raise 'not found' elsif haystack[pos] > needle then
        return chop_recursive(needle, haystack[0..pos-1])
    else
        offset = pos+1
        return offset + chop_recursive(needle, haystack[offset..-1])
    end
end

Errors

I did not want to add offset as parameter to the recursive function, and so one issue that occured was that if the needle was not found in the second half of the array, the offset was added to -1. I solved it using a “not found” exception eventually.

Approach 4: Plain old loop in PHP

On the fourth day, I went back to PHP, but in an old school procedural way.

<?php
declare(strict_types=1);
namespace Kata\Loop;

function chop(int $needle, int ...$haystack) : int
{
    $left = 0;
    $right = count($haystack) - 1;
    do {
        $pos = middle($left, $right);
        if ($left > $right || !isset($haystack[$pos])) {
            return -1;
        }
        if ($haystack[$pos] < $needle) {
            $left = $pos + 1;
        } else {
            $right = $pos - 1;
        }
    }
    while ($haystack[$pos] !== $needle);
    return $pos;
}

function middle(int $left, int $right) : int
{
    return $left + (int)floor(($right - $left) / 2);
}

I realized that the algorithm is pretty much what I did in CJam (while loop, with $left and $right pointers), just in a different language.

I started with a single “if”, which worked for 0-1 elements, then refactored it into a loop. What was interesting is that as soon as I got the loop working for two elements, it also worked with more elements, without further ado. I’m not sure if I did too much in advance or the form of a non-linear while loop forced me to do it right.

Approach 5: –

I’m honest, I did not have time to come up with a really different approach. I thought about using a functional language or get familiar with ES6, but the initial effort did not fit into my schedule. So instead of doing nothing, I wrote a recursive function again, this time in PHP. Not much to tell here, except that it’s quite ugly, with array_slice() and so on.

Sixth Kata: Roman Numerals

Our next kata is “Roman Numerals”: Use TDD to implement a converter of (arabic) numbers to roman numbers (I, III, III, IV, V, …). The system is described here. With which numbers will you start? As a bonus, try to implement the reverse as well, a converter from roman to arabic.