TDD Kata 10 – Anagrams

This is my weekly Kata post. Read the first one to learn what it is all about.

Last week: Print Diamond

To the Kata description
Scroll down to this weeks Kata description

My first implementation suffered from the anticipated problem: Starting with tests for the whole output for ‘A’, then ‘B’, then ‘C’, the full algorithm was implemented after the second or third test in one big step.

<?php
namespace Katas\Diamond;

class DiamondTest extends \PHPUnit_Framework_TestCase
{
    /**
     * @dataProvider dataDiamond
     */
    public function testDiamond($letter, $expected)
    {
        $diamond = new Diamond($letter);
        $this->assertEquals($expected, $diamond->__toString());
    }
    public static function dataDiamond()
    {
        $b = <<<TXT
 A
B B
 A
TXT;
        $c = <<<TXT
  A
 B B
C   C
 B B
  A
TXT;
        $d = <<<TXT
   A
  B B
 C   C
D     D
 C   C
  B B
   A
TXT;
        $e = <<<TXT
    A
   B B
  C   C
 D     D
E       E
 D     D
  C   C
   B B
    A
TXT;


        return [
            ['A', 'A'],
            ['B', $b],
            ['C', $c],
            ['D', $d],
            ['E', $e],
        ];
    }
}

In the following days, I exposed some internal properties per row to be able to test them one after each other, before tieing them together: the letter, the left offset and the gap between left and right letter. This is how the test looked like in the end for PHP:

<?php
declare(strict_types=1);

namespace Katas\Diamond;

class DiamondTest extends \PHPUnit_Framework_TestCase
{

    public static function dataLetters()
    {
        return [
            ['A',                'A'],
            ['B',           'A', 'B', 'A'],
            ['C',      'A', 'B', 'C', 'B', 'A'],
            ['D', 'A', 'B', 'C', 'D', 'C', 'B', 'A'],
        ];
    }

    /**
     * @dataProvider dataLetters
     */
    public function testLetters(string $letter, string ...$expectedLetters)
    {
        $diamond = new Diamond($letter);
        $this->assertEquals($expectedLetters, $diamond->letters());
    }

    public static function dataOffsets()
    {
        return [
            ['A',          0],
            ['B',       1, 0, 1],
            ['C',    2, 1, 0, 1, 2],
            ['D', 3, 2, 1, 0, 1, 2, 3],
        ];
    }
    /**
     * @dataProvider dataOffsets
     */
    public function testOffsets($letter, int ...$expectedOffsets)
    {
        $diamond = new Diamond($letter);
        $this->assertEquals($expectedOffsets, $diamond->offsets());
    }

    public static function dataGaps()
    {
        return [
            ['A',          -1],
            ['B',       -1, 1, -1],
            ['C',    -1, 1, 3, 1, -1],
            ['D', -1, 1, 3, 5, 3, 1, -1],
        ];
    }
    /**
     * @dataProvider dataGaps
     */
    public function testGaps($letter, int ...$expectedGaps)
    {
        $diamond = new Diamond($letter);
        $this->assertEquals($expectedGaps, $diamond->gaps());
    }

    public static function dataPrint()
    {
        $A = "A";
        $B = <<<TXT
 A
B B
 A
TXT;
        $C = <<<TXT
  A
 B B
C   C
 B B
  A
TXT;
        $D = <<<TXT
   A
  B B
 C   C
D     D
 C   C
  B B
   A
TXT;
        return [
            ['A', $A],
            ['B', $B],
            ['C', $C],
            ['D', $D],
        ];
    }

    /**
     * @dataProvider dataPrint
     */
    public function testPrint($letter, $expectedOutput)
    {
        $diamond = new Diamond($letter);
        $this->assertEquals($expectedOutput, $diamond->__toString());
    }
}

Here is the same in Ruby:

class PrintDiamondTest < Minitest::Test

    def assert_diamond_letters letter, expected
        assert_equal expected, Diamond.new(letter).letters
    end
    def test_diamond_letters
        assert_diamond_letters 'A', ['A']
        assert_diamond_letters 'B', ['A', 'B', 'A']
        assert_diamond_letters 'C', ['A', 'B', 'C', 'B', 'A']
    end

    def assert_diamond_offsets letter, expected
        assert_equal expected, Diamond.new(letter).offsets
    end
    def test_diamond_offsets
        assert_diamond_offsets 'A', [0]
        assert_diamond_offsets 'B', [1, 0, 1]
        assert_diamond_offsets 'C', [2, 1, 0, 1, 2]
    end
    
    def assert_diamond_gaps letter, expected
        assert_equal expected, Diamond.new(letter).gaps
    end
    def test_diamond_gaps
        assert_diamond_gaps 'A', [-1]
        assert_diamond_gaps 'B', [-1, 1, -1]
        assert_diamond_gaps 'C', [-1, 1, 3, 1, -1]
    end
    
    def assert_diamond_output letter, expected
        assert_equal expected, Diamond.new(letter).to_s
    end
    def test_diamond_output
        a = "A"
        b = <<-TXT
 A
B B
 A
TXT
        c = <<-TXT
  A
 B B
C   C
 B B
  A
TXT
        assert_diamond_output 'A', a.chomp
        assert_diamond_output 'B', b.chomp
        assert_diamond_output 'C', c.chomp
    end
end

And the implementation:

class Diamond
    def initialize letter
        @letter = letter
    end

    def to_s
        letters
            .zip(offsets, gaps)
            .map do |letter, offset, gap|
                ' ' * offset + letter + (gap > 0 ? ' ' * gap + letter : '')
            end
            .join "\n"
    end
    
    def letters
        mirror ('A'..@letter).to_a
    end
    
    def offsets
        mirror (0..distance).to_a.reverse
    end

    def gaps
        mirror (-1..distance * 2 - 1).step(2).to_a
    end
    
    private
    
    def mirror array
        array.concat(array.reverse[1..-1])
    end

    def distance
        @letter.ord - 'A'.ord 
    end
    
end

What did I learn? Divide and conquer! If you have to implement too much at once to make the next test pass, write a different test and start to test parts instead of the whole result. Even if that means to make previously internal properties public. If this is a problem, you can still make them private later and throw away the tests.

Tenth Kata: Anagrams

Given a file containing one word per line, print out all the combinations of words that are anagrams; each line in the output contains all the words from the input that are anagrams of each other. For example, your program might include in its output:

kinship pinkish
enlist inlets listen silent
boaster boaters borates
fresher refresh
sinks skins
knits stink
rots sort

If you run this on the word list here (http://codekata.com/data/wordlist.txt) you should find 20683 sets of anagrams (a total of 48162 words), including all-time favorites such as

crepitus cuprites pictures piecrust
paste pates peats septa spate tapes tepas
punctilio unpolitic
sunders undress
For added programming pleasure, find the longest words that are anagrams, and find the set of anagrams containing the most words (so “parsley players replays sparely” would not win, having only four words in the set).

Source and full description: http://codekata.com/kata/kata06-anagrams/

if working with files in tests turns out difficult for whatever reason, feel free to use other forms of input/output. Bonus points if you still manage to optimize for big word lists after making it just work.