Unit Test Generated PDFs with PHPUnit and PDFBox

Amongst the features, that are hard to test with Unit Tests, is generating PDF documents.

The command line tool PDFBox with the option ExtractText comes in handy:

PDF

This application will extract all text from the given PDF document.

This allows us, to test the textual content of the document or searching for specific strings inside.

It gets interesting with the option -html, which converts the PDF to HTML instead. Thus structure and formatting gets at least remotely testable.

Unfortunately the tool does not work with streams, we have to use temporary files. A simple example for a function that receives a PDF document as string, converts it to HTML with PdfBox and returns the HTML string:

/**
 * @var string $streamIn binary string with generated PDF
 * @return string HTML string
 */
function htmlFromPdf($streamIn)
{
  $pdf = tempnam();
  file_put_contents($pdf, $streamIn);
  $txt = tempnam();
  exec('java -jar pdfbox-app-x.y.z.jar ExtractText -encoding UTF-8 -html ' . $pdf . ' ' . $txt);
  $streamOut = file_get_contents($txt);
  unlink($pdf);
  unlink($txt);
  return $streamOut;
}

For regression tests or refactoring it sometimes is enough to test that the generated PDF did not change in comparision to a reference PDF. This could be achieved with a hash value but a PDF itself is not binary equal every time, probably due to timestamps. But a hash of the converted HTML is sufficient:

        // In PHPUnit test case:
        $converter = new PdfBox();
        $html = $converter->htmlFromPdfStream($pdf);
        $this->assertEquals('336edd9ee49b57e6dba5dc04602765056ce05b91', sha1($html), 'Hash of PDF content');

In this example I use a self-written class PdfBox, which encapsulated the call to Apache PdfBox. The code is available under BSD Licence on GitHub: PHP PdfBox

PHP PdfBox

Requirements

  1. Java Runtime Environment, with “java” in the system path. To test this, run java -version on the command line. If you see information about the Java version, everything is fine
  2. Apache PdfBox as executable JAR file. You can download it here: http://pdfbox.apache.org/downloads.html
  3. The PHP function exec() for executing system commands must not be disabled. On shared hosts this is usually the case for security reasons; for local execution of Unit Tests it shouldn’t be a problem to allow exec().PHP-CLI, i.e. PHP on the command line usually uses a different php.ini configuration file than PHP-CGI for the web. The command php --ini shows, which INI files are loaded in CLI mode. If necessary, edit these to remove exec from the disable_functions list.
  4. A PSR-0 compatible autoloader, as shipped with most frameworks. Otherwise you will need to include the single PHP files.

Usage

First you’ll have to specify the full path to the PdfBox JAR. Afterwards you can call the conversion methods, for example:

use SGH\PdfBox

//$pdf = GENERATED_PDF;
$converter = new PdfBox;
$converter->setPathToPdfBox('/usr/bin/pdfbox-app-1.7.0.jar');
$text = $converter->textFromPdfStream($pdf);
$html = $converter->htmlFromPdfStream($pdf);
$dom  = $converter->domFromPdfStream($pdf);

The following conversion methods exist:

  • string textFromPdfStream($content, $saveToFile = null)
  • string htmlFromPdfStream($content, $saveToFile = null)
  • DomDocument domFromPdfStream($content, $saveToFile = null)
  • string textFromPdfFile($fileName, $saveToFile = null)
  • string htmlFromPdfFile($fileName, $saveToFile = null)
  • DomDocument domFromPdfFile($fileName, $saveToFile = null)

The second parameter is either the PDF as binary string ($content) or the file name of a PDF ($fileName). The second parameter, if provided, is a file name for the output. In this file the text, or HTML, will be saved.

A few additional PdfBox-Options can be useful as well:

// Only extract pages 2-5
$converter->getOptions()
    ->setStartPage(2)
    ->setEndPage(5);

// ignore corrupt PDF objects
$converter->getOptions()
    ->setForce(true);

Everything else should be clear from the PhpDoc comments. Happy Testing! Continue reading “Unit Test Generated PDFs with PHPUnit and PDFBox”

PHP: References and Memory

Never ever use references in PHP just to reduce memory load. PHP handles that perfectly with its internal copy on write mechanism. Example:

$a = str_repeat('x', 100000000); // Memory used ~ 100 MB
$b = $a;                         // Memory used ~ 100 MB
$b = $b . 'x';                   // Memory used ~ 200 MB

You should only use references if you know exactly what you are doing and need them for functionality (and that’s almost never, so you could as well just forget about them). PHP references are quirky and can result to some unexpected behaviour.

Question and Answer on StackOverflow

Problem: Double Accents (“^^”) in Windows 7

Something you should know: If you are using Windows 7 and get double accents when typing, you might have a serious problem, because it seems like this is a side effect of any activated keyloggers. While it’s nice that these cannot be active entirely unnoticed, it is annoying if you use a program that you want to log your keyboard actions.

In my case, the text expander FastFox was acting as a desired keylogger to supply shortcuts while I typed.

http://repos.zend.com/deb/zend.key 404 Not Found

If you want to install Zend Server CE on Debian Linux (e.g. Ubuntu Server) with apt-get and follow one of the many installation guides out there you might stumble upon this error, like I did.

http://repos.zend.com/deb/zend.key 404 Not Found

The solution is simple: The URL of the key has changed recently! Use http://repos.zend.com/zend.key and it will be fine!

Of course, the official guide has it right.

4 Tools To Make Magento Development Easier

I compiled a list of the most important Magento specific development tools in my toolbox. Additions are most welcome!

Greetings to the Magento Stammtisch Aachen guys, this is for you 😉

Magneto (sic!) Debug

http://www.magentocommerce.com/magento-connect/magneto-debug-8676.html

A free extension that integrates a developer toolbar in the Magento Layout where you find various information about the current page:

  • Request route
  • Loaded modules
  • Configuration summary
  • SQL profiler
  • Used layout handles
  • Rendered blocks/templates
  • Instantiated block classes (rendered or not)
  • One-click cache clearing

CommerceBug

http://store.pulsestorm.net/products/commerce-bug

Another integrated developer toolbar for $49.95 that offers mostly the same features but with two important additions:

  • Complete merged layout XML with all handles
  • Ability to log the information for all requests

EcomDev PHPUnit

http://www.ecomdev.org/shop/code-testing/php-unit-test-suite.html

PHPUnit extension for unit tests and integration tests. Includes database fixtures, configuration tests and more. See PDF documentation for details.

MageTool

https://github.com/alistairstead/MageTool

Command line tool for shop management and extension development. Some features:

  • Automatically create extensions and class files
  • Manage configuration
  • Manage cache, compiler and indexer

Print All SQL Queries in Magento

Activate the Zend SQL Profiler with the following node in your local.xml:

    <resources>
     <default_setup>
      <connection>
       <profiler>1</profiler>

Then you can access the profiler somewhere in your code and retrieve a lot of informations about all executed queries:

$profiler = Mage::getSingleton('core/resource')
    ->getConnection('core_write')->getProfiler();

To simply output all queries:

print_r($profiler->getQueryProfiles());

Question and answer on StackOverflow

Magento: Rewrite Shipping Method

There is a way to rewrite carrier classes of shipping methods but it is not obvious and required me to browse the shipping module source:

If you look at Mage_Shipping_Model_Config, you will discover that the code used as parameter for Mage::getModel() is taken from the store configuration. This code is NOT the standard code like 'shipping/carrier_tablerate', so it does not help overriding as usual.

Now you have to find out first what this code is. For example I wanted to override the matrixrate carrier, so I tested it like that:

$carrierConfig = Mage::getStoreConfig('carriers/matrixrate')
var_dump($carrierConfig['model']);

Yes, you can put this code anywhere on the page temporary but it is useful to have a separate file for such things that can be run from the command line (starting with Mage::app() to initialize Magento)

In my case the code was matrixrate_shipping/carrier_matrixrate so I had to change my config.xml like that:

<global>
    <models>
        <matrixrate_shipping>
            <rewrite>
                <carrier_matrixrate>my_class_name</carrier_matrixrate>
            </rewrite>
        </matrixrate_shipping>
    </models>

instead of

<global>
    <models>
        <matrixrate>
            <rewrite>
                <carrier_matrixrate>my_class_name</carrier_matrixrate>
            </rewrite>
        </matrixrate>
    </models>

Question and answer on StackOverflow

PHP: “Mocking” built-in functions like time() in Unit Tests

A common problem in Unit Testing in PHP is testing something that depends on the current time. For a determined test it should be possible to set the time in your test script without really changing the system settings. In this article I’ll describe how it is usually done with OOP and then come to an alternative solution with much less code that makes use of the new features in PHP 5.3.

The usual approach would be a wrapper class like this:

class Calendar
{
    public function time()
    {
        return time();
    }
    public function date($format, $time = null)
    {
        return date($format, $time ?: $this->time());
    }
    // ...
}

Now any class that uses date/time functions has to be modified to use the Calendar class via Dependency Injection:

class SomeClass
{
    /**
     * @var Calendar
     */
    private $calendar;

    public function __construct(Calendar $calendar = null)
    {
        $this->calendar = $calendar ?: new Calendar;
    }
    public function oneHourAgo()
    {
        return $this->calendar->date('H:i:s', $this->calendar->time() - 3600);
    }
}

Then you mock the Calendar class in your tests and pass it to the test subject. I won’t go into further details because you probaly know the concept of mocking and how to do this in your favourite unit testing framework. After all this article is not about mocking classes, because I have:

A simpler solution with namespaces

If you are using PHP 5.3 namespaces you are lucky because you won’t need all this overhead and probably no changes in your classes at all. The trick is to override built-in functions in your current namespace. Consider this namespaced version of the class from above:

namespace My\Namespace;

class SomeClass
{
    public function oneHourAgo()
    {
        return date('H:i:s', time() - 3600);
    }
}

As you can see, no overhead, just a straightforward call to date() and time(). To test this with specific times we implement a test case as follows (Example in PHPUnit but works as well with other frameworks):

namespace My\Namespace;

require_once 'PHPUnit\Framework\TestCase.php';

/**
 * Override time() in current namespace for testing
 *
 * @return int
 */
function time()
{
	return SomeClassTest::$now ?: \time();
}

class SomeClassTest extends \PHPUnit_Framework_TestCase
{
	/**
	 * @var int $now Timestamp that will be returned by time()
	 */
	public static $now;

	/**
	 * @var SomeClass $someClass Test subject
	 */
	private $someClass;

	/**
	 * Create test subject before test
	 */
	protected function setUp()
	{
		parent::setUp();
		$this->someClass = new SomeClass;
	}
	/**
	 * Reset custom time after test
	 */
	protected function tearDown()
	{
		self::$now = null;
	}

	/*
	 * Test cases
	 */
	public function testOneHourAgoFromNoon()
	{
		self::$now = strtotime('12:00');
		$this->assertEquals('11:00', $this->someClass->oneHourAgo());
	}
	public function testOneHourAgoFromMidnight()
	{
		self::$now = strtotime('0:00');
		$this->assertEquals('23:00', $this->someClass->oneHourAgo());
	}
}

The crucial point here is that we implement a new function named exaclty like a built-in function. You cannot replace functions but since this is defined in the namespace \My\Namespace it does not replace anything. In fact it is a new function with the fully qualified name \My\Namespace\time()

The test subject now calls time() as unqualified name so PHP looks for the function in the current namespace at first. That is \My\Namespace\time() in our example. I recommend the section about name resolution rules in the manual for further reading.
Important Implication: It does not work if you use the global functions with fully qualified names (i.E. \time()) in your test subjects!

You can implement this function however you like, I decided to make the return value configurable within the test case via a static property that gets resetted after each test and if it is not set the real time is used.

I hope this solution helps, it may feel hackish but for me it made testing a lot easier!