PDF Generation in Magento 2

The Magento core methods to generate PDF files are rather unflexible. An alternative are tools that convert HTML to PDF.

In our current Magento 2 project, which we (integer_net) develop together with Stämpfli AG, there is a requirement to dynamically create a PDF catalog based on selected products, which has almost the same layout as the product lists in the shop. So, generating this PDF based on HTML suggested itself.

In this article I present our solution, which integrates wkhtmltopdf with the Magento layout. At the end you will find a link to the base module on Github.

Minimal Example

This is how easily the module can be used if you want to deliver a layout as PDF instead of HTML:

use Staempfli\Pdf\Model\View\PdfResult;

class Example extends \Magento\Framework\App\Action\Action
{
  public function execute()
  {
    $result = $this->resultFactory->create(PdfResult::TYPE);
    return $result;
  }
}

So we use the same mechanism as the standard, but with a new “Result” type (PDF instead of “Page” or “Layout”).

But this is just the minimal example. You can modify the PDF in many ways (with $result->addPageOptions() and $result->addGlobalOptions()). Also, other actions are possible, such as send the PDF per mail or save the file. It doesn’t even have to be within a controller.

Module Structure

Let’s have a look behind the scenes:

    wkhtmltopdf             File based command line tool
        ^
        |
    PHP WkHtmlToPdf         Slim PHP wrapper (external library)
        ^
        |
+-------|-------------------+
|       |                   |
|   PDF Engine Adapter      |
|       |                   |
|       v                   |
|   Independent Service     |   Staempfli_Pdf Magento 2 module
|       ^                   |
|       |                   |
|   Magento Integration     |
|       |                   |
+-------|-------------------+
        |
        v
    Magento Framework    

The PDF Engine: wkhtmltopdf

“wk” in wkhtmltopdf stands for Webkit, the HTML rendering engine in Safari and formerly Chrome. To be exact, it uses Qt WebKit.
That means, instead of inventing its own HTML rendering engine, as it does Dompdf for instance, a reas browser engine is used and the result “printed” to PDF. You can choose if “print” or “screen” style sheets should be used.

It’s a command line tool, that receives one or more HTML files or URLs as input and writes a PDF file.

Depending on the installation, a virtual display is needed (running Xvfb server or installed xvfb-run wrapper) or not. This is well described in the documentation of phpwkhtmltopdf: Installation of wkhtmltopdf

PHP WkHtmlToPdf

This useful library provides a PHP interface for wkhtmltopdf and also cares for creating temporary files, where needed. Additional command line options can be passed as array, this way the library is forwards compatible regarding new options.

Adapter

For a better object oriented interface I wrote an adapter. Main reason was testability: for unit tests, the engine should be replacable by a fake implementation.

Service

The actual domain logic is quite straightforward, it consists of a hand full of classes with few short methods. One central class is PdfOptions. The options object represents options for wkhtmltopdf and for the PHP Wrapper. It basically is a SPL ArrayObject. All currently documented options are added as constants. This way you can work aided by your IDE without constantly having to look at the wkhtmltopdf documentation.

Example:

$pdf->addOptions(
  new PdfOptions(
    [
      PdfOptions::KEY_GLOBAL_TITLE => 'Layout Example',
      PdfOptions::KEY_PAGE_ENCODING => PdfOptions::ENCODING_UTF_8,
      PdfOptions::KEY_GLOBAL_ORIENTATION => PdfOptions::ORIENTATION_LANDSCAPE,
    ]
  )
);

Since every HTML document passed to wkhtmltopdf can have its own options, we need a combination of a HTML string and a PdfOptions object as source. For this, we provide an interface SourceDocument, which is implemented in two variants in the Magento integration (see next section):

namespace Staempfli\Pdf\Api;
interface SourceDocument
{
  /**
   * @param Medium $medium
   * @return void
   */
  public function printTo(Medium $medium);
}
interface Medium
{
  /**
   * Takes HTML and prints it
   *
   * @param Options $options
   * @return Medium
   */
  public function printHtml($html, Options $options);
}

Why not getHtml() and getOptions()? I try to avoid getters and setters to not treat objects as mere data containers. The API above is inspired by Printers instead of Getters and the “Printer” pattern, described there, works quite well in this case. The Medium implementations PdfCover and PdfAppendContent encapsulate the PDF engine.

Magento Integration

Here it gets interesting. In Magento , controller actions should return a “result” object in their execute() method (Controller\ResultInterface). Results on the other hand, have to be able to populate a “response” object (App\ResponseInterface). This usually is the HTTP Response, which Magento sends after the action has been executed.

In the core, you’ll find the following result types:

  • Layout: renders a layout
  • Page: specific implementation of “Layout”, renders the layout for the current controller with all handles and the HTML head
  • Redirect: renders a HTTP redirect
  • Json: renders JSON, for XHR requests (“AJAX”)
  • Raw: renders arbitrary content, for example for file downloads
  • Forward: is a special case, does not render a response on its own, but triggers another dispatch in the front controller, with changed parameters. So it essentially calls another controller

Now we need a new result type, which renders the layout, but does not return it immediately, but converts it to PDF first. My first approach was to inherit from the Page result and override the render() method. This worked, but was not very
clear and the tight coupling to the layout implementation annoyed me. Following “Favor Composition Over Inheritance”, the PDF result now calls a new instance of the page result and lets it render a “PDF Response” instead of an HTTP response.

The PDF response implements not only ResponseInterface but also SourceDocument, so it can be passed to our PDF conversion service (this again is handled by the PdfResult class).

But we could not go completely without extending the Page Result, because its render method has plugins that assume, the passed response is a HTTP response. So we now have a class PageResultWithoutHttp, with additional method renderNonHttpResult(). This also allows further modification, like replacing “http://” URLs by “file://” URLs, to avoid additional requests for images, CSS and JavaScript (currently not implemented)

Extended Usage

If you need to render more than the current layout, or the PDF should not be offered for download, but for example sent per mail, then the result object can still be used. It has a renderSourceDocument() method, which returns the PdfResponse object with rendered HTML, without generating a HTTP response.

This way, the following is possible:

# additionally generate a table of contents:

$this->pdf->appendTableOfContents(
    new PdfOptions(
        [
            PdfOptions::KEY_TOC_HEADER_TEXT => 'Overview',
        ]
    )
);

# Render layout, using PdfResult:

/** @var PdfResult $result */
$result = $this->resultFactory->create(PdfResult::TYPE);
$source = $result->renderSourceDocument();
$this->pdf->appendContent($source);

# Generate PDF:

$pdfFileContents = $this->pdf->file()->toString();

$this->pdf should be instantiated using Staempfli\Pdf\Model\PdfFactory. This factory takes care of some global settings that can be configured in the Magento backend, for example the path to wkhtmltopdf.

Alternative without layout

If you only want to render single blocks, without the whole default layout (particularly without HTML head of Magento), you can use Staempfli\Pdf\Block\PdfTemplate instead.

This is a Magento block which implements the SourceDocument interface, so it can be converted to PDF by our module. By default it uses the container template, i.e. renders all blocks that were added as children with addChild(). Those can be arbitrary Magento blocks..

It can look as follows:

/** @var PdfTemplate $pdfBlock */
$pdfBlock = $this->_view->getLayout()->createBlock(PdfTemplate::class);
$pdfBlock->addChild('test-full-html', Template::class, ['template' => 'Bdk_PdfTest::test-full-html.phtml']);
$this->pdf->addOptions(
  new PdfOptions(
    [
      PdfOptions::KEY_PAGE_ENCODING => PdfOptions::ENCODING_UTF_8,
    ]
  )
);
$this->pdf->appendContent($pdfBlock);

The Module

The module is freely available at https://github.com/staempfli/magento2-module-pdf and already works as described. The first “stable” release will be created as soon as it is used in production successfully.

Detailed documentation with examples will follow too, until then the source code is mostly documented. Let me know if you want to use it and what for. Of course we are also happy about collaboration!


This article was originally published in German on the webguys.de Magento Advent Calendar: Türchen 04