CSV Processing In Magento

A development principle , not only with Magento, is that you shouldn’t try to reinvent the wheel and especially use the functions of the used framework wherever possible. Magento has many more or less known universal helpers, in the helper classes in Mage_Core as well as in lib/Varien and of course in the Zend Framework.

A classic is for example JSON encoding. Although PHP has its built-in functions json_encode and json_decode, but they have some shortcomings that are compensated for in the Zend_Json implementation. So Zend_Json::encode() has a cycle check, Magento added support for inline translations within JSON strings in Mage_Core_Helper_Data::jsonEncode(). Thus in Magento you always should use Mage::helper('core')->jsonEncode() (and jsonDecode).

Varien_File_Csv

How is it with processing CSV files? Since import and export works with CSV files in the standard implementation, Magento should have somthing, right? Presenting Varien_File_Csv! Well, I’ll anticipate the result: except for very simple tasks with small files, the class is not useful at all.

What the class can do

  • Reading and writing arrays from and to CSV file with object oriented interface
  • Complete reading the array with one column as key
  • Own fputcsv implementation that always uses “enclosures”, i.e. sets quotation marks around values even if they don’t contain whitespace. The PHP function fputcsv() can’t do that.

What it can not do

  • Sequential read/write
  • Iterator/generator interfaces

Especially the missing sequential reading lead me to switch an import script for customer data from Varien_File_Csv to a custom solution, which reduced the memory usage by 95% at once. Here is the code of my CSV iteraror:

class Int_ListImport_Model_Csv_Iterator implements Iterator
{
    const CSV_DELIMITER = '|';
    /**
     * @var string path to CSV
     */
    protected $_filename;
    /**
     * @var string[]
     */
    protected $_columns = array();
    /**
     * @var string
     */
    protected $_keyColumn;
    /**
     * @var string[]
     */
    protected $_currentRow = array();
    /**
     * @var int
     */
    protected $_currentRowNumber = 0;
    /**
     * @var resource
     */
    protected $_filePointer;

    public function __construct($filename, $keyColumn)
    {
        $this->_filename = $filename;
        $this->_keyColumn = $keyColumn;
    }

    public function open()
    {
        if (!$this->_filePointer) {
            $this->_filePointer = fopen($this->_filename, 'r');
        }
        return $this;
    }

    public function close()
    {
        if ($this->_filePointer) {
            fclose($this->_filePointer);
        }
        return $this;
    }

    public function current()
    {
        return $this->_currentRow;
    }
    public function next()
    {
        $this->_readRow();
    }
    public function key()
    {
        if (is_array($this->_keyColumn)) {
            return array_reduce($this->_keyColumn, function($carry, $keyPartColumn) {
                $carry .= $this->_currentRow[$keyPartColumn] . '|';
                return $carry;
            }, '|');
        } else {
            return $this->_currentRow[$this->_keyColumn];
        }
    }
    public function valid()
    {
        return $this->_currentRow !== false;
    }
    public function rewind()
    {
        $this->open();
        fseek($this->_filePointer, 0);
        $this->_currentRowNumber = 0;
        $this->_readHead();
        $this->_readRow();
    }
    protected function _readHead()
    {
        $this->_columns = fgetcsv($this->_filePointer, null, self::CSV_DELIMITER);
        if (strpos(join('', $this->_columns), "\r") !== false) {
            throw Mage::exception('Int_ListImport', 'The CSV file contains Windows or Mac line breaks (CR). Please convert it to Unix line breaks (LF)');
        }
        return $this;
    }
    protected function _readRow()
    {
        ++$this->_currentRowNumber;
        $this->_currentRow = fgetcsv($this->_filePointer, null, self::CSV_DELIMITER);
        if ($this->_currentRow === false) {
            if (feof($this->_filePointer)) {
                $this->close();
                return $this;
            } else {
                throw Mage::exception('Int_ListImport', 'Read error in row ' . $this->_currentRowNumber);
            }
        }
        $this->_currentRow = array_combine($this->_columns, $this->_currentRow);
        if ($this->_currentRow === false) {
            throw Mage::exception('Int_ListImport', 'Column count does not match in row ' . $this->_currentRowNumber);
        }

        return $this;
    }
}

Usage

$iterator = new Int_ListImport_Model_Csv_Iterator(
    'path/to/file.csv', 'name_of_key_column');
foreach ($iterator as $key => $row) {
    //...
}

It’s also possible to pass an array of column names to get a combined key. As you can see, the code is still coupled to the Magento module Int_ListImport, so if you want to use it, at least change the error handling.

Only later I discovered that the class SplFileObject of the Standard PHP Library (SPL) already can be used as iterator for CSV files, so I could have used it as foundation.

$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
    list ($fruit, $quantity) = $row;
    // Do something with values
}

Source: http://de.php.net/manual/en/splfileobject.setcsvcontrol.php

Furthermore, amongst the really good League of Extraordinary Packages there is a package for CSV processing that is based on SplFileObject. But both lack the feature of Varien_FIle_Csv, using a column as array key.

Comparison

Let’s compare the classes of Varien and the PHP League, together with SPL only:

Varien_File_Csv SplFileObject League\Csv
Sequential reading x x
Sequential writing x x
Define a column as key x
Enforce enclosures x
Iterator interface x x
Generator interface
Filters x

Conclusion

If you need the features of Varien_File_Csv, i.e. enforced enclosures or columns as array keys, you are better of implementing them by yourself and if possible, use SplFileObject or League\Csv as a foundation.

One Reply to “CSV Processing In Magento”

  1. Just stumble on your blog post. As the maintainer of League\Csv I should point that
    since version 7.0, League\Csv has been able to enforce enclosure. I’ve written a blog post about it on my blog.

    And with the release of version 8.0, the package has been update and should be able to accomplish everything you’ve noted in your table. Funny enough, the changes where made without me knowing about Varien_File_Csv since I’m not a Magento user to begin with.

Comments are closed.