Installing the Google/ Bing XML Sitemap Splitter & Generator Magento Extension & Not Compromising on the Memory

The Magento XML sitemap Splitter & Generator is a very useful Extension for both large and small-sized stores. This extension helps in creating and organizing sitemaps by splitting or generating search engine compliant sitemaps.

Since Magento does not currently offer this functionality, the XML sitemap splitter & generator extension has proved to be of great significance. However, to best utilize this, you need to take care of the integration process on your website/ web store and prevent bad usage of memory.

There are two aspects that we should take care of when we code:

    • memory usage
    • cpu cycles

A common understanding today is ‘Computers today are powerful, memory cheap, so we don’t need optimization’ (Microsoft developers probably used to think this when they developed Vista).

People relying on the exponential approach don’t realize that you can multiply the power of cpu with the number of times you want but the root square will still be a bad enemy to defeat.

(Internet security and cryptography is based on this idea so if you find a way to crumble it let me know )

In this article we have a bad example of memory usage that we found out while implementing the Magento XML Sitemap Extension. Here are the two files involved. If you don’t have magento installation you can click on the links to see the code)

/app/code/core/Mage/Sitemap/Model/Sitemap.php
/app/code/core/Mage/Sitemap/Model/Resource/Catalog/Product.php

The first file is not very interesting, it just calls a function in product.php (->getCollection($storeId);) and iterates on the returned array, we need it just to understand the scenario.

This is the part that we need:

/**
 * Generate products sitemap
 */
 $changefreq = (string)Mage::getStoreConfig('sitemap/product/changefreq', $storeId);
 $priority = (string)Mage::getStoreConfig('sitemap/product/priority', $storeId);
 $collection = Mage::getResourceModel('sitemap/catalog_product')->getCollection($storeId);
 foreach ($collection as $item) {
 $xml = sprintf('<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>',
 htmlspecialchars($baseUrl . $item->getUrl()),
 $date,
 $changefreq,
 $priority
 );
 $io->streamWrite($xml);
 }
 unset($collection);

What is interesting here is the last line of code unset($collection); by this line seems clear that the developer was aware about the memory consumption.

Let us see the second file and the relative issue ..

The insane approach

Product.php is a resource model that executes a custom query (this is good: reduce Mysql load) in the method getCollection($storeId); there is a part that is totally not optimized:

while ($row = $query-&gt;fetch()) {
$product = $this-&gt;_prepareProduct($row);
$products[$product-&gt;getId()] = $product;
}

return $products;

This code before give back $products array, it iterates the rows returned by the query and for everyone calls _prepareProduct($row);

/**
* Prepare product
*
* @param array $productRow
* @return Varien_Object
*/
protected function _prepareProduct(array $productRow)
{
$product = new Varien_Object();
$product-&gt;setId($productRow[$this-&gt;getIdFieldName()]);
$productUrl = !empty($productRow['url']) ? $productRow['url'] : 'catalog/product/view/id/' . $product-&gt;getId();
$product-&gt;setUrl($productUrl);
return $product;
}

There are 2 performance issues here:
It does N iteration, just to prepare an array and then it will iterates again other N times ( in Sitemap.php ), so it reads all data 2 times.

It creates a big array with N obj, that is very memory consuming because we are speaking about total products (on a big store this can produce more than 100.000 obj)

The result is that if you launch sitemap generation on a medium/large store, you will get a nice black page and a ‘Allowed Memory Size Exhausted’ message.

This is good as I was not able to try/catch this error and then manage it. –

Now I will add some log to measure the performance and test it on one big store that we manage.
Just remember that Magento by default has access to 256mb of memory. (Look in default .htaccess )

We can note the total memory usage before the execution and after it.

something like this:

/* DEBUG MEMORY */
$baseMemory = memory_get_usage();
$totMem = (int) str_replace('M', '', ini_get('memory_limit'));

Mage::log('------------------------------------------------------------------');
Mage::log('Total Php Memory ("memory_limit"):'.$totMem.'M');
Mage::log('FreeMem: ' . ( $totMem - ($baseMemory / 1024 / 1024)));
$i = 1;
/* DEBUG MEMORY */

while ($row = $query-&gt;fetch()) {
$product = $this-&gt;_prepareProduct($row);
$products[$product-&gt;getId()] = $product;

/* DEBUG MEMORY */

if (($i % 10000) == 0) {
$mem = (memory_get_usage() - $baseMemory);
$media = $mem / $i;
Mage::log($i . " - FreeMem: " . ( $totMem - ($mem / 1024 / 1024)) . ' --- ' . $media);
}
$i++;
/* DEBUG MEMORY */
}

$i % 10000 condition means every 10000 cycle (products/row).

Now we’ll carry out the execution of the script and will look in /var/log/system.log for result:

2012-05-11T20:44:00+00:00 DEBUG (7): ------------------------------------------------------------------
2012-05-11T20:44:00+00:00 DEBUG (7): Total Php Memory ("memory_limit"):256M
2012-05-11T20:44:00+00:00 DEBUG (7): FreeMem: 229.54512023926
2012-05-11T20:44:01+00:00 DEBUG (7): 10000 - FreeMem: 234.45269012451 --- 2259.3992
2012-05-11T20:44:03+00:00 DEBUG (7): 20000 - FreeMem: 212.66645050049 --- 2271.926
2012-05-11T20:44:05+00:00 DEBUG (7): 30000 - FreeMem: 192.04777526855 --- 2235.2922666667
2012-05-11T20:44:06+00:00 DEBUG (7): 40000 - FreeMem: 169.10470581055 --- 2277.908
2012-05-11T20:44:08+00:00 DEBUG (7): 50000 - FreeMem: 148.45632171631 --- 2255.3544
2012-05-11T20:44:09+00:00 DEBUG (7): 60000 - FreeMem: 127.8698348999 --- 2239.2369333333
2012-05-11T20:44:11+00:00 DEBUG (7): 70000 - FreeMem: 102.72708892822 --- 2295.9756571429
2012-05-11T20:44:13+00:00 DEBUG (7): 80000 - FreeMem: 82.065711975098 --- 2279.7915
2012-05-11T20:44:15+00:00 DEBUG (7): 90000 - FreeMem: 61.429832458496 --- 2266.9067555556
2012-05-11T20:44:18+00:00 DEBUG (7): 100000 - FreeMem: 40.794189453125 --- 2256.59648

You can see at the beginning the we have 229Mb of free ram and after 100000 cycle it is about to finish: every 10000 iteration consume about 20mb and each obj about 2200 bytes
Yes Totally Insane (or just Digging the grave ) at least if your customer has a real shop 🙂

Possible Solution

One solution to avoid this is to pass the obj $query directly to the calling function and manage it there, without iterate 2 times on data and ,for sure, without create any obj for every row.

So getCollection($storeId) code will become something like this:

/**
* Get category collection array
*
* @return array
*/
public function getCollection($storeId)
{
$products = array();

$store = Mage::app()-&gt;getStore($storeId);
/* @var $store Mage_Core_Model_Store */

if (!$store) {
return false;
}

$urCondions = array(
'e.entity_id=ur.product_id',
'ur.category_id IS NULL',
$this-&gt;_getWriteAdapter()-&gt;quoteInto('ur.store_id=?', $store-&gt;getId()),
$this-&gt;_getWriteAdapter()-&gt;quoteInto('ur.is_system=?', 1),
);
$this-&gt;_select = $this-&gt;_getWriteAdapter()-&gt;select()
-&gt;from(array('e' =&gt; $this-&gt;getMainTable()), array($this-&gt;getIdFieldName()))
-&gt;join(
array('w' =&gt; $this-&gt;getTable('catalog/product_website')),
'e.entity_id=w.product_id',
array()
)
-&gt;where('w.website_id=?', $store-&gt;getWebsiteId())
-&gt;joinLeft(
array('ur' =&gt; $this-&gt;getTable('core/url_rewrite')),
join(' AND ', $urCondions),
array('url' =&gt; 'request_path')
)
;

$this-&gt;_addFilter($storeId, 'visibility', Mage::getSingleton('catalog/product_visibility')-&gt;getVisibleInSiteIds(), 'in');
$this-&gt;_addFilter($storeId, 'status', Mage::getSingleton('catalog/product_status')-&gt;getVisibleStatusIds(), 'in');

$query = $this-&gt;_getWriteAdapter()-&gt;query($this-&gt;_select);

return $query;
}

and the related Product.php part will change in this way:

/**
* Generate products sitemap
*/
$changefreq = (string) $productConf-&gt;getChangefreq();
$priority = (string) $productConf-&gt;getPriority();

$queryCollection = Mage::getResourceModel('sitemapEnhanced/catalog_product')-&gt;getCollection($this-&gt;getStoreId(), $catId, $filterOutOfStock, $filterInStock);

while ($productRow = $queryCollection-&gt;fetch())
{
$prodId = $productRow['entity_id'];
$url = !empty($productRow['url']) ? $productRow['url'] : 'catalog/product/view/id/' . $prodId;

$xml = sprintf('&lt;url&gt;&lt;loc&gt;%s&lt;/loc&gt;&lt;lastmod&gt;%s&lt;/lastmod&gt;&lt;changefreq&gt;%s&lt;/changefreq&gt;&lt;priority&gt;%.1f&lt;/priority&gt;&lt;/url&gt;', htmlspecialchars($this-&gt;_baseUrl . $url), $this-&gt;_date, $changefreq, $priority);
$io-&gt;streamWrite($xml);
}
unset($collection);

In this way the memory used is constant and doesn’t get exhausted iteration by iteration: we don’t create and store anything so every cycle php internal engine can keep on cleaning the garbage.

Let us know what’s your way of making good use of storage.

About Nosheen

I am a writer, reader, and a part time adventure and travel enthusiast. The other three things that vie for my mind share are dark chocolate, coffee, and photography. I am highly motivated by user perspectives and addressing the common human experience when I write.
This entry was posted in All Posts, Magento, Recommended, Web Development and tagged , , , , , . Bookmark the permalink.

Comments are closed.