Project Origins

From Magmi Wiki
Jump to: navigation, search

The Magmi Project Origins

As everyone knows, Magento is a powerful e-commerce platform that is built on top excellent Zend Framework.It has plenty of features, covers many aspects of the e-commerce workflow.

Magento has a clean,extensible but complex core object model and enable external data ingestion & output using its Dataflow feature or APIs.

A good design but...

The problem comes with implemetation philosophy that takes object reuse as a religion.

Many complex operations are done inside magento object architecture when dealing with product values (wether you want to use a product, a list of products or modify/create products)

while abstraction levels are good and should be kept for safety & reusability goal , it ruins batch operation use case.

The code called for modifying any attribute of an item or all of them will be exactly the same (this is so "elegant") , roughly speaking this code does the following:

   load all attributes of the item passing through magento pseudo orm collection layer
   apply the new value (perform value checking also)
   save all attributes of the item passing through magento pseudo orm collection layer (this will hopefully include the single value that you modified)

so, no need for exception cases, this has the elegance of solving all checks at once, even dependency checks between attributes.

The sum of magento internal operations bound to product manipulation is such that around 98% of the execution time is out of strict data access from database layer.

... that does not scale on batch operations

A simple use case: modify the product qty on a whole catalog of around 20k of objects will take roughly 20k times the time for modifying a full product.

we talk about several seconds by object, so the dataflow feature which is just a wrapper around standard product internal api calls gives a snail speed of 20/30 products per minute.

make yourself the count: 1000 mns for 20k products, around 16hours on a "standard" dual core.

and this is enforced by internal design,you've just been trapped.

Even cron based, you need to carefully choose your timespan for catalog update.

You idiot, use the "Number of records" field

Magento improved some how by adding "packetized" operations that will accumulate object instances in RAM before doing multiple insert/updates in the database in a single request.

Here again, alas, that means consuming huge amounts of ram (a magento product instance is a complex object that includes many hierachy levels) for a still suboptimal result.

they want to optimize externally what cannot scale internally.

so the same complex operations are somehow multiplexed, but the result is that database layer is still ignored in 90+% of the processing time.

Some time ago, in a weird fictional dream,the quest began....

On a bridge, on the edge of a dark forest , a silouhette pops out from the surrounding shadows.

it wears a robe with a caption under which i can see no face but only two red glowing eyes....

- Thou Shall Not Access Magento Database Unbeliever !!!!!

- And why sir ?

- It's way too complex for you to understand !

- Do not take it bad, but i understand it , and apart some strange implementation choices made both for legacy preservation & some other unknown motivations, the initial intent was damn good !

- Thanks, but i warned you, no one that passed the brige came back with something good, the few lucky just got mad and err in the eternal shadows where their pretention led them.

- i'll take the challenge , no risk since i'm already mad !!!

- you fool, you seem guided by some strange faith ,my persuasion powers have no effects on you. i cannot stop you.....

The silouhette disappeared in the surrounding shadows and i crossed the bridge that separated me from the Magento Database internals , my quest just began on the magento forums.

After some long nights, the quest led me to creating this project !

Welcome stranger !!!!