Posts in Development


E-commerce product migration with DOMDocument()

16 Jan 2018

When building a new e-commerce site to replace something pre-existing I usually try and get hold of a copy of the database in use such that I can save the client significant set-up time by simply mapping products, categories and so on across to the new database. If the agency in question is generally helpful that is rarely a problem and thus my preferred solution. Sometimes however the client has had a poor experience with their current agency, and/or the agency in question is simply unhelpful, and on occasion deliberately obstructive. Sadly it does happen, and I have even come across cases where the current agency have suspected that the client will be going elsewhere and simply taken the existing site offline with no warning.  I've been working around one such situation recently in which the client was potentially faced with an extended workload of having to recreate many thousands or products in the new site. Not ideal. With no access to the database a different approach was required.... which is where PHP's Document Object Model (DOM) comes in.

The PHP DOM provides a very handy API for operating on structured XML/HTML documents, and given that the product pages on the existing site all used a common template with identifiable nodes for the various key product parameters theirein lay the solution to saving the client hundreds of hours of tedious effort.  This case is one that will have commonality with a number of development situations so I figured I would share my solution here such that you can take from it what you will.

I'd already culled the product category structure from the existing site so what follows deals with the products themselves together with images, and any assignments to those categories.

My e-commerce platform is built upon the Codeigniter 3 framework so the scripts are presented in the context of a CI controller that is part of the build in question but of course it is easily adapted to any other context and really is just a bit of precedural code. it was just easier when it came to doing all the necessary stuff to map the harvested data into the local site under development.

The solution assumes that the site being examined has a proper XML sitemap. If it doesn't then some sort of recursive function starting from the category menu would be a good place to start in terms of harvesting all the site URLs.

I haven't really included anything way of error handling since this controller only gets called manually by me, I'm interested only in its utility but it could easily be turned into a tool with a nice user-interface and so on.

It all worked well and in the matter of a few minutes successfully recreated thousands of products in the local site. Huge timesaver.

 

1. The basic controller + index function.


class Get_products extends Site_Controller 
{

   private $baseurl = 'http://www.somesite.com' //the baseurl of the site being examined


    function index()
    {
        ini_set('memory_limit', '-1');
        set_time_limit(0);

        $sitemap = $this->baseurl.'/sitemap.xml';

        //get all the urls
        $urls = $this->parseSitemap($sitemap);

        if(!empty($urls))
        {

            echo 'Processing '.count($urls).' urls...';

            $n = 0;

            foreach ($urls as $url)
            {
                if($this->parseProduct($url))
                {
                    $n++;
                } 
            }

            echo $n.' products were successfully processed.';
        }
        else
        {
            echo 'No urls were found';
        }

       return;

    }



}

 

2. Parse the Sitemap

    function parseSitemap($sitemap)
    {
        /** This function simply gets all the URLs in the sitemap. Assuming the sitemap is structured correctly URLs are wrapped by the loc tag.
        *    In this case all product urls contain the string 'shop' in the URL so am ignoring any that doen't.
        *  It's not critical since ultimately the product page structure is used to determine if the URL is a product or not, but this just saves a bit of overhead.
        */
        
        $urls = array(); 
        $DomDocument = new DOMDocument(); 
        $DomDocument->preserveWhiteSpace = false;
        $DomDocument->load($sitemap); 
        $DomNodeList = $DomDocument->getElementsByTagName('loc'); 

        foreach($DomNodeList as $url) { 
            if(stripos($url->nodeValue, '_shop') !== FALSE)
            {
                $urls[] = $url->nodeValue; 
            }
        }

        return $urls;
    }

 

3. Parse the product

This function does the work of picking through a retrieved product page. It includes calls to a number of helper functions which are reproduced with explanations below this one.

function parseProduct($url)
    {

        $html = $this->fetch_html($url);

        $dom = new DOMDocument();

        @$dom->loadHTML($html);
        $dom->preserveWhiteSpace = false;

        /**
         * In this case the product page structure uses an h1 tag for the product title.
         *     If no title is found then ignore the URL as it's not a product.
         *  Call to helper function elementByClass() to search the DOM for the appropriate element 
         */

        $className = 'product-title';
        $tagName = 'h1';
        $element = $this->elementByClass($dom, $tagName, $className);

        if($element !== false)
        {
            
            $productTitle = $element->nodeValue;

            /**
             * Subsequent product parameters can be discovered using the same method based on tag and class.
             * I'd already retrieved all the category names in use by the site so grabbing the product category assignment also so I can set up categories.
             * In this case the existing site had a 1 to 1 relationship between products and categories. If dealing with a one to many then if the sitemap has unique URLs
             * then simply look for duplicate products in the function saveProduct() and do category assignments as appropriate (assuming your new site can handle a one to many relationship).
             */

            // Look for a category name

            $className = 'detailProductCat';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);

            if($element !== false)
            {
                $productCategory = $element->nodeValue;
            }
            else
            {
                $productCategory = null;
            }

            // And for a product description

            $className = 'detailProductDesc';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);

            if($element !== false)
            {
                $productDescription = strip_empty_paras($this->innerHTML($element));
            }
            else
            {
                $productDescription = null;
            }

            // Now find a price.. in this case the site being analyzed didn't permit different prices for various options on a given product.

            $element = $dom->getElementsByTagName('h2')[0];
            if($element !== false)
            {
                $productPrice = preg_replace('/[^0-9.]/','',$element->nodeValue);
            }
            else
            {
                $productPrice = null;
            }

            /** Product Options
            * The site being analyzed used a  to present different variations of a given product.
            * So if the product has options find those by finding the select and iterating over the select options.
            * If no  found then it must be a single product with no choices.
            */

            $className = 'cartDdlOptions';
            $tagName = 'select';
            $element = $this->elementByClass($dom, $tagName, $className);

            
            $productOptions = array();
            if($element !== false)
            {
                $options = $element->getElementsByTagName('option');
                foreach ($options as $option) {
                    $productOptions[] = $option->nodeValue;
                }
                
            }
            
            /** PRODUCT IMAGES use the same philosophy. In this case the site used a carousel plugin so it was easy to identify the appropriate classname.
            * Images are copied to a local directory for later use.
            * In this case the source site generated image srcs dynamically so typically an image source could look like "/_loadimage.aspx?ID=172236"
            * so the following includes a call to a function that looks in the headers sent to determine the image type to save as.
            */
    

            $className = 'cycle-slide';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);
            $imagePaths = array();

            if($element !== false)
            {
                $images = $element->getElementsByTagName('img');
                $i = 0;
                $savePath = 'imagesTemp/';

                foreach ($images as $image)
                {
                    $src = $this->baseurl.$image->getAttribute('src');

                    //get the file contents

                    $imageString = file_get_contents($src);  

                    if($imageString !== FALSE)
                    {
                        //and work out the file type. Only interested in jpg, gif, or png in this case.

                        $type = $this->find_file_type($src);

                        if($type == 'gif' || $type == 'jpg' || $type == 'jpeg' || $type == 'png')
                        {
                            $ext = str_replace('e', '', $type); //i know jpeg is a valid extension but I don't like it...

                            //save the file with a nice, SEO friendly filename. Codeigniter has a handy helper function, url_title(), that does a good job of cleaning up strings for URLs.

                            $save = file_put_contents($savePath.url_title($productTitle).'-'.$i.'.'.$ext,$imageString);
                            if($save !== FALSE)
                            {
                                $imagePaths[] = $savePath.url_title($productTitle).'-'.$i.'.jpg';
                                $i++;
                            }
                        }

                    }
                }
            }

            $product = array(
                'productTitle' => $productTitle,
                'productCategory' => $productCategory,
                'productDescription' => $productDescription,
                'productPrice' => $productPrice,
                'productOptions' => $productOptions,
                'imagePaths' => $imagePaths
            );

            // Pass the product data to the saveProduct function that does whatever your own e-commerce platform needs in terms of database and file structure.
            return $this->saveProduct($product);
            
        }
        
        return FALSE;

    }

4. Get the HTML

Simple cURL request to fetch the HTML for a given URL

function fetch_html($url)
    {
        

        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $html = curl_exec($ch);
        curl_close($ch);

        return $html;
    }

 

5. Find elements by class

The site being examined used specific classes to identify key areas of markup in the product template. We need to grab those to get to the product parameters. PHP's DOMDocument doesn't include a direct means of accessing nodes by classname so this function takes care of that.

function elementByClass(&$domParent, $tagName, $className)
    {
        /** PHPs DOMDocument() class doesn't include a direct means of identifying nodes by classname.
        * But you can iterate over childnodes looking for the appropriate class attribute
        * I only want the first instance but have structured the function to provide an array of nodes should it be needed
        */ 

        $nodes = array();

        $childNodes = $domParent->getElementsByTagName($tagName);
        $tagCount = 0;

        foreach ($childNodes as $node)
        {
            if (stripos($node->getAttribute('class'), $className) !== FALSE)
            {
                $nodes[] = $node;

                //you could just do this is always wanting only the first node.
                //return $node  
            }
        }
           
           //in this case I just want the first

           if(!empty($nodes[0]))
           {
               return $nodes[0];
           }
           else
           {
               return FALSE;
           }
           
    }

 

6. Inner HTML

The product description exists over multiple paragraphs with tags I was keen to preserve so this helper function does just that.

function innerHTML( $parentNode )
    {
        /* Neat helper function extracts the inner HTML of a DOM node
        * credit to https://kuttler.eu/en/post/php-innerhtml/  for saving me time
         */

        $innerHTML = '';
        $elements = $parentNode->childNodes;

        foreach( $elements as $element ) { 
            if ( $element->nodeType == XML_TEXT_NODE ) {
                $text = $element->nodeValue;
                $innerHTML .= $text;
            }     
            elseif ( $element->nodeType == XML_COMMENT_NODE ) {
                $innerHTML .= '';
            }     
            else {
                $innerHTML .= '<';
                $innerHTML .= $element->nodeName;
                if ( $element->hasAttributes() ) { 
                    $attributes = $element->attributes;
                    foreach ( $attributes as $attribute )
                        $innerHTML .= " {$attribute->nodeName}='{$attribute->nodeValue}'" ;
                }     
                $innerHTML .= '>';
                $innerHTML .= $this->innerHTML( $element );
                $innerHTML .= "nodeName}>";
            }     
        }     
        return $innerHTML;
    }

 

7.  Image file types

Browsers use the content-type header rather than file extension to determine the type of image file being served. In this case because the site under examination served image data dynamically it's necessary to know what the image type is such that the image can be copied and saved correctly. This function does a simple examination of the headers served from the image src.

function find_file_type($image_src)
    {
        /* browsers use the content-type header to understand if something is an image and what kind it is.
        * this function simply uses PHPs built-in get_headers() to get the headers returned at the image src url and returns the type if it's an image.
        */

        $type = null;

        $headers = get_headers($image_src);
        
        if(!empty($headers))
        {
            foreach ($headers as $h)
            {
                //just looking for an "image/*" string

                if(strpos($h, 'image/') !== FALSE)
                {
                    $dat = array();

                    //extract the type substring
                    preg_match("/image\/(.+?);/", $h, $dat);
                    if(!empty($dat[1]))
                    {
                        return $dat[1];
                    }
                    
                }
            }
        }

        return false;
    }

 

8. Save Product

Just whatever you need to do here...

function saveProduct($product)
    {

        /* function contains whatever you need to do to create the product  in the context of your own site
        * In my case various database operations around products, product options, category assignments, and setting up the file structure for the product images.
        * for the record my e-commerce platform maintains product images in separate folders for each product, it makes user management of them much simpler than having a single repository with thousands of pictures.
        .
        .
        .
        .
        */

    }

 

 

MySQL conditional composite join with a subquery (Ecommerce, Sage Accounts Import)

05 Jan 2018

I'm going to try and make the effort to be a bit more forthcoming with useful dev stuff.. stuff that isn't necessarily obvious that I come across from time to time. So to kick that off here's a little tidbit I needed to figure out just now. My e-commerce platform uses separate tables for products, product variations (eg large, small, black, red.. whatever) and product stock. I won't reproduce the tables in full here but essentially consider the 'products' table as a list of primary, or parent products. The 'product_options' table contains all the children, if applicable, of those products with their own SKUs, prices and so on.
Product stock is maintained in a separate table for various reasons, that contains fields for the product ID, option ID (if applicable), current stock level, and fields for tracking stock movement.

In building a tool for importing product stock data from Sage Accounts I needed a query that would give me a single flat array of products with stock levels. The join would be conditional on whether or not a product had child products, and if it did the join would work on composite fields (i.e product ID and option ID). Now there are many ways of skinning the proverbial SQL cat but this is how I did it.. a subquery to get a flat list of products/product variations with the JOIN  condition to the stock table inside a CASE statement.

I have not benchmarked it for performance as I don't really care, the query is run once as an admin task during the parsing and error checking of an imported CSV file that in this case runs to in excess of 20,000 records, in that context a few milliseconds either way is not a worry.

 

SELECT a.*,b.stock 
FROM 
(
    SELECT p.product_id, p.name, p.price,o.option_name, p.sku, o.option_id, o.sku AS option_sku, o.option_price
    FROM products p 
    LEFT JOIN product_options o ON p.product_id = o.parent_id
    WHERE p.deleted = 0 AND (o.deleted = 0 OR o.deleted IS NULL) 
) AS a 
LEFT JOIN product_stock b ON (CASE WHEN a.option_id IS NULL  
                                   THEN b.product_id = a.product_id
                                   ELSE (b.product_id = a.product_id AND a.option_id = b.option_id)
                               END)
ORDER BY a.sku ASC;

 

There. Might come in handy if you have a similar problem especially if you don't necessarily always find SQL syntax completely intuitive.

 

ArtEye Creative Consultancy

22 Dec 2016

Earlier this year and in conjunction with Design Room Cornwall we launched a new website for Cornwall-based ArtEye. The brief was quite demanding with requirements for a high end brochure feel appropriate to the market in which ArtEye operates, rich magazine-style layouts, lots of portfolios, a members area with paid subscriptions, premium content, client preferences and access to consultancy services, amongst others. The end result has been very well received and over the few months since the site was launched is growing to be a terrific resource of fine art related material. To achieve the magazine style layouts while still retaining a super intuitive admin user interface I developed a number of 'widgets' for the editors that allow a variety of content wrappers to be inserted into single editor instances that can then be configured and styled to achieve the complex layouts required. It means that essentially the page layouts can be created on the fly inside the editors by an admin user without any need for multiple templates, or complex editor setups. The site uses the mikesimagination.net-developed modular content management system (CMS) and hence is well placed to accommodate updates and changes for years into the future with minimal overheads as the business evolves.

Anyway, the site has stacks of content and some super photography so rather than go on and on about it on here you might as well just go and look... https://www.arteye.co.uk/.

A few screengrabs below too. Do get in touch if you'd like to know more about the work behind the site, the features or any related query.. especially if you're looking for a developer to build something super...

High end website design for ArtEye Creative Consultancy

 

High end web design & development for ArtEye Creative Consultancy

 

top quality web design - development for ArtEye Creative Consultancy

 

High end web development and design for ArtEye Creative Consultancy

 

Banshee Bikes New Zealand

04 Oct 2016

Another highly rewarding project that launched recently. I've written about it in much more detail in the portfolio here, so for the purposes of this blog post here's a quick summary. 

Built for Banshee Bikes New Zealand this instance of the mikesimagination.net custom bicycle builder / web configurator application represents the ultimate evolution of the application. Able to cope with the multiple standards and component compatibility problems that exist within the cycling world - it permits the admin to define relationships between components and categories at multiple levels - within categories, within sets of categories or globally. From the point of view of the end user the path through the configurator is completely unrestricted with all products/components available at once with compatibility being calculated on the fly as the user makes selections.

Completed builds can be saved, shared on social media, printed and financed - a finance calculator is included that permits the user to create their own flexible finance package before being forwarded on to complete their application.

 

 

Responsive email newsletter editor

30 Dec 2015

Camtec Photo in Montréal have a specialist online newsletter that can have any number of distinct sections/chapters with embedded galleries as well as inline content. I built an editor that allows that online edition to be converted into an email newsletter - bilingual English and French - that is responsive so it can adapt to display well in the extensive variety of mobile and tablet email clients available, as well as desktop clients. The tool can of course also be used to create other types of email newsletter but the automated import of online editions is a key time saver for the staff. Test newsletters can be sent in both languages to the logged in user before choosing to send to the entire mailing list. Recipients of course have tools for managing their subscription preferences - including choice of language. Sending to the full list is managed by polling the list in chunks so that up to date progress reports can be returned to the browser and errors caught and handled appropriately.

 

They really liked it :-)

Responsive email newsletter editor web application

Responsive email newsletter editor web application

Responsive email newsletter editor web application

Custom Configurator App

30 Dec 2015

Time for an update on the custom bicycle configurator application that I introduced in my last post. It's just about complete and live with a just a couple of jobs left to do - the social media sharing tools essentially such that users can share their builds around the facebook-and-twitter-verse. All of the other functionality is complete -  aside from building bikes users can save their builds, print them, ask questions about them - component choices, sizing etc etc.. and apply for finance online via the V12 Retail Finance api. The chaps at Cycle Logic are still populating the categories so the MTB flavour is not yet available at the time of writing but the road flavour is just about populated with the exception of a few fork choices to go with the rather lovely Seven Cycles custom frames.

 

It's coded as a module for my e-commerce platform so integrates directly with the product catalogue and is fully user configurable such that it can be set up with any number of categories, and could quite easily be used for pretty much any situation where a product can be configured - not just bicycles.. kayaks, boats, prefab buildings, wetsuits... lots of different sporting goods applications spring to mind. It can also be set-up to run as a standalone application in a sub-directory or subdomain alongside an existing site.

 

A few screenshots follow but really it's just better to go and have a play. It's here:

 http://www.cyclelogic.co.uk/configurator/index/asphalt

Showing 1 to 6 of 10 |  1 2 >