Posts in Ecommerce


E-commerce product migration with DOMDocument()

16 Jan 2018

When building a new e-commerce site to replace something pre-existing I usually try and get hold of a copy of the database in use such that I can save the client significant set-up time by simply mapping products, categories and so on across to the new database. If the agency in question is generally helpful that is rarely a problem and thus my preferred solution. Sometimes however the client has had a poor experience with their current agency, and/or the agency in question is simply unhelpful, and on occasion deliberately obstructive. Sadly it does happen, and I have even come across cases where the current agency have suspected that the client will be going elsewhere and simply taken the existing site offline with no warning.  I've been working around one such situation recently in which the client was potentially faced with an extended workload of having to recreate many thousands or products in the new site. Not ideal. With no access to the database a different approach was required.... which is where PHP's Document Object Model (DOM) comes in.

The PHP DOM provides a very handy API for operating on structured XML/HTML documents, and given that the product pages on the existing site all used a common template with identifiable nodes for the various key product parameters theirein lay the solution to saving the client hundreds of hours of tedious effort.  This case is one that will have commonality with a number of development situations so I figured I would share my solution here such that you can take from it what you will.

I'd already culled the product category structure from the existing site so what follows deals with the products themselves together with images, and any assignments to those categories.

My e-commerce platform is built upon the Codeigniter 3 framework so the scripts are presented in the context of a CI controller that is part of the build in question but of course it is easily adapted to any other context and really is just a bit of precedural code. it was just easier when it came to doing all the necessary stuff to map the harvested data into the local site under development.

The solution assumes that the site being examined has a proper XML sitemap. If it doesn't then some sort of recursive function starting from the category menu would be a good place to start in terms of harvesting all the site URLs.

I haven't really included anything way of error handling since this controller only gets called manually by me, I'm interested only in its utility but it could easily be turned into a tool with a nice user-interface and so on.

It all worked well and in the matter of a few minutes successfully recreated thousands of products in the local site. Huge timesaver.

While you're at it you can use the same approach to write all all the 301 redirects you'll need to map all the old product URLs to the new ones tready for when the new site goes live.

 

1. The basic controller + index function.


class Get_products extends Site_Controller 
{

   private $baseurl = 'http://www.somesite.com' //the baseurl of the site being examined


    function index()
    {
        ini_set('memory_limit', '-1');
        set_time_limit(0);

        $sitemap = $this->baseurl.'/sitemap.xml';

        //get all the urls
        $urls = $this->parseSitemap($sitemap);

        if(!empty($urls)) {

            echo 'Processing '.count($urls).' urls...';

            $n = 0;

            foreach ($urls as $url) {
                if($this->parseProduct($url))
                {
                    $n++;
                } 
            }
            echo $n.' products were successfully processed.';
        }
        else {
            echo 'No urls were found';
        }
       return;
    }



}

 

2. Parse the Sitemap

    function parseSitemap($sitemap)
    {
        /** This function simply gets all the URLs in the sitemap. Assuming the sitemap is structured correctly URLs are wrapped by the loc tag.
        *    In this case all product urls contain the string 'shop' in the URL so am ignoring any that doen't.
        *  It's not critical since ultimately the product page structure is used to determine if the URL is a product or not, but this just saves a bit of overhead.
        */
        
        $urls = array(); 
        $DomDocument = new DOMDocument(); 
        $DomDocument->preserveWhiteSpace = false;
        $DomDocument->load($sitemap); 
        $DomNodeList = $DomDocument->getElementsByTagName('loc'); 

        foreach($DomNodeList as $url) { 
            if(stripos($url->nodeValue, '_shop') !== false) {
                $urls[] = $url->nodeValue; 
            }
        }
        return $urls;
    }

 

3. Parse the product

This function does the work of picking through a retrieved product page. It includes calls to a number of helper functions which are reproduced with explanations below this one.

function parseProduct($url)
    {

        $html = $this->fetch_html($url);

        $dom = new DOMDocument();

        libxml_use_internal_errors(true); //if HTML 5 then lack of a DTD will cause errors on load, this will supress those.

        @$dom->loadHTML($html);
        
        libxml_clear_errors();
        
        $dom->preserveWhiteSpace = false;

        /**
         * In this case the product page structure uses an h1 tag for the product title.
         *     If no title is found then ignore the URL as it's not a product.
         *  Call to helper function elementByClass() to search the DOM for the appropriate element 
         */

        $className = 'product-title';
        $tagName = 'h1';
        $element = $this->elementByClass($dom, $tagName, $className);

        if($element !== false) {
            
            $productTitle = $element->nodeValue;

            /**
             * Subsequent product parameters can be discovered using the same method based on tag and class.
             * I'd already retrieved all the category names in use by the site so grabbing the product category assignment also so I can set up categories.
             * In this case the existing site had a 1 to 1 relationship between products and categories. If dealing with a one to many then if the sitemap has unique URLs
             * then simply look for duplicate products in the function saveProduct() and do category assignments as appropriate (assuming your new site can handle a one to many relationship).
             */

            // Look for a category name

            $className = 'detailProductCat';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);

            if($element !== false) {
                $productCategory = $element->nodeValue;
            }
            else {
                $productCategory = null;
            }

            // And for a product description

            $className = 'detailProductDesc';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);

            if($element !== false) {
                $productDescription = strip_empty_paras($this->innerHTML($element));
            }
            else {
                $productDescription = null;
            }

            // Now find a price.. in this case the site being analyzed didn't permit different prices for various options on a given product.

            $element = $dom->getElementsByTagName('h2')[0];
            if($element !== false) {
                $productPrice = preg_replace('/[^0-9.]/','',$element->nodeValue);
            }
            else {
                $productPrice = null;
            }

            /** Product Options
            * The site being analyzed used a  to present different variations of a given product.
            * So if the product has options find those by finding the select and iterating over the select options.
            * If no  found then it must be a single product with no choices.
            */

            $className = 'cartDdlOptions';
            $tagName = 'select';
            $element = $this->elementByClass($dom, $tagName, $className);

            
            $productOptions = array();
            if($element !== false) {
                $options = $element->getElementsByTagName('option');
                foreach ($options as $option) {
                    $productOptions[] = $option->nodeValue;
                }
            }
            
            /** PRODUCT IMAGES use the same philosophy. In this case the site used a carousel plugin so it was easy to identify the appropriate classname.
            * Images are copied to a local directory for later use.
            * In this case the source site generated image srcs dynamically so typically an image source could look like "/_loadimage.aspx?ID=172236"
            * so the following includes a call to a function that looks in the headers sent to determine the image type to save as.
            */
    

            $className = 'cycle-slide';
            $tagName = 'div';
            $element = $this->elementByClass($dom, $tagName, $className);
            $imagePaths = array();

            if($element !== false) {
                $images = $element->getElementsByTagName('img');
                $i = 0;
                $savePath = 'imagesTemp/';

                foreach ($images as $image) {
                    $src = $this->baseurl.$image->getAttribute('src');

                    //get the file contents

                    $imageString = file_get_contents($src);  

                    if($imageString !== false) {
                        //and work out the file type. Only interested in jpg, gif, or png in this case.

                        $type = $this->find_file_type($src);

                        if($type == 'gif' || $type == 'jpg' || $type == 'jpeg' || $type == 'png') {
                            $ext = str_replace('e', '', $type); //I know jpeg is a valid extension but I don't like it...

                            //save the file with a nice, SEO friendly filename. Codeigniter has a handy helper function, url_title(), that does a good job of cleaning up strings for URLs.

                            $save = file_put_contents($savePath.url_title($productTitle).'-'.$i.'.'.$ext,$imageString);
                            if($save !== false) {
                                $imagePaths[] = $savePath.url_title($productTitle).'-'.$i.'.jpg';
                                $i++;
                            }
                        }
                    }
                }
            }

            $product = array(
                'productTitle' => $productTitle,
                'productCategory' => $productCategory,
                'productDescription' => $productDescription,
                'productPrice' => $productPrice,
                'productOptions' => $productOptions,
                'imagePaths' => $imagePaths
            );

            // Pass the product data to the saveProduct function that does whatever your own e-commerce platform needs in terms of database and file structure.
            return $this->saveProduct($product);
            
        }
        
        return false;

    }

4. Get the HTML

Simple cURL request to fetch the HTML for a given URL

function fetch_html($url)
    {
        

        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $html = curl_exec($ch);
        curl_close($ch);

        return $html;
    }

 

5. Find elements by class

The site being examined used specific classes to identify key areas of markup in the product template. We need to grab those to get to the product parameters. PHP's DOMDocument doesn't include a direct means of accessing nodes by classname so this function takes care of that.

function elementByClass(&$domParent, $tagName, $className)
    {
        /** PHPs DOMDocument() class doesn't include a direct means of identifying nodes by classname.
        * But you can iterate over childnodes looking for the appropriate class attribute
        * I only want the first instance but have structured the function to provide an array of nodes should it be needed
        */ 

        $nodes = array();

        $childNodes = $domParent->getElementsByTagName($tagName);
        $tagCount = 0;

        foreach ($childNodes as $node) {
            if (stripos($node->getAttribute('class'), $className) !== FALSE) {
                $nodes[] = $node;

                //you could just do this is always wanting only the first node.
                //return $node  
            }
        }
           
           //in this case I just want the first

           if(!empty($nodes[0]))  {
               return $nodes[0];
           }
           else {
               return false;
           }
           
    }

 

6. Inner HTML

The product description exists over multiple paragraphs with tags I was keen to preserve so this helper function does just that.

function innerHTML( $parentNode )
    {
        /* Neat helper function extracts the inner HTML of a DOM node
        * credit to https://kuttler.eu/en/post/php-innerhtml/  for saving me time
         */

        $innerHTML = '';
        $elements = $parentNode->childNodes;

        foreach( $elements as $element ) { 
            if ( $element->nodeType == XML_TEXT_NODE ) {
                $text = $element->nodeValue;
                $innerHTML .= $text;
            }     
            elseif ( $element->nodeType == XML_COMMENT_NODE ) {
                $innerHTML .= '';
            }     
            else {
                $innerHTML .= '<';
                $innerHTML .= $element->nodeName;
                if ( $element->hasAttributes() ) { 
                    $attributes = $element->attributes;
                    foreach ( $attributes as $attribute )
                        $innerHTML .= " {$attribute->nodeName}='{$attribute->nodeValue}'" ;
                }     
                $innerHTML .= '>';
                $innerHTML .= $this->innerHTML( $element );
                $innerHTML .= "nodeName}>";
            }     
        }     
        return $innerHTML;
    }

 

7.  Image file types

Browsers use the content-type header rather than file extension to determine the type of image file being served. In this case because the site under examination served image data dynamically it's necessary to know what the image type is such that the image can be copied and saved correctly. This function does a simple examination of the headers served from the image src.

function find_file_type($image_src)
    {
        /* browsers use the content-type header to understand if something is an image and what kind it is.
        * this function simply uses PHPs built-in get_headers() to get the headers returned at the image src url and returns the type if it's an image.
        */

        $type = null;

        $headers = get_headers($image_src);
        
        if(!empty($headers)) {
            foreach ($headers as $h) {
                //just looking for an "image/*" string
                if(strpos($h, 'image/') !== FALSE)
                {
                    $dat = array();
                    //extract the type substring
                    preg_match("/image\/(.+?);/", $h, $dat);
                    if(!empty($dat[1])) {
                        return $dat[1];
                    }
                }
            }
        }

        return false;
    }

 

8. Save Product

Just whatever you need to do here...

function saveProduct($product)
    {

        /* function contains whatever you need to do to create the product  in the context of your own site
        * In my case various database operations around products, product options, category assignments, and setting up the file structure for the product images.
        * for the record my e-commerce platform maintains product images in separate folders for each product, it makes user management of them much simpler than having a single repository with thousands of pictures.
        .
        .
        .
        .
        */

    }

 

 

MySQL conditional composite join with a subquery (Ecommerce, Sage Accounts Import)

05 Jan 2018

I'm going to try and make the effort to be a bit more forthcoming with useful dev stuff.. stuff that isn't necessarily obvious that I come across from time to time. So to kick that off here's a little tidbit I needed to figure out just now. My e-commerce platform uses separate tables for products, product variations (eg large, small, black, red.. whatever) and product stock. I won't reproduce the tables in full here but essentially consider the 'products' table as a list of primary, or parent products. The 'product_options' table contains all the children, if applicable, of those products with their own SKUs, prices and so on.
Product stock is maintained in a separate table for various reasons, that contains fields for the product ID, option ID (if applicable), current stock level, and fields for tracking stock movement.

In building a tool for importing product stock data from Sage Accounts I needed a query that would give me a single flat array of products with stock levels. The join would be conditional on whether or not a product had child products, and if it did the join would work on composite fields (i.e product ID and option ID). Now there are many ways of skinning the proverbial SQL cat but this is how I did it.. a subquery to get a flat list of products/product variations with the JOIN  condition to the stock table inside a CASE statement.

I have not benchmarked it for performance as I don't really care, the query is run once as an admin task during the parsing and error checking of an imported CSV file that in this case runs to in excess of 20,000 records, in that context a few milliseconds either way is not a worry.

 

SELECT a.*,b.stock 
FROM 
(
    SELECT p.product_id, p.name, p.price,o.option_name, p.sku, o.option_id, o.sku AS option_sku, o.option_price
    FROM products p 
    LEFT JOIN product_options o ON p.product_id = o.parent_id
    WHERE p.deleted = 0 AND (o.deleted = 0 OR o.deleted IS NULL) 
) AS a 
LEFT JOIN product_stock b ON (CASE WHEN a.option_id IS NULL  
                                   THEN b.product_id = a.product_id
                                   ELSE (b.product_id = a.product_id AND a.option_id = b.option_id)
                               END)
ORDER BY a.sku ASC;

 

There. Might come in handy if you have a similar problem especially if you don't necessarily always find SQL syntax completely intuitive.

 

ArtEye Creative Consultancy

22 Dec 2016

Earlier this year and in conjunction with Design Room Cornwall we launched a new website for Cornwall-based ArtEye. The brief was quite demanding with requirements for a high end brochure feel appropriate to the market in which ArtEye operates, rich magazine-style layouts, lots of portfolios, a members area with paid subscriptions, premium content, client preferences and access to consultancy services, amongst others. The end result has been very well received and over the few months since the site was launched is growing to be a terrific resource of fine art related material. To achieve the magazine style layouts while still retaining a super intuitive admin user interface I developed a number of 'widgets' for the editors that allow a variety of content wrappers to be inserted into single editor instances that can then be configured and styled to achieve the complex layouts required. It means that essentially the page layouts can be created on the fly inside the editors by an admin user without any need for multiple templates, or complex editor setups. The site uses the mikesimagination.net-developed modular content management system (CMS) and hence is well placed to accommodate updates and changes for years into the future with minimal overheads as the business evolves.

Anyway, the site has stacks of content and some super photography so rather than go on and on about it on here you might as well just go and look... https://www.arteye.co.uk/.

A few screengrabs below too. Do get in touch if you'd like to know more about the work behind the site, the features or any related query.. especially if you're looking for a developer to build something super...

High end website design for ArtEye Creative Consultancy

 

High end web design & development for ArtEye Creative Consultancy

 

top quality web design - development for ArtEye Creative Consultancy

 

High end web development and design for ArtEye Creative Consultancy

 

Elementum Journal

28 Jun 2016

I have quite a backlog of material to post, the last few months have been very busy, so by way of starting to make a dent in that backlog I thought I would take a moment to mention something I was privileged to become involved with this past winter, It is also  easy for me to write about as all the hard work like thinking of appropriate words to use has been done for me ready for the launch which happened, to a terrific reception, back in April .. here in Cornwall at the Porthleven Festival.

 

Elementum Journal cover image

 

Elementum is a journal of new writing and visual arts that explores our connection to the natural world. We bring together the scientist’s findings with the artist’s response, the ecologist’s observation with the writer’s reflection, to offer the reader an immersive and insightful publication.

 

Surrounded by the Atlantic, we draw inspiration from the rocks and ruins, wildlife and flora, sea, sky and folklore of the Cornish peninsula. In seeking the spirit of place we will also look beyond these borders to other stories that link people with nature and landscape.

 

Guided by a different theme for each edition, Elementum will publish three times a year.

 


Elementum Journal preview

 

Elementum Journal preview

 

Elementum is based in Cornwall and it was a real privilege to be invited onboard with the project. It’s going to be beautiful and I do genuinely feel lucky to have been asked to build it. Do check out the website (click here) and you can find out more about the site in my portfolio by clicking here. You can also download an issue 1 preview from the ‘In Print‘ page and the online shop for purchasing subscriptions, fine art prints and so on is accepting orders in advance of the first print edition. If you're interested please do sign up for the newsletter via the form in the footer of the site to keep up to date as the launch continues. Having seen some of what’s coming from the writers, photographers and artists already on-board it will be well worth it.

 

Elementum Journal e-commerce website

Custom Configurator App

30 Dec 2015

Time for an update on the custom bicycle configurator application that I introduced in my last post. It's just about complete and live with a just a couple of jobs left to do - the social media sharing tools essentially such that users can share their builds around the facebook-and-twitter-verse. All of the other functionality is complete -  aside from building bikes users can save their builds, print them, ask questions about them - component choices, sizing etc etc.. and apply for finance online via the V12 Retail Finance api. The chaps at Cycle Logic are still populating the categories so the MTB flavour is not yet available at the time of writing but the road flavour is just about populated with the exception of a few fork choices to go with the rather lovely Seven Cycles custom frames.

 

It's coded as a module for my e-commerce platform so integrates directly with the product catalogue and is fully user configurable such that it can be set up with any number of categories, and could quite easily be used for pretty much any situation where a product can be configured - not just bicycles.. kayaks, boats, prefab buildings, wetsuits... lots of different sporting goods applications spring to mind. It can also be set-up to run as a standalone application in a sub-directory or subdomain alongside an existing site.

 

A few screenshots follow but really it's just better to go and have a play. It's here:

 http://www.cyclelogic.co.uk/configurator/index/asphalt

Project Round Up

29 Jul 2015

I thought I would take a moment to talk about a few of the things I'm working on at the moment, each deserves a post of it's own as they all have interesting aspects to them, but by way of a quick run down prior to more detail... here's a summary:

 

 

Advanced Chemical Intermediates

 

I'm really enjoying working on this one, coming from an engineering/scientific background the subject matter is interesting in its own right but in addition to that the project had a set of very specific requirements that have required some original thinking with regards to the best way to deliver.. a high level of satisfaction with this one.

 

Advanced Chemical Intermediates, ACINTs, to use their own words "specialise in the synthesis of new chemical entities, functionalised intermediates and building blocks". Their primary customer is the drug discovery industry  - offering a range of chemistry services including custom synthesis, contract research, and consultancy services.  They were referred to me having failed to find an agency able to commit to delivery on their requirements.

ACINTs use a specialised offline database tool for managing their product catalogue. One of their requirements was that they continue to use that database rather than move everything into the cloud. It's an old tool and there is no direct means of accessing it remotely so one of the key requirements was a flexible, multi-step import/export process between that tool and the cloud-based database driving the website.

 

The public site also incorporates a graphical tool that permits a user to draw a structure and find matches on that in the database... as a result of that I learned about Smile Strings (I really like the diversity of stuff I get asked to do :-). The editor instance itself is a javascript app that has a number of methods available that allow a developer to do cool things like, in my case, extract the canononical Smile String to match against the products database and pre-load the editor with a molecular structure.

Other things... with more than 3500 products in the database there also had to be a way to offer a very structured search for users so I built a tool that allows the users to make choices based on Functional Groups and/or Ring Systems as a way of making a very efficient, targeted search. It's working really well.

 

I suppose really the key with this was to build something very much better in term of tools, usability and search engine effectiveness than their competitors.. It's not live yet but will be very soon and based on feedback so far we seem to have achieved, and surpassed that goal. Result!

 

 

ADI Access

 

"RoomMate, a real issue, a revolutionary solution". I mentioned this one briefly in an earlier post but it's now up and running and working really well. There is also good overview on the project crowdfunding page here if you'd like to know more about the background of it.

 

The 'uniqueness' of this project centers around the configuring of devices by the user during the order process. The admin tools offer a drag and drop interface for designing the available configurations of a devices - i.e a device configuration is divided into any number of steps and then within each of those steps the user can choose from a number of selections that describe the installation location of the device. When a customer orders and configures a device online the application builds the MP3 files appropriate to the chosen configuration and delivers those to the manufacturers portal ready for direct download into the devices. This was also a project I was asked to pick up after a previous developer was unable to deliver a solution that didn't involve lots of manual work building the device configurations each time an order was received.

It's a great product with a terrific team behind it that has the potential to impact millions of lives so it has been particularly satisfying to build something that facilitates that.

Looking to the future there will be versions of the device that are designed to be updateable with new configurations in the field - for example a hotel might have a number of devices that can be placed in rooms on request and each room might be different - so a customer portal that allows a device to be reconfigured and updated from any PC with a USB port.

 

 

Pure Nuff Stuff

 

I recently launched a new ecommerce site for Penzance-based natural skincare specialists Pure Nuff Stuff. They're a business that currently sell within the UK and overseas that are looking to grow their export sales. It's a very bespoke site with the full set of tools available on my ecommerce platform with a full set of tools for their European wholesalers, order fulfilment agents and so on. Wholesalers love the ability to register/validate their VAT numbers and the fact that it seamlessly shows prices in some 140 currencies all with real time exchange rates - it's allowed them to revise their own pricing to match the market. The ordinary customers appear to love it too with ease of access to the products and ease of ordering being top of the agenda. It is a personal bugbear that I do see some very cool looking ecommerce sites but when it comes to actually trying to find what you want or actually buy something then it's too often a complete ballache. For that reason I tend to favour very simple layouts for selling online that put the products in the forefront and allow the site owner to feature new products, seasonal products etc etc without having to engage a developer or designer to do the work.

I'm not publicising the site too much in my portfolio yet because the new product photography isn't ready yet so we're using pictures off the old site. They're not great.. but when the new stuff, courtesy of Exile Design, is done, and in combo with some great graphic design from Heather Allen, it will look fab. They've been, and are, terrific fun to work with which is great, and makes it even more satisfying to be able to make such a big difference to their business "can't TELL you how happy I am with how it works. And everyone here is just gob-smacked at how much time this saves us every day, honestly, we're such happy bunnies".

 

Ok that'll do for now.. there are a number of other bits and pieces ongoing - online bike configurator and finance application tool for CycleLogic ,  enhancements and new modules for ProCare Sports MedicineCamtec Photo in Montréal, WorldBlu's business learning/gamification platform continues to evolve and a bunch of other stuff. Oh, I even got asked to develop some funkiness in conjunction with the most excellent Exile Design for a Surfers Against Sewage campaign. Not dull :-)

Showing : 6