Squarepeg Systems


For price quotes or general questions on what we can do for you, please use the following form. Someone will be back with you in one workday, but usually within four hours during the week. If it's something urgent, please call 402-577-0641

Searching For What You Can't Describe

I've been working the Powerhouse Distributing site for about a year and a half. The site has been live for not quite a year, yet I've been busy with changes, whether it's new functionality, fixing bugs, server maintenance, etc. It's been a lot of fun and they are a great customer to work with.

We've had to make quite a few adjustments along the way but the biggest one is with how search is done. Powerhouse Distributing seemed like a normal ecommerce site at the beginning, but when you look closer they aren't at all - because users know what they want to buy, but they don't know what to call it. Let me explain . .

Powerhouse Distribution sells replacement parts for power tools. So, if the guard breaks on your DeWalt saw, you can buy a new guard instead of replacing the entire saw. The trick is -- all you know is that it's a guard on a DeWalt circular saw. You may know the model number of the saw, but that is about it. Normally, because of wear and tear on the tools, a lot of users only have a partial model number. Therefore the user only has partial information of what they are looking for (an upper guard from a DeWalt circular saw whose model number begins with DC3) and they somehow have to get to here.

This is subtly, yet significantly, different than how we usually shop. I know that I want a science fiction book but not always sure which one I want. I go to Amazon, search for scifi books, and then pick one. Or I want to get a Thomas the Tank Engine toy for my son -- I know the exact one I want, and I know how to type that. But, in the case of users of the Powerhouse site, they know exactly what they want (they can look it) but it's hard to describe. How do you make a search for that?

The original search implementation used Whoosh via the django-haystack interface. I'd admit that the original implementation was bit naive, but it did the job of the original set to tools. But as Powerhouse added tools over the next few months, Whoosh took longer to index the data and searches on the site also took longer. Finally, there was a large jump in Whoosh's index time. When the site had 1500 tools, Whoosh was taking two hours every night to index them.

Not only was the backend jobs taking longer, the site was suffering too. I didn't know how bad it was until I started monitoring the site with New Relic. New Relic was able to show me that searches were taking an average of 24 seconds to complete. If that wasn't bad enough, the memory on the server shot up to 95% when someone did a search and it took hours to drop to a reasonable level (that is, if no one else did a search. And someone always did). So we had a bad user experience with something that brought the server to it's knees. It was time to re-think how we did search.

My first inclination was to use ElasticSearch. I've used it before, it's works really well and I knew that it could easily handle the load. But it seemed too heavyweight for this. I didn't need yet another process to monitor, and Powerhouse only updated their site once a day, so most of the time ElasticSearch would be idle. So a lot of that power would be wasted.

I made the choice early on to use PostgreSQL as the database behind Powerhouse's site and, once again, PostgreSQL comes to the rescue with it's full text search functionality. I modeled an implementation based on this excellent post to integrate it with Django. The initial results were fast -- very fast, in fact. I coded up a more robust solution and had Powerhouse test it out. After making a few tweaks based on their feedback, we shut off the old and turned on the new.

Through monitoring I can now tell that the PostgreSQL-powered search now returns in 84ms. That's right -- 0.084 seconds, as opposed to 24 seconds. And doing a search did not have any noticeable effect on the server itself -- it's memory use doesn't change at all. However the best part is that orders suddenly increased after we put in the new search.. I asked Powerhouse if they did anything different with their Google Ads (the main source of their traffic) and they said they didn't. The only change to the site was the new search.

The increase of sales makes perfect sense -- are you going to wait for 24 seconds to a search to come back? Of course not! Making search not be annoying had an immediate return on investment. What business wouldn't want to see that?

posted at:Friday, 10 August 2012 | postgresql, sites, satchmo, django