About a year ago, I had a wreck. I totaled my car. I took my eyes off the road to look at my son in the back seat. I put two of my children in danger. Luckily, everything turned out alright.
This week, I have attended the Velocity Conference. It’s not my first time. I have attended all of them but the one last year. Velocity is all about Web Performance and Operations. I attended mostly web and mobile performance tracks. I was quickly reminded (like, first day, first session) of many things I have been wanting to implement to help me know how DealNews.com is doing performance wise. So, like I often do at conferences, I started hacking. This was Tuesday.
By Wednesday morning, I had some stats. Those stats led to more questions. I refactored some of the stats I was collecting. By dinner, I had good data about our page performance. I was pissed.... at myself. As I said before, I didn’t attend Velocity in 2012. In 2012, I attended other things not related to web performance. In doing so, I took my eye off the road. Or in this case, off the performance of DealNews.com.
Now, we still get an A from WebPageTest for first byte. We don’t get any bad scores really. We aren’t doing poorly. The site performance is just no where near where I want it to be. And it is nowhere near where I have been telling people it was. We deliver the first byte in around 500ms for a request that can use cache well. We draw the above the fold in about 1.5 seconds. I have seen way worse sites out there. But, some times, its about yard sticks.
You see, if you are measuring using a broken, worn out yard stick, it may not be an actual yard. You need to measure using the the latest greatest, laser cut yard stick. So, when I compare DealNews performance with others, I look to the best of the best.
Amazon, ShopZilla, and others have openly talked about performance and business success being directly correlated. If that is true for DealNews, there is low hanging fruit to improve our business. And apparently that fruit is rotting its been hanging so long.
I have already found 480ms in the header I can trim down. I am not sure yet how much I can reduce it, but it can be faster. I am hoping I can get it down to 100ms. That would be a huge savings as our header currently finishes in about 980ms on average. That would be cutting more than 25% of our header load time completely out. And that is just the first thing I have found.
I saw other good talks that will help me get back on track as well. One talked about premature optimization. Before I put in the new metrics, I had a theory on what was taking up that time. I was wrong. Not totally wrong. That thing is still taking 150ms, so it is next on the list. But, the other issue is clearly more problematic to me since I assumed it was a non-issue and it caught me by surprise.
I will post anything I think is useful to the general public. Right now, it looks like code and feature bloat. That is not all that interesting.
Once such product was Urchin. Years ago, we were users of Urchin, the product and company that Google purchased to create GA. It was the last really good log analyzer out there. We were kind of excited when Google bought them as we were looking at going to their hosted solution which eventually became GA. At the time however they were young and we decided to go with Omniture, the 800 pound gorilla in the space. However, Omniture prices were such that increased traffic meant increased cost with no additional importance in their numbers and no new features from their product. So, we left them. We turned back to GA. In addition we started using Yahoo's recently acquired product, Yahoo Analytics (formerly IndexTools). They were both free so we figured why not use them both. All this was over 2 years ago.
Recently we decided to try and use the GA API to get page views, visits and unique visitor data about parts of our site so we could, at least at a high level, tie the numbers all together in one report. This gets me to the meat of this post. What we found was quite disturbing. This is a query for a single day for a particular segment of our site. We used the Data Feed Query Explorer to craft our queries for the data we wanted. This was the result.
As you can see, some pages received visitors but no visits. Pages consistently received more visitors than visits which I find quite odd. Sometimes we found that our internal logging would show activity on pages and Google would simply not show any activity for that page for an entire month. Now, I know they have a cutoff. I believe it is still 10,000, the old Urchin number. (C'mon Google, you are Google. You still have this?) They will only store up to 10,000 unique items (page URLs in this case) for a given segment of their reporting data. So, 10k referring domains, 10k pages, etc. etc. Perhaps that is what happened? We dug in and found that to not be the case. And besides, it shows visitors but not visits. What is up with that? Maybe its a date thing. We should look at more days. This is an entire month for the same filters.
Whoa, still have visitors and no visits. In addition, there are lots of multiples of 17 there. 17, 34 both appear a lot. We found that littered throughout our results when filtering the page path. This has to be made up data or something. I don't see how it could be based in any reality.
The odd data was not limited to this apparent missing data. There is apparently extra data in there too. In another discussion about the impact of recent social marketing efforts we went to the analytics to see what kind of traffic social networking sites were sending our way. When comparing GA to YA and our internal numbers, we were left baffled. Sorry, no numbers allowed here - super corporate secrets and all that. But, I can tell you that Google Analytics claims that several large, well known referring sites send 5x to 10x the traffic to us that our other tracking reports. I even wrote custom code to try and follow a user through our logs to see if maybe they were attributing referrers to a visitor on a second visit in the day as belonging to the earlier referrer but could never find any pattern that matched their data. There simply is no way this is right.
There is good news. If you stick with the high level numbers like overall page views, visitors, visits and things like new vs. old visitors, the numbers appear to line up well. In all cases, the differences of all 3 of our resources are within acceptable ranges. But, apparently, drilling down too deep in the data with GA will yield some very unsavory answers.
This got me thinking that I had not look at the browser stats very much lately. dealnews has a very odd graph on browser statistics. We do not follow the industry averages. Our audience is dominantly tech savy (that does not mean geeks). Our users don't just use the stuff that is installed on the computer when they get it. This kind of proves my point about ad blocking even more. We have non-moron users and they still don't block ads.
|% of Visits
As you can see, Firefox is very prevalent on our site. We generally test in IE7/8, Firefox 3, Safari and Chrome. I will occasionally test a major change in Opera. Typically, well formed HTML and CSS works fine in Opera so everything is all good.
As for operating systems, Windows still dominates, but we have more Macs than the average site I would guess.
|% of Visits
Interesting that iPhone beats out Linux. That is just another sign to me that Linux is still not a real choice for real people. Be that a product issue from OEMs or user choice. That is debatable. It is notable that most of our company uses Macs. I don't think we make up a speck of that traffic though. If we did, our home state of Alabama would be our most dominant. It isn't. We are very typical in that regard, California is number one. We only have one employee there.