Taking Your Eyes Off The Road


Source: http://www.flickr.com/photos/viernest/3380560365/

About a year ago, I had a wreck. I totaled my car. I took my eyes off the road to look at my son in the back seat. I put two of my children in danger. Luckily, everything turned out alright.

This week, I have attended the Velocity Conference. It’s not my first time. I have attended all of them but the one last year. Velocity is all about Web Performance and Operations. I attended mostly web and mobile performance tracks. I was quickly reminded (like, first day, first session) of many things I have been wanting to implement to help me know how DealNews.com is doing performance wise. So, like I often do at conferences, I started hacking. This was Tuesday.

By Wednesday morning, I had some stats. Those stats led to more questions. I refactored some of the stats I was collecting. By dinner, I had good data about our page performance. I was pissed.... at myself. As I said before, I didn’t attend Velocity in 2012. In 2012, I attended other things not related to web performance. In doing so, I took my eye off the road. Or in this case, off the performance of DealNews.com.

Now, we still get an A from WebPageTest for first byte. We don’t get any bad scores really. We aren’t doing poorly. The site performance is just no where near where I want it to be. And it is nowhere near where I have been telling people it was. We deliver the first byte in around 500ms for a request that can use cache well. We draw the above the fold in about 1.5 seconds. I have seen way worse sites out there. But, some times, its about yard sticks.


Source: http://www.flickr.com/photos/billhd/3048457153/

You see, if you are measuring using a broken, worn out yard stick, it may not be an actual yard. You need to measure using the the latest greatest, laser cut yard stick. So, when I compare DealNews performance with others, I look to the best of the best.

Amazon
, ShopZilla, and others have openly talked about performance and business success being directly correlated. If that is true for DealNews, there is low hanging fruit to improve our business. And apparently that fruit is rotting its been hanging so long.

I have already found 480ms in the header I can trim down. I am not sure yet how much I can reduce it, but it can be faster. I am hoping I can get it down to 100ms. That would be a huge savings as our header currently finishes in about 980ms on average. That would be cutting more than 25% of our header load time completely out. And that is just the first thing I have found.

I saw other good talks that will help me get back on track as well. One talked about premature optimization. Before I put in the new metrics, I had a theory on what was taking up that time. I was wrong. Not totally wrong. That thing is still taking 150ms, so it is next on the list. But, the other issue is clearly more problematic to me since I assumed it was a non-issue and it caught me by surprise.

If you are asking “Brian, how are you doing this?” I am glad you asked. I am using the window.performance.timing object available in new browers. After the onload event fires a script gathers up this data and send it back to our servers in an XHR request. Server side code then takes that data, does a little math where needed and sends it all through StatsD which in turn shoves it in Graphite. That lets me build graphs and get the data as JSON. That second part is key as I will want to put some automated monitoring on this data to keep an eye on when it may go bad again. There were a lot of talks this week about detecting fault or detecting anomalies as well. So, I will put that to good use with the help of a coworker who loves the hard math problems. If you don't have those things in your stack already, SOASTA mPulse appears to be a good option. I was impressed with Philip Tellis from SOASTA in his talk about JavaScript load blocking. Since the mPulse code runs in a JavaScript tag, I was happy to hear he was so concerned with how it affected their user's performance.

I will post anything I think is useful to the general public. Right now, it looks like code and feature bloat. That is not all that interesting.

Lies, Damned Lies and Google Analytics

Google Analytics (GA) has changed the world of web analytics. It used to be that you only had applications that analyzed logs from your web server. Those were OK for the first few years. But, with bots (and especially ones that lied about being a bot) and more complex web architectures, those logs became less useful for understanding how your users used your web site.

Once such product was Urchin. Years ago, we were users of Urchin, the product and company that Google purchased to create GA. It was the last really good log analyzer out there. We were kind of excited when Google bought them as we were looking at going to their hosted solution which eventually became GA. At the time however they were young and we decided to go with Omniture, the 800 pound gorilla in the space. However, Omniture prices were such that increased traffic meant increased cost with no additional importance in their numbers and no new features from their product. So, we left them. We turned back to GA. In addition we started using Yahoo's recently acquired product, Yahoo Analytics (formerly IndexTools). They were both free so we figured why not use them both. All this was over 2 years ago.

The one big lacking thing for us with all these tools was being able to tie actual user activity in the analytics to revenue. You see, dealnews does some affiliate based business. This means that we get a percentage of sales after the transactions are all closed. So, we may not see revenue for a user action for days or even weeks. All the analytics packages are made for shopping carts. In a shopping cart, you know at the checkout how much the user spent so you can inject that into the system along with their page view. We don't have that luxury. To try and fill this gap, we started keeping our own javascript based session data in logs for various purposes in addition to the two analytics systems.

Recently we decided to try and use the GA API to get page views, visits and unique visitor data about parts of our site so we could, at least at a high level, tie the numbers all together in one report. This gets me to the meat of this post. What we found was quite disturbing. This is a query for a single day for a particular segment of our site. We used the Data Feed Query Explorer to craft our queries for the data we wanted. This was the result.



As you can see, some pages received visitors but no visits. Pages consistently received more visitors than visits which I find quite odd. Sometimes we found that our internal logging would show activity on pages and Google would simply not show any activity for that page for an entire month. Now, I know they have a cutoff. I believe it is still 10,000, the old Urchin number. (C'mon Google, you are Google. You still have this?) They will only store up to 10,000 unique items (page URLs in this case) for a given segment of their reporting data. So, 10k referring domains, 10k pages, etc. etc. Perhaps that is what happened? We dug in and found that to not be the case. And besides, it shows visitors but not visits. What is up with that? Maybe its a date thing. We should look at more days. This is an entire month for the same filters.



Whoa, still have visitors and no visits. In addition, there are lots of multiples of 17 there. 17, 34 both appear a lot. We found that littered throughout our results when filtering the page path. This has to be made up data or something. I don't see how it could be based in any reality.

The odd data was not limited to this apparent missing data. There is apparently extra data in there too. In another discussion about the impact of recent social marketing efforts we went to the analytics to see what kind of traffic social networking sites were sending our way. When comparing GA to YA and our internal numbers, we were left baffled. Sorry, no numbers allowed here - super corporate secrets and all that. But, I can tell you that Google Analytics claims that several large, well known referring sites send 5x to 10x the traffic to us that our other tracking reports. I even wrote custom code to try and follow a user through our logs to see if maybe they were attributing referrers to a visitor on a second visit in the day as belonging to the earlier referrer but could never find any pattern that matched their data. There simply is no way this is right.

There is good news. If you stick with the high level numbers like overall page views, visitors, visits and things like new vs. old visitors, the numbers appear to line up well. In all cases, the differences of all 3 of our resources are within acceptable ranges. But, apparently, drilling down too deep in the data with GA will yield some very unsavory answers.

State of the Browsers and ad blocking

In my last post about CSS layout and ads, a commenter brought up that the dealnews.com web site did not handle extensions like Ad Block very gracefully. To which I responded that I don't care. To which he responded with download counts. Well, the reason I don't care is that ad impressions when compared to page views on dealnews.com are within 2% of each other. So, at best, less than 2% of users are blocking ads. In reality, that is going to include some DNS failures, network issues, or something else. I would bet our logo graphic has about the same difference. The reality is that normal people don't block ads. In my opinion, if you make your money by working on the web, you shouldn't either. I should add that this site's (my geeky blog) ad views was about 16% lower than the recorded page views. So, geeks block ads more I guess. But, geeks have dominated the web for a long time.

This got me thinking that I had not look at the browser stats very much lately. dealnews has a very odd graph on browser statistics. We do not follow the industry averages. Our audience is dominantly tech savy (that does not mean geeks). Our users don't just use the stuff that is installed on the computer when they get it. This kind of proves my point about ad blocking even more. We have non-moron users and they still don't block ads.



Browser   % of Visits
Internet Explorer 42.34%
Firefox 36.94%
Safari 9.55%
Chrome 8.34%
Mozilla 1.46%
Opera 0.68%
Netscape 0.41%
Avant 0.08%
Camino 0.06%
IE Mobile 0.02%

As you can see, Firefox is very prevalent on our site. We generally test in IE7/8, Firefox 3, Safari and Chrome. I will occasionally test a major change in Opera. Typically, well formed HTML and CSS works fine in Opera so everything is all good.

As for operating systems, Windows still dominates, but we have more Macs than the average site I would guess.



OS   % of Visits
Windows 82.95%
Macintosh 11.27%
iPhone 3.80%
Linux 1.19%
Android 0.17%

Interesting that iPhone beats out Linux. That is just another sign to me that Linux is still not a real choice for real people. Be that a product issue from OEMs or user choice. That is debatable. It is notable that most of our company uses Macs. I don't think we make up a speck of that traffic though. If we did, our home state of Alabama would be our most dominant. It isn't. We are very typical in that regard, California is number one. We only have one employee there.