Developers and Entropy

This is a selfish blog post. I read a great blog post titled "Why You Need To Hire Great Developers" but I could not find it in my browser history or chat history. It talks about entropy creators versus entropy reducers and how bad we are at knowing which one someone is during the hiring process. I wanted to mention it here so my followers could read it and so I could find it again when I was looking for it.

Lock Wait Timeout Errors or Leave Your Data on the Server

If you use MySQL with InnoDB (most everyone) then you will likely see this error at some point. There is some confusion sometimes about what this means. Let me try and explain it.

Let's say we have a connection called A to the database. Connection A tries to update a row. But, it receives a lock wait timeout error. That does not mean that connection A did anything wrong. It means that another connection, call it B, is also updating a row that connection A wants to update. But, connection B has an open transaction that has not been committed yet. So, MySQL won't let you update that row from connection A. Make sense?

The first mistake people may make is looking at the code that throws the error to find a solution. It is hardly ever the code that throws the error that is the problem. In our case, it was code that was doing a simple insert into a table. I had a look at our processing logs around the time that the errors were thrown and I found a job that was running during that time. I then looked for code in that job that updates the table that was locked. This was where the problem lied.

So, why does this happen? Well, there can be very legitimate reasons. There can also be very careless reasons. The genesis of this blog post was some code that appeared to be legitimate at first, but upon further inspection was careless. This is basically what the code did.

  1. Start Transaction on database1
  2. Clear out some old data from the table
  3. Select a bunch of data from database2.table
  4. Loop in PHP, updating each row in its own query to update one column
  5. Select a bunch of data from database2.other_table
  6. Loop in PHP, updating each row in its own query to update another column
  7. Commit database1

This code ran in about 20 minutes on the data set we had. It kept a transaction open the whole time. It appeared legit at first because you can't join the data as there are sums and counts going on that have a one to many relationship which would cause some duplication of the sums and counts. It also looks legit because you are having to pull data from one database into another. However, there is a solution for this. We need to stop pulling all this data into PHP land and let it stay on the server where it lives. So, I changed it to this.

  1. Create temp table on database2 to hold mydata
  2. Select data from database2.table into my temp table
  3. Select data from database2.other_table into my temp table
  4. Move my temp table using extended inserts via PHP from database2 to database1
  5. Start Transaction on database1
  6. Clear out some old data from the table
  7. Do a multi-table bulk update of my real table using the temp table
  8. Commit database1

This runs in 3 minutes and only requires a 90 second transaction lock. Our lock wait timeout on this server is 50 seconds though. However, we have a 3 time retry rule for any lock wait timeout in our DB code. So, this should allow for our current workload to be processed without any data loss.

So, why did this help so much? We are not moving data from MySQL to PHP over and over. This applies to any language, not just PHP. The extended inserts for moving the temp table from one db to another really help. That is the fastest part of the whole thing. It moves about 2 million records from one to the other in about 1.5 seconds.

So, if you see a lock wait timeout, don't think you should sleep longer between retries. And don't dissect the code that is throwing the error. You have to dig in and find what else is running when it happens. Good luck.

Bonus: If you have memory issues in your application code, these techniques can help with those too.

Scaling 101 - We are Failing the Next Generation

The other day Twitter was down and I had no place to comment on Twitter being down. This got us to talking about scaling at work. I was reminded of the recent slides posted from Instagram about their scaling journey. They are great slides. There is only one problem I have with them. They are just the same slides that you would find from 2000 about scaling.

I have to say, I like Instagram. My daughter has something like 1,000 followers on Instagram. And good for them for being bought by Facebook for a bajillion dollars. This is not a dig on them really. This is a dig on our industry. Why did Instagram have to learn the hard way how to scale their app? I want to point out some of their issues and discuss why its silly they had to learn this the hard way.

Single machine somewhere in LA

Why would anyone deploy an app to the app store when the backend is all on one server in this day and age? I am not big poponent of the cloud, but that has to be better than a single server in a rack somewhere. And L.A.? Go for Dallas or somewhere geographically neutral.

favicon.ico

So, this is one of the biggest mistakes I see in all of web application developement. People use Apache or Nginx or whatever and have a mod_rewrite command that sends ALL requsets into their appliction stack. They do this because they are lazy. They want to write whatever code they want later and just have the request picked up without any work. At dealnews, we don't do that. We have controllers. But, we specify in our Apache config what paths are routed to those controllers. The most general we have is:

    RewriteCond %{REQUEST_URI} (\/|.html|.php)$
    RewriteRule ^.+$ /torpedo.php/$1 [L]

So, if the request ends with / or .html or .php, send it to the controller. This is the controller our proxy servers use. Any other requests like robots.txt, images, etc. are all served off disk by Apache. No sense having a PHP process handle that. It's crazy. This is not Instagram's fault. They likely followed some examples they found that other developers before them put on the internet. Why are we doing this to people? Is it some rite of passage? I suspect that the problem is actually that 90% of the web does not require any scaling. Only 10% of the sites and services out there have to actually worry about load. So, this never comes up.

So, the next part of their slides basically glorify the scaling process. This is another problem with people in our field. We thrive on the chaos. These are our glory days. Let's face it, most geeks never won the high school football championship. The days when we are faced with a huge scaling challenge are our glory days. I know I have had that problem. And I played sports as a youth. But, nothing is better than my 2006 war story about a Yahoo front page link. Man, I rocked that shit. But, you know what. The fact that I had to struggle through that means I did not do my job. We should have never been facing that issue. It should have just all worked. That is what it does now. It just works. We don't even know when we get a spike now. It just works. The last thing I want in my life now is an unexpected outage that I have to RAGE on to get the site up again. That leaves me feeling like a failure.

Then they realize they are out of the depths. They need to do things they never thought they would need to do. Why not? Why did they not know they would need to do these things? Unexpected growth? Maybe. Why is this not common knowledge though? With all the talk of the cloud being awesome for scaling, why was this not a button on the AWS dashboard that said [SCALE] that you just push? That is what the cloud does right?

In the end, Instagram learned, the hard way again, that you have to build your own architecture to solve your problem. I learned it the hard way. LiveJournal learned it the hard way. Facebook, Twitter, etc. etc. They have all learned the hard way that there is no single solution for massive scale. You have to be prepared to build the architecture that solves your problem in a way that you can manage. But, there are basic building blocks of all scaling that need to be in place. You should never, ever, ever start with an application on a single server that reads and writes directly to the database with no cache in place. Couch, Mongo, blah blah blah. Whatever. They all need cache in front of them to scale. Just build it in from the start.

Instagram was storing images. Why were they surprised when they ran out of room for the images? I just can't fathom that. Its images. They don't compress. You have to put them somewhere. This has to be an education issue. LiveJournal solved this in 2003 with MogileFS and Gearman. Why did they not build their arch on top of that to start with? Poor education, that is why.

One thing they bring up is one that has me bugged is monitoring. There is no good solution for this. Everyone ends up rolling their own solutions using a few different tools. The tools are often the same, but the metrics, and how they are reported and monitored are all different. I think there is a clear need in the industry for some standards in this area.

if you’re tempted to reinvent the wheel ... don't

I did find this slide funny. Reading these slides for me is like seeing them reinvent the whole wheel that is scaling. This has all been done before. Why are they having to learn it the hard way?

don’t over-optimize or expect to know ahead of time how site will scale

I take exception to this slide however. You should have some idea how to scale your app before you deploy it. It is irresponsible and cowboy to deploy and think "oh, we will fix it later". That is true no matter what you are doing. Don't give me the lines about being a start up and all that. It is just irresponsible to deploy something you know won't hold up. For what it is worth, I think these guys really had no clue. And that is because we, as an industry, did not make it known to them what they were up against.

How do we fix this?

Stop comparing stuff you don't understand

I normally don't do this. When I see someone write a blog post I don't agree with, I often just dismiss it and go on. But, this particular one caught my attention. It was titled PHP vs Node.js: Yet Another Versus. The summary was:

Node.js = PHP + Apache + Memcached + Gearman - overhead

What the f**k? Are you kidding me? Clearly this person has NEVER used memcached or Gearman in a production environment that had any actual load.

Back in the day, when URLs and filesystems had a 1:1 mapping, it made perfect sense to have a web server separate from the language it is running. But, nowadays, any PHP app with attractive URLs running behind the Apache web server is going to need a .htaccess file, which tells the server a regular expression to check before serving up a file. Sound complex and awkward with unnecessary overhead? That’s because it is.

Node has a web server built in. Some people call this a bad thing, I call those people crazy. Having the server built in means that you don’t have the awkward .htaccess config thing going on. Every request is understood to go through the same process, without having to hunt through the filesystem and figure out which script to run.
He believes that PHP inside Apache requires a .htaccess file. Welcome to 1999. I have not used a .htaccess file since then. Anyone that cares at all about scaling Apache would disable .htaccess files. And as for running regexs, how does he propose you decide your code path in the controller of his Node.js code? Something somewhere has to decide what code is going to answer a given request. mod_rewrite is wire speed fast and compiled in C. Javascript nor PHP code could ever beat that.

The official website is quite ugly and outdated.
Really? You choose your tools based on that? I don't know what to say.

Since a PHP process starts, does some boilerplate work, performs the taks the user actually wants, and then dies, data is not persistent in memory. You can keep this data persistent using third party tools like Memcache or traditional database, but then there is the overhead of communicating with those external processes.
He clearly has no understanding of how memcached is supposed to be used. You don't put things in Memcached so you can use it on the next request on this server. You put things in memcached so you can use it in any request on any server in your server pool. If you just have one web server, you can write Perl CGI scripts. Performance and up time is not important to you. If you want to share things across requests in PHP, APC and XCache fill the need very well.

The number one bottleneck with web apps is not the time it takes to calculate CPU hungry operations, but rather network I/O. If you need to respond to a client request after making a database call and sending an email, you can perform the two actions and respond when both are complete.
This sums up the mythical magic of Node.js. People think just because your code is not "running" that somehow the server is not doing anything. The process does not get to go do other shit. No, that would be multi-threaded. And Node.js is not multi-threaded. It is single threaded. That means the process can only be doing one thing at a time. If you are waiting on a DB call, you are waiting. I don't care what world you think you live in. You are waiting on that DB call. How your code is structured is irrelevant to how computers actually work. The event driven nature that is Node.js is much like OOP. You are abstracting yourself from how computers really work. The further you get from that, the less you will be able to control the computer.
Node.js is a very new, unstable, untested platform. If you are going to be building a large corporate scale app with a long lifetime, Node.js is not a good solution. The Node API’s are changing almost daily, and large/longterm apps will need to be rewritten often.
So, if I plan on making a living, don't use Node.js. Got it. We finally agree on something.
Being so new, it doesn’t have a lot of baggage leftover from days of old. Having a server built in, the stack is a lot simpler, there are less points of failure, and there is more control over what you can do with HTTP responses (ever try overwriting the name of the web server using PHP?).
No, why would you? It is not the web server. Apache or nginx would be your web server.
  • Are you building some sort of daemon? Use Node.
  • Are you making a content website? Use PHP.
  • Do you want to share data between visitors? Use Node.
  • Are you a beginner looking to make a quick website? Use PHP.
  • Are you going to run a bunch of code in parallel? Use Node.
  • Are you writing software for clients to run on shared hosts? Use PHP.
  • Do you want push events from the server to the client using websockets? Use Node.
  • Does your team already know PHP? Use PHP.
  • Does your team already know frontend JavaScript? Node would be easier to learn.
  • Are you building a command line script? Both work.
Yes, of course you would not build a daemon in PHP. Do you plan to share that same data across servers? He already told us Node is not multi-threaded, so how can it run code in parallel? Websockets have a ton of their own pain to deal with that is not even related to Node vs. PHP. Things like proxies. If I was building a command line tool and wanted to use Javascript, I would just use V8.

Listen, I write code in PHP and JavaScript all day. I also use some Ruby, Lua and even dabble in C. I am not a language snob. Use what works for you. I do however take exception when people write about things they clearly have no idea about. He claims to have written a lot of PHP. He clearly has never deployed a lot of PHP in environments that matter. If you are building small sites that don't have a lot of traffic, you can use anything. If you are building massive sites that have to scale, any technology is going to require a full understanding of what it takes to scale it out. I leave you with this wisdom that I am reminded of by his blog post.

PHP Coding Standards

Update: Matthew Weier O'Phinney, one of the core members of the group, has cleared up the naming history in the comments.

During the /dev/hell podcast at Tek12, someone asked the guys their opinion about PSR. I did not know what PSR was by that name. A quick search lead me to the Google Group named PHP Standards Working Group. I had vaguely remembered a consortium of frameworks, libraries and applications that were organizing to attempt to make their projects cooperate better. But, this did not sound like the same project. Another search and I found the PHP Framework Interoperability Group on Github. A bit more searching led me to a post where apparently the PHP FIG changed their name at some point citing people not knowing what FIG meant. But, this is not a history post. The group had done some work on setting a standard for auto loaders in PHP. This is a very good thing and much needed. That is a real thing that impacts real developers.

The person asking the question had asked about PSR1 and PSR2. These are the first two standards proposals in the group and they deal with coding standards. There were mixed feelings in the room about the proposals. I asked (being me, probably with very little tact) why in 2012 were a group of really smart people still discussing coding standards such as tabs vs. spaces. Because this is what immediately came to mind for me.


Source: http://xkcd.com/927/

There are already coding standards for PHP and any other language out there. Why does anyone need to make a new one? For Phorum we chose the PEAR standard (ok, with 2 minor modifications). On top of that, every one of the projects in this group already have coding standards. Why not just pick one of those? Are 10 projects that currently have their own standards going to actually all change to something else? I highly doubt it. My guess is that, at best, they will all end up with a modified version of the groups standards.

This reminds me a lot of Open Source licenses. There are tons of these things. And in the end, most (GPL has its issues I know) of the open source licenses represent the same idea. I suppose you could say that most of all of them fall into GPL like or BSD like. Anyhow, I quit worrying about having my own license years ago. I now just use a BSD style license that you can generate with several online BSD license generators.

When I voiced my concern about what is, in my opinion, a waste of very smart people's time, my good friend Cal Evans (He has bled in my car. So, I think he is my friend. And I hope he feels the same.) said that I was misunderstanding the point of the group. It was a group of projects that were collaborating to try and use similar standards and practices to make the PHP OSS community better. And that is exactly what I thought PHP FIG was. However, the group name is now "PHP Standards Working Group". That reminds me of the W3C HTML Working Group. And in my mind that means a group that is deciding the future of a technology. In addition the proposal being discussed is titled "PSR-1, a standard coding convention for PHP". If you pair that with the name of the group, it sounds very authoritative. And I don't think that is by accident. If I was heading up such an effort, I would hope that every PHP developer on the planet would follow it too. If you saw Terry Chay's keynote at the PHP Community Conference last year, he talked about frameworks and platforms. He pointed out that the reason people like Facebook were sharing their data center technology was in hopes that people would start using it and it would become common. Thus meaning the equipment they are custom building would be cheaper and people they hire would already be familiar with it. But, if the point of the group is *only* cooperation between lage OSS PHP projects, I wish they would pick a name that is more indicative of that. As it stands, when I landed on the page, my immediate assumption was that this groups intention was to dictate to the rest of the PHP world how to write their PHP code.

In the end, cooperation is good. And if these guys want to cooperate I say more power to them. I just hope they get into really good things soon. Like, can we talk about a maximum number of files, functions or classes used for any one single page execution? *That* would be valuable to the PHP community. I can deal with funny formatting. I can't deal with poorly performing code that his dragged down in abstracton and extension. Or how about things like *never* running queries inside loops that are reading results from another query. That would be a great thing to make examples of and show people the best practices. Tabs vs. spaces? That should have been solved 10 years ago. When in doubt, PHP code should do what the PHP core does. This is PHP we are talking about. Would it not make sense to have the people who write PHP writing code that is somewhat similar in style to those that make PHP? C and PHP syntax are very, very similar. So, why don't we all just refer to the PHP CODING_STANDARDS file when in doubt and not even worry with the little stuff that does not affect performance?

So, as of a few minutes ago, I have joined the group. If for no other reason, just to see what is discussed. Perhaps I should follow the advice I give people when they ask for features in my projects and do something about my issues and worries.

Living in the Prove It Culture

Engineering cultures differ from shop to shop. I have been in the same culture for 13 years so I am not an expert on what all the different types are. Before that I was living in Dilbert world. The culture there was really weird. The ideas were never yours. It was always some need some way off person had. A DBA, a UI "expert" and some product manager would dictate what code you wrote. Creativity was stifled and met with resistance.

I then moved to the early (1998) days of the web. It was a start up environment. In the beginning there were just two of us writing code. So, we thought everything we did was awesome. Then we added some more guys. Lucky for us we mostly hired well. The good hires where type A personalities that had skills we didn't have. They challenged us and we challenged them. On top of that, we had a CEO who had been a computer hacker in his teens. So, he had just enough knowledge to challenge us as well. Over the years we kept hiring more and more people. We always asked in the interview if the person could take criticism and if they felt comfortable defending their ideas. We decided to always have a white board session. We would ask them questions and have them work it out on a white board or talk it out with us in a group setting. The point of this was not to see if they always knew the answer. The point was to see how they worked in that setting. Looking back, the hires that did not work out also did not excel in that phase of the interview. The ones that have worked out always questioned our methods in the interview. They did not belittle our methods or dismiss them. They just asked questions. They would ask if we had tried this or that. Even if we could quickly explain why our method was right for us, they still questioned it. They challenged us.

When dealing with people outside the engineering team, we subconsciously applied these same tactics. The philosophy came to be that if you came to us with an idea, you had to throw it up on the proverbial wall. We would then try to knock it down. If it stuck, it was probably a good idea. Some people could handle this and some could not. The ones that could not handle that did not always get their ideas pushed through. It may not mean they were bad ideas. And that is maybe the down side of this culture. But, it has worked pretty well for us.

We apply this to technology too. My first experience on Linux was with RedHat. The mail agent I cut my teeth on was qmail. I used djbdns. When Daniel Beckham, our now director of operations, came on, he had used sendmail and bind. He immediately challenged qmail. I went through some of the reasons I prefered it. He took more shots. In the end, he agreed that qmail was better than sendmail. However, his first DNS setup for us was bind. It took a few more years of bind hell for him to come around to djbdns.

When RedHat splintered into RedHat Enterprise and Fedora, we tried out Fedora on one server. We found it to be horribly unstable. It got the axe. We looked around for other distros. We found a not very well known distro that was known as the ricer distro of the Linux world called Gentoo. We installed it on one server to see what it was all about. I don't remember now whose idea it was. Probably not mine. We eventually found it to be the perfect distro for us. It let us compile our core tools like Apache, PHP and MySQL while at the same time using a package system. We never trusted RPMs for those things on RedHat. Sure, bringing a server online took longer but it was so worth it. Eventually we bought in and it is now the only distro in use here.

We have done this over and over and over. From the fact that we all use Macs now thanks to Daniel and his willingness to try it out at our CEO's prodding to things like memcached, Gearman, etc. We even keep evaluating the tools we already have. When we decided to write our own proxy we discounted everything we knew and evaluated all the options. In the end, Apache was known and good at handling web requests and PHP could do all we needed in a timely, sane manner. But, we looked at and tested everything we could think of. Apache/PHP had to prove itself again.

Now, you might think that a culture of skepticism like this would lead to new employees having a hard time getting any traction. Quite the opposite. Because we hire people that fit the culture, they can have a near immediate impact. We have a problem I want solved and a developer that has been here less than a year suggested that Hadoop may be a solution, but was not sure we would use it. I recently sent this in an email to the whole team in response to that.
The only thing that is *never* on the table is using a Windows server. If you can get me unique visitors for an arbitrary date range in milliseconds and it require Hadoop, go for it.
You see, we don't currently use Hadoop here. But, if that is what it takes to solve my problem and you can prove it and it will work, we will use it.

Recently we had a newish team member suggest we use a SAN for our development servers to use as a data store. Specifically he suggested we could use it to house our MySQL data for our development servers. We told him he was insane. SANs are magical boxes of pain. He kept pushing. He got Dell to come in and give us a test unit. Turns out it is amazing. We can have a hot copy of our production database on our dev slices in about 3 minutes. A full, complete copy of our production database in 3 minutes. Do you know how amazing that is? Had we not had the culture we do and had not hired the right person that was smart enough to pull it off and confident enough to fight for the solution, we would not have that. He has been here less than a year and has had a huge impact to our productivity. There is talk of using this in production too. I am still in the "prove it" mode on this. We will see.
I know you will ask how our dev db works, here you go:
1. Replicate production over VPN to home office
2. Write MySQL data on SAN
3. Stop replication, flush tables, snapshot FS
4. Copy snapshot to a new location
5. On second dev server, umount SAN, mount new snapshot
6. Restart MySQL all around
7. Talk in dev chat how bad ass that is

We had a similar thing happen with our phone system. We had hired a web developer that previously worked for a company that created custom Asterisk solutions. When our propietary PBX died, he stepped up and proved that Asterisk would work for us. Not a job for a web developer. But he was confident he could make it work. It now supports 3 offices and several home bound users world wide. He also had only been here a short time when that happened.

Perhaps it sounds like a contradiction. It may sound like we just hop on any bandwagon technology out there. But no. We still use MySQL. We are still on 5.0 in fact. It works. We are evaluating Percona 5.5 now. We tried MySQL 5.1. We found no advantage and the Gentoo package maintainer found it to be buggy. So, we did not switch. We still use Apache. It works. Damn well. We do use Apache with the worker MPM with PHP which is supposedly bad. But, it works great for us. But, we had to prove it would work. We ran a single node with worker for months before trusting it. Gearman was begrudgingly accepted. The idea of daemonized PHP code was not a comforting one. But once you write a worker and use it, you feel like a god. And then you see the power. Next thing you know, it is a core, mission critical part of your infrastructure. That is how it is with us now. In fact, Gearman has went from untrusted to the go to tech. When someone proposes a solution that does not involve Gearman, someone will ask if part of the problem can be solved using Gearman and not whatever idea they have. There is then a discussion about why it is or is not a good fit. Likewise, if you want to a build a daemon to listen on a port and answer requests, the question is "Why can't you just use Apache and a web service?" And it is a valid question. If you can solve your problem with a web service on already proven tech, why build something new?

This culture is not new. We are not unique. But, in a world of "brogramming" where "engineers" rage on code that is awesome before it is even working and people are said to be "killing it" all the time, I am glad I live in a world where I have to prove myself everyday. I am the most senior engineer on the team. And even still I get shot down. I often pitch an idea in dev chat and someone will shoot it down or point out an obvious flaw. Anyone, and I mean anyone, on the team can question my code, ideas or decisions and I will listen to them and consider their opinion. Heck, people outside the team can question me too. And regularly do. And that is fine. I don't mind the questions. I once wrote here that I like to be made to feel dumb. It is how I get smarter. I have met people that thought they were smarter than everyone else. They were annoying. I have interviewed them. It is hard to even get through those interviews.

Is it for everyone? Probably not. It works for us. And it has gotten us this far. You can't get comfortable though. If you do foster this type of culture, there is a risk of getting comfortable. If you start thinking you have solved all the hard problems, you will look up one day and realize that you are suffering. Keep pushing forward and questioning your past decisions. But before you adopt the latest and greatest new idea, prove that the decisions your team makes are the right ones at every step. Sometimes that will take a five minute discussion and sometimes it will take a month of testing. And other times, everyone in the room will look at something and think "Wow that is so obvious how did we not see it?" When it works, it is an awesome world to live in.

Errors when adding/subtracing dates using seconds

This just came up today again for me. I have said it before, but even I get lazy and forget. When doing math with dates such as adding days it is really quick to think this works:
<?php

$date = "2011-11-01";

// add 15 days

$new_date = date("Y-m-d", strtotime($date) + (86400 * 15));

// $new_date should be 2011-11-16 right?

echo $new_date;

?>
This yields `2011-11-15`. The problem with this is that it assume that there are only 86400 seconds in every day. There are in fact not. On days when the clocks change for daylight savings time, there are either 1 hour more than that or 1 hour less than that. In addition, there are also leap seconds put into our time system to keep us in line with the sun. There is one this year, 2012, on June 30th in fact. Since they don't happen with the regularity that daylight savings time does, it may be easy to forget those. Luckily, for this problem, the solution is the same. You have two choices. And the solution you choose depends on the particular problem you have. For the simple problem above, you can simply let strtotime take care of it for you.
$new_date = date("Y-m-d", strtotime($date." +15 days"));
strtotime() is the most awesome date/time related function in all of computer programming. I have written about it before. It handles all those nasty weird seconds issues. But, if you are not solving a problem this simple or you are reading this and need help with another language, there is another solution. You do all your date math at noon. Simply only run code that does date math during lunch time. No, just joking. That would be silly. What I mean is to adjust all your date/time variables to represent the time at 12:00 hours.
$new_date = date("Y-m-d", strtotime($date." 12:00:00") + 86400 * 15);
Now that you are doing date math at noon, you will be safe for most any date range you are doing math on. Daylight savings time always gives and hour and takes an hour every year, so those cancel each other out. It would take a whole lot of leap seconds to cause the offset from the start date to the end date to shift enough to make this technique no longer work, but it is technically feasible. So, if you are adding hundreds of years to dates, this won't work for you. There, disclaimer added.

UPDATE: I should point out that the examples in this post only apply to US time zones that recognized Daylight Savings Time and the rules for DST as they apply after the changes in 2007.

Google is breaking the Web

So, Google announced today that they would start doing a couple of new things. First, they are going to start sending all logged in users to their SSL enabled search page. Secondly, they claimed they are going to stop sending search terms to the sites in the referring URL. They claim all of this for your security. The first one I buy. SSL is better for those people on public wifi, no doubt. The second, I don't think so.

Let's back up a step. For those that don't know how the web works, this is a quick lesson. When you click on a link on a site, your browser connects to that new site to get the page. Part of that communication is to tell the site you are going to what site the user was on when they clicked on your URL. This is a good thing. There is never any information passed between Google and your site. Its all between your browser on your computer and the site you are asking your compter to load. It helps site owners know who is linking to them. In the case of search engines, the referring URLs often contain the search terms someone typed in to find their site. This is also helpful for lots of reasons. None of them involve a user's security.

Ok, so, Google claims they are going to remove your search terms. But, my tests show they are removing the whole referring URL. Yes, you will not know what users are coming from Google. Let me show you. This is what I did.
  1. I typed http://www.google.com/ into my browser
  2. I searched for dealnews
  3. I clicked on the first link, which is the dealnews.com front page.
Using a tool called HTTPFox I am able to see what information is being passed between my computer and the web sites. This is what I see:
  1. http://www.google.com/ with no referring URL
  2. http://www.google.com/#hl=en&sugexp=kjrmc&cp=5&gs_id=j&xhr=t&q=dealnews&qe=ZGVhbG4&qesig=YtB_HodN2qCOIiqwx_wetA&pkc=AFgZ2tlle01GJ99f38Ol-HvrY0sbiq4vzJfAPDSXGQ2js5QqyHGJ9-5HIgoFXbUujrU81pfyhEVO8jpmFouC09MG1fRbqd0GVA&pf=p&sclient=psy-ab&site=&source=hp&pbx=1&oq=dealn&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=7b65204da701ddb7&biw=1295&bih=1406 with no referring URL because Google use javascript to load the search results.
  3. http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CCwQFjAA&url=http%3A%2F%2Fdealnews.com%2F&rct=j&q=dealnews&ei=EPOdTtaUN4XOiAKZlIntCQ&usg=AFQjCNEN2YJ8XgSAJm6FOUqK2PuBUOkfxA&sig2=N2jBSsJb8sgPsrTkGgFCfw&cad=rja with a referrring URL of http://www.google.com/
  4. http://dealnews.com/ with a referring URL of http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CCwQFjAA&url=http%3A%2F%2Fdealnews.com%2F&rct=j&q=dealnews&ei=EPOdTtaUN4XOiAKZlIntCQ&usg=AFQjCNEN2YJ8XgSAJm6FOUqK2PuBUOkfxA&sig2=N2jBSsJb8sgPsrTkGgFCfw
As you can see, the request to http://dealnews.com/ was sent a URL by my browser telling that site that Google was linking to dealnews. In that URL you will see q=dealnews. That is the search term I typed into Google. Now, lets see what happens when I do the same thing on SSL.
  1. https://www.google.com/ with no referring URL
  2. Redirected to https://encrypted.google.com/ with no referring URL
  3. https://encrypted.google.com/#hl=en&sugexp=kjrmc&cp=8&gs_id=f&xhr=t&q=dealnews&tok=wzChADhZTTjwPuXR1iOwSA&pf=p&sclient=psy-ab&site=&source=hp&pbx=1&oq=dealnews&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=47f2f62d0e6da959&biw=1295&bih=1406 with no referring URL because Google use javascript to load the search results.
  4. https://encrypted.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CCsQFjAA&url=http%3A%2F%2Fdealnews.com%2F&rct=j&q=dealnews&ei=x_edTvjlGeKviQKzmdHqCQ&usg=AFQjCNEN2YJ8XgSAJm6FOUqK2PuBUOkfxA&sig2=OEhW8Z_BhHcCboIzu_Z2zQ with a referring URL of https://encrypted.google.com/
  5. http://dealnews.com/ with no referring URL.
So, if dealnews.com does not get a referring URL, how do we know this came from Google. This is the quote from the Google blog post:
When you search from https://www.google.com, websites you visit from our organic search listings will still know that you came from Google, but won't receive information about each individual query.
I ask you how the site will know that if there is no referring URL? Referring URLs are a fundamental part of the web. If Google wants to strip data off the URL, that is one thing. It is not great IMO, but whatever. But, not sending referrers at all is  just wrong and should be changed.

If you care, please share this post. Tweet it, +1 it, whatever. This is just bad news for the web.

Edit: I wanted to make sure everyone knew, I observed the same behavior in both Firefox 7 and latest Google Chrome

Edit 2: I have also confirmed with the Apache access logs that no referring URL was sent.

Lightbox using YUI

I made this a while back but have not blogged about it. I needed a lightbox for a project. So, I looked around. Like usual, none of them were right for me. In addition to that, I am not on the jQuery band wagon. No particular reason. I just never latched on to the weird syntax (They made a function named $ so it could kind of sort of read like PHP, come on). Anyhow, not a jQuery hate post. I did however take a liking to YUI. It is more my style. Less magic stuff and more of a real library. I feel like I can use it where I need to and not use it when I don't. Again, not a YUI love fest post. Long story short, I chose to make my lightbox with the help of YUI. I have just uploaded the code to GitHub. Maybe someone else will find it useful. I have an example page as well.

Features
  • Lightbox and contents can be created via:
    • Javascript
    • Existing markup
    • Asynchronus (AJAX) request (contents only)
  • Lightbox will close when ESC is pressed
  • Rounded corners and box shadow using CSS3
  • Exposes an onClose event that can be subscribed to

Compatibility
  • Firefox 3.5+
  • Chrome 4+
  • Safari 5+
  • Internet Explorer 9 (Animation is not working due to YUI's lack of opacity support in IE9)
  • Internet Explorer 8 (Via IE9 in IE8 compatibility mode. No rounded corners, no shadows)
  • Internet Explorer 7 (Via IE9 in IE7 compatibility mode. No rounded corners, no shadows)
  • Opera 10.53
YUI Requirements
  • Dom
  • Event
  • Connection Core
  • JSON
  • Selector
  • Animation (if animate set to true)