I have been working on adding some sharing features to dealnews.com. Dealing with Facebook and Twitter has been nothing if not frustrating. Neither one seems to understand how to properly deal with escaping a URL. At best they do it one way, but not all ways. At worst, they flat out don't do it right. I thought I would share what we found out so that someone else my be helped by our research.

Facebook

Facebook has two main ways to encourage sharing of your site on Facebook. The older way is to "Share" a page. The second, newer, cooler way to promote your page/site on Facebook is with Facebook's Like Button. Both have the same bug. I will focus on Share as it is easier to show examples of sharing. To do this, you make a link and send it to a special landing page on Facebook's site. But, lets say my URL has a comma in it. If it does, Facebook just blows up in horrible fashion. The users of Phorum have run into this problem too. In Phorum, we dealt with register_globals in a unique way long ago. We just don't use traditional query strings on our URLs. Instead of the traditional var1=1&var2=2 format, we decided to use a comma delimited query string. 1,2,3,var4=4 is a valid Phorum URL query string.

According to RFC 3986, a query string is made up of:
query = *( pchar / "/" / "?" )
where pchar is defined as:
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
and finally, sub-delims is defined as:
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / 
"*" / "+" / "," / ";" / "="
That is RFC talk for "A query string can have an un-encoded comma in it as a delimiter." So, in Phorum we have URLs like http://www.phorum.org/phorum5/read.php?61,145041,145045. That is the post in Phorum talking about Facebook's problem. It is a valid URL. The commas do not need to be escaped. They are delimiters much like an & would be in a traditional URL. So, what happens when you share this URL on Facebook? Well, a share link would look like http://www.facebook.com/share.php?u=http%3A%2F%2Fwww.phorum.org%2Fphorum5%2Fread.php%3F61%2C146887%2C146887. If I go to that share page and then look in my Apache logs I see this:
66.220.149.247 - - [18/Nov/2010:00:47:51 -0600] "GET /phorum5/read.php?61%2C146887%2C146887 HTTP/1.1" 302 26 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
Facebook sent %2C instead of comma? It decoded the other stuff in the URL. The slashes, the question mark, all of it. So, what is their deal with commas? Well, maybe I can hack Facebook and not send an encoded URL to the share page. Nope, same thing. So, they are proactively encoding commas in URL's query strings.

This has two effects. The first is that the share app attempts to pull in the title, description, etc. from the page. In this case, we redirect the request as the query string is invalid for a Phorum message page. So, they end up getting the main Phorum page. In the case of dealnews, we usually throw a 400 HTTP error when we get invalid query strings. Neither of these get the user what he wanted. The second problem is that the URL that is clickable when the user has shared the URL is not valid. So, the whole thing was just a huge waste of time.

I have submitted this to the Facebook Bugzilla. The only work around is to use a URL shortener or don't use commas in your URLs. Just make sure the shortener does not use commas. I guess you could use special URLs for Facebook that used something besides comma that are then redirected to the real URL with commas. I don't know what that character is, I am just guessing.

Twitter

Twitter's issues deal with their transition from their old interface to their new interface. Twitter is in the process of (or is done with) rolling a new UI on their site. The link in the old site to share something on Twitter was something like: http://twitter.com/home?status=[URL encoded text here]. This worked pretty darn well. You could put any valid URL encoded text in there and it worked. However, that now redirects you to their new interface's way of updating your status and they don't encode things right.

If I want to tweet "I love to eat pork & beans" I would make the URL http://twitter.com/home?status=I+love+to+eat+pork+%26+beans. Twitter then takes that, decodes the query string and redirects me to http://twitter.com/?status=I%20love%20to%20eat%20pork%20&%20beans. The problem is that they did not re-encode the &. It is in the bare URL. So, when I land on my twitter page, my status box just says "I love to eat pork ". Which while true, is not what I mean to tweet. This bug has been submitted to Twitter, but has yet to be fixed.

The second problem is with the new site and how they deal with validly encoded spaces. Spaces can be escaped two ways in a URL. The first, older way (which the PHP function urlencode uses) is to encode spaces as a plus (+) sign. This comes from the standard for how forms submit (or used to submit) data. It is understood by all browsers. The second way comes from the later RFC's written about URLs. They state that spaces in a URL should be escape like other characters by replacing a space with %20. The old Twitter UI would accept either one just fine. And, if you send that to the old status update URL it will redirect you (see above) with %20 in the URL instead of +. However, if you send + to the new Twitter UI, as above, you get "I+love+to+eat+pork+&+beans" in your status box. The only solution is to not send + has an encoding for space to Twitter. In PHP you can use the function rawurlencode to do this. It conforms to the RFC(s) on URL encoding. Doing so, with thew new linking pattern generates the URL http://twitter.com/?status=I%20love%20to%20eat%20pork%20%26%20beans which works great. This was also reported to Twitter as a bug by our team.

So, maybe that will help someone out that is having issues with sharing your site on the two largest social networks. Good luck with your social media development.