Thursday, April 29, 2004
Description of Google Bug
Here's a picture of the bug in action using the query Garrett sent.
Initially the query Garrett sent me that was causing the errors was C + nÂº + B. But after doing a ton of searches and looking through a seemingly endless amount of html generated by Google's results I found what appears to cause the issue.
Initially I thought it was the special character causing the problem (nÂº) but after conducting some incredibly technical analysis (running a bunch of searches) I found that the actual phrase was having a "B" preceded by a "C" with two or more spaces between the two characters.
The following phrase will not cause the results to be corrupt.
But Reverse the letters and the results are corrupted.
You'll notice that Google replaces spaces (" ") in a search string with the "+" sign. When a user types in a search string with spaces between the words Google interprets those spaces with "+" signs because a url cannot have any spaces in it. If a url has a space in it then it will not render properly so the space should be converted to the hexadecimal equivalent or "%20". You're probably wondering what the hexadecimal equivalent of "+" is and that's a great question. The hex code for "+" is "%2B". If Google translated spaces as "%20" instead of "+" it would probably fix this issue but I'm not sure that's the answer because there are probably hundreds of thousands of functions relying on a space being translated as a "+" that we are unaware of.
Besides, if the phrase B C causes this formatting problem but C B doesn't, then the above solution isn't the answer. As I dug into the results I noticed Google's code that highlights the search phrase in the url was trying to highlight it's own <b> tags. So instead of highlighting the string like this...
<b>b</b>b it was trying to highlight it like this
As you can see, Google's trying to bold it's own bold tags by parsing them out of the results. Could it be that Google is cacheing specific queries, and keeping those highlighted results in it's resultset and then reparsing the urls and applying the highlighting again? Not sure but it's possible. But that doesn't explain why we don't see the same results in the title of the site because highlighting is applied there as well as in the main description of the result. Also, why is it that reversing the order of the keyphrase causes this bug to arise.
Unfortunately, I don't think I can answer all of these questions without looking at code and that's not going to happen. I'll forward this description to Google and see what happens.
As a side note, I tried replacing the "+" in the url with "%20" and received the same scrambled results. Even replacing both +'s with C%20%20B caused the same results. I also tried several two letter combinations and the only one that caused this error was the C B, no others caused it.