May 2009 12

Google has announced Rich Snippets for their search results today, meaning you’ll be able to have finer grained control over the brief summary that is presented to users when they see your site on the search results page.

Webmasters can now start to participate in Google’s Semantic interpretation of content
The release also marks the first time Google has let us tell its robots a little about the meaning of our texts. Previously, any semantic interpretation of page information was done by Google exclusively, and webmasters had very little control over this process.

This clearly marks the beginning of the public phase of a broader strategy to allow webmasters to participate in Google’s interpretation of the semantic web. We knew Google was deploying semantic indexing, but webmasters never had ANY control whatsoever over the process.

First, an example of a Rich Snippet
Below is an example of a legacy, or “poor snippet”.
Rich Snippet, Poor Snippet

And now, Google’s example of a rich snippet, as shown on the official Google Webmaster blog.
Wealthy snippet - Drooling Dog BBQ from Google webmaster blog rich snippet example.

Semantic Indexing
“Semantic” derives from the Greek word “semantikos” and, in a nutshell, refers to the meanings of words. Syntax, in turn, refers to the graphic representation of words, literally just the sequences of glyphs(characters) which make up text.

Syntactically, Google is a sequence of the glyphs G, o, o, g, l and e.

Semantically, Google(TM) is a brand name belonging to a corporation whose main product is online search systems, aggregated by several peripheral services including contextual ads, email services, news, newsgroups and much more.

As you can see, syntax and semantics are complementary, one allows you to “draw” something, the other defines that something so you know what you’re looking at. When we search the web, it’d be really nice if both these concepts worked together. Google has been working on this for a long time now, and this is the first time they’ve given webmasters the chance to participate in the process.

Legacy Syntactic Search
For a long time, all we’ve had is syntactic search – the engines looked for sequences of letters, not their meanings. You told a search engine you wanted “webmaster tips” and the engine searched for the word webmaster followed by any number of spaces and then the word tips. Variations could include reversing the order of the search terms, searching for tips , a few spaces, and then webmaster for instance, or allowing words to show up in between. The tricks you can play with syntactic search are nearly endless, but it remains a simple search for the terms provided.

Semantic Search
Semantic search is much more complicated and involves the meanings of words instead of just their graphic, literal representation, as we’ve exemplified above with the word “Google”.

Suppose you search for the word “Nikon”. A well designed semantic system will know Nikon produces photographic equipment and nothing else, and it will add “photography”, for example, as a strongly coupled word to the subject being searched. A real life system will actually add as many strongly related terms as it had available.

You couldn’t follow the exact same strategy with a search for Yamaha, for example, because Yamaha builds everything from motorcycles to saxophones. So you couldn’t just assume the search is for a certain subject.

A semantic system would figure this out algorithmically, by interlinking the data it has about Nikon until it “reaches the conclusion” that Nikon only shows up related to photographic products. It would also “discover” that Yamaha is interlinked to a large set of industries.

Therefore a semantic system would search through a much richer universe, providing not only literal results for Nikon but also filtering through several related keyword spaces.

By adding synonyms of the semantically related terms into the mix, you further enrich the search experience. With due care and a technically sound implementation, you can make the system seem pretty smart to the end user.

Added System Complexity
All this comes at the cost of exponential growth in the search system’s complexity, as you might have imagined by now. Any semantic system would also know that “complex” is a synonym of “very expensive”.

Google is likely investing billions into this technology. As smart as Google is as an enterprise, we shouldn’t doubt for a minute how seriously they’re taking semantic search, artificial intelligence and the larger project of being able to thoroughly index and search a more semantic web.

What the launch of Rich Snippets means is that Google is releasing the tip of the semantic iceberg to us webmasters. As you can imagine, internally this is probably “Alpha” software, and it’ll be “Beta” for a long time with us webmasters(it’s sort of a sympathetic Google tradition by now to have a small Beta attached to its logos).

In the future, search systems will be able interlink more and more dimensions of semantically marked data, providing astonishing depth of knowledge in response to very simple search terms.

(Semantically)Relevant links:
Fill this form to let Google know you’re interested in participating
Google Webmaster Central Blog: Introducing Rich Snippets
Google documentation: Marking up structured data
Definition of Semantics at Wikipedia
Hands on, how to mark your code up: About microformats

BTW, Google just made Drooling Dog BBQ famous! You can check out the official Drooling Dog BBQ site and their eclectic menu(Including BBQ for Breakfast!) by clicking here!

May 2009 29

This is old news, but I only found out about it today and thought it was a pretty interesting service, maybe others who haven’t found out about it can find it useful as well.

Google hosts the following libraries for you:
* jQuery
* jQuery UI
* Prototype
* script.aculo.us
* MooTools
* Dojo
* SWFObject
* Yahoo! User Interface Library (YUI)

You can use any of these libs from your site through a Google loader.

<script src=”http://www.google.com/jsapi”></script>
<script>
// Load jQuery
google.load(“jquery”, “1″);

// Now you can use the latest jQuery normally
</script>

Google will update the libraries and they are released and they commit to hosting these libs indefinitely.

Find out more at the official Google AJAX libs API page.

May 2009 29

I spent the whole afternoon trying to download the Chromium source code tarball. Using Firefox it apparently timed out at about 20MB or so and the download is listed as “complete”.

Of course that tarball is damaged. If you try to extract it you’ll get something to the extent of:

stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Apparently, Chromium does not offer a Bit torrent alternative, so I had to make the HTTP download work. First we determine the tarball URL by inspecting the source code of http://build.chromium.org/buildbot/archives/chromium_tarball.html

<a href=”chromium.r16847.tgz”>download the file here</a>
<script>window.location.href = “chromium.r16847.tgz”;</script>

Knowing the URL, we can try the download with wget:

[root@hendrix chromium]# wget http://build.chromium.org/buildbot/archives/chromium.r16847.tgz
–22:04:21– http://build.chromium.org/buildbot/archives/chromium.r16847.tgz
=> `chromium.r16847.tgz.1′
Resolving build.chromium.org… 74.125.65.118
Connecting to build.chromium.org|74.125.65.118|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 677,220,596 (646M) [application/x-gzip]

The problem persisted with wget. After around 20MB I got:
22:10:14 (62.17 KB/s) – Connection closed at byte 22394485. Retrying.

Fortunately wget will restart the download where it left off. Every 5 to 6 minutes, or every ~20MB or so, the connection would be closed. Luckily wget retried the download automatically every time:
22:16:28 (57.79 KB/s) – Connection closed at byte 44458788. Retrying.
22:22:15 (62.61 KB/s) – Connection closed at byte 66517299. Retrying.
22:28:17 (59.59 KB/s) – Connection closed at byte 88428114. Retrying.
22:34:21 (67.80 KB/s) – Connection closed at byte 113358009. Retrying.
22:40:58 (55.17 KB/s) – Connection closed at byte 135483127. Retrying.
… about 30 retries later….
00:28:59 (95.51 KB/s) – `chromium.r16847.tgz’ saved [677220596/677220596]

The problem could be at my ISP, but I didn’t have any problems with other downloads lately. So I guess now I’ll be able to play around with the Chromium source code, I’m really looking forward to that.

May 2009 30

The Earth API allows Javascript code to control a plugin that provides Google Earth-like capabilities for your browser. Below you’ll find Google’s “Hello Earth” sample, which is the simplest possible Earth API application. It just instantiates an Earth canvas and allows you to interact with it.

I thought this was a really neat plugin, you can develop games, trivia quizzes and so on with it, and it’s fully programmable in Javascript, from within your browser.

You’ll find additional Earth API examples here.

You need the Earth Plugin for the sample to run. Instructions should appear on the Earth canvas above.

Jun 2009 01

It may be a mistake to speak at such an early stage but my first impression of Bing.com was a very positive one. The system still has a few rough edges, some snippets were made up of nonsensical text, a search for allinurl for example would result in a completely blank page and so forth. If Gmail is still BETA, this is definitely ALPHA code. But a pretty good one at that.

I was planning on waiting a few more weeks for the media hoopla do die down a bit before giving it a try, but this interview with Woz sent Bing.com to my priority list. What, Steve Wozniak praising a presentation by Steve Ballmer? Something’s different here….

So, I’ve decided to give Bing.com a try. It’ll be my default search engine for the near future, I’ll see how productive I am with it.