May 2009 11

Nearly 2 years ago Slashdot covered the news that Airbus was to include Linux in every seat on the new A380. Well, we just found out(the fun way) that the A330 already does.


Linux 2.4 a bordo de Airbus da TAP

Kudos to Fabio Pili for capturing this during his trip to Portugal!

May 2009 11

In May last year I updated our CPAN installation as I normally do every 3 months or so. The routine will seem familiar to all Perl developers out there. On the CPAN shell just run install Bundle::* for most core modules, test, be happy. This one particular time it got a little more complicated than that. It was my own Perl Chernobyl – a simple procedure that should have taken 30 minutes became a system meltdown.

Flashback
Since 1999 my trust in CPAN has been unshakable, the Perl community itself has been one of the biggest quality assurance teams ever grouped together(perhaps rivaled only by Linux, but that was a few years after Perl was out there). What’s most impressive is a complete absence of micro-management by Larry Wall. Perl is an entirely autonomous slightly organized anarchic community of hackers that do what they do because they love Perl, not because a czarish group leader is spiking them with a fork.

Larry Wall’s object encapsulation principle, that you shouldn’t go into other people’s houses because you weren’t invited, not because the door is locked and the owner has a gun, prevails in the Perl community as well.

I was involved with Hibernate from early 2006 up until late 2007 in many projects. When I finished my work with Java I went looking for similar ORM alternatives in Perl, to try out on my own projects. Heck, even Dr. Dobbs had me featured as a Hibernate developer in Brazil…so I desperately needed to get back to Perl.

I tested Rose::DB(fast!) and Class::DBI without much success, not that I have anything against these great modules, just that I was missing something…and even I didn’t know exactly what it was.

 My homegrown can of wORMs
I also forgot the “lazyness” virtue for a bit and tried my own solution, which unfortunately ended up on a production system. My custom ORM was becoming a monster, its shadow pending over me scared me to heck and just the thought of maintaining my own ORM code freaked me out. It was dirty, clever code, full of hacks, cryptic variables, and too many specifics for the system it was running under(a large tourism system that grouped years and years of heterogenous data), so I couldn’t release it as a CPAN module to share my development nor reuse it on my other projects. All in all, the lesson was taken: lazyness IS a programmer virtue. Cultivate it. Time to find a Perl ORM on CPAN.

So…to make a long story short, fast forward and for over a year now I’ve been in love with the DBIx::Class ORM framework.

It doesn’t feature “too much magic”, it lets you hack some SQL in there when you need it, it reuses other CPAN modules for the SQL generation and is overall a very granular, very well architected system. All this without losing the Perl attitude, it feels like Perl and it just works, period.

I went full throttle with DBIx::Class, or DBIC as they call it. Since I couldn’t find any books on the subject(are there any yet?) I read all I could read about it online. The DBIC documentation has matured with time, as did my own projects using the modules. Slowly I’ve been migrating thousands and thousands of lines of code from my older projects to DBIC code. Right now I’ve developed this natural feel for DBIC. It’s consistent, works a dandy. The DBIC community is alive and active, giving support and kindly answering all kinds of questions on the dbix-class@lists.scsys.co.uk mailing list.

Back to May 2008 and my CPAN upgrade
So, as I settled with DBIx::Class, I got used to upgrading the system every so often. DBIC is still under 1.0, so it’s changing constantly. I kept my routine upgrades blindly, religiously, happening until May 2008.

That’s when an otherwise tranquil CPAN upgrade went sour. It was the first time a CPAN module in release status broke my system.

After the CPAN ritual was over everything seemed fine until screens went blank on my browser. No data was returned from my HTTP requests. I went over to check the Apache server logs and this is the war zone I found:
[Tue May 27 06:01:16 2008] [notice] child pid 14013 exit signal Segmentation fault (11)
[Tue May 27 06:01:47 2008] [notice] child pid 13698 exit signal Segmentation fault (11)
[Tue May 27 06:02:04 2008] [notice] child pid 13678 exit signal Segmentation fault (11)
[Tue May 27 06:56:07 2008] [notice] child pid 13999 exit signal Segmentation fault (11)
[Tue May 27 06:57:28 2008] [notice] child pid 14000 exit signal Segmentation fault (11)
[Tue May 27 06:58:10 2008] [notice] child pid 14004 exit signal Segmentation fault (11)
[Tue May 27 06:59:48 2008] [notice] child pid 14008 exit signal Segmentation fault (11)
[Tue May 27 07:00:17 2008] [notice] child pid 14002 exit signal Segmentation fault (11)
(Actually, this sample of log entries was taken at the NEXT DAY from the upgrade…. I had been getting my butt kicked by this problem since the day before…if you got this far, then read on.)

I suspected my application first, went over several pieces of it and couldn’t replicate the problem. Sometimes it worked, sometimes it didn’t. Worst kind of bug…these have a way to freak our mental logic unit(worse, it blows the otherwise slo-blo patience fuse).

I added a forced DBI reconnect to every page and the problem seemed solved, until it happened yet again. Blank page, segmentation fault. The absence of some proper error messages on the main httpd log meant it crashed before the logging phase of the Apache request loop.

This couldn’t be good, but I hadn’t realised that much at this point.

At first I was sure it was the Linux instance of DLL hell …I thought maybe my latest DBD::mysql linked to some conflicting libmysqlclient version. Maybe I forgot to run ldconfig… But then ldconfig didn’t work… Slowly I tried all the simpler tricks, none worked.

So I recompiled MySQL. It didn’t matter really, it wasn’t a critical MySQL host. It was, though, the main Apache server front, and it was down… Pressure everywhere, customers, everyone asking when the system would be back up…and there I was recompiling Apache, MySQL and other 20 or so libraries, hoping the impedance would match at some point and everything would just work again.

Each full system recompilation took over 20 minutes. It’s a custom Apache server wholly patched by yours truly to work as an optimized(hopefully, when it works) application server for Perl apps.

Was it PHP that was messing around with mod_perl? Tried it. Was it mod_ssl meddling with the Apache guts and knocking me down? Tried it. Nothing seemed to work.

The dreaded gdb session
I finally realized I’d have to feed a httpd core into gdb and try to find out what was happening from the recovered black box. Hopefully the httpd process core would reveal the culprit.

Recompile Apache one more time, this time editing line 282 of the configure script, adding CFLAGS=-g to add debug symbols and information to the binary. (In the meantime the server is down, and the cell phone is on silent, 27 calls unanswered.)

So, a few minutes later we have a brand new httpd binary, nice and stuffed with debug symbols.

Adding
CoreDumpDirectory /usr/local/apache/coredump/
to our httpd.conf would guarantee we get some juicy cores dumped for us at that location.

We tell the kernel we want core dumps by using ulimit -c (not activated on the Fedora family by default, but if you’re lucky to use Slackware you’ll find the cores are always there).

ulimit -c 10240 (10 MB limit for core files)

And Bob’s your uncle. A few minutes later and we have a TON of cores under /usr/local/apache/coredump/ to examine. I go with the lowest PID first to examine the earliest possible casualties, then I chose some random PID’s to examine. Here we go…

# gdb ../bin/httpd -c core.1035
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “i686-pc-linux-gnu”…

[... snip ....]

Program terminated with signal 11, Segmentation fault.
[New process 1045]
#0 0×084b739c in Perl_av_undef ()

OK. What the heck is Perl_av_undef()?! I didn’t have much time to investigate. Phone calls were piling up as quickly as core files…and as curious as I was, I needed solutions, not overnight hacking sessions into the Perl source code…

I know DBIC uses some large data structures passed in and out of functions. Class instances can be HUGE data structures that reference a deeply nested spiderweb of related classes, on and on recursively.

Researching Perl_av_undef() on Google gives me the clue I need: it’s something related to deeply nested recursive data structures blowing up the Perl stack, AV probably means ArrayValue and so on.

So I restart the Apache server without a kernel stack limit:

# ulimit -s unlimited
(Actually, I chose 100MB for the original test. But I later found that when testing, you better exaggerate to really eliminate a possibility. This time we’re trying to rule out stack problems here.)

Restarted the httpd server, and for a few minutes I thought we’d solved the problem… Relief only lasted about 5 minutes…and the core files reappeared.

By now you could tell I was a pretty desperate man. The stack problem was only part of the overall grim picture.

I decided to check the core files and see what had changed.

# gdb /usr/local/apache/bin/httpd -c core.1047
[... snip ...]
Program terminated with signal 11, Segmentation fault.
[New process 1047]
#0 0×00b9e46b in mysql_ping () from /usr/local/mysql//lib/mysql/libmysqlclient.so.15

And that’s when things started to turn around. Mysql was my first suspect after all….

Back to gdb:
(gdb) where
#0 0×00b9e46b in mysql_ping () from /usr/local/mysql//lib/mysql/libmysqlclient.so.15
#1 0×004d99f2 in XS_DBD__mysql__db_ping (cv=0×94d2680) at mysql.xs:519

#2 0×001edbf4 in XS_DBI_dispatch () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/DBI/DBI.so
#3 0×084bfa43 in Perl_pp_entersub ()
#4 0×084be32e in Perl_runops_standard ()
#5 0×084ba15d in Perl_call_sv ()
#6 0×0043ae8f in EMBPERL2_CallStoredCV () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#7 0×00453cf0 in embperl_Execute () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#8 0×0045a008 in ProviderEpRun_GetContentIndex () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#9 0×0045891a in Cache_GetContentIndex () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#10 0×00434a63 in ProcessFile () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#11 0×004355da in embperl_RunRequest () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#12 0×00435b8c in embperl_ExecuteRequest () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#13 0×0043233f in XS_Embperl__Req_ExecuteRequest () from /usr/lib/perl5/site_perl/5.10.0/i686-linux/auto/Embperl/Embperl.so
#14 0×084bfa43 in Perl_pp_entersub ()
#15 0×084be32e in Perl_runops_standard ()
#16 0×084ba15d in Perl_call_sv ()
#17 0×08371d3b in perl_call_handler (sv=0×96a8d20, r=0×9541c14, args=0×0) at mod_perl.c:1668
#18 0×083723f9 in perl_run_stacked_handlers (hook=0×85ef03b “PerlHandler”, r=0×9541c14, handlers=0×96a8cb0)
at mod_perl.c:1381
#19 0×08373861 in perl_handler (r=0×9541c14) at mod_perl.c:904
#20 0×08395497 in ap_invoke_handler (r=0×9541c14) at http_config.c:476
#21 0×083acd2f in process_request_internal (r=0×9541c14) at http_request.c:1299
#22 0×083ad1a0 in ap_internal_redirect (new_uri=0×951427c “/index.html”, r=0×9512ae4)
at http_request.c:1440
#23 0×00386fe4 in mod_gzip_redir1_handler () from /usr/local/apache/modules/mod_gzip.so
#24 0×00385219 in mod_gzip_handler () from /usr/local/apache/modules/mod_gzip.so
#25 0×08395497 in ap_invoke_handler (r=0×9512ae4) at http_config.c:476
#26 0×083acd2f in process_request_internal (r=0×9512ae4) at http_request.c:1299
#27 0×083acd8c in ap_process_request (r=0×9512ae4) at http_request.c:1315
#28 0×083a2ef6 in child_main (child_num_arg=12) at http_main.c:4971
#29 0×083a3227 in make_child (s=0×87be05c, slot=12, now=1211894763) at http_main.c:5150
#30 0×083a32c2 in startup_children (number_to_start=23) at http_main.c:5178
#31 0×083a3a66 in standalone_main (argc=3, argv=0xbfe95934) at http_main.c:5525
#32 0×083a434f in main (argc=3, argv=0xbfe95934) at http_main.c:5883

The call stack trace is very clear: we get that the problem happens with the native mysql_ping() C function called from an XS stub.

Now Google is our friend and we quickly arrive at this bug report from May 20 2008, precisely a week before my fantastic upgrade voyage.

Suggested solution is regression to DBD::mysql version 4.006… How frustrating. CPAN only installs the latest version, so we gotta do this one manually.

# tar xzvf DBD-mysql-4.006.tar.gz
# cd DBD-mysql-4.006
# perl Makefile.PL
# make
# make install
# /usr/local/apache/bin/apachectl restart

It’s been a year since….and the bug is still there. A poster on the original bug report discussion thread says 4.010 fixes the issue. I haven’t tested that one yet.

May 2009 11

Fravia - Fjalar RaviaIt’s been widely reported that Fravia passed away on Sunday, May 3 2009.

Legendary searcher and programmer Fravia became widely known in the mid-to-late nineties for his essays on reverse engineering. Focusing on the ideological aspect of reversing, as a way to achieve knowledge of the inner workings of society and the market, rather than on piracy or software vandalism, Fravia influenced a whole generation of computer users who wished to learn more about x86 assembly language, software internals, undocumented operating system features and low level debugging.

During his later years he devoted to teaching and exploring the art of searching the Deep Web, focusing on techniques to enable a searcher(identified by ~S~) find any kind of information online. Fravia’s searching tutorials include advanced techniques for many types of information retrieval. Fravia made heavy use of symbolism and allegory to transmit information. Software reversers were often identified by placing a + prefix to their nicknames. A most senior reverser, known only as ORC had the highest ranked prefix, ++ORC, as a tribute to his higher knowledge of low level programming and binary software analysis.

In 2007 Fravia reported he had cancer. His battle with the disease lasted 2 years. When, in late April 2009, he finally got word from his physician that the metastasis had gotten out of control and that the situation was irreversible, he gave us a Sweet and Sour goodbye on a page called swangsong.htm.

As of May 11 2009 all of Fravia’s sites are down(update: they’ve been put back up as of late May 2009). A tribute was posted by members of his family on his home page shortly after his passing. I hope they can restore the sites and maintain them online as a most valuable source of timeless information, although Fravia did empower his followers to search and find caches and copies of his teachings even with his sites taken offline. As he put it, he gave us all “powerful weapons” for seeking knowledge.

May his memory live forever through the knowledge he so kindly shared with us all throughout the years.

May 2009 11

This is one of my favorite shell tips, from one of my favorite UNIX books: Unix Power Tools, written by a whole gang of überhackers.

The shell hat trick ^this^that substitutes this from the last command for that on a new command line, and then executes the new command, all in one line.

Practical Example
Suppose you’re working on CWD /usr/sites/yoursite.com/htdocs/

You probably got there by running
[root@hendrix ~]# cd /usr/sites/yoursite.com/htdocs/

Then you realize you need to switch to the logs directory under yoursite.com, which means substituting htdocs for logs on that cd command.

That’s where the hat trick comes into play.

[root@hendrix ~]# ^htdocs^logs

And voilà, the last cd command is repeated with htdocs substituted for logs. Getting used to using the hat trick can save you hundreds of keystrokes each session and it gets you used to thinking of finding patterns in your shell work so you can avoid repetition and thus deploy lazyness, one of programmers’ main virtues, according to Larry Wall.

May 2009 12

Google has announced Rich Snippets for their search results today, meaning you’ll be able to have finer grained control over the brief summary that is presented to users when they see your site on the search results page.

Webmasters can now start to participate in Google’s Semantic interpretation of content
The release also marks the first time Google has let us tell its robots a little about the meaning of our texts. Previously, any semantic interpretation of page information was done by Google exclusively, and webmasters had very little control over this process.

This clearly marks the beginning of the public phase of a broader strategy to allow webmasters to participate in Google’s interpretation of the semantic web. We knew Google was deploying semantic indexing, but webmasters never had ANY control whatsoever over the process.

First, an example of a Rich Snippet
Below is an example of a legacy, or “poor snippet”.
Rich Snippet, Poor Snippet

And now, Google’s example of a rich snippet, as shown on the official Google Webmaster blog.
Wealthy snippet - Drooling Dog BBQ from Google webmaster blog rich snippet example.

Semantic Indexing
“Semantic” derives from the Greek word “semantikos” and, in a nutshell, refers to the meanings of words. Syntax, in turn, refers to the graphic representation of words, literally just the sequences of glyphs(characters) which make up text.

Syntactically, Google is a sequence of the glyphs G, o, o, g, l and e.

Semantically, Google(TM) is a brand name belonging to a corporation whose main product is online search systems, aggregated by several peripheral services including contextual ads, email services, news, newsgroups and much more.

As you can see, syntax and semantics are complementary, one allows you to “draw” something, the other defines that something so you know what you’re looking at. When we search the web, it’d be really nice if both these concepts worked together. Google has been working on this for a long time now, and this is the first time they’ve given webmasters the chance to participate in the process.

Legacy Syntactic Search
For a long time, all we’ve had is syntactic search – the engines looked for sequences of letters, not their meanings. You told a search engine you wanted “webmaster tips” and the engine searched for the word webmaster followed by any number of spaces and then the word tips. Variations could include reversing the order of the search terms, searching for tips , a few spaces, and then webmaster for instance, or allowing words to show up in between. The tricks you can play with syntactic search are nearly endless, but it remains a simple search for the terms provided.

Semantic Search
Semantic search is much more complicated and involves the meanings of words instead of just their graphic, literal representation, as we’ve exemplified above with the word “Google”.

Suppose you search for the word “Nikon”. A well designed semantic system will know Nikon produces photographic equipment and nothing else, and it will add “photography”, for example, as a strongly coupled word to the subject being searched. A real life system will actually add as many strongly related terms as it had available.

You couldn’t follow the exact same strategy with a search for Yamaha, for example, because Yamaha builds everything from motorcycles to saxophones. So you couldn’t just assume the search is for a certain subject.

A semantic system would figure this out algorithmically, by interlinking the data it has about Nikon until it “reaches the conclusion” that Nikon only shows up related to photographic products. It would also “discover” that Yamaha is interlinked to a large set of industries.

Therefore a semantic system would search through a much richer universe, providing not only literal results for Nikon but also filtering through several related keyword spaces.

By adding synonyms of the semantically related terms into the mix, you further enrich the search experience. With due care and a technically sound implementation, you can make the system seem pretty smart to the end user.

Added System Complexity
All this comes at the cost of exponential growth in the search system’s complexity, as you might have imagined by now. Any semantic system would also know that “complex” is a synonym of “very expensive”.

Google is likely investing billions into this technology. As smart as Google is as an enterprise, we shouldn’t doubt for a minute how seriously they’re taking semantic search, artificial intelligence and the larger project of being able to thoroughly index and search a more semantic web.

What the launch of Rich Snippets means is that Google is releasing the tip of the semantic iceberg to us webmasters. As you can imagine, internally this is probably “Alpha” software, and it’ll be “Beta” for a long time with us webmasters(it’s sort of a sympathetic Google tradition by now to have a small Beta attached to its logos).

In the future, search systems will be able interlink more and more dimensions of semantically marked data, providing astonishing depth of knowledge in response to very simple search terms.

(Semantically)Relevant links:
Fill this form to let Google know you’re interested in participating
Google Webmaster Central Blog: Introducing Rich Snippets
Google documentation: Marking up structured data
Definition of Semantics at Wikipedia
Hands on, how to mark your code up: About microformats

BTW, Google just made Drooling Dog BBQ famous! You can check out the official Drooling Dog BBQ site and their eclectic menu(Including BBQ for Breakfast!) by clicking here!