Dec 2011 15

SOPA is based on the absurd principle that sites shall be co-responsible for criminal acts committed by their members. It turns web sites into accomplices for copyright violations perpetrated by people the site owners have never met. It is a confession of defeat by authorities – they are unable to enforce copyright law in the new reality of 2011, thus they will remove everybody’s rights online because of their particular failure.

Let’s draw a parallel here. Blaming Reddit, Youtube and even Wikipedia for any eventual copyrighted content submitted by their users is akin to blaming gun makers for homicides committed using their product. You will not, in our lifetimes, see the gun makers be criminally convicted for murders committed using their product. It’s been tried, and it’s always been a lost cause. Civil lawsuits have been won, damages have been granted for violent gun-related crimes, but a criminal conviction has never been attained, precisely because you can’t be criminally responsible for crime you didn’t commit(or actively participate in, in some way).

Why should SOPA, then, convict websites like Wikipedia of a felony if anyone is able to submit copyrighted content to them? It is obvious that if SOPA is passed, it may be used against websites by the very copyright holders. Copyright owners, who have been known for spreading their own content as a “honey pot” in order to implicate downloaders, could easily sabotage sites like Wikipedia by submitting their own content, taking screenshots filing lawsuits for copyright infringement.

SOPA has the potential to create a sea of lawsuits, by everyone against everyone. Just upload your copyrighted work and file a criminal complaint any given site. This old trick has been used against me in the past, by a photographer.

One website I worked on was sued by a photographer who claimed one of his photos was used without authorization. We checked and the photo had been uploaded just days before the lawsuit was filed…it is unlikely this photographer found this photo casually on the internet. Most likely, we were the victims of that very photographer. We checked and this person had over 300 identical lawsuits against sites which allowed the upload of photos. Basically, this photographer was making a living by submitting his photos and then filing lawsuits. As of this time, we’re still defending ourselves against his accusations. Now, imagine if SOPA were in effect here, we’d be criminally responsible, along with 300 others, for a crime we didn’t commit.

SOPA has the potential of destroying the collaborative nature of the Internet and it must be stopped. Do your part!

Nov 2011 15

Google penalizes any site which copies content from another with the intent of taking the original author’s traffic. I wonder if they have a special deal with some sites in order to present their content on Google’s search results directly?

Here’s a simple example. I’ll ask Google to define what monopoly means.

I take the definition and search for it in quotes, on Google itself. Which gives me the following result:

The definition which Google showed on its search result is exactly the same provided by the Oxford Dictionary. But the user never reached the source of the original definition of “monopoly”, never registered for that site’s services, never even saw the logo of Oxford.

I wonder if dictionary sources are getting some compensation for allowing this?

Sep 2011 19

If, by any chance, you decide to reset your Twitter password and you receive the message via Gmail, you’ll find that it’s currently impossible to procede because only a part of the message is displayed.

The solution is to “View Original Message” on the drop-down menu located at the top-right hand corner of Gmail messages. There you’ll find the link, you should copy and paste it into a new browser tab.

Aug 2011 15

In the first days of August I was in charge of setting up a new Apache server for a medium traffic site. The backend was a rather large system that had finally passed all tests on the x86 64 bit platform. We’d finally be leaving 32 bits behind.

After a few pleasant hours of the usual custom compilations, package upgrades and pre-requisites checking, the migration was finally done. To our surprise, all went perfectly – not a single glitch.

Comparing to some of my past experiences, it’d be surreal to imagine such a successful migration a few years ago, with zero complaints or technical issues.

Peace lasted very little, as expected: several hours after the launch, Apache had started randomly returning blank pages and producing 500 Server Errors.

The only clue I had was this error_log entry, repeated thousands of times:

(103)Software caused connection abort: cache: error returned while trying to return disk cached data

So mod_cache was ruining our sleep.

What didn’t work

Since you’ve likely Googled or Binged the problem and read all the suggestions in mailing lists and forums, here are some steps I tried and still didn’t work:

  • Disabled SELinux – no luck. (Who knows, in previous lives I had issues with Apache serving and SELinux.)
  • Changed mounted disks for the cache data – no luck. (Maybe we had a bad filesystem, who knows.)
  • I thought it could be related to the Amazon EBS virtual drive latency, so I used the instance RAM memory for the cache directory(using tmpfs) – no luck.
  • Tried reducing the CacheDirLevels and CacheDirLength to one. Nope…wait 15 minutes or so, and the errors returned.
  • Set htcacheclean to clean up after 30 minutes, allowing only 250MBytes of cache data. (/usr/sbin/htcacheclean -p /var/cache/apache/ -l250M -d30). No luck – the errors still appeared, seemingly randomly.

There was apparently no specific file type that triggered the error. PHP scripts, Perl programs, WordPress, MediaWiki and our in-house systems – all equally affected.

Note that each time the Cache settings were changed, we started with a fresh cache directory(/var/cache/apache1,apache2,apache3 … apacheN). Once we found a solution, we went back to /var/cache/apache – cleaning it up before.

After reading a dozen or so related complaints in mailing lists and having unsuccessfully tried their recommendations I figured it was time to access the Apache documentation and see what directives we could tweak that could help us.

mod_disk_cache gives us 5 configuration options only: CacheDirLength, CacheDirLevels, CacheMaxFileSize, CacheMinFileSize and CacheRoot. Click here for a detailed explanation of each. After testing several combinations of these directives, and starting with a fresh cache directory, the error would return after a few thousand(or so) requests. Debugging this issue is specially hard because the problem doesn’t happen as soon as the Apache server is started – it takes a while to replicate and test.

The Solution

This combination of CacheMaxFileSize and CacheDisable for the images directory.

CacheMaxFileSize 64000
CacheDisable /images

Limiting the cached files to 64KB and making sure the /images directory was not being cached solved the problem. It’s been a week now and we had zero of the dreaded “Software caused connection abort” error 103 messages. Having to block the images directory came as a shock to us, because none of the errors we examined were triggered by serving an image. It seemed random and happened for html files, PHP and Perl scripts and so forth.

So what’s the cause? I have no idea. Folks wanted the site back up and running, so we had zero time left for long debugging sessions. It’s something which only happened after a few minutes. The reason it took the initial install more time to present the problem was because we performed the main migration on a late sunday night, when traffic was considerably lower.

Additional Info


#httpd -l
Compiled in modules:
core.c
prefork.c
http_core.c
mod_so.c

# httpd -v
Server version: Apache/2.2.15 (Unix)
Server built: Apr 9 2011 08:58:28

# uname -a
Linux hostname_here 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

# Cache-related directives on httpd.conf:
CacheRoot /var/cache/apache/
CacheEnable disk /
CacheMaxFileSize 64000
CacheDirLevels 1
CacheDirLength 1
CacheDisable /images
CacheDefaultExpire 176400
CacheIgnoreHeaders Set-Cookie
CacheIgnoreNoLastMod On

Aug 2011 06

The error_log for my DBIx::Class based sites were absolutely packed with warnings about n to n relationships getting out of hand, such as this line:

[Sat Aug 06 21:09:02 2011] [warn] [1144]ERR: 32: Warning in Perl code: DBIx::Class::ResultSet::next(): Prefetching multiple has_many rels addresses and addresses at the same level (person) will explode the number of row objects retrievable via ->next or ->all. Use at your own risk. at /chili/beans/Lib/Process.pm line 38

I’m aware of the dangers of a badly implemented ORM schema and I took care to add enough constraints to my query so that next() won’t blow up my memory space. And I never use all() … at all().

So I searched for options on how to quiet these messages, because in a heavy traffic site these log files reach gigabytes every day. Plus, logging all that stuff surely slows the site down. Here’s how I solved it. If you came here looking for an elegant solution, please hit Back now.


# cd /usr/local/share/perl5/DBIx/Class
# vi ResultSource.pm
:/Prefetching # this will search for the first occurrence of the word Prefetching
#carp (
# "Prefetching multiple has_many rels ${last} and ${pre} "
# .(length($as_prefix)
# ? "at the same level (${as_prefix}) "
# : "at top level "
# )
# . 'will explode the number of row objects retrievable via ->next or ->all. '
# . 'Use at your own risk.'
#);

Comment out the multiline carp() call, and Bob’s your uncle. A clean error_log from now on.