I’ve been working on a Reddit-based site that’ll be hosted at Amazon EC2 and, as you probably know, Reddit currently only provides VMWare and Virtualbox images for testing and development. EC2 is an ideal test environment because I can boot an image, play geek god with my own Reddit clone and then shut it down as needed – all in a matter of minutes.
But first I needed a Reddit-ready Amazon EC2 AMI so that making mistakes in between hacks wouldn’t require restarting the whole painstaking installation process over and over. I decided to share my Reddit AMI so that more developers will use the Reddit system; hopefully more sites will crop up using this impressive code base. I remember the excitement of reading the Slashdot code back in the day, how much I learned from that experience – especially being a system which handled millions of concurrent users with very few, if any, errors.
Log into your Amazon Web Services account, click on the EC2 tab, then on AMI menu on the left. On the “Viewing:” filter, select “All Images”(this will take a while to load). Search for “reddit” and you’ll find a public AMI image with the ID below.
Current AMI ID: ami-56e81c3f
SSH Username: ec2-user
Password: reddit
Disclaimer: You must tighten the security of this AMI if you wish to use it in a production environment. This AMI is provided as-is, 100% free of charge, no guarantees offered.
Note: You’ll need to open TCP port 9090 on your security group to view the site remotely. Then access http://YOUR_AMAZON_PUBLIC_DNS:9090/
If everything goes ok, you should be staring at your very own Reddit like the Reddit admins see it! Unlimited internets and endless karma. But feels kinda lonely, don’t you think?
Note: You need to configure it further to get it fully functional, modify your template and finish the installation. This AMI is just a starting point.

The first decision to be made is whether to download an official image for VMWare or Virtualbox and convert them to AMI somehow, or create my own, by building and installing the needed dependencies. By virtue of lazyness, I went for the image conversion route. What could possibly go wrong?
First, I downloaded the VMWare virtual machine image, which is kindly provided by the Reddit folks here.
I found this article via Yahoo! and followed the instructions.
A coffee mug later, I was surprised by a message long forgotten in the days of cheap Terabyte disks: “Out of space on device”.
It turns out "cat *.raw >> output.raw" is NOT a very good idea. So watch your step when you copy and paste instructions from the web.
Substituting output.raw with output.img, and after about 15 minutes, the commands finally returned and I found a delicate 21 gigabyte elephant staring back at me. There is no way I can upload that to Amazon EC2, even bzip2′d to 10. I didn’t dig much into why the qemu-img command did this, but it seems that every vdmk chunk is exploded into a 2 gigabyte image, then when you concatenate them all you add up all that unused image space into one large cluster. I guess the vmdk conversion route works best if you have a single file VMWare image.
Having hacked at the image conversion attempt for a couple of hours, I finally go down the conservative route: download and install packages. This is pretty boring stuff, so let’s get this step out of the way already.
sudo su - and proceed with the following steps as root.Note: Even if you’re only a regular Amazon customer, you can still quickly sign up for AWS and get your EC2 instance running in minutes.
Below is the command history I used to get a base system together.
The Python module configuration wasn’t as straightforward as I wished, mainly because Reddit uses old versions of some packages which have been superseded by incompatible ones. WebHelpers, for example, needs to be 0.3 – you can’t go with the newer versions. The most recent WebHelpers do not include the webhelpers.rails.* subdirectory.
yum install python-setuptools
yum install python-imaging
wget http://www.cython.org/release/Cython-0.13.tar.gz
tar xzvf Cython-0.13.tar.gz
cd Cython-0.13
python setup.py install
yum install libevent
yum install libevent-devel
wget http://memcached.googlecode.com/files/memcached-1.4.5.tar.gz
tar xzvf memcached-1.4.5.tar.gz
cd memcached-1.4.5
./configure –prefix=/usr
make
make install
wget http://launchpad.net/libmemcached/1.0/0.44/+download/libmemcached-0.44.tar.gz
export CFLAGS=”${CFLAGS} -march=i486″ # without this, you ll probably run into undefined references to `__sync_fetch_and_add_4′
./configure –prefix=/usr
make
make install
# the biggest package is also the simplest!
yum install postgresql-server
yum install postgresql
yum install postgresql-devel
# libs, pg_config script et al
yum install postgresql-devel
# libpqxx requires a build
# Find current links at http://pqxx.org/development/libpqxx/wiki/DownloadPage
wget http://pqxx.org/download/software/libpqxx/libpqxx-3.1.tar.gz
cd libpqxx-3.1
./configure –prefix=/usr
make
make install
cd ..
mkdir package
wget http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
tar xzvf daemontools-0.76.tar.gz
cd admin/daemontools-0.76/
package/install
Here you may run into the following issue:
./load envdir unix.a byte.a
/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference in envdir.o
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [envdir] Error 1
Which, of course, you can fix by editing compile/error.h
Substitute extern int errno;
With #include <errno.h>
Rerun package/install – you should be fine now.
(What, no patch? Pardon the lazyness once again, it’s a one line fix so you probably don’t need a patch for that.)
# Some required XML libs
yum install libxml2
yum install libxml2-devel
yum install libxslt
yum install libxslt-devel
# Now let’s get Erlang installed. We’ll need it for RabbitMQ Server
# First, we need curses
yum install ncurses-devel
# This will take a while…
wget http://www.erlang.org/download/otp_src_R14B_erts-5.8.1.1.tar.gz # source is over 57MBytes – HUGE!
tar xzvf otp_src_R14B_erts-5.8.1.1.tar.gz
cd otp_src_R14B
./configure –prefix=/usr
make
make install
# Get RabbitMQ Server from rpm package
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.1.0/rabbitmq-server-2.1.0-1.noarch.rpm
rpm –nodeps -i rabbitmq-server-2.1.0-1.noarch.rpm #unfortunately we need nodeps because we installed erlang from source
Head over to the Cassandra site click on Download for the mirror list and copy the closestmirror link.
Copy the link location and, as usual:
wget CLOSESTMIRROR
tar xvzf apache-cassandra-0.6.5-bin.tar.gz
cd apache-cassandra-0.6.5
mkdir -p /var/log/cassandra
chown -R `whoami` /var/log/cassandra
mkdir -p /var/lib/cassandra
chown -R `whoami` /var/lib/cassandra
# Fire Cassandra up
./bin/cassandra -f
Pfeeew. Now we’re done with the pre-requisites.
Now, let’s follow the Reddit setup instructions to get an initial instance up and running.
I’ve run virtual Apache servers since 1997, and my usual setup is to have public files under /www/sites – You may choose any other location. Basically, this varies with every UNIX distribution and every system I’ve seen, so feel free to adopt your own strategy here.
cd r2
sudo python setup.py develop # look, ma, no hands!
Here you may run into trouble with lxml-2.3beta1.tar.gz. Libxml2 provides /usr/include/libxml2 but its subdirectory libxml is also searched for by C programs as in #include <libxml/xmlversion.h> Unless configure was clever enough to add -I/usr/include/libxml2, this step will fail to include the needed file. lxml 2.3 happens to fail here on my system.
There’s probably more than one way of fixing this, I just went with the simplest.
Let’s move on. Now you’ll run into an issue with pycassa, that is because “pycassa has moved to http://github.com/pycassa/pycassa“. So, we’ll build this one by hand.
Back to /www/sites/r/reddit.com/r2 and we retry python setup.py install
We now find that it does not detect the newer pycassa. We must edit setup.py to substitute the old URL by hand and then retry. Easy. Edit setup.py using your favorite editor(</tongueincheek>), search for pycassa and substitute for the above URL.
setup.py should now complete successfully. Now onto the almighty make – it should run without issues.
Create a postgres user if you don’t have one. Then we follow the Reddit instructions and create the user and set directory permissions accordingly.
# start and configure rabbitmq for the 1st time
/sbin/service rabbitmq-server start # start RabbitMQ queue service
rabbitmqctl add_vhost /
rabbitmqctl add_user reddit reddit
rabbitmqctl set_permissions -p / reddit “.*” “.*” “.*”
Now we need to work on the Cassandra configuration for the “permacache”. Follow the quick instructions from the “Set up Cassandra” section on the Reddit guide.
If all goes well now, you should now be able to start Reddit.
paster serve --reload example.ini http_port=9090 (I used 9090 because Cassandra took 8080.)
The Reddit system is tested daily by millions of users. It’s a great software system for you to learn more about the RabbitMQ message queue server, Pylons, the amazing Cassandra distributed storage database and improve your overall web development foo, based on a tried and tested code base.
Open source gives us the opportunity to see the inner workings of systems which handle billions of requests daily, something that was completely out of the reach of the student and neophite back in the 1980′s and early 90′s. Reading the Reddit source and playing with it live is an opportunity for you to learn or improve your Python skills, enterprise-level architecture strategies that work.
Reading dead code is one thing. Reading the same code which is currently handling a billion requests somewhere on the WWW is another entirely different experience, it’s something which is always on my mind when I study the Apache sources: this code handles over 60% of all the HTTP requests out there every second. Having access to such code is of incalculable value.
I hope you have fun and learn tons from playing with the Reddit system.
You can grab the rpm for erlang from the EPEL repo.
Much faster than compiling it yourself.
http://download.fedora.redhat.com/pub/epel/5/i386/repoview/erlang.html
r12b is a bit old, but -5 is new enough to fulfill the rabbitmq reqs.
Any chance you could make one that uses an EBS-backed root device, so that people could spawn micro instances of this?
Looks like the AMI has been moved. It does not show up on the search and manually adding the ami path produces the following error:
HTTP 301 (Moved Permanently) response for URL http://zefonseca.s3.amazonaws.com:80/ami/reddit/image.manifest.xml: check your manifest path is correct and in the correct region.
Register
We’ve had changes made at my workplace. I’ll ask about this and let you know. Thanks for noting.
Anyone having problems with the app establishing a Cassandra connection with the default setup? The application boots fine and everything, however no links can be submitted. Can’t figure out why this would be happening, especially if it worked for the OP.
When I built the Reddit AMI I had issues with Cassandra and it turned out I was using a newer Cassandra, when Reddit used to work with an older one.
The Cassandra schema creation method has changed from the version used by Reddit and the current version out there. In the earlier versions you’d describe tablespaces(IIRC) in a configuration file. Now you create tablespaces more like a regular RDBMS, via the CLI.
Also I think the authentication has changed between versions, it’s been a while since I’ve messed with it, but I recall something to that extent. In fact, Cassandra is a rapidly evolving system, changing wildly from one version to the next, to the point you need a new glue between your app and thrift with every new version.
Try regression to the older version, the AMI worked a dandy and I had Reddit working upon booting up.
Thanks for the quick response, Zen. Are you saying that at the time you saved the AMI that all reddit functions were working, including submission? If that’s the case then I’m confused as to why the version of Cassandra you installed is not working for me on the stock AMI.
Hey Zen -
You’re awesome. I’m inspired. I’m also a total n00b with this stuff. Would you be willing to help as a a gun for hire?
Here’s as far as I got: http://ec2-107-21-157-166.compute-1.amazonaws.com:9090/
Thanks for posting the fix for the ucspi build problem. I’d been looking all over the place for that.
It’s slightly ironic when what gums up the works is an error in error.h.