If you’ve downloaded and installed Cassandra 0.7.0 you may have noticed that neither Net::Cassandra or Net::Cassandra::Easy work with it. The thrift interface on 0.7.0 has changed, and the Perl modules available on CPAN seem to use the old method signatures.
Even if you generate your code from the brand new conf/cassandra.thrift interface file, you’ll discover that the several Perl examples and code snippets available on the WWW don’t work anymore. You won’t even get past the login phase, as the CassandraClient::login() method signature has changed as well.
So, let’s get Perl to talk to Cassandra 0.7.0.
First, is Cassandra 0.7.0 up and running?
This post assumes you have Cassandra 0.7.0 up and running on your machine. If not, please download it here. Remember, this post is about 0.7.0 – things have changed and are likely to do so again in future versions. Let’s stick with 0.7.0 for now.
All you have to do is download the tarball, extract it, change into the apache-cassandra-0.7.0 directory and run bin/cassandra -f – The examples below work with the default install.
If you happen to rename apache-cassandra-0.7.0, then please adapt path names accordingly throughout this text.
Thrift?
Thrift is a nifty little protocol that does, broadly, what SOAP and CORBA do : it’s an RPC glue between distributed systems. Say you’ve got a do_something() server function written in Python that you’d like to call, from across the world, using a Perl app. Thrift allows you to define an interface on both sides so that the Perl app can just say do_something() and the Python will get the message and reply transparently through the inverse process.
Cassandra is written in Java, and Thrift allows Perl to talk to Cassandra locally or from across the Internet using sockets. Basically, that’s all we need to know about Thrift. Here are the nitty gritty details, if you have the time.
Binary Thrift Installation
If your favorite packaging solution hosts a Thrift binary package, by all means do try that. Yum doesn’t:
# yum search thrift Loaded plugins: presto, refresh-packagekit Warning: No matches found for: thrift No Matches found
Installation from Source
First, download the latest Thrift tarball from here.
# tar xzvf thrift-0.5.0.tar.gz # ./configure --prefix=/usr # make
configure may give you errors about missing language interpreters. You’ll need to install all requirements before proceeding.
Note on Ruby: the Ruby generator build core dumped during installation. Since I won’t be using the RB interface, and don’t have the time to debug that, I just substituted the Ruby Makefiles with dummy versions that always return success. Ugly hack, I know, but I just need the Perl generator to work. You probably can disable Ruby from the configure script as well.
Now that you have Thrift installed, change to the interface/ subdirectory under where you installed Cassandra.
cd /opt/apache-cassandra-0.7.0/interface/
Generate the Perl interface modules:
thrift --gen perl cassandra.thrift
If all goes well, you should now have a subdirectory called ./gen-perl containing three Perl modules. Don’t worry, the Cassandra:: namespace modules are all contained in those three files, Thrift just seems to like embedding a ton of packages into a single source file.
We’re all set to run our tests now.
Don’t try to run the Thrift API examples under the Cassandra developer wiki. They’re meant for an old version of the thrift interface and will not work with Cassandra 0.7.0(and that’s the whole point of this post).
Create a Cassandra Keyspace and Column Family via the CLI
Keyspaces are roughly comparable to Databases under MySQL, or Schemas under Oracle, DB2 and PgSQL. A keyspace will group all our column families(roughly analog to tables on a RDBMS – roughly). In the following steps, we’ll create a Keyspace named test1 and a single Column Family(CF) named “people”. We’ll later add personalities to the people CF using our Perl script.
cd /opt/apache-cassandra-0.7.0/ bin/cassandra-cli Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] connect localhost/9160; Connected to: "ZenCluster" on localhost/9160 [default@unknown] create keyspace test1; 679d7c8e-273c-11e0-99e9-e700f669bcfc [default@unknown] use test1; Authenticated to keyspace: test1 [default@test1] create column family people; 9be2ae7f-273c-11e0-99e9-e700f669bcfc
If your output looks similar to the above, then you’ve successfully created a Keyspace for your application(create keyspace). Then we’ve “authenticated” with the keyspace without using a password since we’re using the default allow-all authenticator org.apache.cassandra.auth.AllowAllAuthenticator which lets everyone in(not to be used in production!).
The “use test1;” syntax may seem familiar to MySQL users: through that command we’ve told Cassandra we’ll be working with the test1 Keyspace. To ease the transition for beginners, you may consider a keyspace to be roughly analog to a relational schema, that should help you remember the purpose of the use command.
Keep in mind that NoSQL is entirely different from RDBMS’s, but using these mnemonics may help you get over the initial learning curve.
Note that, since Cassandra 0.7.0, the keyspaces defined in conf/cassandra.yaml are ignored – you must define keyspaces and column families through the CLI. Also note that conf/storage-conf.xml has become conf/cassandra.yaml
Back to Perl.
As we mentioned before, If you ran any of the examples copied verbatim off the Cassandra Wiki you’ve probably found out by now that they don’t work against Cassandra 0.7.0. The example snippets are for an older version of Cassandra and the Thrift-generated Cassandra::Cassandra modules use a different interface. I hit the wall for a while before I decided to dive into the generated code to find the correct interface.
You may also have run into an annoying issue that involves the protocol transport(Transport is unable to read 4 bytes, or something like that). Since I didn’t have (more)time to investigate, I tried the inverse of a recommendation given on a similar PHP issue : use FramedTransport instead of BufferedTransport, then it worked OK.
#!/usr/bin/perl -w
use strict;
use warnings;
# Change for your environment
use lib '/opt/apache-cassandra-0.7.0/interface/gen-perl/';
use Cassandra::Cassandra;
use Cassandra::Constants;
use Cassandra::Types;
use Thrift;
use Thrift::BinaryProtocol;
use Thrift::Socket;
use Thrift::FramedTransport;
use Data::Dumper;
# localhost and 9160 are default in storage conf for rpc listener
my $socket = new Thrift::Socket('localhost', 9160);
my $transport = new Thrift::FramedTransport($socket,1024,1024);
my $protocol = new Thrift::BinaryProtocol($transport);
my $client = new Cassandra::CassandraClient($protocol);
eval {
$transport->open();
my $keyspace = 'test1';
my $row_key = 'people_code_1';
# ColumnParent tells the API the ColumnFamily or SuperColumn we're working on
my $column_parent = new Cassandra::ColumnParent({column_family => "people"});
my $consistency_level = Cassandra::ConsistencyLevel::ONE;
my $auth_request = new Cassandra::AuthenticationRequest();
# accessing object internals directly seems to be standard practice on the Thrift-generated code
$auth_request->{credentials} = { username => 'user', password => 'pass' };
$client->login($auth_request);
$client->set_keyspace($keyspace);
my $timestamp = time;
my $column = new Cassandra::Column();
$column->{name} = 'name';
$column->{value} = 'Jon Stewart';
$column->{timestamp} = time;
$client->insert($row_key, $column_parent, $column, $consistency_level);
$column->{name} = 'tv_show';
$column->{value} = 'The Daily Show';
$client->insert($row_key, $column_parent, $column, $consistency_level);
# -- INSERT ANOTHER TV PERSONALITY ---
$row_key = 'people_code_2'; # this is analog to a primary key, you'll later search for this guy using this key
$column->{name} = 'name';
$column->{value} = 'Stephen Colbert';
$column->{timestamp} = time;
$client->insert($row_key, $column_parent, $column, $consistency_level);
$column->{name} = 'tv_show';
$column->{value} = 'The Colbert Report';
$client->insert($row_key, $column_parent, $column, $consistency_level);
# -- LET's QUERY THE PEOPLE COLUMN FAMILY TO FIND OUT WHO WE HAVE ON FILE ---
my $slice_range = new Cassandra::SliceRange();
$slice_range->{start} = "";
$slice_range->{finish} = "";
my $predicate = new Cassandra::SlicePredicate();
$predicate->{slice_range} = $slice_range;
# let's load user with primary key = 'people_code_1'
my $result = $client->get_slice('people_code_1', $column_parent, $predicate, $consistency_level);
print "'people_code_1': " . Dumper($result) . "\n";
# now, let's load user with primary key = 'people_code_2'
$result = $client->get_slice('people_code_2', $column_parent, $predicate, $consistency_level);
print "'people_code_2': " . Dumper($result) . "\n";
# nice, eh?
$transport->close();
};
if ($@) {
warn(Dumper($@));
}
1;
Your output should look like this:
'people_code_1': $VAR1 = [
bless( {
'super_column' => undef,
'column' => bless( {
'timestamp' => '1295829074',
'ttl' => undef,
'value' => 'Jon Stewart',
'name' => 'name'
}, 'Cassandra::Column' )
}, 'Cassandra::ColumnOrSuperColumn' ),
bless( {
'super_column' => undef,
'column' => bless( {
'timestamp' => '1295829074',
'ttl' => undef,
'value' => 'The Daily Show',
'name' => 'tv_show'
}, 'Cassandra::Column' )
}, 'Cassandra::ColumnOrSuperColumn' )
];
'people_code_2': $VAR1 = [
bless( {
'super_column' => undef,
'column' => bless( {
'timestamp' => '1295829074',
'ttl' => undef,
'value' => 'Stephen Colbert',
'name' => 'name'
}, 'Cassandra::Column' )
}, 'Cassandra::ColumnOrSuperColumn' ),
bless( {
'super_column' => undef,
'column' => bless( {
'timestamp' => '1295829074',
'ttl' => undef,
'value' => 'The Colbert Report',
'name' => 'tv_show'
}, 'Cassandra::Column' )
}, 'Cassandra::ColumnOrSuperColumn' )
];
Thanks for posting this! A colleague of mine has been pushing to use Cassandra in our next project, but the lack of a usable Perl client had me really worried. I’ve searching the web for the last couple of days trying to get Perl to work with Cassandra 0.7.0. Your post has been a tremendous help, and just in time! — Sidney
[...] This post was mentioned on Twitter by Proggit Articles, HN Firehose. HN Firehose said: Accessing Cassandra 0.7.0 from Perl: http://bit.ly/gxA1iM [...]
Hi Sidney, thanks for your kind words. I hope your project using Perl and Cassandra is a success.
Best wishes,
Jose
Hello,
A colleague of mine and I are attempting a reconstruction of our Database Implementation, and are so far very impressed with Cassanda 0.7.0. Like many before me, I’m sure that’s why I’ve found this tutorial.
I’m able to call out each individual variable from get_range_slices in PHP, but not Perl.
In PHP:
$paged_result[0]->key;
$paged_result[0]->columns[0]->column->name;
I am unable to reproduce anything similar in Perl, and the comments in the Thrift and Cassandra modules leave much to the imagination.
Other than all that, great article, it has helped tremendously. Thanks!
I retract my question!
To all those who have had a similar problem, it is solved rather simply (I am not a Perl developer):
$paged_result->[0]->{key};
etc.
Thanks anyway, and again, great article!
Bryan, thanks for your comments.
As the Perl modules are machine-generated by the Thrift compiler, they do not come with any embedded documentation. So the only way I got through the first few steps in getting Perl and Cassandra to go along, was to read the generated module sources.
Cassandra is maturing very quickly and I’m sure pretty soon we’ll see more Perl information available for it. For the time being, reading the Java examples and comparing to the Perl interface might give you clues, then peeking into the generated modules will provide the precise interfaces and return values. Good luck with your implementation.
Regards,
Ze
Has anyone been successful using Perl/Thrift to insert a SuperColumn or SuperColumnFamily?