Open Credo

January 6, 2014 | Cassandra

New features in Cassandra 2.0 – Lightweight Transactions on Insert

The team over at Cucumber Pro recently posted a sneak peek on their blog, demonstrating some key features of their offering.

As more of a technical user of Cucumber, there isn’t much that’s new or ground-breaking for me – almost every feature is already available through your preferred IDE combined with a few plugins.

WRITTEN BY

David Borsos

David Borsos

New features in Cassandra 2.0 – Lightweight Transactions on Insert

Cassandra 2.0 transactions

In our last post we gave a brief overview of Cassandra 2.0’s new compare-and-set (lightweight transaction) commands. This time we are going to give a more detailed walkthrough about how conditional INSERT statements work.

Scenario

Perhaps the most common situation when such a feature would be required is the registration of new users to an application. Usually you will have some kind of a form where the soon-to-be users can choose a userId, a password and supply some basic information, such as their names. For obvious reasons you’ll want to keep userIds unique across the system.

Before compare-and-set

The naive approach to insert a user in a pre-2.0 Cassandra environment would look something like this:

public void registerUser(User user, String password) {
    PreparedStatement insertStmt = session.prepare(
        "INSERT INTO USERS (userid, first_name, last_name, " +
        "password, registered_at) " +
        "VALUES (?, ?, ?, ?, ?)");

    Statement stmt = insertStmt.bind(
        user.getUserId(),
        user.getFirstName(),
        user.getLastName(),
        password,
        new Date());

    session.execute(stmt);
}

Let’s try it:

registerUser(new User("jdoe86", "John", "Doe"), "johnspassword");

We then query the database:

select * from users;

userid  | first_name | last_name | password      | registered_at
--------+------------+-----------+---------------+--------------------------
jdoe86  | John       | Doe       | johnspassword | 2013-11-11 17:51:34+0000

(1 rows)

So far so good. Let’s add another user:

registerUser(new User("jdoe86", "Jane", "Doe"), "janespassword");

And if we query again:

select * from users;

userid  | first_name | last_name | password      | registered_at
--------+------------+-----------+---------------+--------------------------
jdoe86  | Jane       | Doe       | janespassword | 2013-11-11 17:53:20+0000

(1 rows)

Ooops… What happened to John?

Since the users table is defined with userId as the rowkey and considering that Cassandra makes no distinction between INSERT and UPDATE, using the same userIds will be considered as a write operation on the same row and hence will overwrite the previous record.

One might consider trying to do a lookup with the same userId prior to adding the new data, but since there is no correlation between the read and the subsequent write (you may even end up executing them against different nodes of your cluster), it is possible that operations from another process would interleave with these commands and change the data meanwhile. Or, since Cassandra is only eventually consistent, we might not yet see changes that already have been made.

The bottom line is that even if we read that there are no users called “jdoe86” and happily proceed with the INSERT, we still might end up with overwriting existing data.

Using compare-and-set operations

Here is where the new compare-and-set feature comes into the picture. Let’s create a second version of the registerUser method:

public void checkAndRegisterUser(User user, String password) {
    PreparedStatement insertStmt = session.prepare(
        "INSERT INTO USERS (userid, first_name, last_name, " +
        "password, registered_at) " +
        "VALUES (?, ?, ?, ?, ?) " +
        "IF NOT EXISTS");

    Statement stmt = insertStmt.bind(
        user.getUserId(),
        user.getFirstName(),
        user.getLastName(),
        password,
        new Date());

    session.execute(stmt);
}

Let’s truncate the users table to start in a clean state, and try adding the same users again, but using the improved code:

checkAndRegisterUser(new User("jdoe86", "John", "Doe"), "johnspassword");
checkAndRegisterUser(new User("jdoe86", "Jane", "Doe"), "janespassword");

And after querying the table:

select * from users;

userid  | first_name | last_name | password      | registered_at
--------+------------+-----------+---------------+--------------------------
jdoe86  | John       | Doe       | johnspassword | 2013-11-11 18:05:53+0000

(1 rows)

That is pretty much what you would expect to happen, if we already have a user with userId “jdoe86” then we don’t attempt to insert again. However, there is one piece of functionality still to be added.

Determining Success

If we were using an RDBMS with either a unique/primary key constraint on the userId column or read-and-write in a transaction, either an exception would be thrown or we would be able to see what is in the database. Either way we would be able to somehow signal the second user trying to insert that something has gone wrong. Fortunately it is possible to do so with Cassandra, too.

You probably have noticed that in Cassandra’s Java driver there’s no distinction between the read and the modification operations. In both cases you simply invoke:

session.execute();

This operation always returns with a com.datastax.driver.core.ResultSet object. In case of a “normal” update/insert command even though the returned ResultSet is not-null, it’ll contain no data.

A compare-and-set style insert is different. Let’s slightly modify the code of checkAndRegisterUser by adding the following lines, and printRow which is a custom method that prints column headers and row values to stdout:

ResultSet rs = session.execute(stmt);
Row row = rs.one();
printRow(row);

When executed again on a truncated table, the output will be:

[applied]
true
[applied] userid first_name last_name password registered_at
false jdoe86 John Doe johnspassword Tue Nov 12 09:17:12 GMT 2013

Notice the following:

  • Instead of an empty ResultSet, now we get one with a single row, every time
  • There is a column labelled [applied] that is not part of the original table
  • When the insert was unsuccessful, because the IF condition was evaluated false, we actually get back the row that was preventing the insert from happening

Using this information we can now enhance the registerUser method further:

public void checkAndRegisterUserWithFeedback(User user, String password) throws UserAlreadyExistsException {
    PreparedStatement insertStmt = session.prepare(
        "INSERT INTO USERS (userid, first_name, last_name, " +
        "password, registered_at) " +
        "VALUES (?, ?, ?, ?, ?) " +
        "IF NOT EXISTS");

    Statement stmt = insertStmt.bind(
        user.getUserId(),
        user.getFirstName(),
        user.getLastName(),
        password,
        new Date());

    ResultSet rs = session.execute(stmt);
    Row row = rs.one();
    if (row.getBool("[applied]")) {
        log("Successfully added user " + user);
    } else {
        throw new UserAlreadyExistsException(
            "This userid has already been registered at " +
        row.getDate("registered_at") + " by " +
        row.getString("first_name") + " " +
        row.getString("last_name"));
    }
}

Let’s make use of the new, improved method:

try {
    cas.checkAndRegisterUserWithFeedback(
        new User("jdoe86", "John", "Doe"), "johnspassword");
    cas.checkAndRegisterUserWithFeedback(
        new User("jdoe86", "Jane", "Doe"), "janespassword");
} catch (UserAlreadyExistsException ex) {
    log(ex.getMessage());
}

When executed against a truncated table, the output shows the second insert failing:

2013-11-12 09:49:00:000 - Successfully added user [jdoe86, John, Doe]
2013-11-12 09:49:00:000 - This userid has already been registered at Tue Nov 12 09:49:00 GMT 2013 by John Doe

Next time we are going to cover the update command.

RETURN TO BLOG

SHARE

Twitter LinkedIn Facebook Email

SIMILAR POSTS

Blog