New features in Cassandra 2.0 – More on Lightweight Transactions

David Borsos

December 2, 2013

•

Share this post

Copied!

Perhaps the most important of Cassandra's selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn't scale well).

Dropping support for transactions is not necessarily a problem in many use-cases, however there are some scenarios in multi-threaded/multi-user environments when some form of mutually exclusive operation would be quite useful.

Cassandra 2.0 Lightweight transactions (compare-and-set)

As we already mentioned in our previous post Cassandra 2.0 introduces a new feature to CQL: optimistic lock-style “lightweight transactions” or “compare-and-set” DML operations.

INSERT INTO USERS(userid, first_name, last_name) VALUES ('OpenCredo', 'Open', 'Credo') IF NOT EXISTS;

"The CQL language has been extended with the IF clause for INSERT and UPDATE commands, which lets the user invoke a data modification operation pending some condition specified in the IF part, and have a guaranteed isolation for the test and the modification: no other process can change the values while such a compare-and-set command runs.

The addition of lightweight transactions enables use cases that simply weren’t possible to implement safely in prior releases. This is achieved without compromising Cassandra’s ability to scale, as would be the case if traditional RDBMS-style transactions relying on locking were used."

Pro

Increased feature parity with relational databases
No need for any external service, tool or synchronization mechanism
Retains its good scalability characteristics

Con
‍
Does not offer true ACID transactions
First release included major bugs, which may now be fixed
Operations become slower

About terminology

Cassandra documentation consistently refers to this feature as “Lightweight Transactions or (Compare and Set)”. It's important to point out that even though this newly introduced mechanism is similar or equal to certain well-known use-cases for transactions, and can be very useful, it still does not give you anything like the full power of an ACID transaction.

In our posts we are going to refer them as compare-and-set operations which much more accurately describes what these commands actually do: in a single, atomic operation comparing a value of a column in the database and applying a modification depending on the result of the comparison.

However, Cassandra still does not support the execution of a sequence of arbitrary operations in an atomic and isolated way, like transactions in relational databases. Using the term “compare-and-set” avoids any potential confusion with RDBMS transactions.

A short warning before we start

Even though Cassandra introduced the support for compare-and-set operations, there were several bugs in the initial release, such as

CAS does not always correctly replay inProgress rounds (fixed in 2.0.1)
CAS may return false but still commit the insert (fixed in 2.0.1)‍
Paxos replay of in progress update is incorrect (fixed in 2.0.1)

Most of these issues were fixed in the later (2.0.1, 2.0.2) server releases, but compare-and-set operations are still considered fresh additions to Cassandra, and we recommend some level of caution before you start using them.

You should at least

Consider whether you really need to use them
Get updated about the latest bugs and releases of Cassandra
Be prepared to fix any data that might get corrupted because of implementation errors

If you are interested in a more detailed analysis about Cassandra's strange behaviour in certain cases, we would recommend reading through this article. There are some issues it describes that are specific for Cassandra 2.0's new features (some of these have already been fixed) and some others that have been around for a while.

In our posts we are going to use the Java driver for Cassandra, version 2.0.0-beta2. This, being marked as “beta”, might not be your first choice to run a production system with.

Performance considerations and other surprises

Compare-and-set operations are very useful, but unfortunately they come with a price tag attached. Cassandra has been historically very fast when writing data, but a compare-and-set is not simply a write as it was before; there is a lot more going on in the background and it requires more communication between the nodes of any non-trivial Cassandra cluster.

Additionally to the overhead created by the Paxos protocol, compare-and-set operations will always use at least a consistency level effectively equivalent to QUORUM (called SERIAL) when writing, even if you explicitly specify a lower level, e.g. ANY or ONE.

Setting consistency level to ALL will make the write execute on all replicas if the condition is met, but the comparison itself is executed against a QUORUM number of nodes. As a result, a write operation with ALL consistency level that fails to meet the specified check may not throw an Exception, even if some replica nodes are not accessible.

Enforcing the minimum SERIAL level of write consistency in itself adds some performance overhead, but during our brief tests, even compared to a non-conditional QUORUM level write we observed that it took measurably longer to execute a compare-and-set operation.

In any case you need to carefully consider whether you want (and really need) to use compare-and-set, but at least now there is a choice to be made.

What's coming

In the next articles we're going to cover the conditional INSERT and UPDATE statements, showing a usage via the Java driver and a detailed description of what we think you can use them for.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

Share this post

Copied!

Cassandra