December 2, 2013 | Cassandra
Perhaps the most important of Cassandra’s selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn’t scale well).
Dropping support for transactions is not necessarily a problem in many use-cases, however there are some scenarios in multi-threaded/multi-user environments when some form of mutually exclusive operation would be quite useful.
As we already mentioned in our previous post Cassandra 2.0 introduces a new feature to CQL: optimistic lock-style “lightweight transactions” or “compare-and-set” DML operations.
INSERT INTO USERS(userid, first_name, last_name) VALUES ('OpenCredo', 'Open', 'Credo') IF NOT EXISTS;
“The CQL language has been extended with the IF clause for INSERT and UPDATE commands, which lets the user invoke a data modification operation pending some condition specified in the IF part, and have a guaranteed isolation for the test and the modification: no other process can change the values while such a compare-and-set command runs.
The addition of lightweight transactions enables use cases that simply weren’t possible to implement safely in prior releases. This is achieved without compromising Cassandra’s ability to scale, as would be the case if traditional RDBMS-style transactions relying on locking were used.”
Cassandra documentation consistently refers to this feature as “Lightweight Transactions or (Compare and Set)”. It’s important to point out that even though this newly introduced mechanism is similar or equal to certain well-known use-cases for transactions, and can be very useful, it still does not give you anything like the full power of an ACID transaction.
In our posts we are going to refer them as compare-and-set operations which much more accurately describes what these commands actually do: in a single, atomic operation comparing a value of a column in the database and applying a modification depending on the result of the comparison.
However, Cassandra still does not support the execution of a sequence of arbitrary operations in an atomic and isolated way, like transactions in relational databases. Using the term “compare-and-set” avoids any potential confusion with RDBMS transactions.
Even though Cassandra introduced the support for compare-and-set operations, there were several bugs in the initial release, such as
Most of these issues were fixed in the later (2.0.1, 2.0.2) server releases, but compare-and-set operations are still considered fresh additions to Cassandra, and we recommend some level of caution before you start using them.
You should at least
If you are interested in a more detailed analysis about Cassandra’s strange behaviour in certain cases, we would recommend reading through this article. There are some issues it describes that are specific for Cassandra 2.0’s new features (some of these have already been fixed) and some others that have been around for a while.
In our posts we are going to use the Java driver for Cassandra, version 2.0.0-beta2. This, being marked as “beta”, might not be your first choice to run a production system with.
Compare-and-set operations are very useful, but unfortunately they come with a price tag attached. Cassandra has been historically very fast when writing data, but a compare-and-set is not simply a write as it was before; there is a lot more going on in the background and it requires more communication between the nodes of any non-trivial Cassandra cluster.
Additionally to the overhead created by the Paxos protocol, compare-and-set operations will always use at least a consistency level effectively equivalent to QUORUM (called SERIAL) when writing, even if you explicitly specify a lower level, e.g. ANY or ONE.
Setting consistency level to ALL will make the write execute on all replicas if the condition is met, but the comparison itself is executed against a QUORUM number of nodes. As a result, a write operation with ALL consistency level that fails to meet the specified check may not throw an Exception, even if some replica nodes are not accessible.
Enforcing the minimum SERIAL level of write consistency in itself adds some performance overhead, but during our brief tests, even compared to a non-conditional QUORUM level write we observed that it took measurably longer to execute a compare-and-set operation.
In any case you need to carefully consider whether you want (and really need) to use compare-and-set, but at least now there is a choice to be made.
In the next articles we’re going to cover the conditional INSERT and UPDATE statements, showing a usage via the Java driver and a detailed description of what we think you can use them for.