CockroachDB is a distributed SQL (“NewSQL”) database developed by Cockroach Labs and has recently reached a major milestone: the first production-ready, 1.0 release. We at OpenCredo have been following the progress of CockroachDB for a while, and we think it’s a technology of great potential to become the go-to solution for a having a general-purpose database in the cloud.
My recent blogpost I explored a few cases where using Cassandra and Spark together can be useful. My focus was on the functional behaviour of such a stack and what you need to do as a developer to interact with it. However, it did not describe any details about the infrastructure setup that is capable of running such Spark code or any deployment considerations. In this post, I will explore this in more detail and show some practical advice in how to deploy Spark and Apache Cassandra.
In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.
However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.
Google has recently made its internal Spanner database available to the wider public, as a hosted solution on Google Cloud. This is a distributed relational/transactional database used inside for various Google projects (including F1, the advertising backend), promising high throughput, low latency and 99.999% availability. As such it is an interesting alternative to many open source or other hosted solutions. This whitepaper gives a good theoretical introduction into Spanner.
One of the default Cassandra strategies to deal with more sophisticated queries is to create CQL tables that contain the data in a structure that matches the query itself (denormalization). Cassandra 3.0 introduces a new CQL feature, Materialized Views which captures this concept as a first-class construct.
Microservice-style software architectures have many benefits: loose coupling, independent scalability, localised failures, facilitating the usage of polyglot data persistence tools or multiple programming languages.
However, they also introduce other challenges. A major one is the fact that the end-user functionality of the system will ultimately emerge as a composition of multiple services. This significantly increases the complexity of deploying the system. In addition, because we lose the concept of “versions” of the system, it becomes harder to answer questions like “what capabilities are in production?” and “when is a new feature considered ‘done’?”.
Last year some of us attended the London Spring eXchange where we encountered a new and interesting tool that Pivotal was working on: Spring Boot. Since then we had the opportunity to see what it’s capable of in a live project and we were deeply impressed.
Perhaps the most important of Cassandra’s selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn’t scale well).
Cassandra 2.0 was released in early September this year and came with some interesting new features, including “lightweight transactions” and triggers.
Despite the rising interest in the various non-relational databases in recent years, there are still numerous use-cases for which a relational database system is a better choice. The latest major release of Cassandra (version 2.0) provides some interesting features that aim to close this gap, and offers its fast and distributed storage engine enhanced with new options that will make users’ lives easier.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.