Showing all posts by David Borsos
June 15, 2017 | Data Engineering
CockroachDB is a distributed SQL (“NewSQL”) database developed by Cockroach Labs and has recently reached a major milestone: the first production-ready, 1.0 release. We at OpenCredo have been following the progress of CockroachDB for a while, and we think it’s a technology of great potential to become the go-to solution for a having a general-purpose database in the cloud.
My recent blogpost I explored a few cases where using Cassandra and Spark together can be useful. My focus was on the functional behaviour of such a stack and what you need to do as a developer to interact with it. However, it did not describe any details about the infrastructure setup that is capable of running such Spark code or any deployment considerations. In this post, I will explore this in more detail and show some practical advice in how to deploy Spark and Apache Cassandra.
In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.
However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.
Google has recently made its internal Spanner database available to the wider public, as a hosted solution on Google Cloud. This is a distributed relational/transactional database used inside for various Google projects (including F1, the advertising backend), promising high throughput, low latency and 99.999% availability. As such it is an interesting alternative to many open source or other hosted solutions. This whitepaper gives a good theoretical introduction into Spanner.
February 16, 2017 | Cassandra
One of the default Cassandra strategies to deal with more sophisticated queries is to create CQL tables that contain the data in a structure that matches the query itself (denormalization). Cassandra 3.0 introduces a new CQL feature, Materialized Views which captures this concept as a first-class construct.
March 2, 2016 | Microservices
Microservice-style software architectures have many benefits: loose coupling, independent scalability, localised failures, facilitating the usage of polyglot data persistence tools or multiple programming languages.
However, they also introduce other challenges. A major one is the fact that the end-user functionality of the system will ultimately emerge as a composition of multiple services. This significantly increases the complexity of deploying the system. In addition, because we lose the concept of “versions” of the system, it becomes harder to answer questions like “what capabilities are in production?” and “when is a new feature considered ‘done’?”.
Last year some of us attended the London Spring eXchange where we encountered a new and interesting tool that Pivotal was working on: Spring Boot. Since then we had the opportunity to see what it’s capable of in a live project and we were deeply impressed.
February 17, 2014 | Cassandra
In our previous posts we gave an overview of Cassandra’s new compare-and-set (lightweight transaction) commands and a more detailed look into the API for using them when inserting new rows into the database.
In this third post, we are going to cover update statements. We recommend reading the previous posts, as there are some details which are the same for inserts and updates which are not repeated here.
January 6, 2014 | Cassandra
The team over at Cucumber Pro recently posted a sneak peek on their blog, demonstrating some key features of their offering.
As more of a technical user of Cucumber, there isn’t much that’s new or ground-breaking for me – almost every feature is already available through your preferred IDE combined with a few plugins.
December 2, 2013 | Cassandra
Perhaps the most important of Cassandra’s selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn’t scale well).
November 14, 2013 | Cassandra
Cassandra 2.0 was released in early September this year and came with some interesting new features, including “lightweight transactions” and triggers.
Despite the rising interest in the various non-relational databases in recent years, there are still numerous use-cases for which a relational database system is a better choice. The latest major release of Cassandra (version 2.0) provides some interesting features that aim to close this gap, and offers its fast and distributed storage engine enhanced with new options that will make users’ lives easier.