October 24, 2017 | Data Engineering
Cockroach Labs, the creators of CockroachDB are coming to London for the first time since their 1.0 GA Release in May 2017. They will be taking time to talk about “The Hows & Whys of a Distributed SQL Database” at the Applied Data Engineering meetup, hosted and run by us here at OpenCredo.
We have been interested in CockroachDB for a while now, including publishing our initial impressions of the release on our blog. We thought this would be the perfect time to do a bit of a Q&A before the event! I posed Raphael Poss, a core Software Engineer at Cockroach Labs a few questions.
Chief Technology Officer
Cockroach Labs: CockroachDB is a SQL database for building global cloud services. It falls under the category of NewSQL – a relational database that can scale horizontally. To understand the emergence of these NewSQL databases, it is important to take a step back and look at the evolution of the ways we’ve used data over the last several decades. The “original” databases were relational systems with transactional guarantees to ensure data correctness. As technology evolved, however, particularly with the explosion of the internet, the amount of data needing to be stored and retrieved rose exponentially. By the early 2000’s, traditional SQL databases simply were incapable of handling the scale that companies and organizations needed to keep up with the demands of web services.
The solution to this conundrum was NoSQL databases. Built specifically to scale first and foremost, NoSQL databases were fast and gigantic, but sacrificed consistency, causing a whole new set of problems for business and individuals alike. NewSQL databases like CockroachDB offer the best of both worlds, promising the correctness of SQL and the scale of NoSQL. CockroachDB is a prime example of this NewSQL nexus and takes it a step further by being built from the ground up for cloud-native and hybrid deployments.
Cockroach Labs: In order to deal with the sheer volume of data today, businesses have been moving to distributed architectures in recent years. Adapting to this new approach is a major challenge companies currently face. The most widely used relational databases are difficult to scale out and are unable to handle the dynamic nature of cloud environments. NoSQL databases, on the other hand, have come of age alongside the cloud. While these NoSQL solutions can take advantage of the elastic scale cloud deployments offer, they do so at the cost of basic features such as ACID transactions and the consistency and correctness of SQL. CockroachDB is a cloud native database that is built specifically to work in the cloud, scales well, and does so without sacrificing the inherent advantages of a SQL database. It is the best of both worlds.
Cockroach Labs: Yes, we do support JOINs! We wanted CockroachDB to be as easy to use as possible, so we made the decision early on to support the Postgres wire protocol, which means it works out of the box with postgres drivers and ORMs like Hibernate and SQLAlchemy. Our distributed SQL implementation makes it easy for us to support small and large use cases and for developers to scale their applications seamlessly. CockroachDB’s SQL coverage is ever expanding with ACID transactions, secondary indexes, arrays, foreign keys already supported. We’ve also been publicly documenting the progress of distributed JOINs and distributed query processing with the most recent installment Local and Distributed Query Processing in CockroachDB.
Cockroach Labs: We released CockroachDB 1.0 in May, which marked the introduction of CockroachDB for production-ready workloads. Earlier this month, we launched CockroachDB 1.1, which specifically helps teams reach production faster than ever. With our most recent release, we were excited to also share case studies from a few customers including internet giant Baidu, who’s using CockroachDB on an application that sees 50 million writes and 2 TB of data a day.
Now we are working on 1.2, which we expect to release in April. CockroachDB 1.2 will focus on enabling deployments for global data architectures, particularly for companies with strict data domiciling requirements and global customer bases. We’ll also continue investing in expanding our SQL coverage, including releasing JSONB support and SQL sequences. In pursuit of the constant goal of increased performance, users can expect a significant improvement in throughput, latencies, and predictability. You can find a more detailed roadmap on GitHub.
If you are in London and interested in hearing more of what Raphael Poss has to say, please do signup for the Applied Data Engineering Meetup.