Open Credo

Growing Online Marketplace tackles expanding Data needs with Apache Cassandra.

Kaidee is Thailand’s largest used goods online marketplace, with over 1.5 million items for sale and 7 million monthly visitors. As a result of continual innovation and evolution, they have grown significantly since their inception.

www.kaidee.com

THE CHALLENGE

Kaidee were looking for a solution that would scale seamlessly, support continued the expansion of their data requirements, and automate repetitive tasks. They were anticipating growth from 400 million to several billion records per day, within the upcoming year. As a result of increased user activity, they had a requirement to store an ever-growing volume of data, which could subsequently be accessed and analysed across the business. Their technology solution of choice was Apache Cassandra. Alongside improving the usage of this technology, they also wanted their in-house teams to gain Cassandra skills. Aware of OpenCredo’s expertise with emerging technologies, Kaidee asked for our help in both optimising their Cassandra implementation and training their teams to support the solution.

THE SOLUTION

Discovery

We began our work with Kaidee with an investigation into their current data set-up and its scalability. Kaidee were already using Cassandra on a smaller scale, and some employees were familiar with the basics.

As things stood, Kaidee were using rolling window data storage to maintain a relatively stable data volume, discarding individual legacy records. As such, scaling out of the cluster had not been a requirement. Moving forward, Kaidee wanted to use Cassandra to gather more data and automate repetitive tasks, freeing up employees to work on more complex tasks, whilst also reducing errors in the underlying technical solution. There was a further requirement for high performance in terms of response times.

Cassandra is designed to record large amounts of data, is extremely fast and reliable, and has a dependable implementation of distributed counters. Kaidee’s CTO, Mark Hollow, had prior experience using Cassandra at scale, and identified two mandatory requirements:

Use Case identification: Clearly identify the use cases where Cassandra would add value, and also the ones it was unsuitable for.

Embedding Cassandra knowledge: Ensure Kaidee’s development and operations teams acquired deep knowledge and hands-on skills, to use Cassandra optimally and avoid common pitfalls.

Delivery

Use Case identification

Kaidee used Cassandra for a subset of their data storage requirements. We looked at use cases where Cassandra was the best fit, and others where a different storage solution (i.e. a graph database) would be more appropriate. From this and our discovery work, we established a primary use case: the need to record classified and display advert impressions, as events and counts. This data would be used for live scheduling of sellers’paid promotions, and tailored delivery of promoted classifieds to buyers.

Embedding Cassandra Knowledge

We ran a tailored five-day Skills Workshop with Kaidee’s Development and Operations teams. Whilst the broad format and content was agreed on upfront, we allowed flexibility in order to respond to specific learning and business needs as they arose. The workshops were hands-on for the majority of the time, and covered a breadth of Cassandra-related skills:-

Understanding System Internals:

Beginning with the basics, weexplored Cassandra’s strengths and weaknesses, as well as its distributed nature, internal architecture and storage mechanisms. Cassandra is a Java application, and for Kaidee’s engineers (with little to no Java experience) it was important to cover some basics of the Java Virtual Machine. From this foundation, we were able to help Kaidee deepen their understanding of the Cassandra features of most value to them. We did this by devising a series of exercises designed to validate designs and diagnose operational problems.

Working with Cassandra:

By delving into Cassandra’s memory model, we showed Kaidee how to diagnose some of the most common Cassandra performance problems. We complemented this with hands-on teaching of the most common operational tasks, automation options and less common operational pitfalls.

Responding to Kaidee’s needs, we gave special attention to the following:

  • Manually setting up a multi-node cluster
  • Efficient data modelling
  • Configuring the application side driver for Cassandra Backing up and restoring data
  • Resizing the cluster
  • Replacing nodes
  • Repairing data for inter-node consistency

Furthermore, we dedicated considerable effort to ensuring Kaidee had the knowledge and confidence to use the tools shipped with Cassandra.

Mastering the Data Modelling Workflow:

Kaidee wanted their teams to be able to develop data models. To facilitate this, we ran through some general data modelling exercises, involving business stakeholders to arrive at solutions that served the business’ needs.

THE RESULT

As a result of our work, we were able to leave the Kaidee’s Development and Operations teams with:

  • A breadth of practical Cassandra skills including features, limitations and implementation.
  • Confidence to expand Cassandra usage, to more use cases related to storage of critical data.
  • Ability to support a scalable production cluster, and respond to Kaidee’s continual growth.