Kaidee is Thailand’s largest used goods online marketplace, with over 1.5 million items for sale and 7 million monthly visitors. As a result of continual innovation and evolution, they have grown significantly since their inception.
Kaidee needed a solution that would scale, support continued expansion of their data requirements, and automate repetitive tasks.
They were anticipating growth from 400 million to several billion records per day, within the upcoming year. As a result of increased user activity, they had a requirement to store an ever-growing volume of data, which could subsequently be accessed and analysed across the business.
Their technology solution of choice was Apache Cassandra. Alongside improving the usage of this technology, they also wanted their in-house teams to gain Cassandra skills. Aware of OpenCredo’s expertise with emerging technologies, Kaidee asked for our help in both optimising their Cassandra implementation and training their teams to support the solution.
The partnership incorporated both Discovery and Delivery phases.
DISCOVERY
We began our work with Kaidee with an investigation into their current data set-up and its scalability. Kaidee were already using Cassandra on a smaller scale, and some employees were familiar with the basics.
As things stood, Kaidee were using rolling window data storage to maintain a relatively stable data volume, discarding individual legacy records. As such, scaling out of the cluster had not been a requirement. Moving forward, Kaidee wanted to use Cassandra to gather more data and automate repetitive tasks, freeing up employees to work on more complex tasks, whilst also reducing errors in the underlying technical solution. There was a further requirement for high performance in terms of response times.
Cassandra is designed to record large amounts of data, is extremely fast and reliable, and has a dependable implementation of distributed counters. Kaidee’s CTO, Mark Hollow, had prior experience using Cassandra at scale, and identified two mandatory requirements:
DELIVERY
Use Case identification
Kaidee used Cassandra for a subset of their data storage requirements. We looked at use cases where Cassandra was the best fit, and others where a different storage solution (i.e. a graph database) would be more appropriate.
From this and our discovery work, we established a primary use case: the need to record classified and display advert impressions, as events and counts. This data would be used for live scheduling of sellers’ paid promotions, and tailored delivery of promoted classifieds to buyers.
Embedding Cassandra Knowledge
We ran a tailored five-day skills workshop with Kaidee’s Development and Operations teams. Whilst the broad format and content was agreed on upfront, we allowed flexibility in order to respond to specific learning and business needs as they arose.
The workshops were hands-on for the majority of the time, and covered a breadth of Cassandra-related skills:-
Understanding System Internals:
Beginning with the basics, we explored Cassandra’s strengths and weaknesses, as well as its distributed nature, internal architecture and storage mechanisms. Cassandra is a Java application, and for Kaidee’s engineers (with little to no Java experience) it was important to cover some basics of the Java Virtual Machine.
From this foundation, we were able to help Kaidee deepen their understanding of the Cassandra features of most value to them. We did this by devising a series of exercises designed to validate designs and diagnose operational problems.
Working with Cassandra:
By delving into Cassandra’s memory model, we showed Kaidee how to diagnose some of the most common Cassandra performance problems. We complemented this with hands-on teaching of the most common operational tasks, automation options and less common operational pitfalls.
Responding to Kaidee’s needs, we gave special attention to:
We dedicated considerable effort to ensuring Kaidee had the knowledge and confidence to use the tools shipped with Cassandra.
Mastering the Data Modelling Workflow:
Kaidee wanted their teams to be able to develop data models. To facilitate this, we ran through some general data modelling exercises, involving business stakeholders to arrive at solutions that served the business’ needs.
As a result or our partnership, we were able to leave the Kaidee team with: