Kaidee is Thailand’s largest consumer-to-consumer online marketplace with over 1.5 million items for sale and 650,000 daily visitors. They have chosen Apache Cassandra to store a vast number of events generated by user activity and crucial to analytics across several operational domains. Kaidee were looking for a solution that would scale seamlessly and keep up with the continuing growth of their business and data requirements. They expect a smooth transition from 400 million records a day to several billion per day the following year.
Current Use Case: Storing Events and Counts
Kaidee are using Cassandra to record classified and display ad impressions as events and counts. This data is used for live scheduling of sellers’ paid promotions, delivering unique views of promoted classifieds to buyers. This is an excellent use case for Cassandra as it is designed to record large amounts of data – it is extremely fast and reliable at writing and its newer versions have a dependable implementation of distributed counters. There are performance requirements for response times while consistency is a tradeoff as the platform can slightly over-deliver if necessary. On top of that: Kaidee use rolling window analysis on saved data and periodically discard old records, data volume is relatively stable and scaling out of the cluster has not been needed so far. All of the above has made using Cassandra easy and risk free, even without extensive prior experience.
Along with its current use cases, Kaidee would like to use Cassandra for additional workflows which involve gathering data for automatic decision making currently performed by dedicated staff members. While the prospect of automating repetitive tasks and freeing up employees to work on more complex tasks is promising, it also leaves less room for errors in the underlying technical solution. “This engagement let OpenCredo deliver immediate value to the client in a very short time.” “Our clients often acquire a deeper understanding of Cassandra the hard way – through fighting serious issues in production environments. We were genuinely happy to help Kaidee avoid this painful experience by preparing their teams for real life scenarios in advance,” Tareq Abedrabbo, OpenCredo CEO said. “As a result of our workshop, the team has enough knowledge to avoid most common pitfalls, and if things do go wrong, they are also equipped with an arsenal of operational and problem solving skills.”
The Goal: Cassandra the Right Way
Kaidee CTO, Mark Hollow, had prior experience using Cassandra at scale and identified two mandatory prerequisites to expanding its use at Kaidee. First and foremost, it is important to be able to clearly tell apart the use cases in which Cassandra would add value and the ones it was not suitable for. Secondly, Mark wanted his development and operations teams to have knowledge and skills solid enough to avoid the common pitfalls and use Cassandra the right way from the beginning. Kaidee turned to OpenCredo for help with preparing the teams for practical work with Cassandra.
Kaidee were already using Cassandra successfully on a smaller scale and had several employees familiar with the basics of the database. Management is eager to leverage Cassandra to boost efficiency of business processes through automation. However, this is approached with a reasonable amount of caution. For a growing business to rely on the new solution, it has to be stable and predictable from the very beginning – both technically and operationally. Kaidee has to ensure that engineering teams possess deep knowledge of Cassandra and are prepared to apply them in everyday work. This could not be achieved efficiently through individual learning or classroom style training.
The Solution: A Tailored All Inclusive Skills Workshop
Kaidee opted for an intensive five day workshop tailored to their team and use cases. Although the contents of the workshop were agreed upon in advance, their order and depth of coverage were continuously adjusted to the knowledge and pace of participants. By working through hands on sessions and live demos at every stage of the workshop we have achieved a high average level of fundamental knowledge and skills across the entire team. Regardless of their role, both development and operations teams now possess a full set of Cassandra skills – from manually setting up a multi-node cluster to efficient data modelling and configuring the application side driver for Cassandra.
Tailoring the workshop to specific needs of Kaidee teams was of paramount importance. On the one hand, the speed of comprehension among participants was above average and allowed for a faster-than-expected pace, which required adjusting the daily topic coverage. On the other hand, it became necessary to go into higher than expected level of detail on several topics as the team asked questions which unfolded into discussions of possible solutions. This flexibility allowed OpenCredo to transfer a vast amount of practical experience to the team in just one week. While highly customised, the coaching workshop was largely focused on the following topics.
Understanding System Internals
To truly understand the strengths and weaknesses of Cassandra, it is necessary to understand its distributed nature, internal architecture and storage mechanism. We started learning Cassandra ground up by looking at its basic operations together with how they operate behind the scenes. This has proven invaluable to the team, who as a result were able to successfully reason about efficient and inefficient ways to use Cassandra from day one. We explored the internal implementation of features most important to Kaidee in greater detail through a series of exercises aimed at teaching the team how to validate their designs and diagnose operational problems.
Expert Approach to Working With Cassandra
Among the main goals of this engagement was to teach Kaidee teams the independent approach to working with Cassandra and eliminate their need for external support. We dedicated considerable effort to ensuring Kaidee had the knowledge and confidence to use the tools shipped with Cassandra to understand its inner workings as well as to diagnose, investigate and solve various problems.
We looked at Cassandra as an application written in Java and discussed its memory model: its basic understanding helps diagnose some of the most common Cassandra performance problems. For engineers who use little to no Java, it was important to cover some basics of the Java Virtual Machine for a better understanding of Cassandra internals. Finally, we covered common operational tasks, their implementation and automation options and less well-known operational pitfalls.
Crucial operations such as backing up and restoring data, resizing the cluster, replacing nodes and repairing data for inter-node consistency has been given special attention. Following a series of hands on exercises the team is ready to operationally support a growing production cluster.
Mastering the Data Modelling Workflow
Kaidee currently store mostly time series data, events and counters and wanted the team to have practical data modelling skills to develop models for the new use cases. In preparation for solving Kaidee specific use cases, we applied the knowledge about Cassandra internals from the early days of the workshop to general data modelling exercises, identifying the various tradeoffs and learning how to mitigate them. The exercises included working closely with business stakeholders to jointly come up with feasible, scalable and sustainable solutions.
The last section of the workshop was dedicated to reviewing and extending Kaidee use cases for Cassandra. By the end of the training the team was collectively ready to reason about existing data models and design new solutions with minimal guidance. When discussing new use cases, participants productively debated various approaches to each problem and came up with viable solutions.
Best Use Cases for Cassandra
Kaidee use Cassandra only for a subset of their data storage requirements that it is most suited for. We looked at the type of use cases where Cassandra is the best fit and others where a different storage solution is more appropriate, for example, a graph database. Another of the use cases discussed would be ideally solved using Cassandra as part of a wider architectural solution alongside a search engine and/or in-memory cache.
The Outcome: Ready for More Cassandra Use Cases
As a result of the workshop Kaidee are now confident in their level of technical expertise to bring more use cases into Cassandra and allow for it to store part of business critical data. Engineering teams have a wide range of practical skills with Cassandra and good fundamental understanding of its features, limitations and internal implementation broad enough to support them in everyday work and deeper learning.
OpenCredo believes great IT consultancy is founded in deep and broad real-world experience. Our highly capable and experienced people provide applied knowledge of the latest technologies and organisational transformation methodologies. These capabilities stretch across the entire spectrum of the software development process, from architecture through software engineering to operations (and ‘DevOps’), and encompass transformational skills such C-level strategy creation, organisational design and change management.
OpenCredo are transparent, pragmatic and objective-driven, and we act as a trusted advisor to our clients. We are founded on excellence in software engineering, and appreciate that sustainable and lasting change often occurs when transformation is driven throughout an organisation, not just at the technology level. We deliver tangible value, not just slide decks or fluff.