As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.
Cassandra is a highly performant database when used to store large amounts of data, and performing queries for which it has been optimized. However, when it comes to trying to analyze and gain broader insight from the data captured, Cassandra can be cumbersome to work with, and may not be as performant and scalable as needed. This article demonstrates how you can practically combine Apache Spark with Apache Cassandra in order to better deal with such scenarios.
The ability to write and run adhoc Spark queries is helpful for getting immediate insight into certain data problems, but what happens when these queries needs to form part of a bigger software system? Matt takes you on a journey looking at how you may need to take existing Spark code (in fact the same demo code used in David’s first article), and refactor it in order to make it more testable.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.