As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.
If you missed the webcast or would like to watch it again, here is a recording. You can also view this on our YouTube channel
Check out the full blog series, too: Spark, the pragmatic bits
Data analytics using Cassandra and Spark by David Borsos
Cassandra is a highly performant database when used to store large amounts of data, and performing queries for which it has been optimized. However, when it comes to trying to analyze and gain broader insight from the data captured, Cassandra can be cumbersome to work with, and may not be as performant and scalable as needed. This article demonstrates how you can practically combine Apache Spark with Apache Cassandra in order to better deal with such scenarios.
Deploy Spark with an Apache Cassandra cluster by David Borsos
This post will show how you can deploy the open source version of Apache Spark alongside an Apache Cassandra cluster. It also includes a programmable infrastructure code example.
Testing Spark by Matt Long
The ability to write and run adhoc Spark queries is helpful for getting immediate insight into certain data problems, but what happens when these queries needs to form part of a bigger software system? Matt takes you on a journey looking at how you may need to take existing Spark code (in fact the same demo code used in David’s first article), and refactor it in order to make it more testable.