Open Credo

13 items found: Search results for "spark" in all categories x

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

May 22, 2017 | Data Analysis

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.

 

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

Read More Read More

Testing a Spark Application

May 9, 2017 | Cassandra

Testing a Spark Application

Data analytics isn’t a field commonly associated with testing, but there’s no reason we can’t treat it like any other application. Data analytics services are often deployed in production, and production services should be properly tested. This post covers some basic approaches for the testing of Cassandra/Spark code. There will be some code examples, but the focus is on how to structure your code to ensure it is testable!

 

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

Read More Read More

Deploy Spark with an Apache Cassandra cluster

May 2, 2017 | Cassandra, Data Engineering

Deploy Spark with an Apache Cassandra cluster

My recent blogpost I explored a few cases where using Cassandra and Spark together can be useful. My focus was on the functional behaviour of such a stack and what you need to do as a developer to interact with it. However, it did not describe any details about the infrastructure setup that is capable of running such Spark code or any deployment considerations. In this post, I will explore this in more detail and show some practical advice in how to deploy Spark and Apache Cassandra.

Read More Read More

New Blog Series: Spark – The Pragmatic Bits

April 25, 2017 | Cassandra, Data Analysis, Data Engineering

New Blog Series: Spark – The Pragmatic Bits

Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.

 

 

Read More Read More

Data Analytics using Cassandra and Spark

March 23, 2017 | Cassandra, Data Analysis, Data Engineering

Data Analytics using Cassandra and Spark

In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.

However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.

Read More Read More

Watch ‘Elastic Analytics with Spark, Mesos and Docker’

May 13, 2015 | Software Consultancy

Watch ‘Elastic Analytics with Spark, Mesos and Docker’

Listen to Brenden Matthews discuss Elastic Analytics with Spark, Mesos and Docker as filmed at the most recent London Mesos User Group Meetup.

In this talk, Brenden Matthews discusses how he provided elastic analytics to Airbnb and how the Mesosphere DCOS can easily bring the same type of infrastructure to your own environments.

Read More Read More

Ingesting Big Data into Neo4j – Part 2

February 16, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 2

Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

Read More Read More

Event Replaying with Hazelcast Jet

February 13, 2017 | Data Engineering

Event Replaying with Hazelcast Jet

One of the stated intentions behind the design of Java 8’s Streams API was to take better advantage of the multi-core processing power of modern computers. Operations that could be performed on a single, linear stream of values could also be run in parallel by splitting that stream into multiple sub-streams, and combining the results from processing each sub-stream as they became available.

Read More Read More

How Not To Use Cassandra Like An RDBMS (and what will happen if you do)

September 15, 2016 | Cassandra

How Not To Use Cassandra Like An RDBMS (and what will happen if you do)

Cassandra isn’t a relational database management system, but it has some features that make it look a bit like one. Chief among these is CQL, a query language with an SQL-like syntax. CQL isn’t a bad thing in itself – in fact it’s very convenient – but it can be misleading since it gives developers the illusion that they are working with a familiar data model, when things are really very different under the hood.

Read More Read More

Patterns of Successful Cassandra Data Modelling

September 6, 2016 | Cassandra

Patterns of Successful Cassandra Data Modelling

A growing number of clients are asking OpenCredo for help with using Apache Cassandra and solving specific problems they encounter. Clients have different use cases, requirements, implementation and teams but experience similar issues. We have noticed that Cassandra data modelling problems are the most consistent cause of Cassandra failing to meet their expectations. Data modelling is one of the most complex areas of using Cassandra and has many considerations.

Read More Read More

Fulfilling the promise of Apache Cassandra performance

August 24, 2016 | Cassandra

Fulfilling the promise of Apache Cassandra performance

At OpenCredo we are seeing an increase in adoption of Apache Cassandra as a leading NoSQL database for managing large data volumes, but we have also seen many clients experiencing difficulty converting their high expectations into operational Cassandra performance. Here we present a high-level technical overview of the major strengths and limitations of Cassandra that we have observed over the last few years while helping our clients resolve the real-world issues that they have experienced.

Read More Read More

OpenCredo: First DataStax Partner in the UK to achieve Cassandra Certification
Join the London Mesos User Group to find out more about Elastic Analytics