19 items found: Search results for "spark" in all categories x
May 22, 2017 | Data Analysis
As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
Join us as we conclude our recent Apache Spark series with a webinar that will explore the use case of “Detecting stolen AWS credential usage with Spark”
May 9, 2017 | Cassandra
Data analytics isn’t a field commonly associated with testing, but there’s no reason we can’t treat it like any other application. Data analytics services are often deployed in production, and production services should be properly tested. This post covers some basic approaches for the testing of Cassandra/Spark code. There will be some code examples, but the focus is on how to structure your code to ensure it is testable!
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
May 2, 2017 | Cassandra, Data Engineering
My recent blogpost I explored a few cases where using Cassandra and Spark together can be useful. My focus was on the functional behaviour of such a stack and what you need to do as a developer to interact with it. However, it did not describe any details about the infrastructure setup that is capable of running such Spark code or any deployment considerations. In this post, I will explore this in more detail and show some practical advice in how to deploy Spark and Apache Cassandra.
April 25, 2017 | Cassandra, Data Analysis, Data Engineering
Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.
March 23, 2017 | Cassandra, Data Analysis, Data Engineering
In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.
However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.
May 13, 2015 | Software Consultancy
Listen to Brenden Matthews discuss Elastic Analytics with Spark, Mesos and Docker as filmed at the most recent London Mesos User Group Meetup.
In this talk, Brenden Matthews discusses how he provided elastic analytics to Airbnb and how the Mesosphere DCOS can easily bring the same type of infrastructure to your own environments.
February 16, 2023 | Blog, Data Analysis, Neo4j
Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.
Join us for Applied Data Engineering Meet Up #5 with Ben Evans, Co-Founder of jClarity, who will be talking some of their work with Hazelcast Jet and Spark.
Applied Data Engineering is a meetup for all things Data! Join us for our first meetup on the 19th of July
Join OpenCredo at Devoxx UK 2017 We are pleased to announce that we are sponsoring and attending Devoxx UK this year The Devoxx Family welcomes annually over 11,000 developers to events in Belgium, France, UK, Poland, Morocco & USA. Devoxx UK returns to London 11th – 12th May, 2017. They will again welcome amazing speakers and attendees for the very best developer content and […]
February 13, 2017 | Data Engineering
One of the stated intentions behind the design of Java 8’s Streams API was to take better advantage of the multi-core processing power of modern computers. Operations that could be performed on a single, linear stream of values could also be run in parallel by splitting that stream into multiple sub-streams, and combining the results from processing each sub-stream as they became available.
September 15, 2016 | Cassandra
Cassandra isn’t a relational database management system, but it has some features that make it look a bit like one. Chief among these is CQL, a query language with an SQL-like syntax. CQL isn’t a bad thing in itself – in fact it’s very convenient – but it can be misleading since it gives developers the illusion that they are working with a familiar data model, when things are really very different under the hood.
September 6, 2016 | Cassandra
A growing number of clients are asking OpenCredo for help with using Apache Cassandra and solving specific problems they encounter. Clients have different use cases, requirements, implementation and teams but experience similar issues. We have noticed that Cassandra data modelling problems are the most consistent cause of Cassandra failing to meet their expectations. Data modelling is one of the most complex areas of using Cassandra and has many considerations.
August 24, 2016 | Cassandra
At OpenCredo we are seeing an increase in adoption of Apache Cassandra as a leading NoSQL database for managing large data volumes, but we have also seen many clients experiencing difficulty converting their high expectations into operational Cassandra performance. Here we present a high-level technical overview of the major strengths and limitations of Cassandra that we have observed over the last few years while helping our clients resolve the real-world issues that they have experienced.
It’s now easier then ever to achieve elastic analytics for your company! Join Mesospheres Systems Architect, Brenden Matthews discussion on ‘Elastic Analytics with Sparks, Mesos & Docker’.