Data Analysis Archives

November 22, 2023 | Blog, Data Analysis

Let’s Flink on EKS: Data Lake Primer

Check out the latest blog by Our Senior Consultant Howard Hill where he offers an engineer’s guide to streamlining real-time data using an open-model infrastructure.

September 19, 2023 | Blog, Data Analysis

Yow! London – Searching for Research Fraud in OpenAlex with Graph Data Science (Recording)

Check out our Lead Consultant Ebru Cucen and Sage Publishing Data Scientist Adam Day co-present on “Research Fraud Detection in OpenAlex with Graph Data Science” at YOW! London 2022.

March 9, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 3

Check out the last part of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

February 16, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 2

Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

February 1, 2023 | Blog, Data Analysis, Neo4j

NODES 2022 – Neo4j Online Developer Education Summit 2022 – Tracing your data’s DNA

As data becomes ubiquitous and deeply interconnected, tracing where who or which system that data comes from – its lineage – will create bigger problems and opportunities for us on the horizon. Watch the recording of James Bowkett talk from NODES 2022 – Neo4j Online Developer Education Summit 202 on ‘Tracing Your Data’s DNA.’

January 26, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 1

Fahran Wallace and Ebru Cucen’s most recent blog post is part 1 of a three-part series. They investigate how OpenCredo ingested 400 million nodes with a billion relationships into Neo4j.

December 7, 2022 | Blog, Data Analysis

Seeing Through The Modern Data Stack

Learn more about data processing and analytics, which are essential systems in modern enterprises but are frequently overlooked aspects of modern data architectures, by reading Mateus Pimenta’s latest blog.

June 16, 2022 | Data Analysis, Data Engineering

Lunch & Learn: Scala in 2022

In this lunch & learn session Matt Farrow shares the new developments in Scala and what it could all mean for Scala’s future.

May 26, 2022 | Data Analysis, Data Engineering

Devoxx UK 2022 – Tracing Your Data’s DNA

As data becomes ubiquitous and deeply interconnected, tracing where who or which system that data comes from – its lineage – will create bigger problems and opportunities for us on the horizon. Watch the recording of James Bowkett’s talk from Devoxx UK on ‘Tracing Your Data’s DNA’

March 24, 2022 | Data Analysis, Data Engineering

Lunch & Learn: A Closer Look at Databricks

In this Lunch & Learn session our Senior Consultant, Seb Margineanu shares an overview of the Databricks Lakehouse Platform by exploring the evolution of Databricks.

May 22, 2017 | Data Analysis

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

April 25, 2017 | Cassandra, Data Analysis, Data Engineering

New Blog Series: Spark – The Pragmatic Bits

Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.

March 23, 2017 | Cassandra, Data Analysis, Data Engineering

Data Analytics using Cassandra and Spark

In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.

However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.

March 7, 2017 | Data Analysis, GCP

Google Cloud Spanner: our first impressions

Google has recently made its internal Spanner database available to the wider public, as a hosted solution on Google Cloud. This is a distributed relational/transactional database used inside for various Google projects (including F1, the advertising backend), promising high throughput, low latency and 99.999% availability. As such it is an interesting alternative to many open source or other hosted solutions. This whitepaper gives a good theoretical introduction into Spanner.

January 23, 2017 | Data Analysis

What I Don’t Like About Error Handling in Go, and How to Work Around It

More often than not, people who write Go have some sort of opinion on its error handling model. Depending on your experience with other languages, you may be used to different approaches. That’s why I’ve decided to write this article, as despite being relatively opinionated, I think drawing on my experiences can be useful in the debate. The main issues I wanted to cover are that it is difficult to force good error handling practice, that errors don’t have stack traces, and that error handling itself is too verbose.

October 13, 2016 | Data Analysis

From Java to Go, and Back Again

In Lisp, you don’t just write your program down toward the language, you also build the language up toward your program. As you’re writing a program you may think “I wish Lisp had such-and-such an operator.” So you go and write it. Afterward you realize that using the new operator would simplify the design of another part of the program, and so on. Language and program evolve together…In the end your program will look as if the language had been designed for it. And when language and program fit one another well, you end up with code which is clear, small, and efficient – Paul Graham, Programming Bottom-Up

August 5, 2015 | Data Analysis, Data Engineering

Building a Google analytics dashboard with Python3, Tornado and deploying it on OpenShift (for free)

A few weeks ago, we thought about building a Google analytics dashboard to give us easy access to certain elements of our Google Analytics web traffic. We saw some custom dashboards for bloggers, but nothing quite right for our goal, since we wanted the data on a big screen for everyone in the office to view.

February 8, 2012 | Data Analysis, Data Engineering

A Simple Introduction to Complex Event Processing – Stock Ticker End-to-End Sample

Most of the important players in this space are large IT corporations like Oracle and IBM with their commercial (read expensive) offerings.

While most of CEP products offer some great features, it’s license model and close code policy doesn’t allow developers to play with them on pet projects, which would drive adoption and usage of CEP in every day programming.

Blog Category: Data Analysis