Open Credo

Showing all blogs in category Data Analysis

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

May 22, 2017 | Data Analysis

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.

 

Read More Read More

New Blog Series: Spark – The Pragmatic Bits

April 25, 2017 | Cassandra, Data Analysis, Data Engineering

New Blog Series: Spark – The Pragmatic Bits

Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.

Read More Read More

Data Analytics using Cassandra and Spark

March 23, 2017 | Cassandra, Data Analysis, Data Engineering

Data Analytics using Cassandra and Spark

In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.

However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.

Read More Read More

Google Cloud Spanner: our first impressions

March 7, 2017 | Data Analysis, GCP

Google Cloud Spanner: our first impressions

Google has recently made its internal Spanner database available to the wider public, as a hosted solution on Google Cloud. This is a distributed relational/transactional database used inside for various Google projects (including F1, the advertising backend), promising high throughput, low latency and 99.999% availability. As such it is an interesting alternative to many open source or other hosted solutions. This whitepaper gives a good theoretical introduction into Spanner.

Read More Read More

What I Don’t Like About Error Handling in Go, and How to Work Around It

January 23, 2017 | Data Analysis

What I Don’t Like About Error Handling in Go, and How to Work Around It

More often than not, people who write Go have some sort of opinion on its error handling model. Depending on your experience with other languages, you may be used to different approaches. That’s why I’ve decided to write this article, as despite being relatively opinionated, I think drawing on my experiences can be useful in the debate. The main issues I wanted to cover are that it is difficult to force good error handling practice, that errors don’t have stack traces, and that error handling itself is too verbose.

Read More Read More

From Java to Go, and Back Again

October 13, 2016 | Data Analysis

From Java to Go, and Back Again

In Lisp, you don’t just write your program down toward the language, you also build the language up toward your program. As you’re writing a program you may think “I wish Lisp had such-and-such an operator.” So you go and write it. Afterward you realize that using the new operator would simplify the design of another part of the program, and so on. Language and program evolve together…In the end your program will look as if the language had been designed for it. And when language and program fit one another well, you end up with code which is clear, small, and efficient – Paul Graham, Programming Bottom-Up

Read More Read More

Building a Google analytics dashboard with Python3, Tornado and deploying it on OpenShift (for free)

August 5, 2015 | Data Analysis, Data Engineering

Building a Google analytics dashboard with Python3, Tornado and deploying it on OpenShift (for free)

A few weeks ago, we thought about building a Google analytics dashboard to give us easy access to certain elements of our Google Analytics web traffic. We saw some custom dashboards for bloggers, but nothing quite right for our goal, since we wanted the data on a big screen for everyone in the office to view.

Read More Read More

A Simple Introduction to Complex Event Processing – Stock Ticker End-to-End Sample

February 8, 2012 | Data Analysis, Data Engineering

A Simple Introduction to Complex Event Processing – Stock Ticker End-to-End Sample

Most of the important players in this space are large IT corporations like Oracle and IBM with their commercial (read expensive) offerings.

While most of CEP products offer some great features, it’s license model and close code policy doesn’t allow developers to play with them on pet projects, which would drive adoption and usage of CEP in every day programming.

Read More Read More