Search results for
"apache"

59 items found: Search results for "apache" in all categories x

February 10, 2022 | Data Engineering

Lunch & Learn: Data Pipelines with Apache Hop & Pentaho Data Integration

In this lunch & learn session, Fahran Wallace looks at how Apache Hop and its still active parent project Pentaho Data Integration help tackle data pipeline.

May 22, 2017 | Data Analysis

Detecting stolen AWS credential usage with Apache Spark – Webinar Recording

As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

May 2, 2017 | Cassandra, Data Engineering

Deploy Spark with an Apache Cassandra cluster

My recent blogpost I explored a few cases where using Cassandra and Spark together can be useful. My focus was on the functional behaviour of such a stack and what you need to do as a developer to interact with it. However, it did not describe any details about the infrastructure setup that is capable of running such Spark code or any deployment considerations. In this post, I will explore this in more detail and show some practical advice in how to deploy Spark and Apache Cassandra.

August 24, 2016 | Cassandra

Fulfilling the promise of Apache Cassandra performance

At OpenCredo we are seeing an increase in adoption of Apache Cassandra as a leading NoSQL database for managing large data volumes, but we have also seen many clients experiencing difficulty converting their high expectations into operational Cassandra performance. Here we present a high-level technical overview of the major strengths and limitations of Cassandra that we have observed over the last few years while helping our clients resolve the real-world issues that they have experienced.

February 16, 2015 | Software Consultancy

How to Write Your Own Apache Mesos Framework?

Apache Mesos is often explained as being a kernel for the data-centre; meaning that cluster resources (CPU, RAM, …) are tracked and offered to “user space” programs (i.e. frameworks) to do computations on the cluster.

February 22, 2024 | Blog, Kafka

Use Case: A Kafka Backup Solution

Check out Peter Vegh’s latest blog where he explores a bespoke Kafka backup framework, discussing the approach, architecture and design process to implement and deliver the solution.

November 22, 2023 | Blog, Data Analysis

Let’s Flink on EKS: Data Lake Primer

Check out the latest blog by Our Senior Consultant Howard Hill where he offers an engineer’s guide to streamlining real-time data using an open-model infrastructure.

June 8, 2023 | Blog, Kafka

Kafka: Navigating GDPR Compliance

Check out Greg Nuttall’s latest blog where he looks at the challenges posed by GDPR’s “Right to be Forgotten” in the context of Apache Kafka, and delves deeper into three strategies for overcoming them.

March 9, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 3

Check out the last part of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

February 16, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 2

Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

January 26, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 1

Fahran Wallace and Ebru Cucen’s most recent blog post is part 1 of a three-part series. They investigate how OpenCredo ingested 400 million nodes with a billion relationships into Neo4j.

August 2, 2022 | Blog, Kafka

Controlling Kafka Data Flows using Open Policy Agent

Read Matt Farrow’s blog as he explores the potential for using Open Policy Agent to filter and mask data being sent to and read from Apache Kafka.

March 24, 2022 | Data Analysis, Data Engineering

Lunch & Learn: A Closer Look at Databricks

In this Lunch & Learn session our Senior Consultant, Seb Margineanu shares an overview of the Databricks Lakehouse Platform by exploring the evolution of Databricks.

March 10, 2022 | Data Engineering, Open Source

Lunch & Learn: GraphQL on Springboot

In this lunch & learn session, Ebru Cucen and Alberto Faedda explore the historical background of GraphQL with case examples and a demo of how it can be used.

March 3, 2022 | AWS, Open Source, Software Consultancy

Lunch & Learn: Secure Pipelines Enforcing policies using OPA

Watch our Lunch & Learn by Hieu Doan and Alberto Faedda as they share how engineers and security teams can secure their software development processes with the Secure Pipelines application.

February 11, 2022 | AWS, Cloud, GCP, Kubernetes, Microservices, Open Source, Software Consultancy

DZone Repost: Testing Serverless Functions

Serverless functions are easy to install and upload, but we can’t ignore the basics. This article looks at different strategies related to testing serverless functions.

January 31, 2022 | Blog, Data Engineering

Making Sense of Data with RDF* vs. LPG

There are two camps of Graph database, one side is RDF, where they are strict with their format, and somewhat limited for their extensibility. The other side is LPG, where they can define labels to the relationships. With its recent extension, RDF now allows users to add properties, thus becoming RDF*. In this blog, Ebru explores the structural and performance differences between LPG and RDF*.

July 20, 2021 | Blog, Data Engineering, Kafka

Kafka vs RabbitMQ: The Consumer-Driven Choice

Message and event-driven systems provide an array of benefits to organisations of all shapes and sizes. At their core, they help decouple producers and consumers so that each can work at their own pace without having to wait for the other – asynchronous processing at its best.

In fact, such systems enable a whole range of messaging patterns, offering varying levels of guarantees surrounding the processing and consumption options for clients. Take for example the publish/subscribe pattern, which enables one message to be broadcast and consumed by multiple consumers; or the competing consumer pattern, which enables a message to be processed once but with multiple concurrent consumers vying for the honour—essentially providing a way to distribute the load. The manner in which these patterns are actually realised however, depends a lot on the technology used, as each has its own approach and unique tradeoffs.

In this article we will explore how this all applies to RabbitMQ and Apache Kafka, and how these two technologies differ, specifically from a message consumer’s perspective.

April 20, 2021 | Data Engineering, Machine Learning, Software Consultancy

Machine Learning at scale: first impressions of Kubeflow

Our recent client was a Fintech who had ambitions to build a Machine Learning platform for real-time decision making. The client had significant Kubernetes proficiency, ran on the cloud, and had a strong preference for using free, open-source software over cloud-native offerings that come with lock-in. Several components were spiked with success (feature preparation with Apache Beam and Seldon for model serving performed particularly strongly). Kubeflow was one of the next technologies on our list of spikes, showing significant promise at the research stage and seemingly a good match for our client’s priorities and skills.

That platform slipped down the client’s priority list before completing the research for Kubeflow, so I wanted to see how that project might have turned out. Would Kubeflow have made the cut?

September 22, 2020 | AWS, Blog, Cassandra, Cloud, DevOps, Open Source

Decision time with AWS Keyspaces

With the upcoming Cassandra 4.0 release, there is a lot to look forward to. Most excitingly, and following a refreshing realignment of the Open Source community around Cassandra, the next release promises to focus on fundamentals: stability, repair, observability, performance and scaling.

We must set this against the fact that Cassandra ranks pretty highly in the Stack Overflow most dreaded databases list and the reality that Cassandra is expensive to configure, operate and maintain. Finding people who have the prerequisite skills to do so is challenging.

April 2, 2020 | Machine Learning

It’s 2020, can I have my ML models now?

Recent years have seen many companies consolidate all their data into a data lake/warehouse of some sort. Once it’s all consolidated, what next?

Many companies consolidate data with a field of dreams mindset – “build it and they will come”, however a comprehensive data strategy is needed if the ultimate goals of an organisation are to be realised: monetisation through Machine Learning and AI is an oft-cited goal. Unfortunately, before one rushes into the enticing world of machine learning, one should lay more mundane foundations. Indeed, in data science, estimates vary between 50% to 80% of the time taken is devoted to so-called data-wrangling. Further, Google estimates ML projects produce 5% ML code and 95% “glue code”. If this is the reality we face, what foundations are required before one can dive headlong into ML?

July 30, 2019 | Blog, Kafka

Kafka Connect – Source Connectors: A detailed guide to connecting to what you love.

Writing your own Kafka source connectors with Kafka Connect. In this blog, Rufus takes you on a code walk, through the Gold Verified Venafi Connector while pointing out the common pitfalls

February 20, 2019 | DevOps, Hashicorp, Kafka, Open Source

Securing Kafka using Vault PKI

Creating and managing a Public Key Infrastructure (PKI) could be a very straightforward task if you use appropriate tools. In this blog post, I’ll cover the steps to easily set up a PKI with Vault from HashiCorp, and use it to secure a Kafka Cluster.

January 23, 2018 | Data Engineering, DevOps

A Pragmatic Introduction to Machine Learning for DevOps Engineers

Machine Learning is a hot topic these days, as can be seen from search trends. It was the success of Deepmind and AlphaGo in 2016 that really brought machine learning to the attention of the wider community and the world at large.

August 8, 2017 | Cassandra

Riak, the Dynamo paper and life beyond Basho

Recently, the sad news has emerged that Basho, which developed the Riak distributed database, has gone into receivership. This would appear to present a problem for those who have adopted the commercial version of the Riak database (Riak KV) supported by Basho.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

May 9, 2017 | Cassandra

Testing a Spark Application

Data analytics isn’t a field commonly associated with testing, but there’s no reason we can’t treat it like any other application. Data analytics services are often deployed in production, and production services should be properly tested. This post covers some basic approaches for the testing of Cassandra/Spark code. There will be some code examples, but the focus is on how to structure your code to ensure it is testable!

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

April 25, 2017 | Cassandra, Data Analysis, Data Engineering

New Blog Series: Spark – The Pragmatic Bits

Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.

March 23, 2017 | Cassandra, Data Analysis, Data Engineering

Data Analytics using Cassandra and Spark

In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.

However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.

February 16, 2017 | Cassandra

Everything you need to know about Cassandra Materialized Views

One of the default Cassandra strategies to deal with more sophisticated queries is to create CQL tables that contain the data in a structure that matches the query itself (denormalization). Cassandra 3.0 introduces a new CQL feature, Materialized Views which captures this concept as a first-class construct.

February 13, 2017 | Data Engineering

Event Replaying with Hazelcast Jet

One of the stated intentions behind the design of Java 8’s Streams API was to take better advantage of the multi-core processing power of modern computers. Operations that could be performed on a single, linear stream of values could also be run in parallel by splitting that stream into multiple sub-streams, and combining the results from processing each sub-stream as they became available.

January 25, 2017 | Cassandra

The Three ‘R’s of Distributed Event Processing

One of the simplest and best-understood models of computation is the Finite State Machine (FSM). An FSM has fixed range of states it can be in, and is always in one of these states. When an input arrives, this triggers a transition in the FSM from its current state to the next state. There may be several possible transitions to several different states, and which transition is chosen depends on the input.

October 10, 2016 | Cassandra

Cassandra – The Good, The Bad and the Ugly Webinar Recording

In the culmination of our blog series on the topic, on October 6th 2016 OpenCredo Consultants Dominic Fox, Alla Babkina and Guy Richardson, and hosted by Marco Cullen, presented the common design and implementation issues that they have come across in real-world Apache Cassandra deployments.

September 27, 2016 | Cassandra, Data Engineering

Common Problems with Cassandra Tombstones

If there is one thing to understand about Cassandra, it is the fact that it is optimised for writes. In Cassandra everything is a write including logical deletion of data which results in tombstones – special deletion records. We have noticed that lack of understanding of tombstones is often the root cause of production issues our clients experience with Cassandra. We have decided to share a compilation of the most common problems with Cassandra tombstones and some practical advice on solving them.

September 15, 2016 | Cassandra

How Not To Use Cassandra Like An RDBMS (and what will happen if you do)

Cassandra isn’t a relational database management system, but it has some features that make it look a bit like one. Chief among these is CQL, a query language with an SQL-like syntax. CQL isn’t a bad thing in itself – in fact it’s very convenient – but it can be misleading since it gives developers the illusion that they are working with a familiar data model, when things are really very different under the hood.

September 6, 2016 | Cassandra

Patterns of Successful Cassandra Data Modelling

A growing number of clients are asking OpenCredo for help with using Apache Cassandra and solving specific problems they encounter. Clients have different use cases, requirements, implementation and teams but experience similar issues. We have noticed that Cassandra data modelling problems are the most consistent cause of Cassandra failing to meet their expectations. Data modelling is one of the most complex areas of using Cassandra and has many considerations.

August 26, 2016 | Cassandra

New Blog Series: Cassandra – What You May Learn the Hard Way

At OpenCredo we have been working with Cassandra since 2012 and we are big fans of both open source Apache Cassandra and the capabilities of DataStax Enterprise. Over the years we have collected a great deal of experience throughout the company on how to deliver the benefits of Cassandra in real world projects and have also seen some common pitfalls that businesses have fallen into.

May 10, 2016 | Data Engineering, White Paper

Concursus: Event Sourcing for the Internet of Things

In this technical report, we present Concursus, a framework for developing distributed applications using CQRS and event sourcing patterns within a modern, Java 8-centric, programming model. Following a high-level survey of the trends leading towards the adoption of these patterns, we show how Concursus simplifies the task of programming event sourcing applications by providing a concise, intuitive API to systems composed of event processing middleware.

March 14, 2016 | Software Consultancy

Test Automation Concepts – Parallel test execution

Test automation provides fast feedback on regressions. In order to achieve this tests need to execute quickly, something which becomes more of a problem as test suites grow. This is especially true of tests which exercise a user interface where the interaction with the system is slower.

A good way to address this is to have your tests execute in parallel rather than consecutively. Given sufficient resources this allows your execution time to remain low almost indefinitely as more scenarios are added to the suite.

March 2, 2016 | DevOps, Microservices

Is it Time for Your ‘Microservices Checkup’?

Many of our clients are currently implementing applications using a ‘microservice’-based architecture. Increasingly we are hearing from organisations that are part way through a migration to microservices, and they want our help with validating and improving their current solution. These ‘microservices checkup’ projects have revealed some interesting patterns, and because we have experience of working in a wide-range of industries (and also have ‘fresh eyes’ when looking at a project), we are often able to work alongside teams to make significant improvements and create a strategic roadmap for future improvements.

January 8, 2016 | Microservices

The Seven Deadly Sins of Microservices (Redux)

Many of our clients are in the process of investigating or implementing ‘microservices’, and a popular question we often get asked is “what’s the most common mistake you see when moving towards a microservice architecture?”. We’ve seen plenty of good things with this architectural pattern, but we have also seen a few recurring issues and anti-patterns, which I’m keen to share here.

November 3, 2015 | Software Consultancy

JavaOne: Debugging Java Applications Running in Docker

My JavaOne experience was rather busy this year, what with three talks presented in a single day! The first of these talks “Debugging Java Apps in Containers: No Heavy Welding Gear Required” was delivered with my regular co-presenter Steve Poole, from IBM, and we shared our combined experiences of working with Java and Docker over the past year.

October 31, 2015 | Microservices

JavaOne: Building a Microservice Development Ecosystem (Video)

Microservices: Some Assembly Required

Over the past few weeks I’ve been writing an OpenCredo blog series on the topic of “Building a Microservice Development Ecosystem”, but my JavaOne talk of the same title crept up on me before I managed to finish the remaining posts. I’m still planning to finish the full blog series, but in the meantime I thought it would be beneficial to share the video and slides associated with the talk, alongside some of my related thinking. I’ve been fortunate to work on several interesting microservice projects at OpenCredo, and we’re always keen to share our knowledge or offer advice, and so please do get in touch if we can help you or your organisation.

September 20, 2015 | Microservices

Microservice Platforms: Some Assembly [Still] Required. Part Two

Working Locally with Microservices

Over the past five years I have worked within several projects that used a ‘microservice’-based architecture, and one constant issue I have encountered is the absence of standardised patterns for local development and ‘off the shelf’ development tooling that support this. When working with monoliths we have become quite adept at streamlining the development, build, test and deploy cycles. Development tooling to help with these processes is also readily available (and often integrated with our IDEs). For example, many platforms provide ‘hot reloading’ for viewing the effects of code changes in near-real time, automated execution of tests, regular local feedback from continuous integration servers, and tooling to enable the creation of a local environment that mimics the production stack.

August 26, 2015 | Cloud

Microservice Platforms: Some Assembly [Still] Required. Part One

The challenges of building and deploying microservices

Unless you’ve been living under a rock for the last year, you’ll undoubtedly know that microservices are the new hotness. An emerging trend that I’ve observed is that the people who are actually using microservices in production tend to be the larger well-funded companies, such as Netflix, Gilt, Yelp, Hailo etc., and each organisation has their own way of developing, building and deploying.

News | August 5, 2015

#MesosCon comes to Europe

August 5, 2015 | Cloud, GCP, Kubernetes

Embracing Disruptive Innovation: OpenCredo Partners with Google

Why OpenCredo partnered with Google

Recently OpenCredo chose to partner with Google in order to share knowledge and resources around the Google Cloud Platform offerings. Our clients come in many shapes and sizes, but typically all of them realise three disruptive truths of the modern IT industry: the (economic) value of cloud; the competitive advantage of continuous delivery; and the potential of hypothesis and data-driven product development to increase innovation (as popularised by the Lean Startup / Lean Enterprise motto of ‘build, measure, learn’).

News | June 5, 2015

OpenCredo: First DataStax Partner in the UK to achieve Cassandra Certification

May 13, 2015 | Software Consultancy

Watch ‘Elastic Analytics with Spark, Mesos and Docker’

Listen to Brenden Matthews discuss Elastic Analytics with Spark, Mesos and Docker as filmed at the most recent London Mesos User Group Meetup.

In this talk, Brenden Matthews discusses how he provided elastic analytics to Airbnb and how the Mesosphere DCOS can easily bring the same type of infrastructure to your own environments.

News | April 20, 2015

Round up from the Kubernetes London meetup / Vol.2

News | April 15, 2015

Join the London Mesos User Group to find out more about Elastic Analytics

News | March 30, 2015

OpenCredo and Container Solutions Partner to Deliver Emerging Technologies

January 30, 2015 | Software Consultancy

Traits with Java 8 Default Methods

When I first started programming in Scala a few years ago, Traits was the language feature I was most excited about. Indeed, Traits give you the ability to compose and share behaviour in a clean and reusable way. In Java, we tend deal with the same concerns by grouping crosscutting behaviour in abstract base classes that are subsequently extended every time we need to access shared behaviour in part or in total.

October 23, 2014 | Cassandra

Spring Data Cassandra Overview

Spring Data Cassandra (SDC) is a community project under the Spring Data (SD) umbrella that provides convenient and familiar APIs to work with Apache Cassandra.

December 2, 2013 | Cassandra

New features in Cassandra 2.0 – More on Lightweight Transactions

Perhaps the most important of Cassandra’s selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn’t scale well).

September 19, 2013 | Software Consultancy

Cross platform test automation with Appium, WebDriver and Cucumber-JVM

This post will give an overview of mobile testing using Appium. We will integrate tests for a native Android application into an existing Cucumber-JVM based set of acceptance tests and demonstrate multi platform testing from a single set of BDD scenarios. The sample code for this can be found here.

July 2, 2013 | Software Consultancy

Running Cucumber JVM tests in parallel

This blog post will address the issue of slow test runs when using Cucumber JVM with web automation tools such as WebDriver to perform acceptance testing on a web application.

If your team is using continuous integration this becomes especially noticeable, forcing teams to either wait for acceptance tests to complete before deploying or having to ignore the bulk of the tests.

The sample code and description in this post will show you how to convert your suite to running tests in parallel, something which has historically been problematic with unanswered questions and outstanding Cucumber-JVM bugs on the subject. Tying the described approach in with Selenium Grid2 will allow you distribute your testing across several machines if your suite is especially large or slow.

February 4, 2013 | Software Consultancy

API Testing with Cucumber

This API will in future be used by a mobile client and by third parties, making it important to verify that it is functionally correct as well as clearly documented.

An additional requirement in our case is for the tests to form a specification for the API to allow front and back end developers agree on the format in advance. This is something that BDD excels at, making it natural to continue to use Cucumber. This post will focus on the difficulties of attaining the appropriate level of abstraction with Cucumber while retaining the technical detail required for specification.

January 10, 2013 | DevOps

A dive into saltstack

Recently I have started looking into SaltStack as a solution that does both config management and orchestration. It is a relatively new project started in 2011, but it has a growing fanbase among Sys Admins and DevOps Engineers. In this blog post I will look into Salt as a promising alternative, and comparing it to Puppet as a way of exploring its basic set of features.

October 25, 2012 | DevOps

Overriding Puppet resources with class inheritance

A common issue with Puppet manifests is a clash of resource definitions that appears in the Puppet log file as: ‘Duplicate definition: File[resource-name] is already defined; cannot redefine at…’

This issue is likely to occur when building a service stack from reusable Puppet modules and you try to alter a resource that has already been defined in modules that the service depends on.

Building a service from reusable Puppet modules has a couple of benefits. It saves you time when you don’t have to write new modules every time you build a new service. It also improves the quality when you build your service from modules that are trusted, tried and tested.

Search results for"apache"

Microservices: Some Assembly Required

Working Locally with Microservices

The challenges of building and deploying microservices

Search results for
"apache"