Search results for
"fast data"

63 items found: Search results for "fast data" in all categories x

March 9, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 3

Check out the last part of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

February 16, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 2

Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.

January 26, 2023 | Blog, Data Analysis, Neo4j

Ingesting Big Data into Neo4j – Part 1

Fahran Wallace and Ebru Cucen’s most recent blog post is part 1 of a three-part series. They investigate how OpenCredo ingested 400 million nodes with a billion relationships into Neo4j.

December 7, 2022 | Blog, Data Analysis

Seeing Through The Modern Data Stack

Learn more about data processing and analytics, which are essential systems in modern enterprises but are frequently overlooked aspects of modern data architectures, by reading Mateus Pimenta’s latest blog.

January 31, 2022 | Blog, Data Engineering

Making Sense of Data with RDF* vs. LPG

There are two camps of Graph database, one side is RDF, where they are strict with their format, and somewhat limited for their extensibility. The other side is LPG, where they can define labels to the relationships. With its recent extension, RDF now allows users to add properties, thus becoming RDF*. In this blog, Ebru explores the structural and performance differences between LPG and RDF*.

March 23, 2017 | Cassandra, Data Analysis, Data Engineering

Data Analytics using Cassandra and Spark

In recent years, Cassandra has become one of the most widely used NoSQL databases: many of our clients use Cassandra for a variety of different purposes. This is no accident as it is a great datastore with nice scalability and performance characteristics.

However, adopting Cassandra as a single, one size fits all database has several downsides. The partitioned/distributed data storage model makes it difficult (and often very inefficient) to do certain types of queries or data analytics that are much more straightforward in a relational database.

September 6, 2016 | Cassandra

Patterns of Successful Cassandra Data Modelling

A growing number of clients are asking OpenCredo for help with using Apache Cassandra and solving specific problems they encounter. Clients have different use cases, requirements, implementation and teams but experience similar issues. We have noticed that Cassandra data modelling problems are the most consistent cause of Cassandra failing to meet their expectations. Data modelling is one of the most complex areas of using Cassandra and has many considerations.

October 30, 2015 | Software Consultancy

JavaOne: Thinking Fast and Slow with Software Development

Think More About Thinking…

Having completed my marathon 3-talks-in-one-day at JavaOne on Wednesday, I’m now in a position to share all of the slides and supporting material. First up is the content associated with my “Thinking Fast and Slow with Software Development” talk. I’ve already blogged about an earlier version of this presentation on the OpenCredo blog, and so won’t go into more detail here. However, I will include the video recording (thanks to the JavaOne team for this!), the latest version of the slides, and a much requested reading list!

News | October 8, 2013

OpenCredo partners with Datastax

April 2, 2020 | Machine Learning

It’s 2020, can I have my ML models now?

Recent years have seen many companies consolidate all their data into a data lake/warehouse of some sort. Once it’s all consolidated, what next?

Many companies consolidate data with a field of dreams mindset – “build it and they will come”, however a comprehensive data strategy is needed if the ultimate goals of an organisation are to be realised: monetisation through Machine Learning and AI is an oft-cited goal. Unfortunately, before one rushes into the enticing world of machine learning, one should lay more mundane foundations. Indeed, in data science, estimates vary between 50% to 80% of the time taken is devoted to so-called data-wrangling. Further, Google estimates ML projects produce 5% ML code and 95% “glue code”. If this is the reality we face, what foundations are required before one can dive headlong into ML?

[Past event] …Think Software 2018, London

Join us at …think software 2018! Take one afternoon to learn about some of the most important topics of the day, hear from outstanding speakers all with hard-won experience delivering projects.

View All Past Events

February 11, 2022 | AWS, Cloud, GCP, Kubernetes, Microservices, Open Source, Software Consultancy

DZone Repost: Testing Serverless Functions

Serverless functions are easy to install and upload, but we can’t ignore the basics. This article looks at different strategies related to testing serverless functions.

September 2, 2021 | Blog, Cloud, Kubernetes

Running the Cloud from your Kubernetes Cluster

In this blog, Stuart compares the new approach of deploying cloud resources as Kubernetes custom resources rather than the (now) typical approach using Terraform – or cloud specific: CloudFormation (AWS), Deployment Manager (GCP). He also identifies what resources are suitable for this approach and which ones are not.

July 20, 2021 | Blog, Data Engineering, Kafka

Kafka vs RabbitMQ: The Consumer-Driven Choice

Message and event-driven systems provide an array of benefits to organisations of all shapes and sizes. At their core, they help decouple producers and consumers so that each can work at their own pace without having to wait for the other – asynchronous processing at its best.

In fact, such systems enable a whole range of messaging patterns, offering varying levels of guarantees surrounding the processing and consumption options for clients. Take for example the publish/subscribe pattern, which enables one message to be broadcast and consumed by multiple consumers; or the competing consumer pattern, which enables a message to be processed once but with multiple concurrent consumers vying for the honour—essentially providing a way to distribute the load. The manner in which these patterns are actually realised however, depends a lot on the technology used, as each has its own approach and unique tradeoffs.

In this article we will explore how this all applies to RabbitMQ and Apache Kafka, and how these two technologies differ, specifically from a message consumer’s perspective.

April 20, 2021 | Data Engineering, Machine Learning, Software Consultancy

Machine Learning at scale: first impressions of Kubeflow

Our recent client was a Fintech who had ambitions to build a Machine Learning platform for real-time decision making. The client had significant Kubernetes proficiency, ran on the cloud, and had a strong preference for using free, open-source software over cloud-native offerings that come with lock-in. Several components were spiked with success (feature preparation with Apache Beam and Seldon for model serving performed particularly strongly). Kubeflow was one of the next technologies on our list of spikes, showing significant promise at the research stage and seemingly a good match for our client’s priorities and skills.

That platform slipped down the client’s priority list before completing the research for Kubeflow, so I wanted to see how that project might have turned out. Would Kubeflow have made the cut?

December 11, 2020 | Cloud, Cloud Native, Kubernetes, Microservices

WebAssembly – Where is it going?

“WebAssembly is a safe, portable, low-level code format designed for efficient execution and compact representation.” – W3C

In this blog, I’ll cover the different applications of Wasm and WASI, some of the projects that are making headway, and the implications for modern architectures and distributed systems.

September 22, 2020 | AWS, Blog, Cassandra, Cloud, DevOps, Open Source

Decision time with AWS Keyspaces

With the upcoming Cassandra 4.0 release, there is a lot to look forward to. Most excitingly, and following a refreshing realignment of the Open Source community around Cassandra, the next release promises to focus on fundamentals: stability, repair, observability, performance and scaling.

We must set this against the fact that Cassandra ranks pretty highly in the Stack Overflow most dreaded databases list and the reality that Cassandra is expensive to configure, operate and maintain. Finding people who have the prerequisite skills to do so is challenging.

May 31, 2018 | DevOps

Self-testing infrastructure-as-code

As traditional operations has embraced the concept of code, it has benefited from ideas already prevalent in developer circles such as version control. Version control brings the benefit that not only can you see what the infrastructure was, but you can also get reviews of changes by your peers before the change is made live; known to most developers as Pull Request (PR) reviews.

May 16, 2018 | Microservices

Heuristics for Identifying Service Boundaries

To identify service boundaries, it is not enough to consider (business) domains only. Other forces like organisational communication structures, and – very important – time, strongly suggest that we should include several other criteria in our considerations.

April 18, 2018 | Microservices

Microservices Anti-patterns: It’s All About the People

Quite a few of the anti-patterns we observe today on microservices projects are strongly related to how people approach the problem. Given their nature, these anti-patterns tend to be deeply ingrained and self-sustaining. Addressing them starts with increased awareness and by changing ways of approaching the problem, rather than by the introduction of yet another technical tool or framework.

January 23, 2018 | Data Engineering, DevOps

A Pragmatic Introduction to Machine Learning for DevOps Engineers

Machine Learning is a hot topic these days, as can be seen from search trends. It was the success of Deepmind and AlphaGo in 2016 that really brought machine learning to the attention of the wider community and the world at large.

October 24, 2017 | Data Engineering

Q&A with Cockroach Labs – creators of CockroachDB

Cockroach Labs, the creators of CockroachDB are coming to London for the first time since their 1.0 GA Release in May 2017. They will be taking time to talk about “The Hows & Whys of a Distributed SQL Database” at the Applied Data Engineering meetup, hosted and run by us here at OpenCredo.
We have been interested in CockroachDB for a while now, including publishing our initial impressions of the release on our blog. We thought this would be the perfect time to do a bit of a Q&A before the event! I posed Raphael Poss, a core Software Engineer at Cockroach Labs a few questions.

August 8, 2017 | Cassandra

Riak, the Dynamo paper and life beyond Basho

Recently, the sad news has emerged that Basho, which developed the Riak distributed database, has gone into receivership. This would appear to present a problem for those who have adopted the commercial version of the Riak database (Riak KV) supported by Basho.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

[Past event] DevOpsCon 2017

Join Daniel Bryant and Antonio Cobo at DevOpsCon 2017; The Conference for Continuous Delivery, Microservices, Docker, Clouds & Lean Business! DevOpsCon will be running from the 12th-15th of June, and offers you a glimpse at popular topics such as innovative infrastructure and modern lean business culture through hands-on workshops, sessions and keynotes.

View All Past Events

June 15, 2017 | Data Engineering

CockroachDB: First Impressions

CockroachDB is a distributed SQL (“NewSQL”) database developed by Cockroach Labs and has recently reached a major milestone: the first production-ready, 1.0 release. We at OpenCredo have been following the progress of CockroachDB for a while, and we think it’s a technology of great potential to become the go-to solution for a having a general-purpose database in the cloud.

May 9, 2017 | Cassandra

Testing a Spark Application

Data analytics isn’t a field commonly associated with testing, but there’s no reason we can’t treat it like any other application. Data analytics services are often deployed in production, and production services should be properly tested. This post covers some basic approaches for the testing of Cassandra/Spark code. There will be some code examples, but the focus is on how to structure your code to ensure it is testable!

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

February 13, 2017 | Data Engineering

Event Replaying with Hazelcast Jet

One of the stated intentions behind the design of Java 8’s Streams API was to take better advantage of the multi-core processing power of modern computers. Operations that could be performed on a single, linear stream of values could also be run in parallel by splitting that stream into multiple sub-streams, and combining the results from processing each sub-stream as they became available.

January 25, 2017 | Cassandra

The Three ‘R’s of Distributed Event Processing

One of the simplest and best-understood models of computation is the Finite State Machine (FSM). An FSM has fixed range of states it can be in, and is always in one of these states. When an input arrives, this triggers a transition in the FSM from its current state to the next state. There may be several possible transitions to several different states, and which transition is chosen depends on the input.

October 4, 2016 | Software Consultancy

Announcing GOTO Accelerate 2016: Business Strategy, Situational Awareness and Innovation

As many of you know, OpenCredo are part of the global Trifork family, and as such have access to the combined knowledge and experience of many technology and business leaders throughout the group. Getting public access to all of this expertise and technical leadership can be tricky – until now. GOTO Accelerate is a one-day business focused conference that has emerged from the very successful GOTO technology events.

September 27, 2016 | Cassandra, Data Engineering

Common Problems with Cassandra Tombstones

If there is one thing to understand about Cassandra, it is the fact that it is optimised for writes. In Cassandra everything is a write including logical deletion of data which results in tombstones – special deletion records. We have noticed that lack of understanding of tombstones is often the root cause of production issues our clients experience with Cassandra. We have decided to share a compilation of the most common problems with Cassandra tombstones and some practical advice on solving them.

August 24, 2016 | Cassandra

Fulfilling the promise of Apache Cassandra performance

At OpenCredo we are seeing an increase in adoption of Apache Cassandra as a leading NoSQL database for managing large data volumes, but we have also seen many clients experiencing difficulty converting their high expectations into operational Cassandra performance. Here we present a high-level technical overview of the major strengths and limitations of Cassandra that we have observed over the last few years while helping our clients resolve the real-world issues that they have experienced.

June 15, 2016 | Software Consultancy

You Are Ignoring Non-functional Testing

It’s as simple as that – and as a consultant, it’s a problem I see all the time. Testing is always focused on functional testing. Non-functional testing, by comparison, is treated like a second class citizen. This means that functional requirements get refined, and non-functional requirements are ignored until the very end.

June 3, 2016 | Software Consultancy

The Destructor Pattern

Complexity warning

In this post, I’m going to take something extremely simple, unfold it into something disconcertingly complex, and then fold it back into something relatively simple again. The exercise isn’t entirely empty: in the process, we’ll derive a more powerful (because more generic) version of the extremely simple thing we started with. I’m describing the overall shape of the journey now, because programmers who don’t love complexity for its own sake often find the initial “unfolding” stage objectionable, and then have trouble regarding the eventual increase in fanciness as worth the struggle.

May 31, 2016 | Kubernetes

Introducing KubeFuse: A File System for Kubernetes

Do you ever wake up and think to yourself: oh geez, Kubernetes is awesome, but I wish I could browse and edit my services and replication controllers using the file system? No? Well, in any case, this is now possible.

March 17, 2016 | Software Consultancy

Easy API Simulation with Hoverfly JUnit Rule

In order to be able to regularly release an application, your automated tests must be set up to give you fast and reliable feedback loop. If bugs are only found during a long and expensive multi-service or end-to-end test run, it can be a hinderance to fast delivery. Unfortunately I have often seen this problem in development environments: a huge suite of clunky, flaky and slow end-to-end tests which test the full functionality of the application as opposed to being more lightweight and reflecting basic user journeys. This produces the “ice cream cone” anti-pattern of test coverage, where unit tests aren’t providing the kind of coverage and feedback they need to.

March 3, 2016 | Software Consultancy

React/Redux boilerplate

In this post, I’ll be sharing some React/Redux boilerplate code that Vince Martinez and I have been developing recently. It’s primarily aimed at developers who are familiar with the React ecosystem, so if you are new to React and/or Redux, you might like to have a look at Getting Started with React and Getting Started with Redux.

January 15, 2016 | Software Consultancy

Facebook’s React, and the Signal:Noise Ratio

So, you’ve started to hear a lot about React, the Javascript library developed by Facebook, but is it something you need to investigate? It’s time to distil the signal from the noise, position React amongst its rivals, and provide an indication of where it currently would – and would not – be a suitable fit.

January 8, 2016 | Microservices

The Seven Deadly Sins of Microservices (Redux)

Many of our clients are in the process of investigating or implementing ‘microservices’, and a popular question we often get asked is “what’s the most common mistake you see when moving towards a microservice architecture?”. We’ve seen plenty of good things with this architectural pattern, but we have also seen a few recurring issues and anti-patterns, which I’m keen to share here.

December 1, 2015 | Software Consultancy

(Spring) Booting Hazelcast

This post introduce some of the basic features of Hazelcast, some of its limitations, how to embed it in a Spring Boot application and write integration testings. This post is intended to be the first of a series about Hazelcast and its integration with Spring (Boot). Let’s start from the basics.

November 23, 2015 | Software Consultancy

Ruby Web Acceptance Testing Framework

Over a year ago, my colleague Tristan posted on the OpenCredo blog about a test automation quick start framework. It’s a prepackaged framework you can clone and get going with testing instantly, rather than wasting your time rebuilding your framework every single project. We have used this framework successfully used on many of our internal projects, and it relies upon a Java, Cucumber-JVM and Selenium stack.

October 18, 2015 | Cloud, DevOps

Our Thoughts on DevOps and Cloud at JAX London

DevOps, Cloud and Microservices: “All Hail the Developer King/Queen”

Last week Steve Poole and I were once again back at the always informative JAX London conference talking about DevOps and the Cloud. This presentation built upon our previous DevOps talk that was presented last year, and focused on the experiences that Steve and I had encountered over the last year (the slides for our 2014 “Moving to a DevOps” mode talk can be found on SlideShare, and the video on Parleys).

October 16, 2015 | Software Consultancy

Join us at the Inaugural ContainerSched Conference

Interested in Containers and Schedulers?

OpenCredo is helping Skillsmatter with the organisation of the inaugural ContainerSched conference, and we were last night in attendance at CodeNode, working our way to finalising the program alongside the Skillsmatter team. I’m pleased to say that the provisional lineup looks great (speaker acceptance emails are being sent out over the next few days), and so I wanted to share the details of some of the great content we have confirmed already.

October 1, 2015 | Data Engineering

Introduction to Akka Streams – Getting started

Going reactive

Akka Streams, the new experimental module under Akka project has been finally released in July after some months of development and several milestone and RC versions. In this series I hope to gently introduce concepts from the library and demonstrate how it can be used to address real-life stream processing challenges.

Akka Streams is an implementation of the Reactive Streams specification on top of Akka toolkit that uses actor based concurrency model. Reactive Streams specification has been created by the number of companies interested in asynchronous, non-blocking, event based data processing that can span across system boundaries and technology stacks.

August 26, 2015 | Cloud

Microservice Platforms: Some Assembly [Still] Required. Part One

The challenges of building and deploying microservices

Unless you’ve been living under a rock for the last year, you’ll undoubtedly know that microservices are the new hotness. An emerging trend that I’ve observed is that the people who are actually using microservices in production tend to be the larger well-funded companies, such as Netflix, Gilt, Yelp, Hailo etc., and each organisation has their own way of developing, building and deploying.

August 16, 2015 | Kubernetes

ContainerSched 2015 – Call for Papers Now Open!

ContainerSched 2015

ContainerShed 2015 Over the last few years there has been exponential growth in the interest of containers and schedulers – technology such as Docker, rkt, Mesos, and Kubernetes are now commonplace within the IT domain, and with the rise of microservices, these technologies are set to become even more popular. Skillsmatter are keen to drive forward the discussions and knowledge sharing within this area of technology, and have created a conference focused exclusively on containers and schedulers: ContainerSched!

August 10, 2015 | Cloud, Software Consultancy

Boot my (secure)->(gov) cloud

As a company, we at OpenCredo are heavily involved in automation and devOps based work, with a keen focus on making this a seamless experience, especially in cloud based environments. We are currently working within HMRC, a UK government department to help make this a reality as part of a broader cloud broker ecosystem project. In this blog post, I look to provide some initial insight into some of the tools and techniques employed to achieve this for one particular use case namely:
With pretty much zero human intervention, bar initiating a process and providing some inputs, a development team from any location, should be able to run “something”, which, in the end, results in an isolated, secure set of fully configured VM’s being provisioned within a cloud provider (or providers) of choice.

August 5, 2015 | Data Analysis, Data Engineering

Building a Google analytics dashboard with Python3, Tornado and deploying it on OpenShift (for free)

A few weeks ago, we thought about building a Google analytics dashboard to give us easy access to certain elements of our Google Analytics web traffic. We saw some custom dashboards for bloggers, but nothing quite right for our goal, since we wanted the data on a big screen for everyone in the office to view.

July 8, 2015 | Software Consultancy

A deep dive into Angular 2.0

I was quite excited around autumn last year when Google started to work on a new version of Angular (Angular 2.0) which promised to revolutionise web development. There were rumours that Angular 2.0 wouldn’t be backward compatible with its predecessor, and would be written in Google’s AtScript which is a JavaScript based language on top of Microsoft’s TypeScript. The lack of backwards compatibility raised some concerns, especially for the clients that we had used Angular at. But, lets not get ahead of ourselves here….

News | February 18, 2015

Supporting the G-Cloud 6 framework …

December 2, 2013 | Cassandra

New features in Cassandra 2.0 – More on Lightweight Transactions

Perhaps the most important of Cassandra’s selling points is its completely distributed architecture and its ability to easily extend the cluster with virtually any number of nodes. Implementing a classical RDBMS-style transaction consisting of “put locks on the database, modify the data, then commit the transaction”-style operations are simply not feasible in such an architecture (i.e. that doesn’t scale well).

August 2, 2013 | Software Consultancy

Configuration Management with Flexible Contexts

Configuration management was born in the pre-cloud era. Remember the days when acquiring a super powerful multi core server felt like winning the jackpot? Infrastructure was a slightly different place back then. Yet for all the recent developments in DevOps, its legacy is still with us.

February 19, 2013 | Software Consultancy

Withstanding the test of time – Part 2

How to create robust tests for Spring based applications

This blog post continues on from Part 1 which discussed types of tests and how to create robust tests. Part 2 will examine techniques to help whip a test suite in to shape and resolve common issues that slow everything down. The approaches in this post will focus on spring based applications, but the concepts can be applied to other frameworks too.

January 10, 2013 | DevOps

A dive into saltstack

Recently I have started looking into SaltStack as a solution that does both config management and orchestration. It is a relatively new project started in 2011, but it has a growing fanbase among Sys Admins and DevOps Engineers. In this blog post I will look into Salt as a promising alternative, and comparing it to Puppet as a way of exploring its basic set of features.

January 7, 2013 | Software Consultancy

The factors pushing many organisations towards Continuous Delivery with Infrastructure as Code and what’s next…

The practice of continuous integration in which build servers are used to build and perform testing of code is now widespread and mainstream.

While not all teams have adopted continuous integration effectively, its increasing adoption has led many to start to look for additional opportunities to improve the cost, quality and speed of delivery with which software targeted to meet business needs can be released into production environments.

Traditionally Continuous Integration addresses the question of “does the software build and pass our unit and integration test suites?”. This is often insufficient.

December 18, 2012 | Software Consultancy

Withstanding the test of time

The first thing most people think of when they start a project with the good intentions of test driven development is: write a test first. That’s great, and something I would fully encourage. However, diving in to writing tests without forethought, especially on large projects with a lot of developers can lead to new problems that TDD is not going to solve. With some upfront thinking (but not big upfront design!) a large team can avoid problems later down the line by considering some important and desirable traits of a large and rapidly changing test suite.

February 13, 2012 | Software Consultancy

48 hours – The Pursuit of Proper Performance

Recently we were approached by a client to do some performance testing of the web application they had written. The budget allowed two days for this task. Ok. No problem. Yes, we can. Naturally I had one or another question though…

Search results for"fast data"