49 items found: Search results for "big data" in all categories x
March 9, 2023 | Blog, Data Analysis, Neo4j
Check out the last part of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.
February 16, 2023 | Blog, Data Analysis, Neo4j
Check out Part 2 of Ebru Cucen and Fahran Wallace’s blog series, in which they discuss their experience ingesting 400 million nodes and a billion relationships into Neo4j and what they discovered along the way.
January 26, 2023 | Blog, Data Analysis, Neo4j
Fahran Wallace and Ebru Cucen’s most recent blog post is part 1 of a three-part series. They investigate how OpenCredo ingested 400 million nodes with a billion relationships into Neo4j.
February 1, 2023 | Blog, Data Analysis, Neo4j
As data becomes ubiquitous and deeply interconnected, tracing where who or which system that data comes from – its lineage – will create bigger problems and opportunities for us on the horizon. Watch the recording of James Bowkett talk from NODES 2022 – Neo4j Online Developer Education Summit 202 on ‘Tracing Your Data’s DNA.’
May 26, 2022 | Data Analysis, Data Engineering
As data becomes ubiquitous and deeply interconnected, tracing where who or which system that data comes from – its lineage – will create bigger problems and opportunities for us on the horizon. Watch the recording of James Bowkett’s talk from Devoxx UK on ‘Tracing Your Data’s DNA’
July 31, 2018 | Machine Learning
Machine Learning, alongside a mature Data Science, will help to bring IT and business closer together. By leveraging data for actionable insights, IT will increasingly drive business value. Agile and DevOps practices enable the continuous delivery of business value through productionised machine learning models and software delivery.
September 6, 2016 | Cassandra
A growing number of clients are asking OpenCredo for help with using Apache Cassandra and solving specific problems they encounter. Clients have different use cases, requirements, implementation and teams but experience similar issues. We have noticed that Cassandra data modelling problems are the most consistent cause of Cassandra failing to meet their expectations. Data modelling is one of the most complex areas of using Cassandra and has many considerations.
September 22, 2020 | AWS, Blog, Cassandra, Cloud, DevOps, Open Source
With the upcoming Cassandra 4.0 release, there is a lot to look forward to. Most excitingly, and following a refreshing realignment of the Open Source community around Cassandra, the next release promises to focus on fundamentals: stability, repair, observability, performance and scaling.
We must set this against the fact that Cassandra ranks pretty highly in the Stack Overflow most dreaded databases list and the reality that Cassandra is expensive to configure, operate and maintain. Finding people who have the prerequisite skills to do so is challenging.
At the time of this post, the UK is making steps to exit from an unprecedented lockdown measures for the Coronavirus. Much of the UK workforce are still making efforts to work-from-home with mainly key workers operating – at risk – in public. Many industries have shut down completely. Consequently, many businesses are reflecting on what happens next and how do we better mitigate future pandemic events?
April 2, 2020 | Machine Learning
Recent years have seen many companies consolidate all their data into a data lake/warehouse of some sort. Once it’s all consolidated, what next?
Many companies consolidate data with a field of dreams mindset – “build it and they will come”, however a comprehensive data strategy is needed if the ultimate goals of an organisation are to be realised: monetisation through Machine Learning and AI is an oft-cited goal. Unfortunately, before one rushes into the enticing world of machine learning, one should lay more mundane foundations. Indeed, in data science, estimates vary between 50% to 80% of the time taken is devoted to so-called data-wrangling. Further, Google estimates ML projects produce 5% ML code and 95% “glue code”. If this is the reality we face, what foundations are required before one can dive headlong into ML?
May 18, 2023 | Blog, Kubernetes
Check out Matthew Revell-Gordon’s latest blog as he explores building a local Kubernetes test cluster to better mimic cloud-based deployments, using Colima, Kind, and MetalLB.
September 2, 2021 | Blog, Cloud, Kubernetes
July 20, 2021 | Blog, Data Engineering, Kafka
Message and event-driven systems provide an array of benefits to organisations of all shapes and sizes. At their core, they help decouple producers and consumers so that each can work at their own pace without having to wait for the other – asynchronous processing at its best.
In fact, such systems enable a whole range of messaging patterns, offering varying levels of guarantees surrounding the processing and consumption options for clients. Take for example the publish/subscribe pattern, which enables one message to be broadcast and consumed by multiple consumers; or the competing consumer pattern, which enables a message to be processed once but with multiple concurrent consumers vying for the honour—essentially providing a way to distribute the load. The manner in which these patterns are actually realised however, depends a lot on the technology used, as each has its own approach and unique tradeoffs.
In this article we will explore how this all applies to RabbitMQ and Apache Kafka, and how these two technologies differ, specifically from a message consumer’s perspective.
April 20, 2021 | Data Engineering, Machine Learning, Software Consultancy
Our recent client was a Fintech who had ambitions to build a Machine Learning platform for real-time decision making. The client had significant Kubernetes proficiency, ran on the cloud, and had a strong preference for using free, open-source software over cloud-native offerings that come with lock-in. Several components were spiked with success (feature preparation with Apache Beam and Seldon for model serving performed particularly strongly). Kubeflow was one of the next technologies on our list of spikes, showing significant promise at the research stage and seemingly a good match for our client’s priorities and skills.
That platform slipped down the client’s priority list before completing the research for Kubeflow, so I wanted to see how that project might have turned out. Would Kubeflow have made the cut?
December 11, 2020 | Cloud, Cloud Native, Kubernetes, Microservices
“WebAssembly is a safe, portable, low-level code format designed for efficient execution and compact representation.” – W3C
In this blog, I’ll cover the different applications of Wasm and WASI, some of the projects that are making headway, and the implications for modern architectures and distributed systems.
August 14, 2019 | Culture, News
Here, in our little nook of the internet, we usually write about our experiences of emerging technologies, the tricky coding problems we’ve solved and how we have enhanced our clients’ businesses. We do this because we are very proud of this work. It truly matters.
Today is a little different. Today I’d like to share something a little more personal. Something that brings loneliness, kindness and technology together in an oxytocin-generating, slightly awkward embrace. Because hugs matter too.
May 16, 2018 | Microservices
To identify service boundaries, it is not enough to consider (business) domains only. Other forces like organisational communication structures, and – very important – time, strongly suggest that we should include several other criteria in our considerations.
August 8, 2017 | Cassandra
Recently, the sad news has emerged that Basho, which developed the Riak distributed database, has gone into receivership. This would appear to present a problem for those who have adopted the commercial version of the Riak database (Riak KV) supported by Basho.
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
July 11, 2017 | Cloud, Cloud Native
Over the years, OpenCredo’s projects have become increasingly tied to the public cloud. Our skills in delivering cloud infrastructure and cloud native applications have deepened and the range of cloud projects we are able to take on has grown. From enterprise cloud brokers to cloud platform migration in restricted compliance environments, our ability to deliver on the cloud is now a core component of our value proposition.
May 22, 2017 | Data Analysis
As a final piece of our recent blog series about Apache Spark on 16 May we have presented details of a use-case about using Spark Structured Streaming to generate real-time alerts of suspicious activity in an AWS-based infrastructure.
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
April 25, 2017 | Cassandra, Data Analysis, Data Engineering
Apache Spark is a powerful open source processing engine which is fast becoming our technology of choice for data analytic projects here at OpenCredo. For many years now we have been helping our clients to practically implement and take advantage of various big data technologies including the like of Apache Cassandra amongst others.
February 28, 2017 | Software Consultancy
As Kotlin’s 1.1 release draws closer, I’ve been looking at some of the new language features it supports. Type aliases may seem like a relatively minor feature next to coroutines, but as I will show in this blog post, they can open up a new programming idiom, particularly when combined with extension functions.
September 27, 2016 | Cassandra, Data Engineering
If there is one thing to understand about Cassandra, it is the fact that it is optimised for writes. In Cassandra everything is a write including logical deletion of data which results in tombstones – special deletion records. We have noticed that lack of understanding of tombstones is often the root cause of production issues our clients experience with Cassandra. We have decided to share a compilation of the most common problems with Cassandra tombstones and some practical advice on solving them.
August 26, 2016 | Kubernetes
This post is the first of a series of three tutorial articles introducing a sample, tutorial project, demonstrating how to provision Kubernetes on AWS from scratch, using Terraform and Ansible.
July 3, 2016 | DevOps
Several of us from the OpenCredo team were in attendance at the inaugural EU edition of the DevOps Enterprise Summit conference. We have been big fans of the two previous US versions, and have watched the video recordings of talks (2014, 2015) with keen interest as many of our DevOps transformation clients are very much operating in the ‘enterprise’ space.
May 31, 2016 | Kubernetes
Do you ever wake up and think to yourself: oh geez, Kubernetes is awesome, but I wish I could browse and edit my services and replication controllers using the file system? No? Well, in any case, this is now possible.
January 8, 2016 | Microservices
Many of our clients are in the process of investigating or implementing ‘microservices’, and a popular question we often get asked is “what’s the most common mistake you see when moving towards a microservice architecture?”. We’ve seen plenty of good things with this architectural pattern, but we have also seen a few recurring issues and anti-patterns, which I’m keen to share here.
November 3, 2015 | Software Consultancy
My JavaOne experience was rather busy this year, what with three talks presented in a single day! The first of these talks “Debugging Java Apps in Containers: No Heavy Welding Gear Required” was delivered with my regular co-presenter Steve Poole, from IBM, and we shared our combined experiences of working with Java and Docker over the past year.
November 1, 2015 | Microservices
To use or not to use hypermedia (HATEOAS) in a REST API, to attain the Level 3 of the famous Richardson Maturity Model. This is one of the most discussed subjects about API design.
The many objections make sense (“Why I hate HATEOAS“, “More objections to HATEOAS“…). The goal of having fully dynamic, auto-discovering clients is still unrealistic (…waiting for AI client libraries).
However, there are good examples of successful HATEOAS API. Among them, PayPal.
October 31, 2015 | Microservices
Over the past few weeks I’ve been writing an OpenCredo blog series on the topic of “Building a Microservice Development Ecosystem”, but my JavaOne talk of the same title crept up on me before I managed to finish the remaining posts. I’m still planning to finish the full blog series, but in the meantime I thought it would be beneficial to share the video and slides associated with the talk, alongside some of my related thinking. I’ve been fortunate to work on several interesting microservice projects at OpenCredo, and we’re always keen to share our knowledge or offer advice, and so please do get in touch if we can help you or your organisation.
October 18, 2015 | Cloud, DevOps
Last week Steve Poole and I were once again back at the always informative JAX London conference talking about DevOps and the Cloud. This presentation built upon our previous DevOps talk that was presented last year, and focused on the experiences that Steve and I had encountered over the last year (the slides for our 2014 “Moving to a DevOps” mode talk can be found on SlideShare, and the video on Parleys).
October 16, 2015 | Software Consultancy
OpenCredo is helping Skillsmatter with the organisation of the inaugural ContainerSched conference, and we were last night in attendance at CodeNode, working our way to finalising the program alongside the Skillsmatter team. I’m pleased to say that the provisional lineup looks great (speaker acceptance emails are being sent out over the next few days), and so I wanted to share the details of some of the great content we have confirmed already.
September 20, 2015 | Microservices
Over the past five years I have worked within several projects that used a ‘microservice’-based architecture, and one constant issue I have encountered is the absence of standardised patterns for local development and ‘off the shelf’ development tooling that support this. When working with monoliths we have become quite adept at streamlining the development, build, test and deploy cycles. Development tooling to help with these processes is also readily available (and often integrated with our IDEs). For example, many platforms provide ‘hot reloading’ for viewing the effects of code changes in near-real time, automated execution of tests, regular local feedback from continuous integration servers, and tooling to enable the creation of a local environment that mimics the production stack.
September 14, 2015 | Cloud, DevOps
If you are operating in the programmable infrastructure space, you will hopefully have come across Terraform, a tool from HashiCorp which is primarily used to manage infrastructure resources such as virtual machines, DNS names and firewall settings across a number of public and private providers (AWS, GCP, Azure, …).
August 5, 2015 | Cloud, GCP, Kubernetes
Why OpenCredo partnered with Google
Recently OpenCredo chose to partner with Google in order to share knowledge and resources around the Google Cloud Platform offerings. Our clients come in many shapes and sizes, but typically all of them realise three disruptive truths of the modern IT industry: the (economic) value of cloud; the competitive advantage of continuous delivery; and the potential of hypothesis and data-driven product development to increase innovation (as popularised by the Lean Startup / Lean Enterprise motto of ‘build, measure, learn’).
February 16, 2015 | Software Consultancy
Apache Mesos is often explained as being a kernel for the data-centre; meaning that cluster resources (CPU, RAM, …) are tracked and offered to “user space” programs (i.e. frameworks) to do computations on the cluster.
November 19, 2014 | Microservices
Undeniably, there is a growing interest in microservices as we see more organisations, big and small, evaluating and implementing this emerging approach. Despite its apparent novelty, most concepts and principles underpinning microservices are not exactly new – they are simply proven and commonsense software development practices that now need to be applied holistically and at a wider scale, rather than at the scale of a single program or machine. These principles include separation of concerns, loose coupling, scalability and fault-tolerance.
February 24, 2014 | Cloud Native, Microservices
Last year some of us attended the London Spring eXchange where we encountered a new and interesting tool that Pivotal was working on: Spring Boot. Since then we had the opportunity to see what it’s capable of in a live project and we were deeply impressed.
January 28, 2014 | Software Consultancy
A while ago I published this blog post about writing tests for mobile applications using Appium and cucumber-jvm.
In this post, I will look at an alternative approach to testing an Android native application using Cucumber-Android.
Throughout the post I will draw comparisons between Appium and Cucumber-Android, the goal being to determine the best approach for testing an android application using Cucumber. I will focus on the ease of configuration and use, speed of test runs and quality of reporting.
February 19, 2013 | Software Consultancy
This blog post continues on from Part 1 which discussed types of tests and how to create robust tests. Part 2 will examine techniques to help whip a test suite in to shape and resolve common issues that slow everything down. The approaches in this post will focus on spring based applications, but the concepts can be applied to other frameworks too.
January 10, 2013 | DevOps
Recently I have started looking into SaltStack as a solution that does both config management and orchestration. It is a relatively new project started in 2011, but it has a growing fanbase among Sys Admins and DevOps Engineers. In this blog post I will look into Salt as a promising alternative, and comparing it to Puppet as a way of exploring its basic set of features.
March 21, 2012 | Software Consultancy
Event processing Language (EPL) enables us to write complex queries to get the most out of our event stream in real time, using SQL-like syntax.
EPL allows us to use full power of aggregation of the high volume event stream to get required results with the minimal latency. In this blog we are going to explore some aspects of numerical aggregation of data with high precision BigDecimal values. We will also demonstrate how you can add you own aggregation function to Esper engine and use them in EPL statements.
February 8, 2012 | Data Analysis, Data Engineering
Most of the important players in this space are large IT corporations like Oracle and IBM with their commercial (read expensive) offerings.
While most of CEP products offer some great features, it’s license model and close code policy doesn’t allow developers to play with them on pet projects, which would drive adoption and usage of CEP in every day programming.