December 1, 2015 | Software Consultancy
This post introduce some of the basic features of Hazelcast, some of its limitations, how to embed it in a Spring Boot application and write integration testings. This post is intended to be the first of a series about Hazelcast and its integration with Spring (Boot). Let’s start from the basics.
WRITTEN BY
I’m not writing “yet another introduction to Hazelcast”. Just a few words about it. Hazelcast is a In-Memory Data Grid tool. Beyond buzzwords, it has a number of interesting low-level features:
Above these low-level features, it provides some high-level services. It is becoming popular as distributed cache provider or to support (web) session replication. It also has a out-of-the-box MapReduce interface.
For this post I’m using a very limited set of them.
There is no Hazelcast Server.
Hazelcast is a library, deployed with (and initialised by) your Java application. The Java application launch the Hazelcast instance on starting, than it joins the cluster. A thin-client option allows connecting to external nodes, without storing any data locally.
Starting a Hazelcast instance in a Spring Boot application is easy:
The following example uses Spring Boot 1.3.0. This provides Hazelcast 3.5.3 as managed dependency.
…
org.springframework.boot
spring-boot-starter-parent
1.3.0.RELEASE
…
com.hazelcast
hazelcast
…
@Configuration
public class HazelcastConfiguration {
@Bean
public Config config() {
return new Config(); // Set up any non-default config here
}
}
Spring Boot automatically start a Hazelcast instance when it finds both:
On startup, an instance of com.hazelcast.core.HazelcastInstance is then added to the Spring Application Context.
The interface for creating the Config object is fluent:
return new Config().addMapConfig(
new MapConfig()
.setName("accepted-messages")
.setEvictionPolicy(EvictionPolicy.LRU)
.setTimeToLiveSeconds(2400))
.setProperty("hazelcast.logging.type","slf4j");
Only the Config bean need to be changed, moving beyond development toward production environments. From a single Hazelcast instance to a cluster. Spring Profiles and Spring Boot Cloud features may be easily used for controlling that.
Also switching from a “heavyweight” cluster member, holding data in its own memory, to a lightweight client, connecting to external Hazelcast instances, is controlled by the Configuration bean.
The HazelcastInstance is your interface to the Hazelcast cluster.
@Service
public class ChatService {
@Autowired
private HazelcastInstance instance;
private void send(ChatMessage message) {
instance.getQueue(“messages”).offer(message);
}
...
}
The HazelcastInstance is heavy weight to start. Must be singleton. But retrieving a distributed object from the instance is lightweight: It only returns a proxy to the distributed object.
To demonstrate some of the basic features and the integration with Spring Boot, I’m using is a very simple application. A chat, exposing a REST API:
https://github.com/opencredo/springboot-hazelcast-example/tree/master
Messages are held in (distributed) Queues before being polled by recipients.
To make the send operation idempotent (dropping duplicated messages), received messages IDs are stored in a collection. This collection has a TTL, not to keep them forever. Note that I had to use a (distributed) Map and not a Set, as only Maps supports TTL at the moment.
Note that this application has many limitations. For example:
Integration tests are fundamental, as there is no way to effectively mock Hazelcast.
If you rely only on their java.util.* or java.util.concurrent.* interfaces, you might mock the distributed collections. But as they do not behave exactly as the local implementations, it is not a good idea.
The sample application includes a couple of test examples, at two different levels.
The first example (ChatServiceHazelcastImplTest) is an integration test at service level. A quasi-unit test not using Spring. Only the service under test and a single Hazelcast Instance are loaded.
The HazelcastInstance is initialised before every test:
private ChatService service;
@Before
public void setUp() {
final HazelcastInstance instance = Hazelcast.newHazelcastInstance();
service = new ChatServiceImpl(instance);
}
…and shutdown after:
@After
public void shutdown( ) {
Hazelcast.shutdownAll();
}
This is time-consuming, but works fine and is essential to keep tests independent.
Note that we are using a single Hazelcast instance for testing, even though the application may be deployed on multiple nodes forming a cluster. Clustering is really transparent to the implementation, at least until you have to test network partition resilience or performances.
Beyond initialisation, the test implementation has nothing special:
@Test(timeout = 10000L)
public void testSendAndReceive() throws Exceptions {
final ChatMessage message = new ChatMessage("mssg-id", "sender", "recipient", "text");
service.send(message);
List received = new ArrayList<>();
while(received.isEmpty()) {
received = service.receive("recipient");
}
assertNotNull(received);
assertEquals(1, received.size());
assertEquals(message, received.get(0));
}
[added after the initial porting]
As Vik’s comment pointed out, in Hazelcast source code there is a factory for creating networkless Hazelcast instances, useful for testing: TestHazelcastInstanceFactory. This is still a real (not a mock) instance, but makes (quasi-)unit tests a bit faster.
To use it, you have to add to the POM the Hazelcast dependency with “test” classifier and “test” scope:
com.hazelcast
hazelcast
tests
test
Then use it to create the instances:
private static TestHazelcastInstanceFactory testInstanceFactory = new TestHazelcastInstanceFactory();
@Before
public void setUp() {
final HazelcastInstance instance = testInstanceFactory.newHazelcastInstance();
...
}
The second example (ChatControllerTest) is an end-to-end test of the API endpoints. This test starts the full stack and the Application Context, leveraging Spring Boot. It makes actual HTTP requests using RestAssured.
@RunWith(SpringJUnit4ClassRunner.class)
@SpringApplicationConfiguration(classes = ChatApplication.class)
@WebIntegrationTest("server.port=0") // Use a random free port
public class ChatControllerTest {
@Value("${local.server.port}")
private int port;
@Before
public void setUp() {
RestAssured.port = port;
}
...
}
Note that setup and teardown don’t even mention Hazelcast. We rely on Spring Boot for starting the Hazelcast instance. We don’t have to initialise it before every test. Unfortunately, you cannot shut it down too, for a reported, but never solved issue. So, all the tests methods reuse the same Hazelcast instance, with potential data clashes.
We know this is not good. Unfortunately, at the moment, the only way to work around it is using separate test classes for each test method. Digging into Hazelcast codebase, it appears they have written a custom Spring JUnit runner, but their tests reuse the same instance for all the test methods (sic).
Test implementations make requests and check responses, using RestAssured:
@Test
public void testSendThenReceive() throws Exception {
final String chatMessageJson = "{"messageUid" : "A123", "sender" : "aSender", " +
""recipient" : "aRecipient", "text" : "this is the text" }";
given().contentType(ContentType.JSON).body(chatMessageJson).
when().post("/messages").
then().assertThat().statusCode(200);
Thread.sleep(500);
when().get("/messages/bRecipient").
then().assertThat().statusCode(200).
and().contentType(ContentType.JSON).
and().body("size()", equalTo(1)).
and().body("[0].messageUid", equalTo("A123")).
and().body("[0].sender", equalTo("aSender")).
and().body("[0].recipient", equalTo("aRecipient")).
and().body("[0].text", equalTo("this is the text"));
}
Hazelcast has some important limitations, to be kept in mind.
Everything is in-memory. Hazelcast provides a mechanism for making collections persistent, potentially using any SQL and No-SQL external store with a Java interface. But persistence is optional, always requires a custom implementation and may seriously hit performance if not properly implemented.
The data distribution over the cluster is reasonably transparent, from the developer point of view. But beware, not everything is really distributed. Some structures are just replicated for High Availability. Maps are partitioned among cluster members (plus backups for HA). Sets, Queues and Topics are just replicated for HA. So, for example, all the entries of a Queue are in one cluster member’s memory.
Only the enterprise edition supports storing data externally of your application’s JVM Heap. The risk of OutOfMemoryError crashing your application or slowdowns triggered by the Garbage Collector is very real.
The configuration must be identical for all the nodes in the cluster and the configuration cannot be changed without restarting the member.
“Distribution” imply serialising objects over the wire, so all members must have same class version. For data, you may use DTO and implement a custom, versioned serialisation mechanism to avoid it, but for distributed processing it might be harder.
On large clusters redeploying all nodes at the same time could be impossible. These limitations might become a DevOps’ nightmares, if not properly considered from the start of the project.
The “transparency” of distributed objects may cause performance issues, if directly exposed to flaky, naive implementations.
Similarly to the n+1 queries problem when you iterate a Hibernate lazy-loaded Collection, the following simple code may cause serious problems. The Map is partitioned over the cluster and it will trigger a number of over-the-network calls:
Map<String,Person> map = instance.getMap("myMap");
List results = new ArrayList();
for(Map.Entry<String,Person> entry : map.entrySet()) {
if ( entry.getValue().getName().equals("Lorenzo") )
results = entry.getValue();
}
The trick is sending the query (value == “foo”) to all the members, returning only the entries you are interested in:
Set results = map.values( Predicates.equals("name", "Lorenzo") );
Hazelcast is actually much more than a distributed in-memory object store. The most interesting feature is the ability to distribute processing. This makes Hazelcast closer to Oracle Coherence, Gemfire or Inifinispan rather than to Redis or Ehcache.
It is fairly easy to launch parallel processes, distributed among multiple JVMs. Ideally moving the execution close to the data they are going to process. Common use cases are batch processing, analytics, ETL…
Native client is Java and available to other JVM languages. The enterprise edition also provides C++ and .Net clients. As cache provider (only), it may be used by any language with a Memcache client (e.g. a PHP). Finally, a REST API is also available:
$ curl -v -X POST -H "Content-Type: text/plain" -d "bar" http://127.0.0.1:5701/hazelcast/rest/maps/mapName/foo
Hazelcast provides more Spring supporting features I have not used in this post:
Integrating Hazelcast in a Spring Boot application is straightforward. An instance (or lightweight client) is automatically embedded and started with the Spring application, exposing a very Spring-style singleton interface.
Configuration-by-environment may be controlled by the usual Spring mechanisms, without hacks.
No Hazelcast mock so no truly unit testing. But integration testing is feasible (…and, to be honest, how tricky is mocking RDBMS?).
Hazelcast impose some important limitations to application design and CI/deployment pipeline. All of them have to be known and considered from the very start of the project.
In the next post of this series I will introduce Hazelcast transaction support and the (missing!) integration with Spring-managed transactions.
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
Agile India 2022 – Systems Thinking for Happy Staff and Elated Customers
Watch Simon Copsey’s talk from the Agile India Conference on “Systems Thinking for Happy Staff and Elated Customers.”Lean-Agile Delivery & Coaching Network and Digital Transformation Meetup
Watch Simon Copsey’s talk from the Lean-Agile Delivery & Coaching Network and Digital Transformation Meetup on “Seeing Clearly in Complexity” where he explores the Current…When Your Product Teams Should Aim to be Inefficient – Part 2
Many businesses advocate for efficiency, but this is not always the right goal. In part one of this article, we explored how product teams can…