(Spring) Booting Hazelcast

What is Hazelcast?

Hazelcast logo I’m not writing “yet another introduction to Hazelcast”. Just a few words about it. Hazelcast is a In-Memory Data Grid tool. Beyond buzzwords, it has a number of interesting low-level features:

Distributed, collections: Sets, Lists, Maps…
Distributed primitives for multi-JVM concurrency: Locks, Semaphores, Atomic Longs, Atomic References …
Distributed messaging support: Topics, Queues
Distributed execution: Executor Services, Entry processing, Distributed Queries…
Distributed transactions, potentially participating XA transactions.

Above these low-level features, it provides some high-level services. It is becoming popular as distributed cache provider or to support (web) session replication. It also has a out-of-the-box MapReduce interface.

For this post I’m using a very limited set of them.

Deployment

There is no Hazelcast Server.

Hazelcast is a library, deployed with (and initialised by) your Java application. The Java application launch the Hazelcast instance on starting, than it joins the cluster. A thin-client option allows connecting to external nodes, without storing any data locally.

Hazelcast in a Spring Boot Application

Starting a Hazelcast instance in a Spring Boot application is easy:

Include the com.hazelcast:hazelcast dependency
Initialise a com.hazelcast.config.Config Spring bean <– I personally consider this option more à la Spring Boot
OR
Put hazelcast.xml in the classpath root
OR
Put an Hazelcast XML config file at the location specified by spring.hazelcast.config property
(see Spring Boot docs)

The following example uses Spring Boot 1.3.0. This provides Hazelcast 3.5.3 as managed dependency.

pom.xml

…

 org.springframework.boot
 spring-boot-starter-parent
 1.3.0.RELEASE
 

 …

 com.hazelcast
 hazelcast
 

…

Configuration

@Configuration
public class HazelcastConfiguration {
 @Bean
 public Config config() {
   return new Config(); // Set up any non-default config here
 }
}

Spring Boot automatically start a Hazelcast instance when it finds both:

Hazelcast in classpath
A com.hazelcast.config.Config bean.

On startup, an instance of com.hazelcast.core.HazelcastInstance is then added to the Spring Application Context.

From Development toward Production

The interface for creating the Config object is fluent:

return new Config().addMapConfig( 
  new MapConfig()
    .setName("accepted-messages")
    .setEvictionPolicy(EvictionPolicy.LRU)
    .setTimeToLiveSeconds(2400))
  .setProperty("hazelcast.logging.type","slf4j");

Only the Config bean need to be changed, moving beyond development toward production environments. From a single Hazelcast instance to a cluster. Spring Profiles and Spring Boot Cloud features may be easily used for controlling that.

Also switching from a “heavyweight” cluster member, holding data in its own memory, to a lightweight client, connecting to external Hazelcast instances, is controlled by the Configuration bean.

Hazelcast Interface

The HazelcastInstance is your interface to the Hazelcast cluster.

@Service
public class ChatService {
 @Autowired
 private HazelcastInstance instance;
 
 private void send(ChatMessage message) {
   instance.getQueue(“messages”).offer(message);
 }
 ...
}

The HazelcastInstance is heavy weight to start. Must be singleton. But retrieving a distributed object from the instance is lightweight: It only returns a proxy to the distributed object.

A Sample Messaging Application

To demonstrate some of the basic features and the integration with Spring Boot, I’m using is a very simple application. A chat, exposing a REST API:
https://github.com/opencredo/springboot-hazelcast-example/tree/master

Messages are held in (distributed) Queues before being polled by recipients.

To make the send operation idempotent (dropping duplicated messages), received messages IDs are stored in a collection. This collection has a TTL, not to keep them forever. Note that I had to use a (distributed) Map and not a Set, as only Maps supports TTL at the moment.

Note that this application has many limitations. For example:

Queues have no TTL and no persistence: They might ‘explode’ if a recipient never polls, and will be lost if all nodes restart.
The recipient polling is not transactional: If any error occurs during the operation, messages are lost.

(Integration) Testing: No Mock Available

Integration tests are fundamental, as there is no way to effectively mock Hazelcast.

If you rely only on their java.util.* or java.util.concurrent.* interfaces, you might mock the distributed collections. But as they do not behave exactly as the local implementations, it is not a good idea.

The sample application includes a couple of test examples, at two different levels.

Integration Tests at Service Level

The first example (ChatServiceHazelcastImplTest) is an integration test at service level. A quasi-unit test not using Spring. Only the service under test and a single Hazelcast Instance are loaded.

The HazelcastInstance is initialised before every test:

private ChatService service;

@Before
public void setUp() {
   final HazelcastInstance instance = Hazelcast.newHazelcastInstance();
   service = new ChatServiceImpl(instance);
}

…and shutdown after:

@After
public void shutdown( ) {
  Hazelcast.shutdownAll();
}

This is time-consuming, but works fine and is essential to keep tests independent.

Note that we are using a single Hazelcast instance for testing, even though the application may be deployed on multiple nodes forming a cluster. Clustering is really transparent to the implementation, at least until you have to test network partition resilience or performances.

Beyond initialisation, the test implementation has nothing special:

@Test(timeout = 10000L)
public void testSendAndReceive() throws Exceptions {
  final ChatMessage message = new ChatMessage("mssg-id", "sender", "recipient", "text");
  service.send(message);

  List received = new ArrayList<>();
  while(received.isEmpty()) {
    received = service.receive("recipient");
  }
  assertNotNull(received);
  assertEquals(1, received.size());
  assertEquals(message, received.get(0));
}

[added after the initial porting]
As Vik’s comment pointed out, in Hazelcast source code there is a factory for creating networkless Hazelcast instances, useful for testing: TestHazelcastInstanceFactory. This is still a real (not a mock) instance, but makes (quasi-)unit tests a bit faster.

To use it, you have to add to the POM the Hazelcast dependency with “test” classifier and “test” scope:

  com.hazelcast
  hazelcast
  tests
  test

Then use it to create the instances:

private static TestHazelcastInstanceFactory testInstanceFactory = new TestHazelcastInstanceFactory();
@Before
public void setUp() {
  final HazelcastInstance instance = testInstanceFactory.newHazelcastInstance();
  ...
}

End-to-end Integration Tests

The second example (ChatControllerTest) is an end-to-end test of the API endpoints. This test starts the full stack and the Application Context, leveraging Spring Boot. It makes actual HTTP requests using RestAssured.

@RunWith(SpringJUnit4ClassRunner.class)
@SpringApplicationConfiguration(classes = ChatApplication.class)
@WebIntegrationTest("server.port=0") // Use a random free port
public class ChatControllerTest {
  @Value("${local.server.port}")
  private int port;

  @Before
  public void setUp() {
    RestAssured.port = port;
  }
  ...
}

Note that setup and teardown don’t even mention Hazelcast. We rely on Spring Boot for starting the Hazelcast instance. We don’t have to initialise it before every test. Unfortunately, you cannot shut it down too, for a reported, but never solved issue. So, all the tests methods reuse the same Hazelcast instance, with potential data clashes.

We know this is not good. Unfortunately, at the moment, the only way to work around it is using separate test classes for each test method. Digging into Hazelcast codebase, it appears they have written a custom Spring JUnit runner, but their tests reuse the same instance for all the test methods (sic).

Test implementations make requests and check responses, using RestAssured:

@Test
public void testSendThenReceive() throws Exception {
  final String chatMessageJson = "{"messageUid" : "A123", "sender" : "aSender", " +
     ""recipient" : "aRecipient", "text" : "this is the text" }";

  given().contentType(ContentType.JSON).body(chatMessageJson).
  when().post("/messages").
  then().assertThat().statusCode(200);
  Thread.sleep(500);

  when().get("/messages/bRecipient").
  then().assertThat().statusCode(200).
  and().contentType(ContentType.JSON).
  and().body("size()", equalTo(1)).
  and().body("[0].messageUid", equalTo("A123")).
  and().body("[0].sender", equalTo("aSender")).
  and().body("[0].recipient", equalTo("aRecipient")).
  and().body("[0].text", equalTo("this is the text"));
}

Some Limitations: Memory and Redeployment

Hazelcast has some important limitations, to be kept in mind.

Everything is in-memory. Hazelcast provides a mechanism for making collections persistent, potentially using any SQL and No-SQL external store with a Java interface. But persistence is optional, always requires a custom implementation and may seriously hit performance if not properly implemented.

The data distribution over the cluster is reasonably transparent, from the developer point of view. But beware, not everything is really distributed. Some structures are just replicated for High Availability. Maps are partitioned among cluster members (plus backups for HA). Sets, Queues and Topics are just replicated for HA. So, for example, all the entries of a Queue are in one cluster member’s memory.

Only the enterprise edition supports storing data externally of your application’s JVM Heap. The risk of OutOfMemoryError crashing your application or slowdowns triggered by the Garbage Collector is very real.

The configuration must be identical for all the nodes in the cluster and the configuration cannot be changed without restarting the member.

“Distribution” imply serialising objects over the wire, so all members must have same class version. For data, you may use DTO and implement a custom, versioned serialisation mechanism to avoid it, but for distributed processing it might be harder.

On large clusters redeploying all nodes at the same time could be impossible. These limitations might become a DevOps’ nightmares, if not properly considered from the start of the project.

The Risks of Transparency

The “transparency” of distributed objects may cause performance issues, if directly exposed to flaky, naive implementations.

Similarly to the n+1 queries problem when you iterate a Hibernate lazy-loaded Collection, the following simple code may cause serious problems. The Map is partitioned over the cluster and it will trigger a number of over-the-network calls:

Map<String,Person> map = instance.getMap("myMap");
List results = new ArrayList();
for(Map.Entry<String,Person> entry : map.entrySet()) {
 if ( entry.getValue().getName().equals("Lorenzo") )
   results = entry.getValue();
}

The trick is sending the query (value == “foo”) to all the members, returning only the entries you are interested in:

Set results = map.values( Predicates.equals("name", "Lorenzo") );

More than just Distributed Objects

Hazelcast is actually much more than a distributed in-memory object store. The most interesting feature is the ability to distribute processing. This makes Hazelcast closer to Oracle Coherence, Gemfire or Inifinispan rather than to Redis or Ehcache.

It is fairly easy to launch parallel processes, distributed among multiple JVMs. Ideally moving the execution close to the data they are going to process. Common use cases are batch processing, analytics, ETL…

Native client is Java and available to other JVM languages. The enterprise edition also provides C++ and .Net clients. As cache provider (only), it may be used by any language with a Memcache client (e.g. a PHP). Finally, a REST API is also available:

$ curl -v -X POST -H "Content-Type: text/plain" -d "bar" http://127.0.0.1:5701/hazelcast/rest/maps/mapName/foo

More Spring Support

Hazelcast provides more Spring supporting features I have not used in this post:

Support for using Hazelcast as Spring Cache provider
@SpringAware annotation, to enable distributed objects to be Spring managed
A custom XML namespace, if you want to configure your application by XML

Conclusions

Integrating Hazelcast in a Spring Boot application is straightforward. An instance (or lightweight client) is automatically embedded and started with the Spring application, exposing a very Spring-style singleton interface.

Configuration-by-environment may be controlled by the usual Spring mechanisms, without hacks.

No Hazelcast mock so no truly unit testing. But integration testing is feasible (…and, to be honest, how tricky is mocking RDBMS?).

Hazelcast impose some important limitations to application design and CI/deployment pipeline. All of them have to be known and considered from the very start of the project.

In the next post of this series I will introduce Hazelcast transaction support and the (missing!) integration with Spring-managed transactions.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

RETURN TO BLOG

SHARE