Debugging Java Applications Running in Containers

Shared experiences

After a brief introduction to the talk, Steve and I provided an overview of our experiences of working with Java and Docker. Over the past year Steve has been looking at rolling out an internal large-scale container-based platform within IBM. Accordingly, he has approached the topic of Java and Docker from the ‘bottom up’ perspective, gathering requirements and analysing what is required of such a platform.

Within the same time period I have been approaching the combination of Java and Docker from a ‘top-down’ perspective, as many of our clients at OpenCredo have been experimenting with the combination of these technologies. Therefore, I’ve had a front-row seat (alongside my OpenCredo colleagues) in the early adoption of Java applications within Docker, and the opportunities and challenges of working with container technology.

As with many of the talks that Steve and I have presented, our experiences are often nicely complementary, and the preparation for this presentation led to some interesting conversations and insight for us both (and hopefully you too!)

Debugging techniques: Start simple…

Steve kicked-off the talk by introducing core Docker concepts, such as Dockerfiles and ‘docker run’, and then covered basic Java debugging techniques, using ‘mvn exec:java’ and IntelliJ IDEA debug support. We also introduced ‘docker ps’ and the ever-useful ‘docker exec -it’ command, which can be used to attach to a running container and execute arbitrary commands.

Steve is a big fan of the Kitematic GUI Docker container-management application, which is part of the Docker Toolbox, and he demonstrated how this tool can be used to attach to running containers in order to execute debugging utilities without the need for docker exec. We also covered the basics of remote debugging with Java and Docker, and referencing the great instructions from Patrick McCarthy, we highlighted the gotchas that have troubled us.

Finally in this introductory part of the talk we covered the basics of Java command line debug tooling, such as jps, jstat and jstack (all links here are for the JDK 8 version of the tools), as I have found these useful for getting access to key metrics and hints as to what is going on within a JVM without attaching a profiler or debugger.

How do you think about containers?

In part two of the talk Steve attempted to banish misconceptions of Docker that he has heard from developers new to the technology, and I added my thoughts about cloud technology and some of the similar challenges this can also bring. There are typically restrictions in terms of minimal operating systems, limited resource (often with contention), and applications or platforms that don’t fully respect the Docker resource encapsulation model (for example, some parts of the /proc filesystem are not cgroup aware). This can make debugging even more challenging as the problem space has now been increased, and as we all know, the key part of debugging is locating the issue!

what is a container

Real world case studies

A series of real-world case studies were presented in the third part of the talk, and rather than duplicate the information that is already contained within the slides below, I’ll simply summarise the issues here that I’ve commonly encountered when working with Java and Docker:

Watch for free disk space within the container (particularly if you are using a framework such as Mesos / Marathon or Kubernetes, where container disk quotas can be assigned), as the inability to write logs to disk can cause strange behaviour within applications and Docker.
Also watch for free inodes within the Docker host and container, particularly if you are building a lot of containers and not explicitly destroying them (for example, if you are building images as part of a Jenkins continuous delivery pipeline).
When restricting memory available to a Java process ensure that you add an allowance for the JVM overhead in addition to the Xmx heap size. For example, the JVM itself requires some memory space to operate, the ClassLoader requires memory to load classes into the Metaspace (PermGen), and also any Threads being created will also require additional system memory.
/dev/random may block on a host running a lot of containers, as entropy will typically be low and easily exhausted. This means that applications may fail to start, crash when attempting to allocate a session, block when creating random numbers (or UUIDs or other cryptographic functions). Using /dev/urandom can be a solution e.g. with the JVM flag ‘-Djava.security.egd=file:/dev/urandom’, but be aware of some of the potential issues with this.
Watch for Java’s slightly wonky caching of DNS, particularly if your application is talking to a load balancer in which the IP of attached containers may change (for example, when performing an application update or container restart). The JVM option ‘-Dsun.net.inetaddr.ttl=<<TTL in seconds>>’ can be useful.
Ensure that shared physical resources, such as network, CPU and memory are not in high contention, as this may result in intermittent issues.

Our key learnings

Steve and I summarised the talk by providing an insight into our key learnings working with Java and Docker over the past year:

Instrument all the things! Although instrumentation costs (typically in terms of processing time and network bandwidth), we recommended initially going heavy with instrumenting the individual services/application, OS and the system as a whole. As soon as you are comfortable with the system running in production, you can then dial back the monitoring
Be careful when recording and analysing aggregated metrics. Sometimes the individual service metrics can provide valuable insight (and can also hint to localised issues). Although the general advice when working with containers is to think ‘cattle, not pets’, the reality of the situation sometimes dictates you working with ‘prize bulls’
Distributed tracing tooling such as Zipkin (Brave) is worth it’s weight in gold for the insight it can provide. Applying correlation identifiers for use with tooling like Zipkin can also be useful for tracing an individual requests via logging tooling (using MDC) such as Kibana
In-situ monitoring of containers and associated applications can be done by curling the Docker stats endpoint, or by using a Java shell component such as CRaSH.
We suggested ‘graphing everything’, even if it at first appears insignificant’, and we like InfluxDB, Telegraf, Grafana, Datadog, AWS CloudWatch and Prometheus (although other tooling does exist)
The ElasticSearch-Logstash-Kibana (ELK) stack is an essential logging solution. Don’t forget to ship logs from ephemeral containers and VMs as soon as possible (either via a Logstash log shipper, or by logging to a non-ephemeral directory that is mounted into the container)
“Log like an operator” was a key bit of advice. When writing log statements in Java code (or any code) try and think what information would be useful to someone diagnosing an issue, and remember when things go wrong in production, this person is typically an operator.

Summary

Both Steve and I believe that debugging is still an essential skill, even in the age of TDD, disposable microservices and ephemeral containers. Maybe, just maybe, with the potential rise in system complexity, it is a more important skill than ever before?

Finding, isolating and replicating any issues is a vital approach to systematically locating and fixing a bug. Debugging Java applications that are running within a container doesn’t require a massive shift in mindset, as a lot of the old tooling and approaches still work. However, container and cloud technology do add some additional challenges to the debugging process, but knowledge is your friend, and we recommend building your ‘debugging toolbox’ with the information contained within this presentation.

Finally monitoring, logging and alerting are essential components of container operation and debugging, regardless of programming language choice.

Talks slides

I’ve uploaded the slides that Steve and I created to SlideShare, and you can find a preview here:

J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required" from Daniel Bryant

Please do get in touch!

At OpenCredo we are early adopters of many technologies, including container and cluster management technology, and so please do get in touch if you require information or guidance in your organisation. Although we are passionate about exploring emerging technology, we are also pragmatic, and we won’t recommend containers (or indeed any technology) unless we believe it is a good fit for your situation. Contact me at @danielbryantuk or daniel.bryant@opencredo.com