A Post-Mortem on a Docker Compatibility Break
Have you ever had a perfectly working Docker Compose stack that mysteriously stopped working after a routine software update? It’s a frustrating experience that can consume hours of debugging. This post is a chronicle of just such a problem, involving a local Elastic Stack, Docker’s recent versions, and a simple, yet critical, configuration oversight.
The stack in question was a straightforward setup for local development, enabling a quick start for Elasticsearch, Kibana, and the APM Server. The key to its simplicity was the environment variable xpack.security.enabled=false, which effectively disabled security for a seamless, local-only experience.
The configuration looked like this:
version: "3.9"
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.16.1
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
- "9600:9600"
ulimits:
memlock:
soft: -1
hard: -1
restart: always
kibana:
image: docker.elastic.co/kibana/kibana:8.16.1
container_name: kibana
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- xpack.apm.enabled=true
ports:
- "5601:5601"
restart: always
apm-server:
image: docker.elastic.co/apm/apm-server:8.16.1
container_name: apm-server
depends_on:
- elasticsearch
environment:
- APM_SERVER_LICENSE=trial
- X_PACK_SECURITY_USER=elastic
- X_PACK_SECURITY_PASSWORD=changeme
ports:
- "8200:8200"
restart: always
This setup worked flawlessly for months. But after a hiatus and a few Docker updates, the stack refused to start. Countless hours were spent trying different versions, troubleshooting network issues, and even experimenting with new configurations like Fleet and health checks—all without success. The solution, it turned out, was to roll back to a four-year-old version of Docker (20.10.x), which immediately got the stack running again.
The question was: what had changed?
The Root Cause: A Subtle Security Misalignment
The culprit wasn’t a major Docker bug but a subtle incompatibility in the configuration that was handled differently by newer Docker versions. The issue lies with the apm-server configuration.
Even though security was explicitly disabled in the elasticsearch service with xpack.security.enabled=false, the apm-server was still configured to use authentication with X_PACK_SECURITY_USER=elastic and X_PACK_SECURITY_PASSWORD=changeme.
In older Docker versions, the APM server’s attempt to authenticate against an unsecured Elasticsearch instance might have failed silently or been handled gracefully, allowing the stack to proceed. However, recent versions of Docker and the Elastic stack are more stringent and robust in their security protocols. The APM server’s inability to authenticate against the non-secured Elasticsearch instance led to a fatal startup error, halting the entire stack.
The Solution: A Simple YAML Fix
The solution is to simply align the security settings across all services. Since Elasticsearch is running without security, the APM server should also be configured to connect without authentication.
By removing the authentication environment variables from the apm-server service, the stack starts correctly on the latest Docker versions.
Here is the corrected docker-compose.yml:
version: "3.9"
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.16.1
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
- "9600:9600"
ulimits:
memlock:
soft: -1
hard: -1
restart: always
kibana:
image: docker.elastic.co/kibana/kibana:8.16.1
container_name: kibana
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- xpack.apm.enabled=true
ports:
- "5601:5601"
restart: always
apm-server:
image: docker.elastic.co/apm/apm-server:8.16.1
container_name: apm-server
depends_on:
- elasticsearch
# The fix is here: remove security environment variables
environment:
- APM_SERVER_LICENSE=trial
ports:
- "8200:8200"
restart: always
This experience highlights an important lesson in development: what works today may not work tomorrow due to underlying changes in a platform’s behavior. While a quick downgrade can get you back on track, a deeper investigation into the root cause often leads to a more robust and forward-compatible solution.