DDEV and Docker Healthchecks (Tech Note)
What are Docker Container Healthchecks?
Docker provides Container Health Checks as a means of verifying whether a container is behaving correctly, whether its services or configuration are working right. Normally a healthcheck is a script selected via the Dockerfile configuration or a Docker Compose specification.
How does DDEV use Healthchecks?
DDEV’s primary use of healthchecks is to make sure that everything is working inside a container before declaring it ready for use (and continuing startup). For example, during ddev start
DDEV starts ddev-webserver
and ddev-dbserver
by issuing a docker-compose up
. It then waits for healthy
status on each of these. For ddev-webserver
, this will mean that the web server is properly able to serve files, that php-fpm
or gunicorn
is able to interpret scripts, and that mailpit
is properly running. On ddev-dbserver
it will mean that the database server in the db
container is able to respond to queries and is fully functional.
Without healthchecks, containers will show up as ready when they’re actually not. The ddev-dbserver
is probably most vulnerable to this, because it can take a few seconds for the mysqld
or postgres
server to become ready. If we didn’t wait for the healthy
signal, traffic could try to flow before the various components were ready for it, causing ugly failures.
Avoiding CPU and battery usage
There is one more key trick DDEV does with its healthchecks. During startup, we want to find out as quickly as possible when the container is healthy, so that the start can continue immediately. So we want to test quickly until it’s ready. However, after that, not much goes wrong, so we have a tricky healthcheck script that, once the web
or db
container has become healthy, slows down the checks to about 60 seconds each. Essentially, if a healthy status has already been detected, the healthcheck script sleeps 60 seconds to prevent another check from happening, slowing down the checking process and avoiding unnecessary use of CPU and battery.
How can extra services like add-ons use Healthchecks?
Most extra services added by add-ons should have healthchecks, so that other services (like the web container) don’t try to use them before they are ready. Almost all of the java-based services like solr
and elasticsearch
need this desperately, because it takes them quite a while to come up, and if PHP code tries to use them before they’re ready, things go wrong.
The ddev-solr add-on’s healthcheck waits until the Apache SOLR server on port 8983 responds successfully to a request by using
healthcheck:
test: ["CMD-SHELL", "curl --fail -s localhost:8983/solr/"]
Where does DDEV check for healthy
status
ddev start
checks theweb
anddb
containers for healthy status before starting anyweb_extra_daemons
.- Then, after everything else is done,
ddev start
waits for all containers including those from additional services likesolr
orelasticsearch
, etc.
What are the components of a Healthcheck?
The Dockerfile and Compose Spec documents explain the syntax of healthcheck.
We’ll use the Compose syntax to take a look:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 1m30s
timeout: 10s
retries: 3
start_period: 40s
start_interval: 5s
test
is either built into the Docker image or added in thedocker-compose
recipe. In most DDEV core images, it’s specified in the Dockerfile, and it’s usually in the form of a script, for example,ddev-webserver
’s healthcheck.sh.interval
andstart_interval
are how often thetest
script or command should be run while we’re waiting. Most of our containers are set for 1 second, meaning, that if we keep trying every second on failure.start_interval
is an override ofinterval
for use in thestart_period
, but it can only be used where the Docker server is v25 or greater, so we can’t use it consistently yet, as some Docker providers used with DDEV are not to v25 yet.timeout
is how long the system should wait for thetest
before giving up and trying again.retries
is how many times it will try thetest
before declaring the containerunhealthy
.start_period
is probably the most important for DDEV. If we set thestart_period
to a reasonable value, we can give up waiting for the container at that point:start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.
While start_period
is a good gauge of how long we should wait… its default is zero, so if an add-on service does not provide it, it’s zero. However, the default for interval
is 30s, and the default for retries
is 3, so in that situation, assuming that the test
fails right away, it should be 90s before the container is declared unhealthy. (However, in Docker v25+, start_interval
defaults to 5s, which is a very different situation, possibly resulting in only 15s before reporting unhealthy
.)
How long does DDEV wait for the services to become healthy?
During ddev start
we wait for the maximum of all containers’ start_period
or the default_container_timeout
value from .ddev/config.yaml
The default value for the wait time is 120s. In other words, DDEV will wait for 120s for all containers to become ready unless default_container_timeout
is set to a different value.
What about ddev snapshot restore
?
ddev snapshot restore
is a very special case, because we’re starting the ddev-dbserver
with a specific job to do, and it can’t be declared healthy until after that job is done. And that job is a restore using mariabackup
, xtrabackup
, or pg_dump
.
Some people have huge databases to restore using snapshot restore
, so
- During restore, we raise the
default_container_timeout
to 600s (10 minutes) to give some extra space. - That still isn’t enough for some huge databases, so it’s possible to change the
.ddev/config.yaml
value ofdefault_container_timeout
to a larger value.
Isn’t this all a little confusing?
Yes, it’s confusing. I wrote this tech note because I have already struggled with doing this wrong more than once, and am in the process of fixing the code yet another time in fix: default_container_timeout should work right, fixes #5133 again. The key confusion for me has been the idea of timeout
in Docker, which is “when to give up on the healthcheck command”, and the idea of default_container_timeout
in DDEV, which is “how long should I wait for the container to become ready”. For DDEV, the idea most closely related to “how long should I wait” is the start_period
in Docker Compose.
Contributions welcome!
Your suggestions to improve this tech note are welcome. You can do a PR to this blog adding your techniques. Info and a training session on how to do a PR to anything in ddev.com is at DDEV Website For Contributors.
Join us for the next DDEV Live Contributor Training. Sign up at DDEV Live Events Meetup.