Issue – Liveness probes causing other containers to restart
While migrating from mesos to kubernetes cluster we transformed the marathon configuration files to kubernetes yaml file.
Upon deployment it was observed that all the containers were restarting all the time. One of the configurations was health checks. Mesos health checks were migrated to kubernetes liveness probes.
Documentation revealed that the concept of health checks is further divided into probes which offer finer control in kubernetes.
Let’s go over the different probes offered in kubernetes
Liveness probes
Liveness probes are used to restart containers based on certain conditions. For example, if the container is non-responsive or is unhealthy based on the URL configured in the probe, kubernetes will restart the container in an attempt to fix the problem. Therefore liveliness probe is a feature provided by kubernetes to automatically handle scenarios where otherwise a manual intervention would have been required.
Readiness probes
Readiness probes indicate when a container or service is ready to accept requests. Therefore, for most of the cases readiness probes can be used for configuring health checks.
Startup probes
These probes come in handy when the service or container in question has a slow startup.
Startup probes disable liveness and readiness probes until the service is started.
But what made all the services to restart all the time?
Upon probing a bit further in Kubernetes documentation found that we should have actually used readiness probes in place of liveness probes.
The URL configured in the liveness probe configuration(of service A) was actually health check URL of dependent service(service B). If due to some reason the service B was not healthy then liveness probe will cause Service A to restart. So, even though the problem lied with Service B, services/containers which depended on B were getting restarted. This eventually caused services which depended on service A to be restarted.