Failed ELB Health Checks Post Deployment
If you’re seeing a Task Stopped Reason: Task failed ELB health checks
error after an ECS deployment like this:
Time took: 3m 54s
Software shipped!
Stack: demo-dev
Service: demo-dev-EcsService-7pIXpAWS6VVR
Tasks: Running: 1 Desired: 1 Min: 1 Max: 3
Application ELB: demo-Elb-N2F2H887WJON-658045478.us-west-2.elb.amazonaws.com
[
{
"Task": "029e357a8ff4490e8b7edb2702f38d50",
"Name": "web",
"Release": "demo-dev:8",
"Started": "1 minutes ago",
"Status": "RUNNING",
"Notes": null
},
{
"Task": "4a37407823cd4accb75170f6eda92887",
"Name": "web",
"Release": "demo-dev:8",
"Started": "2 minutes ago",
"Status": "STOPPED",
"Notes": "Task Stopped Reason: Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-west-2:111111111111:targetgroup/demo-Targe-1GXHAH9WXSU9V/8e6d68f88cd88d12)."
}
]
This might mean that the health_check_interval_seconds
and unhealthy_threshold_count
are configured too aggressively for your app. For example, a settings unhealthy_threshold_count = 2
and health_check_interval_seconds = 10
means a total of 20s
before the ELB will see the task as unhealthy.
ECS sends a soft kill signal to the Docker container during a deployment. If the app process doesn’t fully stop the container after 30s, ECS sends a hard kill -9
signal. However, 30s > 20s
so you may intermittently see the error reported. Essentially, the app hasn’t fully finished cleaning up yet and the ELB detects it’s unhealthy before it can.
In this case, you might want to increase the unhealthy_threshold_count = 3
, so the ELB waits 30s
before seeing the task as unhealthy.