Skip to content
🎯 New workshop: Govern AI Costs in Real Time — Hands-On with agentgateway agentgateway has joined the Agentic AI FoundationLearn more

For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.

Page as Markdown

Backend health

Automatically evict and restore unhealthy backend endpoints with passive health checking.

Agentgateway continuously tracks the health of the endpoints behind a backend and can automatically remove, or evict, endpoints that return errors, then gradually restore them as they recover. This passive health checking (also known as outlier detection) is built into the load balancer, so it applies to any backend, including regular Kubernetes Services, not just LLM providers.

Unlike active health checks that probe endpoints on a schedule, passive health checking observes the responses from real traffic. When an endpoint’s responses match an unhealthy condition that you define, its health score drops. If the score crosses the eviction threshold, the gateway stops sending new requests to that endpoint for a backoff period, then returns it to the pool to see whether it recovered.

Before you begin

  1. Set up an agentgateway proxy.
  2. Install the httpbin sample app.

How backend health checking works

You configure backend health checking in the health field of a backend policy. The health field has two parts:

  • unhealthyCondition: A CEL expression that is evaluated against each response. When the expression returns true, the response is counted as unhealthy. If you do not set this field, any 5xx response or connection failure is treated as unhealthy, which lowers the endpoint’s health score but does not trigger eviction on its own.
  • eviction: The settings that control when an unhealthy endpoint is evicted and how it recovers, such as how many consecutive failures to allow before eviction (consecutiveFailures), how long to evict the endpoint for (duration), and the health score to restore it with (restoreHealth).

When every endpoint of a backend is evicted, the load balancer falls back to returning evicted endpoints rather than failing requests entirely. As such, you typically observe eviction in action only when a backend has multiple endpoints and some of them are healthy.

Configure backend health checking

The following example evicts an httpbin endpoint after it returns three consecutive 5xx responses, keeps it out of the pool for 30 seconds, and then restores it with full health. Restoring full health does not guarantee that the endpoint has recovered. If it keeps failing, it is evicted again, but each subsequent eviction lasts longer because the duration uses a multiplicative backoff. This backoff prevents a tight evict-restore-fail loop from sending a steady stream of traffic to a persistently broken endpoint. To restore the endpoint more cautiously, set restoreHealth below 100 so that it returns with a degraded health score and receives less traffic until it proves healthy.

  1. Create an AgentgatewayPolicy that applies backend health settings to the httpbin Service. Because the policy targets a Service, create it in the same namespace as the Service.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayPolicy
    metadata:
      name: httpbin-health
      namespace: httpbin
    spec:
      targetRefs:
      - group: ""
        kind: Service
        name: httpbin
      backend:
        health:
          unhealthyCondition: 'response.code >= 500'
          eviction:
            consecutiveFailures: 3
            duration: 30s
            restoreHealth: 100
    EOF
    SettingDescription
    targetRefsThe backend to apply the health settings to. This example targets the httpbin Kubernetes Service (group: "", kind: Service). You can also target an AgentgatewayBackend.
    backend.health.unhealthyConditionA CEL expression that is evaluated against each response. When it returns true, the response counts as unhealthy. This example treats any 5xx response as unhealthy.
    backend.health.eviction.consecutiveFailuresThe number of consecutive unhealthy responses required before the endpoint is evicted.
    backend.health.eviction.durationThe base amount of time to evict the endpoint for. Subsequent evictions use a multiplicative backoff.
    backend.health.eviction.restoreHealthThe health score from 0 to 100 to assign the endpoint when it returns from eviction. Set a value below 100 for gradual recovery, or 100 to restore it immediately.
  2. Port-forward the gateway proxy on port 15000.

    kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15000
  3. Get the config dump and verify that the health policy is applied to the httpbin Service.

    Example jq command:

    curl -s http://localhost:15000/config_dump | jq '[.policies[] | select(.name.name == "httpbin-health")] | .[0]'

    Example output: Note that the gateway reports your unhealthyCondition as unhealthyExpression, and normalizes the restoreHealth value of 100 to its internal 1 (100%).

    http://localhost:15000/config_dump
    {
      "key": "httpbin/httpbin-health:health:httpbin/httpbin.httpbin.svc.cluster.local",
      "name": {
        "kind": "AgentgatewayPolicy",
        "name": "httpbin-health",
        "namespace": "httpbin"
      },
      "target": {
        "backend": {
          "service": {
            "hostname": "httpbin.httpbin.svc.cluster.local",
            "namespace": "httpbin"
          }
        }
      },
      "policy": {
        "backend": {
          "health": {
            "unhealthyExpression": "response.code >= 500",
            "eviction": {
              "duration": "30s",
              "restoreHealth": 1,
              "consecutiveFailures": 3
            }
          }
        }
      }
    }
  4. Send requests to the httpbin app to confirm that healthy traffic still flows. The /headers endpoint returns a 200 response code, and the /status/503 endpoint simulates an unhealthy backend response that matches your unhealthyCondition.

    curl -i "http://${INGRESS_GW_ADDRESS}:80/headers" -H "host: www.example.com"
    curl -i "http://${INGRESS_GW_ADDRESS}:80/status/503" -H "host: www.example.com"

    The /headers request returns a 200 response, and the /status/503 request returns a 503. With a single httpbin endpoint, the gateway falls back to the evicted endpoint instead of failing requests. To observe eviction shifting traffic away from an unhealthy endpoint, scale the backend to multiple endpoints.

Cleanup

You can remove the resources that you created in this guide.
kubectl delete AgentgatewayPolicy httpbin-health -n httpbin
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.