Rate Limiting

Rate limiting is an effective mechanism to control the throughput of traffic destined to a target host. It puts a cap on how often downstream clients can send network traffic within a certain timeframe.

Most commonly, when a large number of clients are sending traffic to a target host, if the target host becomes backed up, the downstream clients will overwhelm the upstream target host. In this scenario it is extremely difficult to configure a tight enough circuit breaking limit on each downstream host such that the system will operate normally during typical request patterns but still prevent cascading failure when the system starts to fail. In such scenarios, rate limiting traffic to the target host is effective.

OSM supports server-side rate limiting per target host, also referred to as local per-instance rate limiting.

Configuring local per-instance rate limiting

OSM leverages its UpstreamTrafficSetting API to configure rate limiting attributes for traffic directed to an upstream service. We use the term upstream service to refer to a service that receives connections and requests from clients and return responses. The specification enables configuring local rate limiting attributes for an upstream service at the connection and request level. OSM leverages Envoy’s local rate limiting functionality to implement per-instance local rate limiting at each upstream host.

Each UpstreamTrafficSetting configuration targets an upstream host defined by the spec.host field. For a Kubernetes service my-svc in the namespace my-namespace, the UpstreamTrafficSetting resource must be created in the namespace my-namespace, and spec.host must be an FQDN of the form my-svc.my-namespace.svc.cluster.local.

Local rate limiting is applicable at both the TCP (L4) connection and HTTP request level, and can be configured using the rateLimit.local attribute in the UpstreamTrafficSetting resource. TCP settings apply to both TCP and HTTP traffic, while HTTP settings only apply to HTTP traffic. Both TCP and HTTP level rate limiting is enforced using a token bucket rate limiter.

Rate limiting TCP connections

TCP connections can be rate limited per unit of time. An optional burst limit can be specified to allow a burst of connections above the baseline rate to accommodate for connection bursts in a short interval of time. TCP rate limiting is applied as a token bucket rate limiter at the network filter chain of the upstream service’s inbound listener. Each incoming connection processed by the filter consumes a single token. If the token is available, the connection will be allowed. If no tokens are available, the connection will be immediately closed.

The following attributes nested under spec.rateLimit.local.tcp define the rate limiting attributes for TCP connections:

  • connections: The number of connections allowed per unit of time before rate limiting occurs on all backends belonging to the upstream host specified via the spec.host field in the UpstreamTrafficSetting configuration. This setting is applicable to both TCP and HTTP traffic.

  • unit: The period of time within which connections over the limit will be rate limited. Valid values are second, minute and hour.

  • burst: The number of connections above the baseline rate that are allowed in a short period of time.

Refer to the TCP local rate limiting API for additional information regarding API usage.

Rate limiting HTTP requests

HTTP requests can be rate limited per unit of time. An optional burst limit can be specified to allow a burst of requests above the baseline rate to accommodate for request bursts in a short interval of time. HTTP rate limiting is applied as a token bucket rate limiter at the virtual host and/or HTTP route level at the upstream backend, depending on the rate limiting configuration. Each incoming request processed by the filter consumes a single token. If the token is available, the request will be allowed. If no tokens are available, the request will receive the configured rate limit status.

HTTP request rate limiting can be configured at the virtual host level by specifying the rate limiting attributes nested under the spec.rateLimit.local.http field. Alternatively, rate limiting can be configured per HTTP route allowed on the upstream backend by specifying the rate limiting attributes as a part of the spec.httpRoutes field. It is important to note that when configuring rate limiting per HTTP route, the route matches an HTTP path that has already been permitted by a service mesh policy, otherwise the rate limiting policy will be ignored.

The following rate limiting attributes can be configured for HTTP traffic:

  • requests: The number of requests allowed per unit of time before rate limiting occurs on all backends belonging to the upstream host specified via the spec.host field in the UpstreamTrafficSetting configuration.

  • unit: The period of time within which requests over the limit will be rate limited. Valid values are second, minute and hour.

  • burst: The number of requests above the baseline rate that are allowed in a short period of time.

  • responseStatusCode: The HTTP status code to use for responses to rate limited requests. Code must be in the 400-599 (inclusive) error range. If not specified, a default of 429 (Too Many Requests) is used. The code must be a status code supported by Envoy.

  • responseHeadersToAdd: The list of HTTP headers as key-value pairs that should be added to each response for requests that have been rate limited.

Demos

To learn more about configuring rate limting, refer to the following demo guides: