Post

Request Hedging: Fighting Tail Latencies

Request Hedging: Fighting Tail Latencies

Request Hedging: Fighting Tail Latencies

In systems with low latency SLA requirements, it becomes very important to address the tail latencies as it can have a significant impact on users.

Imagine you have multiple services in a request path. Each request adds up its own latency to the overall response.

Request hedging offers a simple solution to this problem by sending multiple duplicate requests and using the first response that comes back.

Understanding Request Hedging

Request hedging is conceptually simple: instead of waiting for a single request to complete, you send multiple identical requests (potentially to different backend servers) and use whichever response arrives first.

When to Use Request Hedging

Hedging is particularly valuable when:

  • Tail latency matters: In user-facing applications where consistent response times are crucial
  • Resources aren’t constrained: You have spare capacity to handle duplicate requests
  • Operations are idempotent: The same request can be processed multiple times without side effects
  • Backend variability exists: Some servers may be slower than others due to resource contention, garbage collection, etc.

How Hedging Differs from Retries

While both strategies involve sending multiple requests, there’s a key difference:

  • Retries: Send a second request only after the first one fails or times out
  • Hedging: Send additional requests proactively, without waiting for the first to fail

This distinction makes hedging much more effective at reducing tail latency.

Hedging Strategies

Several variations of request hedging exist:

  • Naive hedging: Send multiple requests simultaneously
  • Delayed hedging: Send the first request, then send a second after a delay if the first hasn’t completed
  • Adaptive hedging: Adjust hedging behavior based on recent performance patterns

Implementing Request Hedging with Python Asyncio

Let’s implement a delayed hedging strategy using Python’s asyncio:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import asyncio
import time
import aiohttp
import logging
from typing import List, TypeVar, Callable, Awaitable, Optional, Any

T = TypeVar('T')

async def hedged_request(
    request_fn: Callable[[], Awaitable[T]],
    hedge_delay: float = 0.1,
    max_hedges: int = 3,
    timeout: Optional[float] = None
) -> T:
    """
    Execute a request with hedging.

    Args:
        request_fn: Async function that makes the actual request
        hedge_delay: Seconds to wait before sending each additional request
        max_hedges: Maximum number of concurrent requests (including the original)
        timeout: Total timeout for all hedged requests

    Returns:
        Result from the first successful request

    Raises:
        asyncio.TimeoutError: If all requests time out
        Exception: Any exception raised by all request attempts
    """

    # Keep track of sent requests and their tasks
    hedge_tasks = []
    request_count = 0
    exceptions = []

    # Set up a flag to track if we've already returned a result
    got_result = False

    async def execute_request(request_id: int) -> T:
        nonlocal got_result

        try:
            start_time = time.time()
            logging.info(f"Starting request {request_id}")

            result = await request_fn()

            # If we haven't already returned a result, log the success
            if not got_result:
                got_result = True
                duration = time.time() - start_time
                logging.info(f"Request {request_id} succeeded after {duration:.3f}s")

            return result

        except Exception as e:
            logging.warning(f"Request {request_id} failed with: {e}")
            exceptions.append(e)
            raise

    # Create the initial request
    request_count += 1
    main_task = asyncio.create_task(execute_request(request_count))
    hedge_tasks.append(main_task)

    # Set up the overall timeout if specified
    if timeout:
        deadline = asyncio.create_task(asyncio.sleep(timeout))
        hedge_tasks.append(deadline)

    # Main hedging loop
    while request_count < max_hedges:
        # Wait for either a result or the hedge delay
        delay_task = asyncio.create_task(asyncio.sleep(hedge_delay))

        # Create a list of pending tasks to await
        pending = [t for t in hedge_tasks if not t.done()] + [delay_task]

        if not pending:
            # All tasks have completed or failed
            break

        # Wait for the first task to complete
        done, pending = await asyncio.wait(
            pending,
            return_when=asyncio.FIRST_COMPLETED
        )

        # Cancel the delay task if it's still pending
        if not delay_task.done():
            delay_task.cancel()

        # Check if we have a deadline and it's done
        if timeout and deadline in done:
            # Cancel all other tasks
            for task in hedge_tasks:
                if not task.done() and task != deadline:
                    task.cancel()
            raise asyncio.TimeoutError("All hedged requests timed out")

        # Check if we got a result from an existing task
        for task in done:
            if task in hedge_tasks and task != deadline:
                try:
                    result = task.result()

                    # Cancel all other pending tasks
                    for t in hedge_tasks:
                        if not t.done() and t != deadline and t != task:
                            t.cancel()

                    return result
                except Exception:
                    # This task failed, continue with hedging
                    pass

        # If we've reached the hedge delay and haven't returned yet, send another request
        if delay_task in done:
            request_count += 1
            hedge_task = asyncio.create_task(execute_request(request_count))
            hedge_tasks.append(hedge_task)

    # Wait for any remaining tasks
    remaining = [t for t in hedge_tasks if t != deadline and not t.done()]
    if remaining:
        done, pending = await asyncio.wait(remaining, return_when=asyncio.FIRST_COMPLETED)

        # Check if any succeeded
        for task in done:
            try:
                return task.result()
            except Exception:
                # Task failed, continue
                pass

    # If we get here, all requests failed
    if exceptions:
        raise exceptions[0]  # Re-raise the first exception
    else:
        raise RuntimeError("All hedged requests failed with unknown errors")

Now let’s create a practical example using this implementation to make HTTP requests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
async def fetch_with_hedging(url: str) -> str:
    """Fetch a URL with request hedging."""

    # Define our actual request function
    async def make_request():
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                response.raise_for_status()
                return await response.text()

    # Use our hedging function
    return await hedged_request(
        request_fn=make_request,
        hedge_delay=0.1,  # Send a new request after 100ms
        max_hedges=3,     # Send at most 3 requests
        timeout=2.0       # Total timeout of 2 seconds
    )

Real-World Considerations

While request hedging can dramatically improve tail latency, it comes with important trade-offs:

1. Increased Load

By definition, hedging generates additional load on your backends. This amplification factor needs careful consideration:

  • Start with minimal hedging (2 total requests)
  • Monitor backend load closely
  • Consider only hedging a percentage of requests
  • Implement circuit breakers to stop hedging during high load

2. Idempotent Operation

Hedging is only safe when the operation itself is idempotent.

3. Adaptive Hedging

In production systems, adding adaptive hedging helps scenarios to handle bursts:

  • Only hedges when response times exceed certain percentiles
  • Use hedging selectively for faster instances

Conclusion

Request hedging is a simple but powerful approach for improving tail latency in systems. By sending multiple requests and using the first response, you can significantly reduce the impact of occasional slowdowns and provide a more consistent user experience.

This post is licensed under CC BY 4.0 by the author.