If your Discord webhook starts returning 429 Too Many Requests, it’s not a bug — it’s Discord telling you to slow down. Get this wrong and your bot gets a global ban (10 minutes of silence across all webhooks tied to your IP). Get it right and you can push thousands of messages per minute reliably.

This guide covers exactly how Discord’s rate limits work for webhooks in 2026, the headers you must read, and the retry strategy that survives bursts and sustained load.

TL;DR — The Numbers You Need

Per-webhook limit: ~30 requests per 60 seconds (per webhook URL)
Per-channel limit: 5 requests per 5 seconds (shared across all webhooks in the same channel)
Global limit: 50 requests per second (per IP / token)
On 429: read Retry-After and wait that many seconds before retrying
On X-RateLimit-Global: true: stop all requests for the cooldown — not just the one that failed
Cloudflare ban: more than ~10,000 invalid requests in 10 minutes → IP blocked for 1 hour

Always parse the response headers. Hardcoding sleeps is fragile.

How Rate Limit Headers Work

Every webhook response includes these headers:

X-RateLimit-Limit: 5
X-RateLimit-Remaining: 4
X-RateLimit-Reset: 1714499200.123
X-RateLimit-Reset-After: 1.234
X-RateLimit-Bucket: 80c17d2f203122d936070c88c8d10f33

Header	Meaning
`X-RateLimit-Limit`	Total requests allowed in this bucket
`X-RateLimit-Remaining`	Requests left before hitting the limit
`X-RateLimit-Reset`	Unix timestamp (seconds) when the bucket refills
`X-RateLimit-Reset-After`	Seconds until refill (preferred — clock-skew-safe)
`X-RateLimit-Bucket`	Bucket hash — group requests by this, not by route

Use X-RateLimit-Reset-After, not X-RateLimit-Reset. The former is computed by Discord and immune to clock drift on your machine.

The 429 Response Body

When you exceed a limit, you get HTTP 429 with a JSON body:

{
  "message": "You are being rate limited.",
  "retry_after": 0.523,
  "global": false,
  "code": 0
}

retry_after is in seconds as a float (millisecond precision since 2020)
global: true means the global 50/s limit was hit — back off everything
code: 30007 means Cloudflare-level ban — you’re sending way too many invalid requests

Minimal Safe Sender (Python)

import time
import requests

WEBHOOK_URL = "https://discord.com/api/webhooks/ID/TOKEN"

def send(content: str, max_retries: int = 5):
    payload = {"content": content}
    for attempt in range(max_retries):
        r = requests.post(WEBHOOK_URL, json=payload, timeout=10)

        # Success
        if r.status_code in (200, 204):
            remaining = r.headers.get("X-RateLimit-Remaining")
            reset_after = r.headers.get("X-RateLimit-Reset-After")
            # Proactively pause if we're about to hit the limit
            if remaining == "0" and reset_after:
                time.sleep(float(reset_after) + 0.05)
            return True

        # Rate limited
        if r.status_code == 429:
            data = r.json()
            wait = float(data.get("retry_after", 1))
            is_global = data.get("global", False)
            print(f"429 — waiting {wait:.2f}s (global={is_global})")
            time.sleep(wait + 0.05)  # tiny buffer for jitter
            continue

        # Server error → exponential backoff
        if 500 <= r.status_code < 600:
            backoff = (2 ** attempt) + 0.5
            time.sleep(backoff)
            continue

        # Bad request — no point retrying
        r.raise_for_status()

    return False

send("Production deploy completed")

Key behaviors:

Honors retry_after from the 429 body
Adds 50ms buffer to avoid edge-case re-failures
Pauses proactively when X-RateLimit-Remaining hits 0
Falls back to exponential backoff on 5xx errors

Per-Channel vs Per-Webhook Buckets

This trips up most people. Two different webhooks pointing to the same channel share a per-channel rate limit. So if you have:

Webhook A → #alerts (used by CI)
Webhook B → #alerts (used by monitoring)

…and both burst-send simultaneously, you’ll hit 5 requests / 5 seconds faster than you’d expect.

Solution: route different message classes to different channels, or queue and serialize sends through a single worker per channel.

Exponential Backoff with Jitter (JavaScript)

For high-throughput systems, add jitter so multiple workers don’t retry in lockstep:

async function send(payload, attempt = 0) {
  const res = await fetch(process.env.WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(payload),
  });

  if (res.ok) return res;

  if (res.status === 429) {
    const body = await res.json();
    const wait = (body.retry_after + Math.random() * 0.1) * 1000;
    await new Promise(r => setTimeout(r, wait));
    return send(payload, attempt + 1);
  }

  if (res.status >= 500 && attempt < 5) {
    const backoff = (2 ** attempt + Math.random()) * 1000;
    await new Promise(r => setTimeout(r, backoff));
    return send(payload, attempt + 1);
  }

  throw new Error(`Webhook failed: ${res.status}`);
}

Adding Math.random() to the wait time prevents the thundering herd when many clients retry the same 429.

Token Bucket — The Production Pattern

For sustained sending, implement a token bucket that locally enforces the limit:

import time
import threading
from collections import deque

class WebhookLimiter:
    def __init__(self, max_per_window: int = 30, window_s: float = 60.0):
        self.max = max_per_window
        self.window = window_s
        self.timestamps: deque[float] = deque()
        self.lock = threading.Lock()

    def acquire(self):
        with self.lock:
            now = time.monotonic()
            # Drop expired timestamps
            while self.timestamps and now - self.timestamps[0] > self.window:
                self.timestamps.popleft()
            if len(self.timestamps) >= self.max:
                wait = self.window - (now - self.timestamps[0]) + 0.01
                time.sleep(wait)
                return self.acquire()
            self.timestamps.append(now)

limiter = WebhookLimiter(max_per_window=25)  # leave headroom

def send(content: str):
    limiter.acquire()
    requests.post(WEBHOOK_URL, json={"content": content})

This guarantees you never send more than 25/min, regardless of network latency, retries, or thread count.

Checklist: Avoid the Cloudflare Ban

Always parse Retry-After — never hardcode sleeps
Treat global: true as a full-stop, not just for the failed request
Validate payloads before sending (skip Discord rejection round-trips)
Use a single queue per channel, not per webhook
Log 429 frequency — if it’s > 1% of requests, your sender logic is wrong
Cache the bucket from X-RateLimit-Bucket if you need shared state across processes

Common Mistakes

Mistake 1: Treating retry_after as milliseconds. It’s seconds (since 2020). The header X-RateLimit-Reset-After is also seconds. Multiplying by 1000 means you wait way too long and look like you’re not retrying at all.

Mistake 2: Retrying 4xx errors. Only 429, 500, 502, 503, 504 are retryable. A 400 Bad Request means your payload is invalid — fix it. Hammering it accelerates the Cloudflare ban.

Mistake 3: Spawning a thread per message. Threads share the same IP and same rate limit pool. You don’t get parallelism for free — you get more 429s.

Mistake 4: Ignoring X-RateLimit-Remaining: 0. The next request will fail. Pause proactively.

Test It in Our Builder

Want to see exactly what payload your code is sending? Open the Discord Webhook builder, craft your embed, hit “Send”, and inspect the network panel — you’ll see the headers Discord returns and can validate your retry logic against real responses.

For higher-throughput automation, also check our guides on scheduled messages and automation workflows.

References

Discord developer docs: Rate Limits
Discord developer docs: Execute Webhook