guestlist

Python library reference · v0.1

Overview

guestlist is a Python library and hosted service. For any URL it returns whether AI agents are likely to be allowed through — based on continuous probes from real browsers across many domains.

The verdict is a tier: green, yellow, orange, red, or unknown. Check before you crawl; skip the dead ends.

Install

pip install guestlist-tools

Requires Python 3.10 or newer. The only runtime dependency is httpx.

Quickstart

Set GUESTLIST_API_KEY in your environment, then:

from guestlist import check

results = check([
    "https://news.ycombinator.com",
    "https://nytimes.com",
])
for r in results:
    print(r.domain, r.tier)

Authentication

The library reads the API key from the GUESTLIST_API_KEY environment variable. You can also pass it explicitly to the Guestlist constructor:

from guestlist import Guestlist

gl = Guestlist(api_key="gst_…")

Resolution order: constructor argument, then environment variable. If neither is set, a ConfigError is raised at construction time. All requests are sent with Authorization: Bearer <api_key>.

check(urls)

Module-level convenience function. Lazily constructs a default Guestlist client on first call and forwards.

from guestlist import check

results = check(["https://example.com", "https://example.org"])

Accepts any iterable of URL strings and returns a list[Result] in input order. Empty input returns an empty list without an HTTP request. The library auto-batches into 100-URL chunks under the hood, so passing 1,000 URLs is one call from your perspective.

Guestlist class

Construct your own client when you need to override the base URL, set a custom timeout, or manage multiple clients in one process.

from guestlist import Guestlist

gl = Guestlist(
    api_key="gst_…",                        # or omit to read GUESTLIST_API_KEY
    base_url="https://api.guestlist.tools", # default
    timeout=30.0,                           # seconds, per HTTP request
)
results = gl.check(urls)
gl.close()

The client is safe to use as a context manager; the __exit__ handler calls close().

with Guestlist() as gl:
    results = gl.check(urls)

Result

One frozen dataclass per input URL, in input order.

@dataclass(frozen=True)
class Result:
    url: str
    domain: str
    tier: Tier
    success_rate: float | None
    n_samples: int
    confidence: float
    blocker_detected: Blocker | None
    last_tested_at: datetime | None
  • url — the URL you sent in, echoed back.
  • domain — the registrable domain we matched against (e.g. x.com).
  • tier — the verdict. See Tiers.
  • success_rate — rolling fraction of recent probes that succeeded for that domain. None when there is no data yet.
  • n_samples — how many probes back the verdict.
  • confidence — scales with sample count, capped at 1.0.
  • blocker_detected — see Blockers. None when nothing specific was identified.
  • last_tested_at — timezone-aware UTC datetime of the most recent probe. None when there is no data yet.

When the service has no data for a domain, you will get tier=Tier.UNKNOWN, success_rate=None, n_samples=0, confidence=0.0, blocker_detected=None, and last_tested_at=None. This is common in the first weeks after launch for less-trafficked domains.

Tiers

Tier is a str enum, so comparing to the raw string works:

if r.tier == "green":
    ...
if r.tier in (Tier.GREEN, Tier.YELLOW):
    ...
TierMeaning
greenAlmost always succeeds. Use confidently.
yellowUsually succeeds. Expect occasional retries.
orangeSometimes succeeds. Use a stealth or proxy fallback if you have one.
redRarely succeeds. Skip — it's blocked, not flaky.
unknownNot enough probes yet. Try once, don't loop.

Blockers

Blocker is a str enum naming the bot-protection vendor or failure mode last observed for a domain. blocker_detected is None when nothing specific was identified.

  • cloudflare
  • akamai
  • datadome
  • imperva
  • perimeterx
  • connection_failed
  • unknown

URL matching

URLs are resolved server-side to their registrable domain; the verdict is for that domain's apex page. The path, query, fragment, and subdomain are not part of the lookup today.

https://api.x.com/users/123  →  x.com
https://www.instagram.com    →  instagram.com
https://m.bbc.co.uk/world    →  bbc.co.uk

So www., m., and other subdomains collapse to the registrable domain, but distinct effective TLDs are kept separate: bbc.co.uk bbc.com. The Result.domain field shows exactly what was matched.

One consequence to keep in mind: a site like Instagram has a hard apex login wall but many deep public paths load fine — the tier reflects the apex, not the deep page. Per-path tiering is on the roadmap.

Errors

All exceptions inherit from GuestlistError. Catch the base class for a single handler, or catch the subclasses for finer control.

GuestlistError              # base class
├── ConfigError             # missing api_key, bad base_url
├── AuthenticationError     # HTTP 401
├── RateLimitError          # HTTP 429 after retry; carries .retry_after
├── APIError                # other 4xx/5xx after retries; carries .status_code, .detail
└── NetworkError            # connection or timeout error after retry
from guestlist import check, GuestlistError, RateLimitError

try:
    results = check(urls)
except RateLimitError as e:
    sleep(e.retry_after or 60)
except GuestlistError as e:
    log.warning("guestlist failed: %s", e)

Retries & timeouts

The client handles transient failures automatically before raising:

  • 429 Rate limit — one retry, honoring Retry-After up to 60 seconds, then RateLimitError.
  • 5xx Server error — two retries with exponential backoff (250 ms, then 1 s), then APIError.
  • Network or timeout — one retry, then NetworkError.
  • Other 4xx — raised immediately, no retry.

Default request timeout is 30 seconds. Override with Guestlist(timeout=…).