guestlist
Python library reference · v0.1
Overview
guestlist is a Python library and hosted service. For any URL it returns whether AI agents are likely to be allowed through — based on continuous probes from real browsers across many domains.
The verdict is a tier: green, yellow, orange, red, or unknown. Check before you crawl; skip the dead ends.
Install
pip install guestlist-tools
Requires Python 3.10 or newer. The only runtime dependency is httpx.
Quickstart
Set GUESTLIST_API_KEY in your environment, then:
from guestlist import check
results = check([
"https://news.ycombinator.com",
"https://nytimes.com",
])
for r in results:
print(r.domain, r.tier)Authentication
The library reads the API key from the GUESTLIST_API_KEY environment variable. You can also pass it explicitly to the Guestlist constructor:
from guestlist import Guestlist gl = Guestlist(api_key="gst_…")
Resolution order: constructor argument, then environment variable. If neither is set, a ConfigError is raised at construction time. All requests are sent with Authorization: Bearer <api_key>.
check(urls)
Module-level convenience function. Lazily constructs a default Guestlist client on first call and forwards.
from guestlist import check results = check(["https://example.com", "https://example.org"])
Accepts any iterable of URL strings and returns a list[Result] in input order. Empty input returns an empty list without an HTTP request. The library auto-batches into 100-URL chunks under the hood, so passing 1,000 URLs is one call from your perspective.
Guestlist class
Construct your own client when you need to override the base URL, set a custom timeout, or manage multiple clients in one process.
from guestlist import Guestlist
gl = Guestlist(
api_key="gst_…", # or omit to read GUESTLIST_API_KEY
base_url="https://api.guestlist.tools", # default
timeout=30.0, # seconds, per HTTP request
)
results = gl.check(urls)
gl.close()The client is safe to use as a context manager; the __exit__ handler calls close().
with Guestlist() as gl:
results = gl.check(urls)Result
One frozen dataclass per input URL, in input order.
@dataclass(frozen=True)
class Result:
url: str
domain: str
tier: Tier
success_rate: float | None
n_samples: int
confidence: float
blocker_detected: Blocker | None
last_tested_at: datetime | Noneurl— the URL you sent in, echoed back.domain— the registrable domain we matched against (e.g.x.com).tier— the verdict. See Tiers.success_rate— rolling fraction of recent probes that succeeded for that domain.Nonewhen there is no data yet.n_samples— how many probes back the verdict.confidence— scales with sample count, capped at 1.0.blocker_detected— see Blockers.Nonewhen nothing specific was identified.last_tested_at— timezone-aware UTC datetime of the most recent probe.Nonewhen there is no data yet.
When the service has no data for a domain, you will get tier=Tier.UNKNOWN, success_rate=None, n_samples=0, confidence=0.0, blocker_detected=None, and last_tested_at=None. This is common in the first weeks after launch for less-trafficked domains.
Tiers
Tier is a str enum, so comparing to the raw string works:
if r.tier == "green":
...
if r.tier in (Tier.GREEN, Tier.YELLOW):
...| Tier | Meaning |
|---|---|
| green | Almost always succeeds. Use confidently. |
| yellow | Usually succeeds. Expect occasional retries. |
| orange | Sometimes succeeds. Use a stealth or proxy fallback if you have one. |
| red | Rarely succeeds. Skip — it's blocked, not flaky. |
| unknown | Not enough probes yet. Try once, don't loop. |
Blockers
Blocker is a str enum naming the bot-protection vendor or failure mode last observed for a domain. blocker_detected is None when nothing specific was identified.
- cloudflare
- akamai
- datadome
- imperva
- perimeterx
- connection_failed
- unknown
URL matching
URLs are resolved server-side to their registrable domain; the verdict is for that domain's apex page. The path, query, fragment, and subdomain are not part of the lookup today.
https://api.x.com/users/123 → x.com https://www.instagram.com → instagram.com https://m.bbc.co.uk/world → bbc.co.uk
So www., m., and other subdomains collapse to the registrable domain, but distinct effective TLDs are kept separate: bbc.co.uk ≠ bbc.com. The Result.domain field shows exactly what was matched.
One consequence to keep in mind: a site like Instagram has a hard apex login wall but many deep public paths load fine — the tier reflects the apex, not the deep page. Per-path tiering is on the roadmap.
Errors
All exceptions inherit from GuestlistError. Catch the base class for a single handler, or catch the subclasses for finer control.
GuestlistError # base class ├── ConfigError # missing api_key, bad base_url ├── AuthenticationError # HTTP 401 ├── RateLimitError # HTTP 429 after retry; carries .retry_after ├── APIError # other 4xx/5xx after retries; carries .status_code, .detail └── NetworkError # connection or timeout error after retry
from guestlist import check, GuestlistError, RateLimitError
try:
results = check(urls)
except RateLimitError as e:
sleep(e.retry_after or 60)
except GuestlistError as e:
log.warning("guestlist failed: %s", e)Retries & timeouts
The client handles transient failures automatically before raising:
- 429 Rate limit — one retry, honoring
Retry-Afterup to 60 seconds, thenRateLimitError. - 5xx Server error — two retries with exponential backoff (250 ms, then 1 s), then
APIError. - Network or timeout — one retry, then
NetworkError. - Other 4xx — raised immediately, no retry.
Default request timeout is 30 seconds. Override with Guestlist(timeout=…).