← Ascendy 한국어

backend

Only one model kept 404'ing — a preview-alias time bomb meets branch asymmetry

· Ascendy Engineering


TL;DR

Background — why does only one die

We had an agent chat that picks among several LLM providers. But one model — only that one — 404’d every time you picked it. The rest worked fine.

Only one model fails” is itself a strong clue. If the shared path (auth, network, request format) were broken, usually everything breaks. One breaking alone means that model takes a different branch.

Cause 1 — branch asymmetry (fall-through)

The culprit was in the routing. The runtime model-selection middleware explicitly mapped a client only for some providers; the problematic one had no mapping. It returned None, which fell through — with no override — to the agent’s base model.

# Branch asymmetry: some providers explicitly mapped, the rest → None → base model
def _resolve_client(model_name: str):
    if model_name in PROVIDER_A_ALIASES:
        return client_a
    if model_name in PROVIDER_B_ALIASES:
        return client_b
    if model_name in PROVIDER_C_ALIASES:
        return client_c
    return None  # an unmapped provider → None → falls through to the agent's base model

The other providers were explicitly mapped, so they went safely to their own client. Only the one missing a mapping was exposed directly to the base model id — and that base id was the problem.

Cause 2 — the base was a preview alias

The base model id the fall-through reached was pinned to a preview alias. Preview aliases can be deprecated and shut down on the provider’s schedule (ours was retired after a GA successor shipped). Once shut down, calls to that id return 404 NOT_FOUND (“no longer available”).

In our case, the provider on that fall-through path was Gemini. A preview model was promoted to GA, the old preview alias disappeared (model lifecycle is documented publicly), and that one path died with a 404. The preview id we’d hardcoded was a time bomb.

The fix — a GA id + a regression guard

The immediate fix was a one-liner: replace the base with a GA id. On top of that, we swept the codebase and replaced models from a soon-to-sunset generation too.

The key part is the regression guard we added next. We did not pin “the currently correct model id” — that id will itself be retired someday, so pinning it makes the test break at every generation bump (brittle). Instead we check a forbidden pattern.

# Regression guard: don't pin the right id, forbid the retire/preview pattern
for client in configured_model_clients:
    mid = model_id(client)
    assert "preview" not in mid          # forbid preview aliases that can be deprecated/shut down
    assert SUNSET_GENERATION not in mid  # forbid a soon-to-be-retired generation

Model ids are inherently perishable. This guard doesn’t catch every perishable id — it blocks known risky patterns (preview aliases, a specific sunset generation) and complements the provider’s lifecycle monitoring. Still, blocking that pattern without breaking on every generation bump beats pinning the exact answer.

How to avoid the same trap next time

What’s next


Authorship & citation: This post was written by Ascendy Engineering and may be re-cited with attribution. If you find an error, please let us know via a GitHub issue.


Tags: llm, model-lifecycle, regression-testing, incident-prevention, multi-provider