Production chatbot (FastAPI) - Tex | Memory API for agents

This recipe puts the quickstart loop behind one HTTP route:

Recall memory for the incoming message.
Call your model.
Store the user and assistant turns.

Create one Tex client per process and reuse it across requests.

Layout

File	Job
`deps.py`	Cached Tex
`chat.py`	`/chat`
`main.py`	App entry

Lay out the package

Create this structure:

app/
├── deps.py        # cached Tex client
├── main.py        # FastAPI entry
└── chat.py        # /chat route

Rename app/ if you want. Keep the import paths consistent in uvicorn.

Cache your Tex client

Read secrets from the environment and construct Tex once:

deps.py

from functools import cache
from tex import Tex
import os

@cache
def tex_client() -> Tex:
    return Tex(
        api_key=os.environ["TEX_API_KEY"],
        base_url=os.environ.get(
            "TEX_BASE_URL", "https://api.getmetacognition.com"
        ),
        timeout=10,
    )

Use tex_client() inside FastAPI Depends(...) so every route shares the same connection pool.

Build the chat route

Derive session_id from the user and the chat. Recall with a small top_k. If Tex times out or quota is exhausted, answer without memory. Then store both sides of the turn:

chat.py

from datetime import datetime, timezone
from fastapi import APIRouter, Depends, Header
from pydantic import BaseModel
from tex import Tex, RateLimitError, APITimeoutError
from .deps import tex_client

router = APIRouter()

def now_iso() -> str:
    return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

class ChatBody(BaseModel):
    text: str
    session_id: str

@router.post("/chat")
def chat(
    body: ChatBody,
    x_user_id: str = Header(...),
    tex: Tex = Depends(tex_client),
):
    sid = f"u_{x_user_id}-{body.session_id}"

    memory: list[str] = []
    try:
        hits = tex.recall(q=body.text, session_id=sid, top_k=5)
        memory = [h.text for h in hits.hits.turns]
    except (RateLimitError, APITimeoutError):
        memory = []

    answer = your_llm.complete(body.text, memory=memory)

    tex.conversations.remember(
        session_id=sid,
        turns=[
            {"role": "user", "text": body.text, "timestamp": now_iso()},
            {"role": "assistant", "text": answer, "timestamp": now_iso()},
        ],
    )

    return {"answer": answer}

Replace your_llm.complete(...) with your model call.

Expose the app

Mount the router once:

main.py

from fastapi import FastAPI
from .chat import router

app = FastAPI()
app.include_router(router)

Run locally

Export your key and launch uvicorn:

export TEX_API_KEY=tex_live_...
uvicorn app.main:app --reload

Send POST /chat with JSON {"text":"...","session_id":"..."} and header x-user-id.

Full files

If you prefer one copy block, paste these files:

from functools import cache
from tex import Tex
import os

@cache
def tex_client() -> Tex:
    return Tex(
        api_key=os.environ["TEX_API_KEY"],
        base_url=os.environ.get(
            "TEX_BASE_URL", "https://api.getmetacognition.com"
        ),
        timeout=10,
    )

Production tweaks

Run `remember` in the background

Do not make the user wait for remember. Enqueue it in the background:

from fastapi import BackgroundTasks

@router.post("/chat")
def chat(
    body: ChatBody,
    bg: BackgroundTasks,
    x_user_id: str = Header(...),
    tex: Tex = Depends(tex_client),
):
    # ... recall + answer ...
    bg.add_task(
        tex.conversations.remember,
        session_id=sid,
        turns=[user_turn, assistant_turn],
    )
    return {"answer": answer}

Bound recall latency

Set Tex(timeout=2.0) and catch APITimeoutError. If recall is slow, answer without memory instead of blocking the whole chat request.

Add a health probe

@app.get("/healthz")
def healthz(tex: Tex = Depends(tex_client)):
    try:
        tex.usage.today()
        return {"ok": True}
    except Exception as e:
        return {"ok": False, "error": str(e)}, 503

​Layout

​Full files

​Production tweaks

​Run remember in the background

​Bound recall latency

​Add a health probe