Add a Chat Agent
The AI Gateway tutorial wired a
LanguageModel against your gateway and called generateText /
streamText from a single Worker route. Real chat apps need one
more thing: somewhere to keep the conversation between turns. This
tutorial wires that piece — a Durable Object whose state.storage
backs Effect’s Chat.Persistence, so each chat instance is its own
resumable session.
Reuse the gateway as a LanguageModel
Section titled “Reuse the gateway as a LanguageModel”The same aiGateway.model({...}) call from the previous tutorial
gives us a Layer<LanguageModel.LanguageModel, …> — provider-
agnostic, so the chat code below works against any Effect AI provider
once you swap the layer.
import * as Cloudflare from "alchemy/Cloudflare";import { Gateway } from "./AiGateway.ts";
const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);const languageModel = aiGateway.model({ client: aiGateway, model: "@cf/meta/llama-3.1-8b-instruct", parameters: { temperature: 0.7, maxTokens: 1024 },});Generate one-shot text
Section titled “Generate one-shot text”Provide the layer to a route and call LanguageModel.generateText:
import * as Cloudflare from "alchemy/Cloudflare";import * as Effect from "effect/Effect";import { LanguageModel } from "effect/unstable/ai";import { HttpServerRequest } from "effect/unstable/http/HttpServerRequest";import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";import { Gateway } from "./AiGateway.ts";
export default class Api extends Cloudflare.Worker<Api>()( "Api", { main: import.meta.path }, Effect.gen(function* () { const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway); const languageModel = aiGateway.model({ client: aiGateway, model: "@cf/meta/llama-3.1-8b-instruct", });
return { fetch: Effect.gen(function* () { const request = yield* HttpServerRequest; const url = new URL(request.url, "http://api"); const prompt = url.searchParams.get("prompt") ?? "Say hello.";
if (url.pathname === "/generate") { const response = yield* LanguageModel.generateText({ prompt }).pipe( Effect.orDie, ); return yield* HttpServerResponse.json({ text: response.text }); }
return HttpServerResponse.text("Not Found", { status: 404 }); }).pipe(Effect.provide(languageModel)), }; }).pipe(Effect.provide(Cloudflare.AiGatewayBindingLive)),) {}Effect.provide(languageModel) installs the model layer for the
fetch handler. From there LanguageModel.generateText (and
streamText) Just Work — typed errors, structured response parts,
and full Effect composition.
Stateful chat needs persistence
Section titled “Stateful chat needs persistence”A one-shot generateText forgets everything after it returns. To
have a conversation, you need to keep [user, assistant, user, …]
turns and replay them on each request. Effect ships exactly that
abstraction — Chat.Service — but it needs somewhere to store the
history. The natural fit on Cloudflare is a Durable Object: each DO
instance is one chat session, and its state.storage survives
restarts and hibernation.
Alchemy provides Cloudflare.DurableObjectChatPersistence — a Layer
that satisfies Effect’s BackingPersistence interface using the
DO’s own storage. Stack Chat.layerPersisted on top and you have a
fully persistent chat store with no extra infrastructure.
Create the ChatAgent DO
Section titled “Create the ChatAgent DO”Create src/ChatAgent.ts. The outer init phase binds the AI
Gateway and constructs the model layer:
import * as Cloudflare from "alchemy/Cloudflare";import * as Effect from "effect/Effect";import { Gateway } from "./AiGateway.ts";
export default class ChatAgent extends Cloudflare.DurableObjectNamespace<ChatAgent>()( "ChatAgent", Effect.gen(function* () { const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway); const languageModel = aiGateway.model({ client: aiGateway, model: "@cf/meta/llama-3.1-8b-instruct", parameters: { temperature: 0, maxTokens: 256 }, });
return Effect.gen(function* () { // per-instance setup — added below return {}; }); }),) {}The outer Effect.gen runs once when the DO class is bound to a
Worker; the inner one runs every time a new DO instance is
constructed.
Wire persistence in the instance phase
Section titled “Wire persistence in the instance phase”Inside the inner Effect, stack the persistence layers and resolve a
single Chat for this DO instance:
return Effect.gen(function* () { const persistence = yield* Chat.Persistence; const chat = yield* persistence.getOrCreate("session"); return {};}).pipe( Effect.provide( Layer.mergeAll( languageModel, Chat.layerPersisted({ storeId: "chat" }).pipe( Layer.provideMerge(Cloudflare.DurableObjectChatPersistence), ), ), ),);The layer chain has three pieces:
DurableObjectChatPersistence— produces aBackingPersistencefromstate.storage.Chat.layerPersisted({ storeId: "chat" })— produces aChat.Persistencekeyed by chat id, persisted via the backing.languageModel— the Workers AILanguageModel.LanguageModel.
Layer.provideMerge plugs DurableObjectChatPersistence into
Chat.layerPersisted while still exposing BackingPersistence to
the handler (the reset helper reads it to hard-delete a thread).
Layer.mergeAll then composes that with the independent
languageModel, and a single Effect.provide(...) installs the
whole thing on the handler.
persistence.getOrCreate("session") looks up the chat by id; on
the first call it creates a fresh empty one and saves it, on every
subsequent call it rehydrates the saved history from
state.storage. The DO instance itself is the session boundary —
each unique DO name is its own conversation.
Expose a send RPC method
Section titled “Expose a send RPC method”Add a method that takes a prompt, runs generateText against the
persisted chat, and returns the new assistant message plus the
conversation length:
return Effect.gen(function* () { const persistence = yield* Chat.Persistence; const chat = yield* persistence.getOrCreate("session"); return {}; return { send: (prompt: string) => Effect.gen(function* () { const response = yield* chat.generateText({ prompt }); const history = yield* Ref.get(chat.history); return { text: response.text, turns: history.content.length }; }).pipe(Effect.orDie), };}).pipe( Effect.provide( Layer.mergeAll( languageModel, Chat.layerPersisted({ storeId: "chat" }).pipe( Layer.provideMerge(Cloudflare.DurableObjectChatPersistence), ), ), ),);chat.generateText does three things in one call: appends the user
message to history, calls the language model, appends the assistant
response, and saves the updated history back through
Chat.Persistence. Crash mid-flight, restart the DO, send another
prompt, and the next turn picks up exactly where the last completed
one left off.
Bind the DO and route to it
Section titled “Bind the DO and route to it”In your Worker’s init phase, yield the DO class to register the
binding, then forward /chat?id=… to the matching instance:
import * as Cloudflare from "alchemy/Cloudflare";import ChatAgent from "./ChatAgent.ts";
export default class Api extends Cloudflare.Worker<Api>()( "Api", { main: import.meta.path }, Effect.gen(function* () { const chatAgents = yield* ChatAgent;
return { fetch: Effect.gen(function* () { const request = yield* HttpServerRequest; const url = new URL(request.url, "http://api");
if (url.pathname === "/chat") { const id = url.searchParams.get("id") ?? "default"; const prompt = url.searchParams.get("prompt") ?? ""; const result = yield* chatAgents.getByName(id).send(prompt); return yield* HttpServerResponse.json(result); }
// …existing routes }), }; }),) {}chatAgents.getByName(id) returns a typed RPC stub —
send(prompt): Effect<{ text: string; turns: number }> — and the
runtime ferries the call to the DO whose name matches id. Each
unique id is a separate conversation backed by its own DO storage.
Try it
Section titled “Try it”Deploy and have a two-turn conversation against the same chat id:
bun alchemy deploy
curl "$(bun alchemy stack output url)/chat?id=alice&prompt=My%20name%20is%20Sam.%20Remember%20it."# → {"text":"Got it, Sam!","turns":2}
curl "$(bun alchemy stack output url)/chat?id=alice&prompt=What%20is%20my%20name%3F"# → {"text":"Your name is Sam.","turns":4}The second request lands in the same DO instance as the first
(idFromName collisions are deterministic). Even if Cloudflare had
hibernated the DO between the two calls, the persisted history is
read back from state.storage before the model runs — the model
sees the full conversation, not just the latest prompt.
Hit /chat?id=bob&prompt=… and you’re talking to a brand-new
conversation in a brand-new DO. Each id is a session; each session
is durable; no extra database, queue, or KV store needed.
What you have now
Section titled “What you have now”- A Workers AI
LanguageModelexposed through your AI Gateway, with caching, rate limiting, and logs intact. - A
Chat.Persistencelayered on top ofstate.storageviaDurableObjectChatPersistence— drop-in compatible with every other Effect AI helper (generateObject, tool-calls, …). - A typed
ChatAgentDO with a singlesendRPC method that does one full conversation turn — load history, generate, save.
Every layer in this stack is independently swappable: replace
aiGateway.model({...}) with OpenAiLanguageModel.layer({...}) and
you’re talking to GPT instead, without touching the persistence
code; swap DurableObjectChatPersistence for
Persistence.layerBackingMemory
in tests and you keep history in-process; add tools to the chat by
passing a toolkit to generateText. The DO body doesn’t need to
change for any of those.