Add an AI Gateway
You’ve now wired Durable Objects, hibernatable WebSockets, a container, and a Workflow into your Worker. The last piece in the Cloudflare track is an AI Gateway — a stable account-scoped endpoint that fronts every model provider (Workers AI, OpenAI, Anthropic, Bedrock, …) and gives you caching, rate limiting, retries, DLP, and a single dashboard of every request, token, and cost.
In Alchemy the gateway is a single resource. Once you’ve declared and
bound it, .model({...}) returns an Effect LanguageModel.LanguageModel
Layer — and from there you’re using the same generateText /
streamText APIs you’d use against any other provider.
Declare the gateway
Section titled “Declare the gateway”Create src/AiGateway.ts with a single resource definition. The
two flags below enable response caching (60-second TTL) and request
logging — every prompt, completion, latency, and token count will
show up in the AI Gateway dashboard.
import * as Cloudflare from "alchemy/Cloudflare";
export const Gateway = Cloudflare.AiGateway("Gateway", { cacheTtl: 60, collectLogs: true,});Every prop is optional, but explicit defaults make the intent visible. We’ll tune more knobs at the end of the tutorial.
Add it to the stack
Section titled “Add it to the stack”import { Gateway } from "./src/AiGateway.ts"; import Api from "./src/Api.ts";
export default Alchemy.Stack( "CloudflareWorkerExample", { providers: Cloudflare.providers(), state: Cloudflare.state() }, Effect.gen(function* () { const api = yield* Api; const gateway = yield* Gateway;
return { url: api.url.as<string>(), gatewayId: gateway.gatewayId, }; }), );yield* Gateway registers the resource so it gets created/updated on
the next deploy. gateway.gatewayId is exposed as a stack output so
you can find it in the dashboard.
Bind the gateway into the Worker
Section titled “Bind the gateway into the Worker”Cloudflare.AiGateway.bind(Gateway) returns a typed client whose
methods are wrapped in Effect — run for raw inference, getUrl for
the gateway endpoint, getLog/patchLog for the request log, and
model({...}) for building a LanguageModel layer.
import * as Cloudflare from "alchemy/Cloudflare"; import * as Effect from "effect/Effect";import { Gateway } from "./AiGateway.ts";
export default class Api extends Cloudflare.Worker<Api>()( "Api", { main: import.meta.path }, Effect.gen(function* () { const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
return { fetch: Effect.gen(function* () { // …existing routes }), }; }), }).pipe(Effect.provide(Cloudflare.AiGatewayBindingLive)), ) {}Cloudflare.AiGatewayBindingLive is the runtime side of the
binding. Provide it once at the bottom of the Init layer chain and
every bind(...) further up will resolve.
Build a LanguageModel layer
Section titled “Build a LanguageModel layer”Call aiGateway.model({...}) with a Workers AI model id and
parameters. The result is a Layer<LanguageModel.LanguageModel, …>
that satisfies Effect’s standard AI service.
Effect.gen(function* () { const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
const languageModel = aiGateway.model({ client: aiGateway, model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast", parameters: { temperature: 0.7, maxTokens: 1024 }, });
return { fetch: Effect.gen(function* () { // … }), };})parameters is the per-call default; any individual call can still
override it. The Init phase is the right place to build this layer —
construction is pure and the binding factory only exists here.
Generate text on /generate
Section titled “Generate text on /generate”LanguageModel.generateText returns a typed response with text,
finishReason, structured token usage, and any toolCalls. Provide
the languageModel layer to the handler and call it like any other
Effect.
import { LanguageModel } from "effect/unstable/ai";import { HttpServerRequest } from "effect/unstable/http/HttpServerRequest";import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";
return { fetch: Effect.gen(function* () { const request = yield* HttpServerRequest; const url = new URL(request.url, "http://api");
if (url.pathname === "/generate" && request.method === "POST") { const body = (yield* request.json) as { prompt?: string }; const prompt = body.prompt?.trim() ?? "Say hello in one sentence.";
const response = yield* LanguageModel.generateText({ prompt }).pipe( Effect.orDie, );
return yield* HttpServerResponse.json({ text: response.text, usage: { inputTokens: response.usage.inputTokens.total, outputTokens: response.usage.outputTokens.total, }, }); }
return HttpServerResponse.text("Not Found", { status: 404 }); }), }).pipe(Effect.provide(languageModel)), };Effect.provide(languageModel) makes LanguageModel.LanguageModel
available to every handler in fetch. Effect.orDie collapses
AiError to a defect so a model failure surfaces as a 500 — if you
need typed handling instead, Effect.catchTag("AiError", …) works.
Try it
Section titled “Try it”bun alchemy deploycurl -X POST "$(bun alchemy stack output url)/generate" \ -H "content-type: application/json" \ -d '{"prompt":"Write a haiku about Effect"}'The first call takes ~1–2 seconds. Send the exact same prompt
again and it returns in milliseconds — that’s the cacheTtl: 60
config doing its job. Open the Cloudflare dashboard → AI →
AI Gateway → your gateway and you’ll see both requests, with the
second flagged as a cache hit, plus latency and token usage on every
entry.
Stream tokens on /stream
Section titled “Stream tokens on /stream”For chat-style UIs you want tokens to arrive as the model produces
them, not in one big response. LanguageModel.streamText returns an
Effect.Stream of typed response parts — text-delta, tool-call,
finish, and so on. Pipe it through Sse.encode and
HttpServerResponse.stream to get a server-sent-event stream that
flushes through the Worker → edge → client without buffering.
import * as Stream from "effect/Stream";import * as Sse from "effect/unstable/encoding/Sse";
if (url.pathname === "/stream" && request.method === "POST") { const body = (yield* request.json) as { prompt?: string }; const prompt = body.prompt?.trim() ?? "Tell me a haiku about Effect.";
const stream = LanguageModel.streamText({ prompt }).pipe( Stream.provide(languageModel), Sse.encode, );
return HttpServerResponse.stream(stream, { headers: { "content-type": "text/event-stream", "cache-control": "no-cache", "x-accel-buffering": "no", }, }); }Stream.provide(languageModel) is the stream-aware equivalent of
Effect.provide — the LanguageModel needs to be available for the
entire lifetime of the stream, not just the initial setup.
curl -N -X POST "$(bun alchemy stack output url)/stream" \ -H "content-type: application/json" \ -d '{"prompt":"Write a haiku about Effect, slowly."}'-N disables curl’s response buffering so you see each SSE event as
soon as the Worker flushes it.
Tune caching, rate limits, and DLP
Section titled “Tune caching, rate limits, and DLP”Every prop on Cloudflare.AiGateway maps to an update API call — no
replacement, no downtime. A production-grade config might look like:
export const Gateway = Cloudflare.AiGateway("Gateway", { id: "prod-gateway", cacheTtl: 300, cacheInvalidateOnUpdate: true, rateLimitingInterval: 60, rateLimitingLimit: 100, rateLimitingTechnique: "sliding", collectLogs: true, logManagement: 100_000, logManagementStrategy: "DELETE_OLDEST", authentication: true,});Bumping cacheTtl, rateLimitingLimit, or toggling authentication
is a single bun alchemy deploy away — the diff updates the gateway
in place.
What you have now
Section titled “What you have now”- A
Cloudflare.AiGatewayresource with caching, logging, and (when you want them) rate limiting and auth. - An Effect
LanguageModel.LanguageModelLayer that proxies Workers AI through that gateway. /generateand/streamroutes that use the standardLanguageModel.generateTextandstreamTextAPIs — provider- agnostic, so swapping Workers AI for OpenAI or Anthropic later is a layer-level change, not a code-level one.
The natural next step is to give the model memory. The Add a
Chat Agent tutorial wraps the same
LanguageModel with Chat.Service and stores the conversation in a
Durable Object’s state.storage, so every session is resumable
across requests, restarts, and hibernation.
For the wider API surface — generateObject, Toolkit, structured
outputs, and how the same LanguageModel composes inside an HTTP
API or RPC handler — see the Effect AI guide.