Workers AI

Use AI Gateway for analytics, caching, and security on requests to Workers AI.

REST API

To interact with a REST API, update the URL used for your request:

Previous: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model_id}
New: https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/{model_id}

For these parameters:

{account_id} is your Cloudflare account ID.
{gateway_id} refers to the name of your existing AI Gateway.
{model_id} refers to the model ID of the Workers AI model.

Examples

First, generate an API token with Workers AI Read access and use it in your request.

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
 --header 'Authorization: Bearer {cf_api_token}' \
 --header 'Content-Type: application/json' \
 --data '{"prompt": "What is Cloudflare?"}'

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/huggingface/distilbert-sst-2-int8 \
  --header 'Authorization: Bearer {cf_api_token}' \
  --header 'Content-Type: application/json' \
  --data '{ "text": "Cloudflare docs are amazing!" }'

OpenAI compatible endpoints

Workers AI supports OpenAI compatible endpoints for text generation (/v1/chat/completions) and text embedding models (/v1/embeddings). This allows you to use the same code as you would for your OpenAI commands, but swap in Workers AI easily.

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/v1/chat/completions \
 --header 'Authorization: Bearer {cf_api_token}' \
 --header 'Content-Type: application/json' \
 --data '{
      "model": "@cf/meta/llama-3.1-8b-instruct",
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
'

Worker

To include an AI Gateway within your Worker, add the gateway as an object in your Workers AI request.

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run(
      "@cf/meta/llama-3.1-8b-instruct",
      {
        prompt: "Why should you use Cloudflare for your AI inference?",
      },
      {
        gateway: {
          id: "{gateway_id}",
          skipCache: false,
          cacheTtl: 3360,
        },
      },
    );
    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;

Workers AI supports the following parameters for AI gateways:

id string
- Name of your existing AI Gateway. Must be in the same account as your Worker.
skipCache boolean(default: false)
- Controls whether the request should skip the cache.
cacheTtl number
- Controls the Cache TTL.

Cloudflare Dashboard Discord Community Learning Center Support Portal