Skip to main content
The Anannas Responses API provides a unified interface for building advanced AI agents capable of executing complex tasks autonomously. This API is compatible with OpenAI’s Responses API format and supports multimodal inputs, reasoning capabilities, and seamless tool integration.
Stateless API ImplementationThis API implements the stateless version of the Responses API. Unlike OpenAI’s stateful Responses API, Anannas does not maintain conversation state between requests. Each request is completely independent, and you must include the full conversation history in every request.
Beta APIThis API is in beta stage and may have breaking changes. Use with caution in production environments.

Base URL

https://api.anannas.ai/api/v1/responses

Authentication

All requests require authentication using your Anannas API key:
import requests

response = requests.post(
    "https://api.anannas.ai/api/v1/responses",
    headers={
        "Authorization": "Bearer <ANANNAS_API_KEY>",
        "Content-Type": "application/json",
    },
    json={
        "model": "openai/gpt-5-mini",
        "input": "Hello, world!",
    },
)

Core Features

  • Stateless Design: Each request is independent - you manage conversation history client-side
  • Multimodal Support: Handle various input types, including text, images, and audio
  • Reasoning Capabilities: Access advanced reasoning with configurable effort levels
  • Tool Integration: Utilize function calling with support for parallel execution
  • Streaming Support: Receive responses in real-time as they’re generated
Managing Conversation HistorySince this API is stateless, you must include the complete conversation history in each request. Include all previous user messages and assistant responses in the input array to maintain context.

Request Format

The Responses API uses a structured request format with an input array containing conversation messages:
type ResponsesRequest = {
  // Required
  model: string;
  input: ResponsesMessage[];

  // Optional
  instructions?: string;
  response_format?: { type: 'json_object' };
  metadata?: { [key: string]: string };
  temperature?: number;
  max_output_tokens?: number;
  stream?: boolean;
  
  // Tool calling
  tools?: Tool[];
  tool_choice?: 'auto' | 'none' | { type: 'function'; name: string };
  parallel_tool_calls?: boolean;

  // Advanced parameters
  top_p?: number;
  top_k?: number;
  frequency_penalty?: number;
  presence_penalty?: number;
  stop?: string[];
  seed?: number;

  // Reasoning
  reasoning?: {
    effort?: 'minimal' | 'low' | 'medium' | 'high';
    max_tokens?: number;
    exclude?: boolean;
  };

  // Anannas-specific
  provider?: ProviderPreferences;
  modalities?: string[];
  audio?: AudioConfig;
  mcp?: MCPConfig;
  prompt_cache_key?: string;
};

Response Format

The Responses API returns a structured response with an output array:
type ResponsesResponse = {
  id: string;
  object: 'response';
  model: string;
  created: number;
  output: ResponsesOutputItem[];
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
  metadata?: { [key: string]: any };
};
Example Response:
{
  "id": "resp_abc123",
  "object": "response",
  "model": "openai/gpt-5-mini",
  "created": 1693350000,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! How can I help you today?"
        }
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Input Format

The input field accepts either a simple string or an array of message objects: Simple String Input:
{
  "model": "openai/gpt-5-mini",
  "input": "What is the capital of France?"
}
Structured Message Input:
{
  "model": "openai/gpt-5-mini",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "What is the capital of France?"
        }
      ]
    }
  ]
}

Output Format

The output array contains one or more output items. Each item can be:
  • Message: Text response from the model
  • Function Call: Tool/function invocation
  • Error: Error information
Message Output:
{
  "type": "message",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "output_text",
      "text": "The capital of France is Paris."
    }
  ]
}
Function Call Output:
{
  "type": "message",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "function_call",
      "function_call": {
        "id": "call_abc123",
        "name": "get_weather",
        "arguments": "{\"location\": \"Paris\"}"
      }
    }
  ]
}

Prompt Caching

Prompt caching allows you to reduce costs and latency by reusing cached prompt prefixes. Anannas supports two caching methods depending on the provider:

OpenAI Models (prompt_cache_key)

For OpenAI models, use the prompt_cache_key parameter:
{
  "model": "openai/gpt-5-mini",
  "input": "Hello, world!",
  "prompt_cache_key": "my-cache-key-123"
}
Pricing:
  • Cache reads: Cached input tokens are billed at 50% of the original input token price
  • Cache writes: No additional cost for creating the cache
How it works:
  1. First request with a prompt_cache_key creates the cache
  2. Subsequent requests with the same key reuse the cached prefix
  3. Cache is automatically managed by the provider

Anthropic Models (cache_control)

For Anthropic Claude models, use the cache_control object in message content:
{
  "model": "anthropic/claude-sonnet-4",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Your system instructions here...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "5m"
          }
        }
      ]
    }
  ]
}
Pricing:
  • Cache creation: Cache creation tokens are billed at 1.25x (125%) of the original input token price
  • Cache reads: Cached input tokens are billed at 0.1x (10%) of the original input token price - a 90% discount
Anthropic-specific requirements:
  • Maximum of 4 content blocks can have cache_control per request
  • Cache expires after 5 minutes (TTL: "5m")
  • cache_control must be added to individual content parts within messages
  • Only "ephemeral" cache type is supported
Example with system message:
{
  "model": "anthropic/claude-sonnet-4",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant. Always be concise.",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "5m"
          }
        },
        {
          "type": "text",
          "text": "What is 2+2?"
        }
      ]
    }
  ]
}

Other Providers

  • Grok: Supports caching similar to OpenAI with cached tokens at 10% of input price (90% discount)
  • Nebius: Supports caching with provider-specific pricing
  • TogetherAI: Supports caching with provider-specific pricing

Monitoring Cache Usage

Cache usage is included in the response usage object:
{
  "usage": {
    "prompt_tokens": 1000,
    "completion_tokens": 500,
    "total_tokens": 1500,
    "cache_read_input_tokens": 800,
    "cache_creation_input_tokens": 200
  }
}
  • cache_read_input_tokens: Number of tokens read from cache (discounted pricing)
  • cache_creation_input_tokens: Number of tokens used to create the cache (Anthropic only, 1.25x pricing)

Error Handling

The API returns standard HTTP status codes and error responses:
{
  "error": {
    "message": "Invalid request parameters",
    "type": "invalid_request_error",
    "code": "invalid_parameter"
  }
}
Common error codes:
  • 400 Bad Request: Invalid request parameters
  • 401 Unauthorized: Missing or invalid API key
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Server error

Rate Limits

Rate limits are applied per API key. See the Limits documentation for details.

Next Steps

Was this page helpful?