Beta APIThis API is in beta stage and may have breaking changes. Use with caution in production environments.
Base URL
Authentication
All requests require authentication using your Anannas API key:Core Features
- Stateless Design: Each request is independent - you manage conversation history client-side
- Multimodal Support: Handle various input types, including text, images, and audio
- Reasoning Capabilities: Access advanced reasoning with configurable effort levels
- Tool Integration: Utilize function calling with support for parallel execution
- Streaming Support: Receive responses in real-time as they’re generated
Managing Conversation HistorySince this API is stateless, you must include the complete conversation history in each request. Include all previous user messages and assistant responses in the
input array to maintain context.Request Format
The Responses API uses a structured request format with aninput array containing conversation messages:
Response Format
The Responses API returns a structured response with anoutput array:
Input Format
Theinput field accepts either a simple string or an array of message objects:
Simple String Input:
Output Format
Theoutput array contains one or more output items. Each item can be:
- Message: Text response from the model
- Function Call: Tool/function invocation
- Error: Error information
Prompt Caching
Prompt caching allows you to reduce costs and latency by reusing cached prompt prefixes. Anannas supports two caching methods depending on the provider:OpenAI Models (prompt_cache_key)
For OpenAI models, use the prompt_cache_key parameter:
- Cache reads: Cached input tokens are billed at 50% of the original input token price
- Cache writes: No additional cost for creating the cache
- First request with a
prompt_cache_keycreates the cache - Subsequent requests with the same key reuse the cached prefix
- Cache is automatically managed by the provider
Anthropic Models (cache_control)
For Anthropic Claude models, use the cache_control object in message content:
- Cache creation: Cache creation tokens are billed at 1.25x (125%) of the original input token price
- Cache reads: Cached input tokens are billed at 0.1x (10%) of the original input token price - a 90% discount
- Maximum of 4 content blocks can have
cache_controlper request - Cache expires after 5 minutes (TTL:
"5m") cache_controlmust be added to individual content parts within messages- Only
"ephemeral"cache type is supported
Other Providers
- Grok: Supports caching similar to OpenAI with cached tokens at 10% of input price (90% discount)
- Nebius: Supports caching with provider-specific pricing
- TogetherAI: Supports caching with provider-specific pricing
Monitoring Cache Usage
Cache usage is included in the responseusage object:
cache_read_input_tokens: Number of tokens read from cache (discounted pricing)cache_creation_input_tokens: Number of tokens used to create the cache (Anthropic only, 1.25x pricing)
Error Handling
The API returns standard HTTP status codes and error responses:400 Bad Request: Invalid request parameters401 Unauthorized: Missing or invalid API key429 Too Many Requests: Rate limit exceeded500 Internal Server Error: Server error
Rate Limits
Rate limits are applied per API key. See the Limits documentation for details.Next Steps
- Learn basic usage with simple text requests
- Explore streaming responses for real-time interactions
- Integrate tool calling for function execution
- Configure reasoning capabilities for advanced problem-solving
Was this page helpful?