Provider compatibility
Arcade evaluations support both OpenAI and Anthropic. Each provider has different requirements for schemas and message formats.
Provider comparison
| Feature | OpenAI | Anthropic |
|---|---|---|
| Tool name rules | Alphanumeric, -, _ (max 64 chars) | Alphanumeric, _ only |
| Schema format | function.parameters (JSON Schema) | input_schema (JSON Schema) |
| Strict mode | Yes (opt-in via strict: true) | No (standard JSON Schema) |
| Optional params | Required list + null unions | Only required params in required |
| Message roles | system, user, assistant, tool, function | user, assistant (system separate) |
| Tool calling format | tool_calls array | tool_use content blocks |
Tool name normalization
Arcade uses dotted notation for names (e.g., Weather.GetCurrent), but providers don’t allow dots in function names.
How normalization works
names are automatically normalized:
from arcade_core.converters.utils import normalize_tool_name
normalize_tool_name("Weather.GetCurrent") # Returns: "Weather_GetCurrent"
normalize_tool_name("Google.Search") # Returns: "Google_Search"When models make calls, normalized names are resolved back to original names.
Denormalization is lossy
Reversing normalization can’t distinguish between original dots and underscores:
| Original Name | Normalized | Denormalized | Correct? |
|---|---|---|---|
Google.Search | Google_Search | Google.Search | ✅ |
My_Tool.Name | My_Tool_Name | My.Tool.Name | ❌ |
Tool_Name | Tool_Name | Tool.Name | ❌ |
Best practice: Use only dots OR only underscores in names, never both.
Name collision
Don’t register both dotted and underscore versions of the same name:
# ❌ Avoid this - creates collision
suite.add_tool_definitions([
{"name": "Weather.GetCurrent", ...},
{"name": "Weather_GetCurrent", ...}, # Collision!
])The registry accepts both formats for lookups but they resolve to the same internal name.
OpenAI strict mode
OpenAI’s strict mode enforces structured outputs by transforming JSON Schema. This happens automatically in evaluations.
Schema transformations
1. Unsupported keywords are stripped:
# Input schema
{
"type": "integer",
"minimum": 0,
"maximum": 100,
"default": 50
}
# Transformed for OpenAI
{
"type": ["integer", "null"]
}Stripped keywords:
- Validation:
minimum,maximum,minLength,maxLength,pattern,format - Metadata:
default,nullable,minItems,maxItems
2. Optional parameters become required with null unions:
# Input schema
{
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "default": "celsius"}
},
"required": ["city"]
}
# Transformed for OpenAI
{
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": ["string", "null"]} # Now in union with null
},
"required": ["city", "units"], # units added to required
"additionalProperties": false
}3. Enums are stringified:
# Input schema
{
"type": "integer",
"enum": [0, 1, 2]
}
# Transformed for OpenAI
{
"type": "string",
"enum": ["0", "1", "2"]
}4. Additional properties are forbidden:
All objects get "additionalProperties": false to enforce strict validation.
Why defaults still work
Even though default is stripped from schemas, defaults are still applied during evaluation. Here’s why:
- The evaluation framework stores the original schema with defaults
- OpenAI sends
nullfor optional parameters in strict mode - The framework applies defaults when args are missing OR null
# Model sends:
{"city": "Seattle", "units": null}
# Framework applies default:
{"city": "Seattle", "units": "celsius"}This behavior ensures consistent evaluation results regardless of provider.
Anthropic schema format
Anthropic uses standard JSON Schema with minimal transformation.
Key differences from OpenAI
1. Schema field name:
# OpenAI format
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {...} # ← Note: "parameters"
}
}
# Anthropic format
{
"name": "get_weather",
"input_schema": {...} # ← Note: "input_schema"
}2. No strict mode transformations:
Anthropic accepts the schema as-is:
- Validation keywords are preserved
- Optional params stay optional
- Enums keep original types
- Defaults are kept (but not sent to model)
3. Only required params in required list:
{
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "default": "celsius"}
},
"required": ["city"] # Only city is required
}Message format conversion
Arcade evaluations use OpenAI message format internally. When using Anthropic, messages are converted automatically.
System messages
OpenAI:
[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello"},
]Anthropic:
# system → separate parameter
system = "You are helpful."
messages = [
{"role": "user", "content": "Hello"},
]Tool calls
OpenAI:
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "Seattle"}'
}
}
]
}Anthropic:
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "call_123",
"name": "get_weather",
"input": {"city": "Seattle"}
}
]
}Tool results
OpenAI:
{
"role": "tool",
"tool_call_id": "call_123",
"content": "Sunny, 72°F"
}Anthropic:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "call_123",
"content": "Sunny, 72°F"
}
]
}Message conversion happens automatically. You don’t need to handle it manually.
Writing provider-agnostic evaluations
Follow these guidelines to ensure evaluations work with both providers:
1. Use simple tool names
Prefer names without dots or underscores:
# ✅ Good
"GetWeather"
"SearchGoogle"
"SendMessage"
# ⚠️ Acceptable (use only one separator)
"Weather.GetCurrent"
"Google.Search"
# ❌ Avoid (mixing separators)
"My_Tool.GetData"
"Tool_Name.With_Mixed"2. Use MCP-style tool definitions
Define tools using format:
{
"name": "GetWeather",
"description": "Get current weather for a city",
"inputSchema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}3. Don’t rely on strict mode behavior
Don’t assume specific schema transformations:
# ❌ Don't rely on null unions
{
"type": ["string", "null"] # Only in OpenAI strict mode
}
# ✅ Use optional parameters
{
"type": "string"
}
# In required list: OpenAI adds null union, Anthropic keeps as-is
# Not in required list: Both treat as optional4. Handle optional parameters consistently
Use defaults for optional parameters:
{
"type": "object",
"properties": {
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"default": "celsius"
}
},
"required": []
}Both providers will apply the default when the parameter is missing.
Testing with multiple providers
Run evaluations with both providers to verify compatibility:
arcade evals . \
--use-provider openai:gpt-4o \
--use-provider anthropic:claude-sonnet-4-5-20250929Comparing results
Results show provider-specific behavior:
Suite: Weather Tools
Case: Get weather for city
Model: gpt-4o -- Score: 1.00 -- PASSED
Model: claude-sonnet-4-5-20250929 -- Score: 1.00 -- PASSEDCommon differences
Parameter handling:
OpenAI might send:
{"city": "Seattle", "units": null}Anthropic might send:
{"city": "Seattle"}Both are evaluated identically because defaults are applied.
name format:
Both providers see normalized names (Weather_GetCurrent), but your test expectations use original names (Weather.GetCurrent).
Common pitfalls
Avoid these common mistakes when working with multiple providers:
-
Using dots and underscores together
Python# ❌ Don't mix separators "My_Tool.GetData" # ✅ Use one consistently "MyTool.GetData" or "MyTool_GetData" -
Relying on specific schema transformations
Python# ❌ OpenAI-specific null unions {"type": ["string", "null"]} # ✅ Use optional parameters {"type": "string"} # Not in required list -
Forgetting to test with both providers
Terminal# ✅ Always test both arcade evals . \ --use-provider openai:gpt-4o \ --use-provider anthropic:claude-sonnet-4-5-20250929
Troubleshooting
Tool name mismatch
Symptom: Evaluation reports “ not found”
Solution: Check if name uses dots. The normalized name (with underscores) should match:
# Original: "Weather.GetCurrent"
# Normalized: "Weather_GetCurrent"
# Expected: ExpectedMCPToolCall("Weather_GetCurrent", {...})Schema validation errors
Symptom: OpenAI returns validation errors
Solution: Check if your schema uses unsupported strict mode keywords. These are automatically stripped, but might affect expected behavior.
Missing optional parameters
Symptom: Anthropic doesn’t provide optional parameters
Solution: This is expected. Optional parameters may be omitted. Ensure defaults are defined in your schema.
Enum type mismatches
Symptom: OpenAI converts numeric enums to strings
Solution: Use string enums in your schema:
# ✅ Use string enums
{"type": "string", "enum": ["low", "medium", "high"]}
# ❌ Avoid numeric enums
{"type": "integer", "enum": [0, 1, 2]} # Converted to ["0", "1", "2"]Next steps
- Create an evaluation suite with provider-agnostic tests
- Run evaluations with multiple providers
- Explore capture mode to see actual calls