Provider compatibility

Arcade evaluations support both OpenAI and Anthropic. Each provider has different requirements for schemas and message formats.

Provider comparison

Feature	OpenAI	Anthropic
Tool name rules	Alphanumeric, `-`, `_` (max 64 chars)	Alphanumeric, `_` only
Schema format	`function.parameters` (JSON Schema)	`input_schema` (JSON Schema)
Strict mode	Yes (opt-in via `strict: true`)	No (standard JSON Schema)
Optional params	Required list + null unions	Only required params in `required`
Message roles	system, user, assistant, tool, function	user, assistant (system separate)
Tool calling format	`tool_calls` array	`tool_use` content blocks

Tool name normalization

Arcade uses dotted notation for names (e.g., Weather.GetCurrent), but providers don’t allow dots in function names.

How normalization works

names are automatically normalized:

Python


from arcade_core.converters.utils import normalize_tool_name
 
normalize_tool_name("Weather.GetCurrent")  # Returns: "Weather_GetCurrent"
normalize_tool_name("Google.Search")       # Returns: "Google_Search"

When models make calls, normalized names are resolved back to original names.

Denormalization is lossy

Reversing normalization can’t distinguish between original dots and underscores:

Original Name	Normalized	Denormalized	Correct?
`Google.Search`	`Google_Search`	`Google.Search`	✅
`My_Tool.Name`	`My_Tool_Name`	`My.Tool.Name`	❌
`Tool_Name`	`Tool_Name`	`Tool.Name`	❌

Best practice: Use only dots OR only underscores in names, never both.

Name collision

Don’t register both dotted and underscore versions of the same name:

Python


# ❌ Avoid this - creates collision
suite.add_tool_definitions([
    {"name": "Weather.GetCurrent", ...},
    {"name": "Weather_GetCurrent", ...},  # Collision!
])

The registry accepts both formats for lookups but they resolve to the same internal name.

OpenAI strict mode

OpenAI’s strict mode enforces structured outputs by transforming JSON Schema. This happens automatically in evaluations.

Schema transformations

1. Unsupported keywords are stripped:

Python


# Input schema
{
    "type": "integer",
    "minimum": 0,
    "maximum": 100,
    "default": 50
}
 
# Transformed for OpenAI
{
    "type": ["integer", "null"]
}

Stripped keywords:

Validation: minimum, maximum, minLength, maxLength, pattern, format
Metadata: default, nullable, minItems, maxItems

2. Optional parameters become required with null unions:

Python


# Input schema
{
    "type": "object",
    "properties": {
        "city": {"type": "string"},
        "units": {"type": "string", "default": "celsius"}
    },
    "required": ["city"]
}
 
# Transformed for OpenAI
{
    "type": "object",
    "properties": {
        "city": {"type": "string"},
        "units": {"type": ["string", "null"]}  # Now in union with null
    },
    "required": ["city", "units"],  # units added to required
    "additionalProperties": false
}

3. Enums are stringified:

Python


# Input schema
{
    "type": "integer",
    "enum": [0, 1, 2]
}
 
# Transformed for OpenAI
{
    "type": "string",
    "enum": ["0", "1", "2"]
}

4. Additional properties are forbidden:

All objects get "additionalProperties": false to enforce strict validation.

Why defaults still work

Even though default is stripped from schemas, defaults are still applied during evaluation. Here’s why:

The evaluation framework stores the original schema with defaults
OpenAI sends null for optional parameters in strict mode
The framework applies defaults when args are missing OR null

Python


# Model sends:
{"city": "Seattle", "units": null}
 
# Framework applies default:
{"city": "Seattle", "units": "celsius"}

This behavior ensures consistent evaluation results regardless of provider.

Anthropic schema format

Anthropic uses standard JSON Schema with minimal transformation.

Key differences from OpenAI

1. Schema field name:

Python


# OpenAI format
{
    "type": "function",
    "function": {
        "name": "get_weather",
        "parameters": {...}  # ← Note: "parameters"
    }
}
 
# Anthropic format
{
    "name": "get_weather",
    "input_schema": {...}  # ← Note: "input_schema"
}

2. No strict mode transformations:

Anthropic accepts the schema as-is:

Validation keywords are preserved
Optional params stay optional
Enums keep original types
Defaults are kept (but not sent to model)

3. Only required params in required list:

Python


{
    "type": "object",
    "properties": {
        "city": {"type": "string"},
        "units": {"type": "string", "default": "celsius"}
    },
    "required": ["city"]  # Only city is required
}

Message format conversion

Arcade evaluations use OpenAI message format internally. When using Anthropic, messages are converted automatically.

System messages

OpenAI:

Python


[
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello"},
]

Anthropic:

Python


# system → separate parameter
system = "You are helpful."
 
messages = [
    {"role": "user", "content": "Hello"},
]

Tool calls

OpenAI:

Python


{
    "role": "assistant",
    "content": "",
    "tool_calls": [
        {
            "id": "call_123",
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": '{"city": "Seattle"}'
            }
        }
    ]
}

Anthropic:

Python


{
    "role": "assistant",
    "content": [
        {
            "type": "tool_use",
            "id": "call_123",
            "name": "get_weather",
            "input": {"city": "Seattle"}
        }
    ]
}

Tool results

OpenAI:

Python


{
    "role": "tool",
    "tool_call_id": "call_123",
    "content": "Sunny, 72°F"
}

Anthropic:

Python


{
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": "call_123",
            "content": "Sunny, 72°F"
        }
    ]
}

Message conversion happens automatically. You don’t need to handle it manually.

Writing provider-agnostic evaluations

Follow these guidelines to ensure evaluations work with both providers:

1. Use simple tool names

Prefer names without dots or underscores:

Python


# ✅ Good
"GetWeather"
"SearchGoogle"
"SendMessage"
 
# ⚠️ Acceptable (use only one separator)
"Weather.GetCurrent"
"Google.Search"
 
# ❌ Avoid (mixing separators)
"My_Tool.GetData"
"Tool_Name.With_Mixed"

2. Use MCP-style tool definitions

Define tools using format:

Python


{
    "name": "GetWeather",
    "description": "Get current weather for a city",
    "inputSchema": {
        "type": "object",
        "properties": {
            "city": {"type": "string"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["city"]
    }
}

3. Don’t rely on strict mode behavior

Don’t assume specific schema transformations:

Python


# ❌ Don't rely on null unions
{
    "type": ["string", "null"]  # Only in OpenAI strict mode
}
 
# ✅ Use optional parameters
{
    "type": "string"
}
# In required list: OpenAI adds null union, Anthropic keeps as-is
# Not in required list: Both treat as optional

4. Handle optional parameters consistently

Use defaults for optional parameters:

Python


{
    "type": "object",
    "properties": {
        "units": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "default": "celsius"
        }
    },
    "required": []
}

Both providers will apply the default when the parameter is missing.

Testing with multiple providers

Run evaluations with both providers to verify compatibility:

Terminal


arcade evals . \
  --use-provider openai:gpt-4o \
  --use-provider anthropic:claude-sonnet-4-5-20250929

Comparing results

Results show provider-specific behavior:

PLAINTEXT


Suite: Weather Tools
  Case: Get weather for city
    Model: gpt-4o -- Score: 1.00 -- PASSED
    Model: claude-sonnet-4-5-20250929 -- Score: 1.00 -- PASSED

Common differences

Parameter handling:

OpenAI might send:

JSON


{"city": "Seattle", "units": null}

Anthropic might send:

JSON


{"city": "Seattle"}

Both are evaluated identically because defaults are applied.

name format:

Both providers see normalized names (Weather_GetCurrent), but your test expectations use original names (Weather.GetCurrent).

Common pitfalls

Avoid these common mistakes when working with multiple providers:

Using dots and underscores together

Python


# ❌ Don't mix separators
"My_Tool.GetData"
 
# ✅ Use one consistently
"MyTool.GetData" or "MyTool_GetData"

Relying on specific schema transformations

Python


# ❌ OpenAI-specific null unions
{"type": ["string", "null"]}
 
# ✅ Use optional parameters
{"type": "string"}  # Not in required list

Forgetting to test with both providers

Terminal


# ✅ Always test both
arcade evals . \
  --use-provider openai:gpt-4o \
  --use-provider anthropic:claude-sonnet-4-5-20250929

Troubleshooting

Tool name mismatch

Symptom: Evaluation reports “ not found”

Solution: Check if name uses dots. The normalized name (with underscores) should match:

Python


# Original: "Weather.GetCurrent"
# Normalized: "Weather_GetCurrent"
# Expected: ExpectedMCPToolCall("Weather_GetCurrent", {...})

Schema validation errors

Symptom: OpenAI returns validation errors

Solution: Check if your schema uses unsupported strict mode keywords. These are automatically stripped, but might affect expected behavior.

Missing optional parameters

Symptom: Anthropic doesn’t provide optional parameters

Solution: This is expected. Optional parameters may be omitted. Ensure defaults are defined in your schema.

Enum type mismatches

Symptom: OpenAI converts numeric enums to strings

Solution: Use string enums in your schema:

Python


# ✅ Use string enums
{"type": "string", "enum": ["low", "medium", "high"]}
 
# ❌ Avoid numeric enums
{"type": "integer", "enum": [0, 1, 2]}  # Converted to ["0", "1", "2"]

Next steps

Create an evaluation suite with provider-agnostic tests
Run evaluations with multiple providers
Explore capture mode to see actual calls