Skip to Content
HomeEvaluate toolsProvider compatibility

Provider compatibility

Arcade evaluations support both OpenAI and Anthropic. Each provider has different requirements for schemas and message formats.

Provider comparison

FeatureOpenAIAnthropic
Tool name rulesAlphanumeric, -, _ (max 64 chars)Alphanumeric, _ only
Schema formatfunction.parameters (JSON Schema)input_schema (JSON Schema)
Strict modeYes (opt-in via strict: true)No (standard JSON Schema)
Optional paramsRequired list + null unionsOnly required params in required
Message rolessystem, user, assistant, tool, functionuser, assistant (system separate)
Tool calling formattool_calls arraytool_use content blocks

Tool name normalization

Arcade uses dotted notation for names (e.g., Weather.GetCurrent), but providers don’t allow dots in function names.

How normalization works

names are automatically normalized:

Python
from arcade_core.converters.utils import normalize_tool_name normalize_tool_name("Weather.GetCurrent") # Returns: "Weather_GetCurrent" normalize_tool_name("Google.Search") # Returns: "Google_Search"

When models make calls, normalized names are resolved back to original names.

Denormalization is lossy

Reversing normalization can’t distinguish between original dots and underscores:

Original NameNormalizedDenormalizedCorrect?
Google.SearchGoogle_SearchGoogle.Search
My_Tool.NameMy_Tool_NameMy.Tool.Name
Tool_NameTool_NameTool.Name

Best practice: Use only dots OR only underscores in names, never both.

Name collision

Don’t register both dotted and underscore versions of the same name:

Python
# ❌ Avoid this - creates collision suite.add_tool_definitions([ {"name": "Weather.GetCurrent", ...}, {"name": "Weather_GetCurrent", ...}, # Collision! ])

The registry accepts both formats for lookups but they resolve to the same internal name.

OpenAI strict mode

OpenAI’s strict mode enforces structured outputs by transforming JSON Schema. This happens automatically in evaluations.

Schema transformations

1. Unsupported keywords are stripped:

Python
# Input schema { "type": "integer", "minimum": 0, "maximum": 100, "default": 50 } # Transformed for OpenAI { "type": ["integer", "null"] }

Stripped keywords:

  • Validation: minimum, maximum, minLength, maxLength, pattern, format
  • Metadata: default, nullable, minItems, maxItems

2. Optional parameters become required with null unions:

Python
# Input schema { "type": "object", "properties": { "city": {"type": "string"}, "units": {"type": "string", "default": "celsius"} }, "required": ["city"] } # Transformed for OpenAI { "type": "object", "properties": { "city": {"type": "string"}, "units": {"type": ["string", "null"]} # Now in union with null }, "required": ["city", "units"], # units added to required "additionalProperties": false }

3. Enums are stringified:

Python
# Input schema { "type": "integer", "enum": [0, 1, 2] } # Transformed for OpenAI { "type": "string", "enum": ["0", "1", "2"] }

4. Additional properties are forbidden:

All objects get "additionalProperties": false to enforce strict validation.

Why defaults still work

Even though default is stripped from schemas, defaults are still applied during evaluation. Here’s why:

  1. The evaluation framework stores the original schema with defaults
  2. OpenAI sends null for optional parameters in strict mode
  3. The framework applies defaults when args are missing OR null
Python
# Model sends: {"city": "Seattle", "units": null} # Framework applies default: {"city": "Seattle", "units": "celsius"}

This behavior ensures consistent evaluation results regardless of provider.

Anthropic schema format

Anthropic uses standard JSON Schema with minimal transformation.

Key differences from OpenAI

1. Schema field name:

Python
# OpenAI format { "type": "function", "function": { "name": "get_weather", "parameters": {...} # ← Note: "parameters" } } # Anthropic format { "name": "get_weather", "input_schema": {...} # ← Note: "input_schema" }

2. No strict mode transformations:

Anthropic accepts the schema as-is:

  • Validation keywords are preserved
  • Optional params stay optional
  • Enums keep original types
  • Defaults are kept (but not sent to model)

3. Only required params in required list:

Python
{ "type": "object", "properties": { "city": {"type": "string"}, "units": {"type": "string", "default": "celsius"} }, "required": ["city"] # Only city is required }

Message format conversion

Arcade evaluations use OpenAI message format internally. When using Anthropic, messages are converted automatically.

System messages

OpenAI:

Python
[ {"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hello"}, ]

Anthropic:

Python
# system → separate parameter system = "You are helpful." messages = [ {"role": "user", "content": "Hello"}, ]

Tool calls

OpenAI:

Python
{ "role": "assistant", "content": "", "tool_calls": [ { "id": "call_123", "type": "function", "function": { "name": "get_weather", "arguments": '{"city": "Seattle"}' } } ] }

Anthropic:

Python
{ "role": "assistant", "content": [ { "type": "tool_use", "id": "call_123", "name": "get_weather", "input": {"city": "Seattle"} } ] }

Tool results

OpenAI:

Python
{ "role": "tool", "tool_call_id": "call_123", "content": "Sunny, 72°F" }

Anthropic:

Python
{ "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "call_123", "content": "Sunny, 72°F" } ] }

Message conversion happens automatically. You don’t need to handle it manually.

Writing provider-agnostic evaluations

Follow these guidelines to ensure evaluations work with both providers:

1. Use simple tool names

Prefer names without dots or underscores:

Python
# ✅ Good "GetWeather" "SearchGoogle" "SendMessage" # ⚠️ Acceptable (use only one separator) "Weather.GetCurrent" "Google.Search" # ❌ Avoid (mixing separators) "My_Tool.GetData" "Tool_Name.With_Mixed"

2. Use MCP-style tool definitions

Define tools using format:

Python
{ "name": "GetWeather", "description": "Get current weather for a city", "inputSchema": { "type": "object", "properties": { "city": {"type": "string"}, "units": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["city"] } }

3. Don’t rely on strict mode behavior

Don’t assume specific schema transformations:

Python
# ❌ Don't rely on null unions { "type": ["string", "null"] # Only in OpenAI strict mode } # ✅ Use optional parameters { "type": "string" } # In required list: OpenAI adds null union, Anthropic keeps as-is # Not in required list: Both treat as optional

4. Handle optional parameters consistently

Use defaults for optional parameters:

Python
{ "type": "object", "properties": { "units": { "type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius" } }, "required": [] }

Both providers will apply the default when the parameter is missing.

Testing with multiple providers

Run evaluations with both providers to verify compatibility:

Terminal
arcade evals . \ --use-provider openai:gpt-4o \ --use-provider anthropic:claude-sonnet-4-5-20250929

Comparing results

Results show provider-specific behavior:

PLAINTEXT
Suite: Weather Tools Case: Get weather for city Model: gpt-4o -- Score: 1.00 -- PASSED Model: claude-sonnet-4-5-20250929 -- Score: 1.00 -- PASSED

Common differences

Parameter handling:

OpenAI might send:

JSON
{"city": "Seattle", "units": null}

Anthropic might send:

JSON
{"city": "Seattle"}

Both are evaluated identically because defaults are applied.

name format:

Both providers see normalized names (Weather_GetCurrent), but your test expectations use original names (Weather.GetCurrent).

Common pitfalls

Avoid these common mistakes when working with multiple providers:

  1. Using dots and underscores together

    Python
    # ❌ Don't mix separators "My_Tool.GetData" # ✅ Use one consistently "MyTool.GetData" or "MyTool_GetData"
  2. Relying on specific schema transformations

    Python
    # ❌ OpenAI-specific null unions {"type": ["string", "null"]} # ✅ Use optional parameters {"type": "string"} # Not in required list
  3. Forgetting to test with both providers

    Terminal
    # ✅ Always test both arcade evals . \ --use-provider openai:gpt-4o \ --use-provider anthropic:claude-sonnet-4-5-20250929

Troubleshooting

Tool name mismatch

Symptom: Evaluation reports “ not found”

Solution: Check if name uses dots. The normalized name (with underscores) should match:

Python
# Original: "Weather.GetCurrent" # Normalized: "Weather_GetCurrent" # Expected: ExpectedMCPToolCall("Weather_GetCurrent", {...})

Schema validation errors

Symptom: OpenAI returns validation errors

Solution: Check if your schema uses unsupported strict mode keywords. These are automatically stripped, but might affect expected behavior.

Missing optional parameters

Symptom: Anthropic doesn’t provide optional parameters

Solution: This is expected. Optional parameters may be omitted. Ensure defaults are defined in your schema.

Enum type mismatches

Symptom: OpenAI converts numeric enums to strings

Solution: Use string enums in your schema:

Python
# ✅ Use string enums {"type": "string", "enum": ["low", "medium", "high"]} # ❌ Avoid numeric enums {"type": "integer", "enum": [0, 1, 2]} # Converted to ["0", "1", "2"]

Next steps

Last updated on

Provider compatibility | Arcade Docs