← Back to Test Cases
Create Batch Manufacturing Record for Metformin Tablets
Task CompletionBatch Recordhard
Cross-Model Comparison
| Model | Score | Latency | Tokens In | Tokens Out |
|---|---|---|---|---|
| GPT-5.4 | 91.5% | 83.6s | 461 | 7,148 |
| GPT-5.4 mini | 90.5% | 24.3s | 461 | 5,254 |
| Qwen3.5-397B-A17B | 90.5% | 47.6s | 522 | 5,631 |
| Mistral Large 3 675B | 88.5% | 101.5s | 504 | 6,472 |
| DeepSeek-R1 | 88.5% | 117.4s | 472 | 4,550 |
| DeepSeek-V3.2 | 88.5% | 109.5s | 472 | 4,248 |
| Qwen3.5-35B-A3B | 86.1% | 43.4s | 523 | 6,107 |
| Claude Haiku 4.5 | 86.0% | 89.7s | 549 | 14,977 |
| GPT-5.4 nano | 85.0% | 29.5s | 461 | 6,076 |
| Claude Opus 4.6 | 84.3% | 199.0s | 550 | 16,000 |
| Gemini 3.1 Pro | 82.5% | 75.7s | 498 | 5,228 |
| Claude Sonnet 4.6 | 78.5% | 137.8s | 550 | 16,000 |
| Mistral Small 2603 | 78.1% | 15.2s | 516 | 3,174 |
| Llama 4 Maverick | 67.6% | 88.4s | 469 | 1,692 |
| Gemini 3 Flash | 67.5% | 14.2s | 498 | 2,134 |
| Llama 3.3 70B Instruct | 51.5% | 38.4s | 479 | 1,289 |
| DeepSeek-R1-Distill-Qwen-32B | 45.3% | 174.1s | 495 | 1,720 |
| Llama 4 Scout | 43.8% | 16.9s | 469 | 1,202 |
| Gemini 3.1 Flash-Lite | 42.5% | 4.6s | 500 | 1,043 |
Tags
batch_recordtablet_manufacturingwet_granulationdocumentation