← Back to Test Cases
Generate EM Trending Report from Scattered Data
Task CompletionEm Reporthard
Cross-Model Comparison
| Model | Score | Latency | Tokens In | Tokens Out |
|---|---|---|---|---|
| Claude Sonnet 4.6 | 95.0% | 207.0s | 1,097 | 14,026 |
| Claude Opus 4.6 | 94.8% | 212.6s | 1,097 | 12,719 |
| DeepSeek-R1 | 94.3% | 54.5s | 1,042 | 2,001 |
| Qwen3.5-35B-A3B | 91.6% | 37.0s | 1,355 | 5,316 |
| GPT-5.4 | 91.5% | 44.5s | 1,037 | 3,121 |
| Gemini 3.1 Pro | 88.0% | 281.2s | 1,384 | 5,510 |
| Qwen3.5-397B-A17B | 87.5% | 74.5s | 1,355 | 5,115 |
| Gemini 3 Flash | 83.1% | 8.2s | 1,384 | 1,229 |
| Claude Haiku 4.5 | 78.2% | 35.0s | 1,096 | 4,736 |
| GPT-5.4 mini | 75.7% | 13.7s | 1,037 | 2,691 |
| Llama 4 Maverick | 71.0% | 40.3s | 1,041 | 992 |
| GPT-5.4 nano | 68.8% | 14.6s | 1,037 | 3,058 |
| DeepSeek-V3.2 | 68.3% | 75.6s | 1,042 | 1,920 |
| Mistral Large 3 675B | 67.8% | 32.2s | 1,363 | 2,233 |
| Llama 3.3 70B Instruct | 65.3% | 26.7s | 1,044 | 723 |
| Mistral Small 2603 | 65.0% | 12.8s | 1,375 | 2,445 |
| DeepSeek-R1-Distill-Qwen-32B | 64.8% | 145.2s | 1,341 | 2,187 |
| Gemini 3.1 Flash-Lite | 59.8% | 4.2s | 1,384 | 877 |
| Llama 4 Scout | 48.8% | 9.7s | 1,041 | 839 |
Tags
environmental_monitoringtrendingdata_analysis