Artificial Analysis, an independent benchmarking platform, evaluated providers serving GPT-OSS-120B across latency, throughput, and price. In these tests, Clarifai’s Compute Orchestration delivered 0.27 s Time to First Token (TTFT) and 313 tokens per second at a blended price near $0.16 per 1M tokens. These results place Clarifai in the benchmark’s “most attractive” zone for high speed and low price.
Artificial Analysis benchmarks focus on three core metrics that map directly to production workloads:
Time to First Token (TTFT): the delay from request to the first streamed token. Lower TTFT improves responsiveness in chatbots, copilots, and agent loops.
Tokens per second (throughput): the average streaming rate, a strong indicator of completion speed and efficiency.
Blended price per million tokens: a normalized cost metric that accounts for both input and output tokens, allowing apples-to-apples comparisons across providers.
On GPT-OSS-120B, Clarifai achieved:
TTFT: 0.27 s
Throughput: 313 tokens/sec
Blended price: $0.16 per 1M tokens
Overall: Ranked in the benchmark’s “most attractive” quadrant for speed and cost efficiency
These numbers validate Clarifai’s ability to balance low latency, high throughput, and cost optimization—key factors for scaling large models like GPT-OSS-120B.
Below is a comparison of output speed versus price across major providers for GPT-OSS-120B. Clarifai stands out in the “most attractive quadrant,” combining high throughput with competitive pricing.
Output Speed vs. Price
Below chart compares latency (time to first token) against output speed. Clarifai demonstrates one of the lowest latencies while maintaining top-tier throughput—placing it among the best-in-class providers.
Latency vs. Output Speed
Clarifai’s Compute Orchestration is designed to maximize performance and efficiency regardless of the underlying hardware.
Key elements include:
Autoscaling and right-sizing: Dynamic scaling ensures resources adapt to workload spikes while minimizing idle costs.
GPU fractioning and efficiency: Techniques that maximize utilization by running multiple models or tenants on the same GPU fleet.
Runtime flexibility: Support for frameworks such as TensorRT-LLM, vLLM, and SGLang across GPU generations like H100 and B200, giving teams the flexibility to optimize for either latency or throughput.
This orchestration-first approach matters for GPT-OSS-120B, a compute-intensive Mixture-of-Experts model, where careful tuning of schedulers, batching strategies, and runtime choices can drastically affect performance and cost.
For developers and platform teams, Clarifai’s benchmark performance translates into clear benefits when deploying GPT-OSS-120B in production:
Faster, smoother user experiences
With a median TTFT of ~0.27 s, applications deliver instant feedback. In multi-step agent workflows, lower TTFT compounds to significantly reduce response times.
Improved cost efficiency
High throughput (~313 tokens/sec) combined with ~$0.16 per 1M tokens allows teams to serve more requests per GPU hour while keeping budgets predictable.
Operational flexibility
Teams can choose between latency-optimized or throughput-optimized runtimes and scale seamlessly across infrastructures, avoiding vendor lock-in.
Applicable to diverse use cases
Enterprise copilots: faster draft generation and real-time assistance
RAG and analytics pipelines: efficient summarization of long documents with lower costs
Agentic workflows: repeated tool calls with minimal latency overhead
Benchmarks are useful, but the best way to evaluate performance is to try the model yourself. Clarifai makes it simple to experiment and integrate GPT-OSS-120B into real workflows.
You can directly explore GPT-OSS-120B in Clarifai’s Playground with an interactive UI—perfect for rapid experimentation, prompt design, and side-by-side model comparisons.
Try GPT-OSS-120B in the Playground
For production use, GPT-OSS-120B is fully accessible through Clarifai’s OpenAI-compatible API. This means you can integrate the model with the same tooling and workflows you already use for OpenAI models—while benefiting from Clarifai’s orchestration efficiency and cost-performance advantages.
Developers can call GPT-OSS-120B across a wide range of environments, including:
Python (Clarifai Python SDK, OpenAI-compatible API, gRPC)
Node.js (Clarifai SDK, OpenAI-compatible clients, Vercel AI SDK)
JavaScript, PHP, Java, cURL and more
This flexibility allows you to integrate GPT-OSS-120B directly into your existing pipelines with minimal code changes.
See the Clarifai Inference documentation for details on authentication, supported SDKs, and advanced features like streaming, batching, and deployment flexibility.
Artificial Analysis’s independent evaluation of GPT-OSS-120B highlights Clarifai as one of the leading platforms for speed and cost efficiency. By combining fast token streaming (313 tok/s), low latency (0.27 s TTFT), and a competitive blended price ($0.16/M tokens), Clarifai delivers the kind of performance that matters most for production-scale inference.
For ML and engineering teams, this means more responsive user experiences, efficient infrastructure utilization, and confidence in scaling GPT-OSS-120B without unpredictable costs. Read the full Artificial Analysis benchmarks.
If you’d like to discuss these results or have questions about running GPT-OSS-120B in production, join us in our Discord Channel. Our team and community are there to help with deployment strategies, GPU choices, and optimizing your AI infrastructure.
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy