🚀 E-book
Learn how to master the modern AI infrastructural challenges.
September 16, 2025

Artificial Analysis Benchmarks on GPT-OSS-120B: Clarifai Ranks at the Top for Performance and Cost-Efficiency

Table of Contents:

Artificial Analysis, an independent benchmarking platform, evaluated providers serving GPT-OSS-120B across latency, throughput, and price. In these tests, Clarifai’s Compute Orchestration delivered 0.27 s Time to First Token (TTFT) and 313 tokens per second at a blended price near $0.16 per 1M tokens. These results place Clarifai in the benchmark’s “most attractive” zone for high speed and low price.

Inside the Benchmarks: How Clarifai Stacks Up

Artificial Analysis benchmarks focus on three core metrics that map directly to production workloads:

  • Time to First Token (TTFT): the delay from request to the first streamed token. Lower TTFT improves responsiveness in chatbots, copilots, and agent loops.

  • Tokens per second (throughput): the average streaming rate, a strong indicator of completion speed and efficiency.

  • Blended price per million tokens: a normalized cost metric that accounts for both input and output tokens, allowing apples-to-apples comparisons across providers.

On GPT-OSS-120B, Clarifai achieved:

  • TTFT: 0.27 s 

  • Throughput: 313 tokens/sec

  • Blended price: $0.16 per 1M tokens

  • Overall: Ranked in the benchmark’s “most attractive” quadrant for speed and cost efficiency

These numbers validate Clarifai’s ability to balance low latency, high throughput, and cost optimization—key factors for scaling large models like GPT-OSS-120B.

Below is a comparison of output speed versus price across major providers for GPT-OSS-120B. Clarifai stands out in the “most attractive quadrant,” combining high throughput with competitive pricing.

Output Speed vs Price (10 Sep 25)  (2)

Output Speed vs. Price

Below chart compares latency (time to first token) against output speed. Clarifai demonstrates one of the lowest latencies while maintaining top-tier throughput—placing it among the best-in-class providers.

Latency vs Output Speed (10 Sep 25)  (1)

Latency vs. Output Speed

 

GPU and Hardware-Agnostic Inference at Scale with Clarifai

Clarifai’s Compute Orchestration is designed to maximize performance and efficiency regardless of the underlying hardware.

Key elements include:

  • Vendor-agnostic deployment: Seamlessly deploy models on any CPU, GPU, or accelerator in our SaaS, your own cloud or on-premises infrastructure, or in air-gapped environments without lock-in.
  • Autoscaling and right-sizing: Dynamic scaling ensures resources adapt to workload spikes while minimizing idle costs.

  • GPU fractioning and efficiency: Techniques that maximize utilization by running multiple models or tenants on the same GPU fleet.

  • Runtime flexibility: Support for frameworks such as TensorRT-LLM, vLLM, and SGLang across GPU generations like H100 and B200, giving teams the flexibility to optimize for either latency or throughput.

This orchestration-first approach matters for GPT-OSS-120B, a compute-intensive Mixture-of-Experts model, where careful tuning of schedulers, batching strategies, and runtime choices can drastically affect performance and cost.

What these results mean for engineering teams

For developers and platform teams, Clarifai’s benchmark performance translates into clear benefits when deploying GPT-OSS-120B in production:

  1. Faster, smoother user experiences
    With a median TTFT of ~0.27 s, applications deliver instant feedback. In multi-step agent workflows, lower TTFT compounds to significantly reduce response times.

  2. Improved cost efficiency
    High throughput (~313 tokens/sec) combined with ~$0.16 per 1M tokens allows teams to serve more requests per GPU hour while keeping budgets predictable.

  3. Operational flexibility
    Teams can choose between latency-optimized or throughput-optimized runtimes and scale seamlessly across infrastructures, avoiding vendor lock-in.

  4. Applicable to diverse use cases

    • Enterprise copilots: faster draft generation and real-time assistance

    • RAG and analytics pipelines: efficient summarization of long documents with lower costs

    • Agentic workflows: repeated tool calls with minimal latency overhead

Try out GPT-OSS-120B

Benchmarks are useful, but the best way to evaluate performance is to try the model yourself. Clarifai makes it simple to experiment and integrate GPT-OSS-120B into real workflows.

1. Test in the Playground

You can directly explore GPT-OSS-120B in Clarifai’s Playground with an interactive UI—perfect for rapid experimentation, prompt design, and side-by-side model comparisons.

Try GPT-OSS-120B in the Playground

2. Access via the API

For production use, GPT-OSS-120B is fully accessible through Clarifai’s OpenAI-compatible API. This means you can integrate the model with the same tooling and workflows you already use for OpenAI models—while benefiting from Clarifai’s orchestration efficiency and cost-performance advantages.

Broad SDK and runtime support

Developers can call GPT-OSS-120B across a wide range of environments, including:

  • Python (Clarifai Python SDK, OpenAI-compatible API, gRPC)

  • Node.js (Clarifai SDK, OpenAI-compatible clients, Vercel AI SDK)

  • JavaScript, PHP, Java, cURL and more

This flexibility allows you to integrate GPT-OSS-120B directly into your existing pipelines with minimal code changes.

Python example (OpenAI-compatible API)

See the Clarifai Inference documentation for details on authentication, supported SDKs, and advanced features like streaming, batching, and deployment flexibility.

Conclusion

Artificial Analysis’s independent evaluation of GPT-OSS-120B highlights Clarifai as one of the leading platforms for speed and cost efficiency. By combining fast token streaming (313 tok/s), low latency (0.27 s TTFT), and a competitive blended price ($0.16/M tokens), Clarifai delivers the kind of performance that matters most for production-scale inference.

For ML and engineering teams, this means more responsive user experiences, efficient infrastructure utilization, and confidence in scaling GPT-OSS-120B without unpredictable costs. Read the full Artificial Analysis benchmarks.

If you’d like to discuss these results or have questions about running GPT-OSS-120B in production, join us in our Discord Channel. Our team and community are there to help with deployment strategies, GPU choices, and optimizing your AI infrastructure.