🚀 E-book
Learn how to master the modern AI infrastructural challenges.
  • US-based AI research lab

    Arcee maintains 170-195 TPS while scaling to 6x usage growth with Clarifai

    As the sole hosting partner for Trinity Mini, Clarifai's Compute Orchestration enables Arcee to deploy updates with zero downtime while handling billions of tokens daily.

  • Background

    Arcee AI is a US-based AI research lab focused on building state-of-the-art open weight foundation models and developer tooling. Arcee’s models enable customers to progress from using AI to owning their AI, providing an alternative to closed model APIs and Chinese open weight models that can be deployed on secure infrastructure and customized on customer data.

    Arcee works with ISVs, public sector, and regulated industries. Their models are commonly deployed for coding and agentic workflows.

  • Information

    Use Case

    Dedicated inference for SLMs

    Industry

    AI-Native

    Client

    Arcee AI

Details

Challenge

Before deploying on Clarifai, Arcee faced a critical infrastructure challenge: operating and scaling GPU-backed inference reliably while maintaining the rapid release velocity that differentiated them in the market.

As a research lab releasing foundation models faster than any other AI lab, Arcee needed infrastructure that could match their pace. This meant getting new models from development to production quickly without sacrificing performance. As adoption of Trinity Mini grew across OpenRouter and Arcee's native API, the model needed to handle rapidly increasing production traffic without performance degradation.

Supporting this growth required GPU infrastructure that could scale dynamically to handle usage spikes while enabling zero-downtime deployments, all without introducing operational complexity that would slow down their release cycles.

Arcee banner

 

Solution

Arcee deployed Trinity Mini, its 26B-parameter sparse mixture-of-experts language model with 3B active parameters, on Clarifai's Compute Orchestration to support production inference at scale. As the sole hosting partner for Trinity Mini, Clarifai enables Arcee to serve the model across OpenRouter and its native API platform.

Clarifai's Compute Orchestration provides fully managed GPU infrastructure with autoscaling that dynamically provisions resources based on real-time traffic patterns. The platform enables Arcee to evaluate and deploy different GPU instance types to optimize for performance and cost. This eliminated the operational overhead of manual infrastructure management.

Results

Since deploying Trinity Mini on Clarifai, Arcee has scaled production inference across OpenRouter and its native API platform without performance degradation.

From January 6 to February 5, usage on OpenRouter increased 6× month over month. During this surge, Clarifai's Compute Orchestration maintained stable throughput of 170 to 195 tokens per second, demonstrating the infrastructure's ability to handle rapid growth while preserving performance.

6x

Handled 6× MoM surge in usage

170–195 TPS

Sustained high-throughput inference

Zero

Downtime during model launches and updates.

  • Over the last 6 months Arcee has released models with more velocity than any other AI lab, which we are very proud of. Clarifai moves as fast as we do, getting new models optimized and deployed to our users within hours, and scaling to billions of tokens per day.

    Davis Stone Head of Growth

  • Ready to get started?

    Whether you're a start-up or a Fortune 500, contact us to discuss how to enrich your data with AI assisted labeling to annotate datasets accurately, quickly and affordably.

    Learn more about Clarifai

    Schedule a demo

    Discuss your solutions options

  • Already a customer?

    If you are encountering a technical or payment issue, the customer support team will be happy to assist you.


    Contact support >