
This blog post focuses on new features and improvements. For a comprehensive list, including bug fixes, please see the release notes.
Clarifai’s Compute Orchestration lets you deploy models on your own compute, control how they scale, and decide where inference runs across clusters and nodepools.
As AI systems move beyond single inference calls toward long-running tasks, multi-step workflows, and agent-driven execution, orchestration needs to do more than just start containers. It needs to manage execution over time, handle failure, and route traffic intelligently across compute.
This release builds on that foundation with native support for long-running pipelines, model routing across nodepools and environments, and agentic model execution using Model Context Protocol (MCP).
AI systems don’t break at inference. They break when workflows span multiple steps, run for hours, or need to recover from failure.
Today, teams rely on stitched-together scripts, cron jobs, and queue workers to manage these workflows. As agent workloads and MLOps pipelines grow more complex, this setup becomes hard to operate, debug, and scale.
With Clarifai 12.0, we’re introducing Pipelines, a native way to define, run, and manage long-running, multi-step AI workflows directly on the Clarifai platform.
Most AI platforms are optimized for short-lived inference calls. But real production workflows look very different:
Multi-step agent logic that spans tools, models, and external APIs
Long-running jobs like batch processing, fine-tuning, or evaluations
End-to-end MLOps workflows that require reproducibility, versioning, and control
Pipelines are built to handle this class of problems.
Clarifai Pipelines act as the orchestration backbone for advanced AI systems. They let you define container-based steps, control execution order or parallelism, manage state and secrets, and monitor runs from start to finish, all without bolting together separate orchestration infrastructure.
Each pipeline is versioned, reproducible, and executed on Clarifai-managed compute, giving you fine-grained control over how complex AI workflows run at scale.
Let's walk through how Pipelines work, what you can build with them, and how to get started using the CLI and API.
At a high level, a Clarifai Pipeline is a versioned, multi-step workflow made up of containerized steps that run asynchronously on Clarifai compute.
Each step is an isolated unit of execution with its own code, dependencies, and resource settings. Pipelines define how these steps connect, whether they run sequentially or in parallel, and how data flows between them.
You define a pipeline once, upload it, and then trigger runs that can execute for minutes, hours, or longer.
Initialize a pipeline project
This scaffolds a complete pipeline project using the same structure and conventions as Clarifai custom models.
Each pipeline step follows the exact same footprint developers already use when uploading models to Clarifai: a configuration file, a dependency file, and an executable Python entrypoint.
A typical scaffolded pipeline looks like this:
At the pipeline level, config.yaml defines how steps are connected and orchestrated, including execution order, parameters, and dependencies between steps.
Each step is a self-contained unit that looks and behaves just like a custom model:
config.yaml defines the step’s inputs, runtime, and compute requirements
requirements.txt specifies the Python dependencies for that step
pipeline_step.py contains the actual execution logic, where you write code to process data, call models, or interact with external systems
This means building pipelines feels immediately familiar. If you’ve already uploaded custom models to Clarifai, you’re working with the same configuration style, the same versioning model, and the same deployment mechanics—just composed into multi-step workflows.
Upload the pipeline
Clarifai builds and versions each step as a containerized artifact, ensuring reproducible runs.
Run the pipeline
Once running, you can monitor progress, inspect logs, and manage executions directly through the platform.
Under the hood, pipeline execution is powered by Argo Workflows, allowing Clarifai to reliably orchestrate long-running, multi-step jobs with proper dependency management, retries, and fault handling.
Pipelines are designed to support everything from automated MLOps workflows to advanced AI agent orchestration, without requiring you to operate your own workflow engine.
Note: Pipelines are currently available in Public Preview.
You can start trying them today and we welcome your feedback as we continue to iterate. For a step-by-step guide on defining steps, uploading pipelines, managing runs, and building more advanced workflows, check out the detailed documentation here.
With this release, Compute Orchestration now supports model routing across multiple nodepools within a single deployment.
Model routing allows a deployment to reference multiple pre-existing nodepools through a deployment_config.yaml. These nodepools can belong to different clusters and can span cloud, on-prem, or hybrid environments.
Here’s how model routing works:
Nodepools are treated as an ordered priority list. Requests are routed to the first nodepool by default.
A nodepool is considered fully loaded when queued requests exceed configured age or quantity thresholds and the deployment has reached its max_replicas, or the nodepool has reached its maximum instance capacity.
When this happens, the next nodepool in the list is automatically warmed and a portion of traffic is routed to it.
The deployment’s min_replicas applies only to the primary nodepool.
The deployment’s max_replicas applies independently to each nodepool, not as a global sum.
This approach enables high availability and predictable scaling without duplicating deployments or manually managing failover. Deployments can now span multiple compute pools while behaving as a single, resilient service.
Read more about Multi-Nodepool Deployment here.
Clarifai expands support for agentic AI systems by making it easier to combine agent-aware models with Model Context Protocol integration. Models can discover, call, and reason over both custom and open-source MCP servers during inference, while remaining fully managed on the Clarifai platform.
You can upload models with agentic capabilities by using the AgenticModelClass, which extends the standard model class to support tool discovery and execution. The upload workflow remains the same as existing custom models, using the same project structure, configuration files, and deployment process.
Agentic models are configured to work with MCP servers, which expose tools that the model can call during inference.
Key capabilities include:
Iterative tool calling within a single predict or generate request
Tool discovery and execution handled by the agentic model class
Support for both streaming and non-streaming inference
Compatibility with the OpenAI-compatible API and Clarifai SDKs
A complete example of uploading and running an agentic model is available here. This repository shows how to upload a GPT-OSS-20B model with agentic capabilities enabled using the AgenticModelClass.
Clarifai has already supported deploying custom MCP servers, allowing teams to build their own tool servers and run them on the platform. This release expands that capability by making it easy to deploy public MCP servers directly on the Platform.
Public MCP servers can now be uploaded using a simple configuration, without requiring teams to host or manage the server infrastructure themselves. Once deployed, these servers can be shared across models and workflows, allowing agentic models to access the same tools.
This example demonstrates how to deploy a public, open-source MCP server on Clarifai as an API endpoint.
We’ve introduced a new Pay-As-You-Go (PAYG) plan to make billing simpler and more predictable for self-serve users.
The PAYG plan has no monthly minimums and far fewer feature gates. You prepay credits, use them across the platform, and pay only for what you consume. To improve reliability, the plan also includes auto-recharge, so long-running jobs don’t stop unexpectedly when credits run low.
To help you get started, every verified user receives a one-time $5 welcome credit, which can be used across inference, Compute Orchestration, deployments, and more. You can also claim an additional $5 for your organization.
If you want a deeper breakdown of how prepaid credits work, what’s changing from previous plans, and why we made this shift, get more details in this blog.
Clarifai is now available as an inference provider in the Vercel AI SDK. You can use Clarifai-hosted models directly through the OpenAI-compatible interface in @ai-sdk/openai-compatible, without changing your existing application logic.
This makes it easy to swap in Clarifai-backed models for production inference while continuing to use the same Vercel AI SDK workflows you already rely on. Learn more here
We’ve published two new open-weight reasoning models from the Ministral 3 family on Clarifai:
A compact reasoning model designed for efficiency, offering strong performance while remaining practical to deploy on realistic hardware.
Ministral-3-14B-Reasoning-2512
The largest model in the Ministral 3 family, delivering reasoning performance close to much larger systems while retaining the benefits of an efficient open-weight design.
Both models are available now and can be used across Clarifai’s inference, orchestration, and deployment workflows.
We’ve made a few targeted improvements across the platform to improve usability and day-to-day workflows.
Added cleaner filters in the Control Center, making charts easier to navigate and interpret.
Improved the Team & Logs view to ensure today’s audit logs are included when selecting the last 7 days.
Enabled stopping responses directly from the right panel when using Compare mode in the Playground.
This release includes a broad set of improvements to the Python SDK and CLI, focused on stability, local runners, and developer experience.
Improved reliability of local model runners, including fixes for vLLM compatibility, checkpoint downloads, and runner ID conflicts.
Introduced better artifact management and interactive config.yaml creation during the model upload flow.
Expanded test coverage and improved error handling across runners, model loading, and OpenAI-compatible API calls.
Several additional fixes and enhancements are included, covering dependency upgrades, environment handling, and CLI robustness. Learn more here.
You can start building with Clarifai Pipelines today to run long-running, multi-step workflows directly on the platform. Define steps, upload them with the CLI, and monitor execution across your compute.
For production deployments, model routing lets you scale across multiple nodepools and clusters with built-in spillover and high availability.
If you’re building agentic systems, you can also enable agentic model support with MCP servers to give models access to tools during inference.
Pipelines are available in public preview. We’d love your feedback as you build.
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy