Run Hugging Face Models Locally on your Machine

Run Models on Your Own Hardware

Most AI development begins locally. You experiment with model architectures, fine-tune them on small datasets, and iterate until the results look promising. But when it’s time to test the model in a real-world pipeline, things quickly become complicated.

You usually have two choices: upload the model to the cloud even for simple testing, or set up your own API, managing routing, authentication, and security just to run it locally.

Neither approach works well if you’re:

Working on smaller or resource-limited projects
Needing access to local files or private data
Building for edge or on-prem environments where cloud access isn’t practical

Introducing Local Runners - ngrok for AI models.

Local Runners let you serve AI models, MCP servers, or agents directly from your laptop, workstation, or internal server, securely and seamlessly via a Public API. You don’t need to upload your model or manage any infrastructure. Simply run it locally, and Clarifai takes care of the API handling, routing, and integration.

Once running, the Local Runner establishes a secure connection to Clarifai's control plane. Any API request sent to your model is routed to your machine, processed locally, and returned to the client. From the outside, it behaves like a Clarifai-hosted model, while all computation occurs on your local hardware.

With Local Runners, you can:

Run models on your own hardware
Use laptops, workstations, or on-prem servers to serve models directly, with full access to local GPUs or system tools.
Keep data and compute private
Avoid uploading anything. Useful for regulated environments, internal tools, or projects involving sensitive information.
Skip infrastructure setup
No need to build and host your own API. Clarifai provides the endpoint, routing, and authentication.
Prototype and iterate quickly
Test models in real-world pipelines without deployment delays. Watch requests flow through and inspect outputs live.
Connect to local files and private APIs
Let models access your file system, internal databases, or OS-level resources—without exposing your environment.

Now that you understand the benefits and capabilities of Local Runners, let’s see how you can run Hugging Face models locally and expose them securely.

Running Hugging Face Models Locally

The Hugging Face Toolkit in Clarifai CLI enables you to download, configure, and run Hugging Face models locally while exposing them securely through a public API. You can test, integrate, and iterate on models directly from your local environment without managing any external infrastructure.

Step 1: Prerequisites

First, install the Clarifai Package. This also provides the Clarifai CLI:

Next, log in to Clarifai to link your local environment to your account. This allows you to manage and expose your models.

Follow the prompts to enter your User ID and Personal Access Token (PAT). If you need help obtaining these, refer to the documentation.

If you plan to access private Hugging Face models or repositories, generate a token from your Hugging Face account settings and set it as an environment variable:

Finally, install the Hugging Face Hub library to enable model downloads and integration:

With these steps complete, your environment is ready to initialize and run Hugging Face models locally with Clarifai.

Step 2: Initialize a Model

Use the Clarifai CLI to initialize and configure any supported Hugging Face model locally with the Toolkit:

By default, this command downloads and sets up the unsloth/Llama-3.2-1B-Instruct model in your current directory.

If you want to use a different model, you can specify it with the --model-name flag and pass the full model name from Hugging Face. For example:

Note: Some models can be very large and require significant memory or GPU resources. Make sure your machine has enough compute capacity to load and run the model locally before initializing it.

Now, once you run the above command, the CLI will scaffold the project for you. The generated directory structure will look like this:

model.py – Contains the logic for loading the model and running predictions.
config.yaml – Holds model metadata, compute resources, and checkpoint configuration.
requirements.txt – Lists the Python dependencies required for your model.

Step 3: Customize `model.py`

Once your project scaffold is ready, the next step is to configure your model’s behavior in model.py. By default, this file includes a class called MyModel that extends ModelClass from Clarifai. Inside this class, you’ll find four main methods ready for use:

load_model() – Loads checkpoints from Hugging Face, initializes the tokenizer, and sets up streaming for real-time output.
predict() – Handles single-prompt inference and returns responses. You can adjust parameters such as max_tokens, temperature, and top_p.
generate() – Streams outputs token by token, useful for live previews.
chat() – Manages multi-turn conversations and returns structured responses.

You can use these methods as-is, or customize them to fit your specific model behavior. The scaffold ensures that all core functionality is already implemented, so you can get started with minimal setup.

Step 4: Configure `config.yaml`

The config.yaml file defines model metadata and compute requirements. For Local Runners, most defaults work, but it’s important to understand each section:

checkpoints – Specifies the Hugging Face repository and token for private models.
inference_compute_info – Defines compute requirements. For Local Runners, you can typically use defaults. When deploying on dedicated infrastructure, you can customize accelerators, memory, and CPU based on the model requirements.
model – Contains metadata such as app_id, model_id, model_type_id, and user_id. Replace YOUR_USER_ID with your own Clarifai user ID.

Finally, the requirements.txt file lists all Python dependencies required for your model. You can add any additional packages your model needs to run.

Step 5: Start the Local Runner

Once your model is configured, you can launch it locally using the Clarifai CLI:

This command starts a Local Runner instance on your machine. The CLI automatically handles all necessary setup, so you don’t need to manually configure infrastructure.

After the Local Runner starts, you’ll receive a public Clarifai URL. This URL acts as a secure gateway to your locally running model. Any requests made to this endpoint are routed to your local environment, processed by your model, and returned through the same endpoint.

Run Inference with Local Runner

Once your Hugging Face model is running locally and exposed via the Clarifai Local Runner, you can send inference requests to it from anywhere — using either the OpenAI-compatible endpoint or the Clarifai SDK.

Using the OpenAI-Compatible API

Use the OpenAI client to send a request to your locally running Hugging Face model:

Using the Clarifai Python SDK

You can also interact directly through the Clarifai SDK, which provides a lightweight interface for inference:

You can also experiment with:

generate() for real-time streaming output
chat() for multi-turn interactions

With this setup, your Hugging Face model runs entirely on your local hardware — yet remains accessible via Clarifai’s secure public API.

Conclusion

Local Runners give you full control over where your models run — without sacrificing integration, security, or flexibility.

You can prototype, test, and serve real workloads on your own hardware while still using Clarifai’s platform to route traffic, handle authentication, and scale when needed.

You can try Local Runners for free with the Free Tier, or upgrade to the Developer Plan at $1/month for the first year to connect up to 5 Local Runners with unlimited hours. Read more in the documentation here to get started.

Previous Return to Blog Menu Next

Run Hugging Face Models Locally on your Machine

Table of Contents:

Run Models on Your Own Hardware

Introducing Local Runners - ngrok for AI models.

Running Hugging Face Models Locally

Step 1: Prerequisites

Step 2: Initialize a Model

Step 3: Customize `model.py`

Step 4: Configure `config.yaml`

Step 5: Start the Local Runner

Run Inference with Local Runner

Using the OpenAI-Compatible API

Using the Clarifai Python SDK

Conclusion

CONTACT

Platform

Solutions

Community

COMPANY

Resources

CONTACT

Run Hugging Face Models Locally on your Machine

Table of Contents:

Run Models on Your Own Hardware

Introducing Local Runners - ngrok for AI models.

Running Hugging Face Models Locally

Step 1: Prerequisites

Step 2: Initialize a Model

Step 3: Customize model.py

Step 4: Configure config.yaml

Step 5: Start the Local Runner

Run Inference with Local Runner

Using the OpenAI-Compatible API

Using the Clarifai Python SDK

Conclusion

CONTACT

Platform

Solutions

Community

COMPANY

Resources

CONTACT

Step 3: Customize `model.py`

Step 4: Configure `config.yaml`