Run LM Studio Models Locally on your Machine

Introduction

LM Studio makes it incredibly easy to run and experiment with open-source large language models (LLMs) entirely on your local machine, with no internet connection or cloud dependency required. You can download a model, start chatting, and explore responses while maintaining full control over your data.

But what if you want to go beyond the local interface?

Let’s say your LM Studio model is up and running locally, and now you want to call it from another app, integrate it into production, share it securely with your team, or connect it to tools built around the OpenAI API.

That’s where things get tricky. LM Studio runs models locally, but it doesn’t natively expose them through a secure, authenticated API. Setting that up manually would mean handling tunneling, routing, and API management on your own.

That’s where Clarifai Local Runners come in. Local Runners let you serve AI models, MCP servers, or agents directly from your laptop, workstation, or internal server, securely and seamlessly via a public API. You do not need to upload your model or manage any infrastructure. Run it locally, and Clarifai handles the API, routing, and integration.

Once running, the Local Runner establishes a secure connection to Clarifai’s control plane. Any API request sent to your model is routed to your machine, processed locally, and returned to the client. From the outside, it behaves like a Clarifai-hosted model, while all computation happens on your local hardware.

With Local Runners, you can:

Run models on your own hardware
Use laptops, workstations, or on-prem servers with full access to local GPUs and system tools.
Keep data and compute private
Avoid uploading anything. This is useful for regulated environments and sensitive projects.
Skip infrastructure setup
No need to build and host your own API. Clarifai provides the endpoint, routing, and authentication.
Prototype and iterate quickly
Test models in real pipelines without deployment delays. Inspect requests and outputs live.
Connect to local files and private APIs
Let models access your file system, internal databases, or OS resources without exposing your environment.

Now that the benefits are clear, let’s see how to run LM Studio models locally and expose them securely via an API.

Running LM Studio Models Locally

The LM Studio Toolkit in the Clarifai CLI enables you to initialize, configure, and run LM Studio models locally while exposing them through a secure public API. You can test, integrate, and iterate directly from your machine without standing up infrastructure.

Note: Download and keep LM Studio open when running the Local Runner. The runner launches and communicates with LM Studio through its local port to load, serve, and run model inferences.

Step 1: Prerequisites

Install the Clarifai package and CLI

Log in to Clarifai

Follow the prompts to enter your User ID and Personal Access Token (PAT). If you need help obtaining these, refer to the documentation.

Step 2: Initialize a Model

Use the Clarifai CLI to initialize and configure an LM Studio model locally. Only models available in the LM Studio Model Catalog and in GGUF format are supported.

Initialize the default example model

By default, this creates a project for the LiquidAI/LFM2-1.2B LM Studio model in your current directory.

If you want to work with a specific model rather than the default LiquidAI/LFM2-1.2B, you can use the --model-name flag to specify the full model name. See the full list of all models here.

Note: Some models are large and require significant memory. Ensure your machine meets the model’s requirements before initializing.

Now, once you run the above command, the CLI will scaffold the project for you. The generated directory structure will look like this:

model.py contains the logic that calls LM Studio’s local runtime for predictions.
config.yaml defines metadata, compute characteristics, and toolkit settings.
requirements.txt lists Python dependencies.

Step 3: Customize model.py

The scaffold includes an LMstudioModelClass that extends OpenAIModelClass. It defines how your Local Runner interacts with LM Studio’s local runtime.

Key methods:

load_model() – Launches LM Studio’s local runtime, loads the selected model, and connects to the server port using the OpenAI-compatible API interface.
predict() – Handles single-prompt inference with optional parameters such as max_tokens, temperature, and top_p. Returns the complete model response.
generate() – Streams generated tokens in real time for interactive or incremental outputs.

You can use these implementations as-is or modify them to align with your preferred request and response structures.

Step 4: Configure config.yaml

The config.yaml file defines model identity, runtime, and compute metadata for your LM Studio Local Runner:

model – Includes id, user_id, app_id, and model_type_id (for example, text-to-text).
toolkit – Specifies lmstudio as the provider. Key fields include:
- model – The LM Studio model to use (e.g., LiquidAI/LFM2-1.2B).
- port – The local port the LM Studio server listens on.
- context_length – Maximum context length for the model.
inference_compute_info – For Local Runners, this is mostly optional, because the model runs entirely on your local machine and uses your local CPU/GPU resources. You can leave defaults as-is. If you plan to deploy the model on Clarifai’s dedicated compute, you can specify CPU/memory limits, number of accelerators, and GPU type to match your model requirements.
build_info – Specifies the Python version used for the runtime (e.g., 3.12).

Finally, the requirements.txt file lists Python dependencies your model needs. Add any extra packages required by your logic.

Step 5: Start the Local Runner

Start a Local Runner that connects to LM Studio’s runtime:

If contexts or defaults are missing, the CLI will prompt you to create them. This ensures compute contexts, nodepools, and deployments are set in your configuration.

After startup, you will receive a public Clarifai URL for your local model. Requests sent to this endpoint route securely to your machine, run through LM Studio, then return to the client.

Run Inference with Local Runner

Once your LM Studio model is running locally and exposed via the Clarifai Local Runner, you can send inference requests from anywhere using the OpenAI-compatible API or the Clarifai SDK.

OpenAI-Compatible API

Clarifai Python SDK

You can also experiment with generate() method for real-time streaming.

Conclusion

Local Runners give you full control over where your models execute without sacrificing integration, security, or flexibility. You can prototype, test, and serve real workloads on your own hardware, while Clarifai handles routing, authentication, and the public endpoint.

You can try Local Runners for free with the Free Tier, or upgrade to the Developer Plan at $1 per month for the first year to connect up to 5 Local Runners with unlimited hours.

Previous Return to Blog Menu Next

Run LM Studio Models Locally on your Machine

Table of Contents:

Introduction

Running LM Studio Models Locally

Step 1: Prerequisites

Step 2: Initialize a Model

Step 3: Customize model.py

Step 4: Configure config.yaml

Step 5: Start the Local Runner

Run Inference with Local Runner

OpenAI-Compatible API

Clarifai Python SDK

Conclusion

CONTACT

Platform

Solutions

Community

COMPANY

Resources

CONTACT