Home > Tech > Content

Build a Local AI Assistant with GPUStack and OpenClaw to Eliminate Token Costs

Tech May 10 2

Before integrating OpenClaw, you need to deploy a model on GPUStack and retrieve the service endpoint. This walkthrough uses Qwen3.5-35B-A3B as an example, covering custom inference backend setup, model deployment, and obtaining connection details.

Requirements

GPUStack version v2.0.3
Custom inference backend image: swr.cn-south-1.myhuaweicloud.com/gpustack/vllm-openai:qwen3_5
Model weights: Qwen/Qwen3.5-35B-A3B

Important: OpenClaw requires a minimum context window of 16K tokens; 128K+ is recommended.

1. Configure Custom Inference Backend

In the GPUStack console, navigate to Inference Backends → Edit vLLM → Add Version. Enter the image and any environment variables.

2. Deploy the Model

Deploy using these parameters:

--tensor-parallel-size=2
--mm-encoder-tp-mode data
--mm-processor-cache-type shm
--reasoning-parser qwen3
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1}'

If you encounter error 803 (system has unsupported display driver / cuda driver combination), add:

LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/lib/x86_64-linux-gnu

3. Gather Connection Details

Record these three values from GPUStack:

API Base URL
Model ID
API Key (create in GPUStack under API keys)

Setting Up the Feishu Bot Application

Account Requirements

A personal Feishu account cannot create bot applications. Use an enterprise/organization account. You can create one for free:

On the desktop client, click the three dots (⋯) → Log in with another account
Choose Create new account
Select Enterprise or Organization Administrator

Follow the prompts to set a name and organization name.

Create the Application

Open Feishu Open Platform
Log in with the enterprise account
Click Create Enterprise Self-built App
Provide a name and description (icon optional)

Enable Bot Capability

In the left menu, go to Add Application Capability and add Bot.

Import Permissions

Navigate to Permissions Management → Batch Import and replace the default permissions JSON with the following:

{
  "scopes": {
    "tenant": [
      "aily:file:read",
      "aily:file:write",
      "application:application.app_message_stats.overview:readonly",
      "application:application:self_manage",
      "application:bot.menu:write",
      "contact:contact.base:readonly",
      "contact:user.employee_id:readonly",
      "corehr:file:download",
      "event:ip_list",
      "im:chat.access_event.bot_p2p_chat:read",
      "im:chat.members:bot_access",
      "im:message",
      "im:message.group_at_msg:readonly",
      "im:message.p2p_msg:readonly",
      "im:message:readonly",
      "im:message:send_as_bot",
      "im:resource"
    ],
    "user": [
      "aily:file:read",
      "aily:file:write",
      "im:chat.access_event.bot_p2p_chat:read"
    ]
  }
}

Watch out for non-breaking spaces when copying from some sources.

After importing, create and publish a new vertion for the permissions to take effect. Also record the App ID / App Secret.

Installing and Configuring OpenClaw

This demo uses Ubuntu 24.04.

Quick Installation

curl -fsSL https://openclaw.ai/install.sh | bash

The script automatically installs Node.js and Git dependencies.

For manual installation (recommended if you prefer fnm + pnpm), finish with: openclaw onboard --install-daemon

Interactive Configuration

Model/Auth Provider: Choose Custom Provider (Any OpenAI or Anthropic compatible endpoint)
Enter the GPUStack API Base URL and API Key
Channel: Select Feishu / Lark
Provide the App ID and App Secret from your Feishu app
Group chat policy: Choose Open - respond in all groups (requires mention)

Adjust Context Window (Mandatory)

By default, OpenClaw uses a context length of 4096 tokens. This is insufficient for most models. Edit the configuration:

vim ~/.openclaw/openclaw.json

Change the context_length value to at least 16384 or higher. Then restart the gateway:

openclaw gateway restart

Set Event Subscription Method

In your Feishu app settings, change the Event Subscription method to Long Connection and add the Receive Messages event. Remember to create and publish a new version after these changes.

First Authorization and Connectivity Test

Send a message to the bot in Feishu.
The bot will reply with a Pairing request and a code.
On your server, run:

openclaw pairing approve feishu <Pairing-Code>

If you see repeated pairing prompts with duplicate plugin id detected, fix it with:

rm -rf ~/.openclaw/extensions/feishu
openclaw gateway restart

Example: Making the Bot Star a GitHub Repository

Create a GitHub Personal Access Token (PAT)

Use Tokens (classic)
Grant the repo scope

Set the Token

Add to ~/.openclaw/.env:

GITHUB_TOKEN=your_token_here

Restart the gateway:

openclaw gateway restart

Then in Feishu, ask the bot to star a repo, e.g., gpustack/gpustack.

Built-in Commands

/new – Start a new conversation
/status – Check bot status
/reset – Reset conversation context
/model – View or switch the model

Useful OpenClaw CLI Commands

openclaw logs --follow
openclaw doctor
openclaw gateway --help
openclaw dashboard
openclaw tui

Documentation and Ecosystem

Conclusion: AI as Infrastructure, Not a Consumable

Token anxiety stems from treating AI as an external metered resource. When the model runs on your own GPU, inference capacity, context, and tool calling become a part of your infrastructure rather than a per-call cost. The combination of GPUStack and OpenClaw transforms AI from a cost center into a persistent, always-available productivity tool. If you have GPU resources available, try deploying this setup and integrate AI directly into your daily workflows.

Tags: GPUStack OpenClaw Local AI

Back to List

Prev: Java Control Flow Statements: Conditionals, Loops, and Jumps

Next: Implementing Circuit Breakers with Hystrix in Spring Cloud Microservices

Fading Coder

Build a Local AI Assistant with GPUStack and OpenClaw to Eliminate Token Costs

Requirements

1. Configure Custom Inference Backend

2. Deploy the Model

3. Gather Connection Details

Setting Up the Feishu Bot Application

Account Requirements

Create the Application

Enable Bot Capability

Import Permissions

Installing and Configuring OpenClaw

Quick Installation

Interactive Configuration

Adjust Context Window (Mandatory)

Set Event Subscription Method

First Authorization and Connectivity Test

Example: Making the Bot Star a GitHub Repository

Create a GitHub Personal Access Token (PAT)

Set the Token

Built-in Commands

Useful OpenClaw CLI Commands

Documentation and Ecosystem

Conclusion: AI as Infrastructure, Not a Consumable

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Build a Local AI Assistant with GPUStack and OpenClaw to Eliminate Token Costs

Requirements

1. Configure Custom Inference Backend

2. Deploy the Model

3. Gather Connection Details

Setting Up the Feishu Bot Application

Account Requirements

Create the Application

Enable Bot Capability

Import Permissions

Installing and Configuring OpenClaw

Quick Installation

Interactive Configuration

Adjust Context Window (Mandatory)

Set Event Subscription Method

First Authorization and Connectivity Test

Example: Making the Bot Star a GitHub Repository

Create a GitHub Personal Access Token (PAT)

Set the Token

Built-in Commands

Useful OpenClaw CLI Commands

Documentation and Ecosystem

Conclusion: AI as Infrastructure, Not a Consumable

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment