Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Build a Local AI Assistant with GPUStack and OpenClaw to Eliminate Token Costs

Tech May 10 2

Before integrating OpenClaw, you need to deploy a model on GPUStack and retrieve the service endpoint. This walkthrough uses Qwen3.5-35B-A3B as an example, covering custom inference backend setup, model deployment, and obtaining connection details.

Requirements

  • GPUStack version v2.0.3
  • Custom inference backend image: swr.cn-south-1.myhuaweicloud.com/gpustack/vllm-openai:qwen3_5
  • Model weights: Qwen/Qwen3.5-35B-A3B

Important: OpenClaw requires a minimum context window of 16K tokens; 128K+ is recommended.


1. Configure Custom Inference Backend

In the GPUStack console, navigate to Inference BackendsEdit vLLMAdd Version. Enter the image and any environment variables.

2. Deploy the Model

Deploy using these parameters:

--tensor-parallel-size=2
--mm-encoder-tp-mode data
--mm-processor-cache-type shm
--reasoning-parser qwen3
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1}'

If you encounter error 803 (system has unsupported display driver / cuda driver combination), add:

LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/lib/x86_64-linux-gnu

3. Gather Connection Details

Record these three values from GPUStack:

  • API Base URL
  • Model ID
  • API Key (create in GPUStack under API keys)

Setting Up the Feishu Bot Application

Account Requirements

A personal Feishu account cannot create bot applications. Use an enterprise/organization account. You can create one for free:

  1. On the desktop client, click the three dots (⋯) → Log in with another account
  2. Choose Create new account
  3. Select Enterprise or Organization Administrator

Follow the prompts to set a name and organization name.

Create the Application

  1. Open Feishu Open Platform
  2. Log in with the enterprise account
  3. Click Create Enterprise Self-built App
  4. Provide a name and description (icon optional)

Enable Bot Capability

In the left menu, go to Add Application Capability and add Bot.

Import Permissions

Navigate to Permissions ManagementBatch Import and replace the default permissions JSON with the following:

{
  "scopes": {
    "tenant": [
      "aily:file:read",
      "aily:file:write",
      "application:application.app_message_stats.overview:readonly",
      "application:application:self_manage",
      "application:bot.menu:write",
      "contact:contact.base:readonly",
      "contact:user.employee_id:readonly",
      "corehr:file:download",
      "event:ip_list",
      "im:chat.access_event.bot_p2p_chat:read",
      "im:chat.members:bot_access",
      "im:message",
      "im:message.group_at_msg:readonly",
      "im:message.p2p_msg:readonly",
      "im:message:readonly",
      "im:message:send_as_bot",
      "im:resource"
    ],
    "user": [
      "aily:file:read",
      "aily:file:write",
      "im:chat.access_event.bot_p2p_chat:read"
    ]
  }
}

Watch out for non-breaking spaces when copying from some sources.

After importing, create and publish a new vertion for the permissions to take effect. Also record the App ID / App Secret.


Installing and Configuring OpenClaw

This demo uses Ubuntu 24.04.

Quick Installation

curl -fsSL https://openclaw.ai/install.sh | bash

The script automatically installs Node.js and Git dependencies.

For manual installation (recommended if you prefer fnm + pnpm), finish with: openclaw onboard --install-daemon

Interactive Configuration

  1. Model/Auth Provider: Choose Custom Provider (Any OpenAI or Anthropic compatible endpoint)
  2. Enter the GPUStack API Base URL and API Key
  3. Channel: Select Feishu / Lark
  4. Provide the App ID and App Secret from your Feishu app
  5. Group chat policy: Choose Open - respond in all groups (requires mention)

Adjust Context Window (Mandatory)

By default, OpenClaw uses a context length of 4096 tokens. This is insufficient for most models. Edit the configuration:

vim ~/.openclaw/openclaw.json

Change the context_length value to at least 16384 or higher. Then restart the gateway:

openclaw gateway restart

Set Event Subscription Method

In your Feishu app settings, change the Event Subscription method to Long Connection and add the Receive Messages event. Remember to create and publish a new version after these changes.


First Authorization and Connectivity Test

  1. Send a message to the bot in Feishu.
  2. The bot will reply with a Pairing request and a code.
  3. On your server, run:
openclaw pairing approve feishu <Pairing-Code>

If you see repeated pairing prompts with duplicate plugin id detected, fix it with:

rm -rf ~/.openclaw/extensions/feishu
openclaw gateway restart

Example: Making the Bot Star a GitHub Repository

Create a GitHub Personal Access Token (PAT)

  • Use Tokens (classic)
  • Grant the repo scope

Set the Token

Add to ~/.openclaw/.env:

GITHUB_TOKEN=your_token_here

Restart the gateway:

openclaw gateway restart

Then in Feishu, ask the bot to star a repo, e.g., gpustack/gpustack.


Built-in Commands

  • /new – Start a new conversation
  • /status – Check bot status
  • /reset – Reset conversation context
  • /model – View or switch the model

Useful OpenClaw CLI Commands

openclaw logs --follow
openclaw doctor
openclaw gateway --help
openclaw dashboard
openclaw tui

Documentation and Ecosystem


Conclusion: AI as Infrastructure, Not a Consumable

Token anxiety stems from treating AI as an external metered resource. When the model runs on your own GPU, inference capacity, context, and tool calling become a part of your infrastructure rather than a per-call cost. The combination of GPUStack and OpenClaw transforms AI from a cost center into a persistent, always-available productivity tool. If you have GPU resources available, try deploying this setup and integrate AI directly into your daily workflows.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.