Implementation Guide for StarCoder2 Code Generation with PyTorch on DCU Hardware
The StarCoder2 suite comprises architecture variants scaled at 3 billion, 7 billion, and 15 billion parameters. Training utilized a corpus ranging between 3.3 and 4.3 trillion code tokens sourced from the Stack v2 dataset, encompassing support for over 600 distinct programming languages.
Architectural Details
The model architecture is derived from the StarCoderBase framework with key enhancements:
- RoPE Positional Encoding: Replaced standard Embedding layers to improve extrapolation capabilities during sequence generation.
- Grouped Query Attention (GQA): Swapped Multi-Query Attention (MQA) modules. This adjustment balances computational throughput against inference performance depending on the selected head configuration.
Environment Setup
Dependencies and runtime environments can be configured using one of three methods below. Ensure version compatibility across Python, PyTorch, and driver toolkits.
Containerized Deployment
Adjust paths and image identifiers according to your local registry configuration.
docker pull [REGISTRY_URL]/starcoder:2.1.0-py310-dtk24
docker run -it \
--name starcoder-container \
-v /host/code:/workspace/code:rw \
-v /opt/shared:/opt/shared:ro \
--shm-size=80g \
--privileged=true \
--device=/dev/kfd \
--device=/dev/dri/ \
--group-add video \
[IMAGE_ID] bash
Once inside the container:
cd /workspace/code/starcoder2_pytorch
pip install -r requirements.txt -i https://pypi.org/simple
export MODEL_HF_ENDPOINT=https://hf-mirror.com
Local Build Configuration
Alternatively, build directly from source files.
cd docker
DOCKER_BUILDKIT=1 docker build --no-cache -t starcoder-latest .
Execute the same container flags as the Docker method above.
Conda & Toolkit Installation
For direct hardware access, ensure strict version alignment between the toolkit drivers, Python environment, and torch bindings.
| Component | Version |
|---|---|
| DTK Drivers | dtk24.04 |
| Python | 3.10.x |
| PyTorch | 2.1.0 |
Install remaining dependencies:
pip install -r requirements.txt -i https://pypi.org/simple
export MODEL_HF_ENDPOINT=https://hf-mirror.com
Dataset Preparation
Fine-tuning examples are extracted from bigcode/the-stack-smol. For instance, the Rust subset resides at /data/rust.
Directory structure typically includes:
data/
├── assembly/data.json
└── rust/data.json
...
Training Procedure
Configuration parameters should be defined within the training script file.
Essential arguments include:
dataset_name: Path to the prepared data directory.model_name: Location of the base pretrained checkpoint.
Execution command:
chmod +x train_script.sh
bash train_script.sh
Inference Workflow
Inference relies on the HuggingFace Trensformers library. Pretrained weights must be placed in the designated models directory before execution.
Run the following command to initialize the generation process:
HIP_VISIBLE_DEVICES=0 python run_inference.py
You may modify the inference.py script path or update the internal model_name variable to point to custom weight locations.
Performance Benchmarks
Training conducted on bigcode/the-stack-smol/data/rust yielded the following results after 100 steps:
| Device Config | Train Loss | Steps |
|---|---|---|
| 2x A800 | 1.2758 | 100 |
| 2x K100 | 1.2772 | 100 |
Model Artifacts Structure
The pre-trained package typically organizes files as follows:
starcoder2-7b/
├── config.json
├── generation_config.json
├── merges.txt
├── model.safetensors.index.json
├── model-00001-of-00003.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
└── vocab.json
References & Sources
- Project Repository: GitLab ModelZoo containing
starcoder2_pytorch - Official Paper: StarCoder 2 and The Stack v2
- Hugging Face Hub:
bigcode/starcoder2-7b - Dataset Repo:
bigcode/the-stack-smol