Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Using HuggingFace Mirror to Download Models and Datasets Smoothly

Tech 2

Webpage Download Method

Search for your target model or dataset directly on the mirror site, then navigate to the Files and Version section on its dedicated page to start downloading.

HuggingFace CLI Method

The huggingface-cli is Hugging Face’s official command-line tool with robust download capabilities.

1. Install Dependencies

pip install -U huggingface_hub

2. Configure Environment Variable Linux/MacOS Terminal (persist by adding to ~/.bash_profile or ~/.zshrc)

export HF_ENDPOINT="https://hf-mirror.com"

Windows PowerShell (persist via Environment Variables GUI)

$env:HF_ENDPOINT = "https://hf-mirror.com"

3. Download Operations

  • Download a Model (e.g., mistralai/Mistral-7B-v0.3)
huggingface-cli download --resume-download mistralai/Mistral-7B-v0.3 --local-dir mistral-7b
  • Download a Dataset (e.g., allenai/c4)
huggingface-cli download --repo-type dataset --resume-download allenai/c4 --local-dir c4-dataset

Add --local-dir-use-symlinks False to disable symbolic links for a flat file structure.

HFD Tool Method

HFD is a mirror-site-developed download utility built on git+aria2 for stable, resumable transfers.

1. Install HFD

wget https://hf-mirror.com/hfd/hfd.sh
chmod +x hfd.sh

2. Configure Environment Variable Same as HuggingFace CLI Step 2.

3. Download Operations

  • Download a Model (e.g., meta-llama/Llama-2-7b-hf)
./hfd.sh meta-llama/Llama-2-7b-hf --tool aria2c -x 8
  • Download a Dataset (e.g., squad)
./hfd.sh squad --dataset --tool aria2c -x 8

Non-Intrusive Environment Variable Method

Set the HF_ENDPOINT variable temporarily before runing your Python script to route all Hugging Face Hub API and download requests through the mirror.

HF_ENDPOINT=https://hf-mirror.com python your_llm_training_script.py

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.