Getting Started with the Hugging Face Transformers Library
Installation and Model Selection
Begin by installing the libray: pip install transformers
Available models can be found at: https://huggingface.co/languages
Using Pipelinse
Pipelines provide the simplest way to use pre-trained models. The workflow involves:
- Selecting a model from Hugging Face
- Loading it with the appropriate pipeline
Sentiment Analysis Example
from transformers import BertForSequenceClassification, BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')
model = BertForSequenceClassification.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')
text = 'Feeling unhappy today'
output = model(torch.tensor([tokenizer.encode(text)]))
print(torch.nn.functional.softmax(output.logits, dim=-1))
Saving Models
save_dir = "./model_save"
tokenizer.save_pretrained(save_dir)
model.save_pretrained(save_dir)
Available Pipeline Tasks
"sentiment-analysis": Text classification"question-answering": QA systems"text-generation": Text generation"translation": Language translation"summarization": Text summarization
Core Components
Tokenization
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoding = tokenizer("Demonstrating the Transformers library")
print(encoding)
Model Loading
from transformers import AutoModel
model = AutoModel.from_pretrained('bert-base-uncased')
Model Inference
inputs = tokenizer("Sample text", return_tensors="pt")
outputs = model(**inputs)
Text Generasion
from transformers import pipeline
generator = pipeline("text-generation")
generator("Recent events in California")
Fine-Tuning Models
Classification Model
from torch import nn
from transformers import AutoModel
class CustomClassifier(nn.Module):
def __init__(self):
super().__init__()
self.encoder = AutoModel.from_pretrained('bert-base-uncased')
self.classifier = nn.Linear(768, 2)
def forward(self, x):
outputs = self.encoder(**x)
return self.classifier(outputs.last_hidden_state[:, 0, :])
Sequence-to-Sequence Model
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-en-zh')
Advanced Techniques
Prompt Engineering
def create_prompt(text):
return f'Overall sentiment is [MASK]. {text}'
def get_label_mapping(tokenizer):
return {
'positive': {'token': 'good', 'id': tokenizer.convert_tokens_to_ids("good")},
'negative': {'token': 'bad', 'id': tokenizer.convert_tokens_to_ids("bad")}
}