Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Getting Started with the Hugging Face Transformers Library

Tech May 9 3

Installation and Model Selection

Begin by installing the libray: pip install transformers Available models can be found at: https://huggingface.co/languages

Using Pipelinse

Pipelines provide the simplest way to use pre-trained models. The workflow involves:

  1. Selecting a model from Hugging Face
  2. Loading it with the appropriate pipeline

Sentiment Analysis Example

from transformers import BertForSequenceClassification, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')
model = BertForSequenceClassification.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')

text = 'Feeling unhappy today'
output = model(torch.tensor([tokenizer.encode(text)]))
print(torch.nn.functional.softmax(output.logits, dim=-1))

Saving Models

save_dir = "./model_save"
tokenizer.save_pretrained(save_dir)
model.save_pretrained(save_dir)

Available Pipeline Tasks

  • "sentiment-analysis": Text classification
  • "question-answering": QA systems
  • "text-generation": Text generation
  • "translation": Language translation
  • "summarization": Text summarization

Core Components

Tokenization

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoding = tokenizer("Demonstrating the Transformers library")
print(encoding)

Model Loading

from transformers import AutoModel
model = AutoModel.from_pretrained('bert-base-uncased')

Model Inference

inputs = tokenizer("Sample text", return_tensors="pt")
outputs = model(**inputs)

Text Generasion

from transformers import pipeline
generator = pipeline("text-generation")
generator("Recent events in California")

Fine-Tuning Models

Classification Model

from torch import nn
from transformers import AutoModel

class CustomClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = AutoModel.from_pretrained('bert-base-uncased')
        self.classifier = nn.Linear(768, 2)

    def forward(self, x):
        outputs = self.encoder(**x)
        return self.classifier(outputs.last_hidden_state[:, 0, :])

Sequence-to-Sequence Model

from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-en-zh')

Advanced Techniques

Prompt Engineering

def create_prompt(text):
    return f'Overall sentiment is [MASK]. {text}'

def get_label_mapping(tokenizer):
    return {
        'positive': {'token': 'good', 'id': tokenizer.convert_tokens_to_ids("good")},
        'negative': {'token': 'bad', 'id': tokenizer.convert_tokens_to_ids("bad")}
    }

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.