当前位置：网站首页>HuggingFace

HuggingFace

2022-04-23 10:48:00 【qq1033930618】

List of articles

One 、 Official website
Two 、 Model download
3、 ... and 、 The Conduit pipeline
Four 、 Mark tokenizer
5、 ... and 、 Automatic class AutoClass
6、 ... and 、 Automatic vocabulary AutoTokenizer

One 、 Official website

huggingface.co

Two 、 Model download

Installation in the environment transformers package

conda install -n conda Virtual environment name  transformers

The model is automatically downloaded In quotation marks is the model name

from transformers import BertTokenizer, BertModel
model = BertModel.from_pretrained('bert-base-chinese', output_hidden_states = True,)
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')

Model auto download location

/home/ user name /.cache/huggingface/transformers

Manual Download
Search the model name at the top of the page
Click on Model card On the right side of the Files and Versions
The local path where the incoming model is saved

model = BertModel.from_pretrained('./model', output_hidden_states = True,)
tokenizer = BertTokenizer.from_pretrained('./model/vocab.txt')
 Be careful ,BertModel.from_pretrained Enter the path of the folder 
BertTokenizer.from_pretrained The input is vocab.txt, instead of tokenizer.json.

Speed up the download

model = BertModel.from_pretrained('bert-base-chinese', mirror='tuna')

3、 ... and 、 The Conduit pipeline

Use the model directly

from transformers import pipeline
classifier = pipeline("sentiment-analysis")  #  Emotional analysis model 
classifier("We are very happy to show you the Transformers library.")
''' Returns a list of （ Contains a dictionary   Dictionary key   by label  and  score）'''
''' Multiple can use list input '''
results = classifier(["We are very happy to show you the Transformers library.", "We hope you don't hate it."])
''' Returns a list of multiple dictionaries '''
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

Load from dataset

pip install datasets
''' Specify classification and model （ speech recognition ）  If only the classification is specified, the model will be randomly selected '''
speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
files = dataset["file"]
speech_recognizer(files[:4])

Four 、 Mark tokenizer

''' Used to hold the model '''
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
''' Print '''
classifier("Nous sommes très heureux de vous présenter la bibliothèque Transformers.")

5、 ... and 、 Automatic class AutoClass

''' Automatically retrieve the architecture of the model in the name or path of the pre trained model   relation  AutoTokenizer'''

6、 ... and 、 Automatic vocabulary AutoTokenizer

Split text into multiple words To the extent that the text is understandable

from transformers import AutoTokenizer
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
encoding = tokenizer("We are very happy to show you the  Transformers library.")
print(encoding)
{
    'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}