当前位置:网站首页>hugging face tutorial - Chinese translation - tokenizers using Tokenizers
hugging face tutorial - Chinese translation - tokenizers using Tokenizers
2022-08-09 16:59:00 【wwlsm_zql】
使用 Tokenizers 的 tokenizers
PreTrainedTokenizerFast 依赖于 Tokenizers 库.从 Tokenizers library acquiredtokenizerscan be loaded very simplyTransformers.
在详细讨论之前,Let's first create a dummy with a few lines of codetokenizer:
from tokenizers import Tokenizer
from tokenizers.models import BPE
from tokenizers.trainers import BpeTrainer
from tokenizers.pre_tokenizers import Whitespace
tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
tokenizer.pre_tokenizer = Whitespace()
files = [...]
tokenizer.train(files, trainer)
We now have a trainer for the file we defined.We can continue to use it at runtime,or save it to JSON file for future reuse.
直接从 tokenizer 对象加载
让我们看看如何在TransformersUse this in the library tokenizer 对象.By accepting instantiated tokenizer 对象作为参数,PreTrainedTokenizerFast Classes allow easy instantiation:
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
This object is now shared by all methodsTransformerstokenizer!更多信息请访问 tokenizer 页面.
从 JSON 文件加载
为了从 JSON 文件中加载 tokenizer,Let's save first tokenizer:
tokenizer.save("tokenizer.json")
The path where we saved this file can be used tokenizer_file 参数传递给 PreTrainedTokenizerFast 初始化方法:
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
This object is now shared by all methodsTransformerstokenizer!更多信息请访问 tokenizer 页面.
本文是抱抱脸(Hugging Face)教程中文翻译,仅学习使用
边栏推荐
猜你喜欢
随机推荐
flask局域网访问失败解决方法(使用pycharm运行代码的一定要看)
浏览器指纹识别是什么意思?
MNIST数据集的训练(内附完整代码及其注释)
微信小程序封装api
【Postgraduate Work Weekly】(Week 9)
【研究生工作周报】(第五周)
Stetman读peper小记:Defense-Resistant Backdoor Attacks Against DeepNeural Networks in Outsourced Cloud
[Deep Learning] Original Problem and Dual Problem (6)
抱抱脸(hugging face)教程-中文翻译-共享定制模型
Photoshop CS6的使用心得
【学习笔记】win10报0xc0000221错误无法开机
桥接模式下虚拟机连接不上网络的解决方法(WIFI)
【深度学习】原始问题和对偶问题(六)
【深度学习】模型选择、欠/过拟合和感受野(三)
【工具使用】Keil5软件使用-进阶工程配置篇
深入浅出最优化(5) 共轭梯度下降法
链游是什么意思 链游和游戏的区别是什么
抱抱脸(hugging face)教程-中文翻译-预处理
[Deep learning] attention mechanism
PatchEmbed代码讲解记录



![[Deep Learning] SVM solves the linear inseparable situation (8)](/img/3c/199f3ff3fb0546bcd7f70bd71030a0.png)





