当前位置：网站首页>Deploying sbert model based on torchserve ＜ semantic similarity task ＞

Deploying sbert model based on torchserve ＜ semantic similarity task ＞

2022-04-23 02:29:00 【Weiyaner】

List of articles

1 Get the model
2 install torchserve
3 Encapsulate models and interfaces
4 Deployment model
Relevant error reports and solutions

This is a question about how to use TorchServe Deploy pre trained HuggingFace Sentence transformers Model guide .

Mission ： semantic similarity （ Return to ）
Algorithm model ：Sentence BERT
Pre training model ：TinyBERT_L6_ch
frame ：torch 1.11
System environment ：Ubuntu 20.04 （ Cloud server CPU）

1 Get the model

The channel of the model comes from 2 aspect

Get the pre training model directly
stay huggingface transformers Official website , You can download the relevant pre training model .
Fine tuned model on task dataset
After fine tuning the local dataset , Save better models , Get the model and configuration file .

from sentence_transformers import SentenceTransformer
smodel = SentenceTransformer(model_dir_path)
smodel.save('model_name')

obtain ：
Insert picture description here
Mainly use the three files inside , Namely pytorch_model.bin,config.json,vocab.txt.

2 install torchserve

TorchServe from Java Realization , Therefore, the latest version of OpenJDK To run the .

apt install openjdk-11-jdk

If win Environmental Science , Refer to here for manual installation ：https://blog.csdn.net/qq_41873673/article/details/108027074

Next , In order to ensure a clean environment , by TorchServe Create a new Conda Environment and activate .

conda create -n torchserve
source activate torchserve

Next install TorchServe The dependencies of .

conda install sentencepiece torch-model-archiver psutil pytorch torchserve torchvision torchtext

If you want to use GPU example , Additional software packages are required .

conda install cudatoolkit=10.1

Now the dependencies have been installed , Can be cloned TorchServe The repository .

git clone https://github.com/pytorch/serve.git
cd serve

Environment setting is complete , Next, continue to model encapsulation .

3 Encapsulate models and interfaces

Before encapsulating the model and interface, you need to prepare the model and interface files

3.1 Prepare the model

stay serve Create a new folder in the directory , Used to save the files related to the above three models .

mkdir Transformer_model/TinyBERT_L6_ch
mv pytorch_model.bin vocab.txt config.json Transformer_model/TinyBERT_L6_ch/

3.2 Prepare interface documents

Handler It can be TorchServe The name of the built-in processor or py Path to file , To handle customized TorchServe Reasoning logic . It is mainly divided into three parts , Namely preprocess,inference and postprocess, Represents the processing of data , Model reasoning and post-processing .

TorchServe The library supports the following handlers :image_classifier, object_detector, text_classifier, image_segmenter.

There is no supporting interface for semantic similarity tasks , Need to write based on personal data .

handler.py

import json 
import zipfile 
from json import JSONEncoder 
import numpy as np 
import os
class NumpyArrayEncoder(JSONEncoder): 
    def default(self, obj): 
        if isinstance(obj, np.ndarray): 
            return obj.tolist() 
        return JSONEncoder.default(self, obj)
class SentenceTransformerHandler(object): 
    def __init__(self):
        super(SentenceTransformerHandler, self).__init__() 
        self.initialized = False 
        self.embedder = None
    def initialize(self, context):
        properties = context.system_properties 
        model_dir = properties.get("model_dir") 
        self.embedder = SentenceTransformer(model_dir) 
        self.initialized = True
    def preprocess(self, data): 
        ## 
        inputs = data[0].get("data") 
        if inputs is None: 
            inputs = data[0].get("body") 
        inputs = inputs.decode('utf-8') 
        inputs = json.loads(inputs) 
        sentences= inputs['queries'] 
        return sentences 
    def inference(self, data): 
        query_embeddings = self.embedder.encode(data) 
        return query_embeddings
    def postprocess(self, data):
        return [json.dumps(data,cls=NumpyArrayEncoder)]

And then through run_handler.py call , As an interface
run_handler.py

from handler import SentenceTransformerHandler
_service = SentenceTransformerHandler()
def handle(data, context): 
    """ Entry point for SentenceTransformerHandler handler """ 
    try: 
        if not _service.initialized: 
            print('ENTERING INITIALIZATION')
        if data is None: 
            return None
        data = _service.preprocess(data) 
        data = _service.inference(data) 
        data = _service.postprocess(data)
        return data
    except Exception as e: 
        raise Exception("Unable to process input data. " + str(e))

3.3 encapsulation

stay TinyBERT_L4_ch Under the folder , Execute the encapsulation command

 Three model files 
1.  Model parameter file （torch） pytorch_model.bin
2.  Model configuration file config.json
3.  Thesaurus   vocab.txt  chinese 21128, english 30522

 Two interface files ：
1. handler.py   Define interface functions 
2. run_handler.py   Call the interface class defined above , As handler Pass in 

 Encapsulate models and interfaces 
torch-model-archiver --model-name sbert_tiny4ch \
--version 1.0 \
--serialized-file pytorch_model.bin  \
--export-path /root/weiyan/serve/model_store  \
--handler run_handler.py \
--extra-files "handler.py,config.json,vocab.txt"  \
--runtime python3 -f

Encapsulate the command

torch-model-archiver --model-name sbert_tiny4ch --version 1.0 --serialized-file pytorch_model.bin --export-path /root/weiyan/serve/model_store --handler run_handler.py --extra-files "handler.py,config.json,vocab.txt"  --runtime python3 -f

stay model-store Folder gets a sbert_tiny4ch.mar file .

4 Deployment model

4.1 start-up torchserve

stay serve Under the table of contents , Create a new folder , Store the configuration file config.properties, Record about the model path and file name .

mkdir Configs
vi sbert_tiny4_config.properties

sbert_tiny4_config.properties

model_store=/root/weiyan/serve/model_store
load_models=sbert_tiny4ch.mar

start-up torchserve

torchserve --ncs --start --ts-config sbert_tiny4_config.properties

Then print out a series of model information
Insert picture description here
If there is no error in the middle , Then the model deployment is successful .

4.2 Model reasoning

Use the relevant commands to view the model deployment

1.ping Connection status

curl http://localhost:8080/ping

{
    
  "status": "Healthy"
}

2. Model loading

curl http://localhost:8081/models

{
    
  "models": [
    {
    
      "modelName": "sbert_tiny4ch",
      "modelUrl": "sbert_tiny4ch.mar"
    }
  ]
}

4.3 Semantic similarity reasoning

because SBERT about querys the embeddings expression , So the output of the model is also embeddings, The similarity between the two can be calculated in the above handler.py In file , You can also get the reasoning result, that is embeddings after , Operate again .

Calculate the similarity after selecting here
sbert_tiny4ch.py

import requests
import json
from scipy import spatial
sentences = [' I go skiing '," I'm going skiing "]
data = {
    'data':json.dumps({
    'queries':sentences})}
print(data)
response = requests.post('http://localhost:8080/predictions/sbert_tiny4ch',data = data)
print(response)
if response.status_code==200:
    vectors = response.json()
    sim = 1 - (spatial.distance.cosine(vectors[0], vectors[1]))
    print(' The similarity is ：',sim)

Execute the script

python sbert_tiny4ch.py

Return the similarity result

{
    'data': '{"queries": ["\\u4eca\\u5929\\u53bb\\u6ed1\\u96ea", "\\u6211\\u8981\\u53bb\\u6ed1\\u51b0"]}'}
<Response [200]>
 The similarity is ： 0.9452057554006862

Relevant error reports and solutions

Query results 404

Report errors

{
    
  "code": 404,
  "type": "ResourceNotFoundException",
  "message": "Requested resource is not found, please refer to API document."
}

reason ：

 Wrong port used , There are three ports 8080,8081,8082, It's really not good. Try it all .
 commonly ,:8080/predictions.
 see models Use 8081.
ping Use 8080

Query results 503

prediction error , Generally, there is an error when encapsulating the model and interface , My experience is handler.py Among them process Wrong function , Cause problems in processing data , Naturally, there is no way to return the result of reasoning .

Generally, the models are tested locally before they are deployed online , So make sure there are no errors in the model file itself .

see logs

Running torchserve Under the directory of , Will generate a logs Folder , It automatically saves some logs, among model.logs Save the operation information of the model , Including the above errors Believe in information , You can go back to the log to see , What exactly is wrong with the model .