attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]]}
tokenizer还有truncation...和max_length属性,用于在max_length处截断:
tokenizer(raw_inputs, padding=True, truncation=True, max_length=7)...return_tensors属性也很重要,用来指定返回的是什么类型的tensors,pt就是pytorch,tf就是tensorflow:
tokenizer(raw_inputs, padding=True, truncation...AutoModel.from_pretrained(checkpoint)
加载了模型之后,就可以把tokenizer得到的输出,直接输入到model中:
inputs = tokenizer(raw_inputs, padding=True, truncation...AutoModelForSequenceClassification.from_pretrained(checkpoint)
inputs = tokenizer(raw_inputs, padding=True, truncation