https://download.csdn.net/download/2402_83140078/89351811
情感分析旨在自动识别和提取文本中的倾向、立场、评价、观点等主观信息。它包含各式各样的任务,比如句子级情感分类、评价对象级情感分类、观点抽取、情绪分类等。情感分析是人工智能的重要研究方向,具有很高的学术价值。同时,情感分析在消费决策、舆情分析、个性化推荐等领域均有重要的应用,具有很高的商业价值。
近日,百度正式发布情感预训练模型SKEP(Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis)。SKEP利用情感知识增强预训练模型, 在14项中英情感分析典型任务上全面超越SOTA,此工作已经被ACL 2020录用。
论文地址:https://arxiv.org/abs/2005.05635
为了方便研发人员和商业合作伙伴共享效果领先的情感分析技术,本次百度在Senta中开源了基于SKEP的情感预训练代码和中英情感预训练模型。而且,为了进一步降低用户的使用门槛,百度在SKEP开源项目中集成了面向产业化的一键式情感分析预测工具。用户只需要几行代码即可实现基于SKEP的情感预训练以及模型预测功能。
SKEP是百度研究团队提出的基于情感知识增强的情感预训练算法,此算法采用无监督方法自动挖掘情感知识,然后利用情感知识构建预训练目标,从而让机器学会理解情感语义。SKEP为各类情感分析任务提供统一且强大的情感语义表示。
百度研究团队在三个典型情感分析任务,句子级情感分类(Sentence-level Sentiment Classification),评价对象级情感分类(Aspect-level Sentiment Classification)、观点抽取(Opinion Role Labeling),共计14个中英文数据上进一步验证了情感预训练模型SKEP的效果。实验表明,以通用预训练模型ERNIE(内部版本)作为初始化,SKEP相比ERNIE平均提升约1.2%,并且较原SOTA平均提升约2%,具体效果如下表:
任务 | 数据集合 | 语言 | 指标 | 原SOTA | SKEP | 数据集地址 |
|---|---|---|---|---|---|---|
句子级情感 分类 | SST-2 | 英文 | ACC | 97.50 | 97.60 | 下载地址 |
Amazon-2 | 英文 | ACC | 97.37 | 97.61 | 下载地址 | |
ChnSentiCorp | 中文 | ACC | 95.80 | 96.50 | 下载地址 | |
NLPCC2014-SC | 中文 | ACC | 78.72 | 83.53 | 下载地址 | |
评价对象级的 情感分类 | Sem-L | 英文 | ACC | 81.35 | 81.62 | 下载地址 |
Sem-R | 英文 | ACC | 87.89 | 88.36 | 下载地址 | |
AI-challenge | 中文 | F1 | 72.87 | 72.90 | 暂未开放 | |
SE-ABSA16_PHNS | 中文 | ACC | 79.58 | 82.91 | 下载地址 | |
SE-ABSA16_CAME | 中文 | ACC | 87.11 | 90.06 | 下载地址 | |
观点 抽取 | MPQA-H | 英文 | b-F1/p-F1 | 83.67/77.12 | 86.32/81.11 | 下载地址 |
MPQA-T | 英文 | b-F1/p-F1 | 81.59/73.16 | 83.67/77.53 | 下载地址 | |
COTE_BD | 中文 | F1 | 82.17 | 84.50 | 下载地址 | |
COTE_MFW | 中文 | F1 | 86.18 | 87.90 | 下载地址 | |
COTE_DP | 中文 | F1 | 84.33 | 86.30 | 下载地址 |
.
├── README.md
├── requirements.txt
├── senta # senta核心代码,包括模型、输出reader、分词方法等
├── script # 情感分析各任务入口启动脚本,通过调用配置文件完成模型的训练和预测
├── config # 任务配置文件目录,在配置文件中设定模型的方法、超参数、数据等为了降低用户的使用门槛,百度在SKEP开源项目中集成了面向产业化的一键式情感分析预测工具。具体安装和使用方法如下:
本仓库支持pip安装和源码安装两种方式,使用pip或者源码安装时需要先安装PaddlePaddle,PaddlePaddle安装请参考安装文档。
python -m pip install Sentagit clone https://github.com/baidu/Senta.git
cd Senta
python -m pip install .from senta import Senta
my_senta = Senta()
# 获取目前支持的情感预训练模型, 我们开放了以ERNIE 1.0 large(中文)、ERNIE 2.0 large(英文)和RoBERTa large(英文)作为初始化的SKEP模型
print(my_senta.get_support_model()) # ["ernie_1.0_skep_large_ch", "ernie_2.0_skep_large_en", "roberta_skep_large_en"]
# 获取目前支持的预测任务
print(my_senta.get_support_task()) # ["sentiment_classify", "aspect_sentiment_classify", "extraction"]
# 选择是否使用gpu
use_cuda = True # 设置True or False
# 预测中文句子级情感分类任务
my_senta.init_model(model_class="ernie_1.0_skep_large_ch", task="sentiment_classify", use_cuda=use_cuda)
texts = ["中山大学是岭南第一学府"]
result = my_senta.predict(texts)
print(result)
# 预测中文评价对象级的情感分类任务
my_senta.init_model(model_class="ernie_1.0_skep_large_ch", task="aspect_sentiment_classify", use_cuda=use_cuda)
texts = ["百度是一家高科技公司"]
aspects = ["百度"]
result = my_senta.predict(texts, aspects)
print(result)
# 预测中文观点抽取任务
my_senta.init_model(model_class="ernie_1.0_skep_large_ch", task="extraction", use_cuda=use_cuda)
texts = ["唐 家 三 少 , 本 名 张 威 。"]
result = my_senta.predict(texts, aspects)
print(result)
# 预测英文句子级情感分类任务(基于SKEP-ERNIE2.0模型)
my_senta.init_model(model_class="ernie_2.0_skep_large_en", task="sentiment_classify", use_cuda=use_cuda)
texts = ["a sometimes tedious film ."]
result = my_senta.predict(texts)
print(result)
# 预测英文评价对象级的情感分类任务(基于SKEP-ERNIE2.0模型)
my_senta.init_model(model_class="ernie_2.0_skep_large_en", task="aspect_sentiment_classify", use_cuda=use_cuda)
texts = ["I love the operating system and the preloaded software."]
aspects = ["operating system"]
result = my_senta.predict(texts, aspects)
print(result)
# 预测英文观点抽取任务(基于SKEP-ERNIE2.0模型)
my_senta.init_model(model_class="ernie_2.0_skep_large_en", task="extraction", use_cuda=use_cuda)
texts = ["The JCC would be very pleased to welcome your organization as a corporate sponsor ."]
result = my_senta.predict(texts)
print(result)
# 预测英文句子级情感分类任务(基于SKEP-RoBERTa模型)
my_senta.init_model(model_class="roberta_skep_large_en", task="sentiment_classify", use_cuda=use_cuda)
texts = ["a sometimes tedious film ."]
result = my_senta.predict(texts)
print(result)
# 预测英文评价对象级的情感分类任务(基于SKEP-RoBERTa模型)
my_senta.init_model(model_class="roberta_skep_large_en", task="aspect_sentiment_classify", use_cuda=use_cuda)
texts = ["I love the operating system and the preloaded software."]
aspects = ["operating system"]
result = my_senta.predict(texts, aspects)
print(result)
# 预测英文观点抽取任务(基于SKEP-RoBERTa模型)
my_senta.init_model(model_class="roberta_skep_large_en", task="extraction", use_cuda=use_cuda)
texts = ["The JCC would be very pleased to welcome your organization as a corporate sponsor ."]
result = my_senta.predict(texts)
print(result)该项目中使用的各数据集的说明、下载方法及使用样例如下:
基于该项目可以实现对于论文 Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis 效果的复现。下面给出论文效果的复现方法示例:
#下载以Roberta作为初始化,训练的SKEP英文情感预训练模型(简写为SKEP-RoBERTa)
sh download_roberta_skep_large_en.sh
#基于SKEP-RoBERTa模型finetune训练和预测英文句子级情感分类任务(示例数据:SST-2)
sh ./script/run_train.sh ./config/roberta_skep_large_en.SST-2.cls.json # finetune训练
sh ./script/run_infer.sh ./config/roberta_skep_large_en.SST-2.infer.json # 预测
#基于SKEP-RoBERTa模型finetune训练和预测英文评价对象级的情感分类任务(示例数据:Sem-L)
sh ./script/run_train.sh ./config/roberta_skep_large_en.absa_laptops.cls.json # finetune训练
sh ./script/run_infer.sh ./config/roberta_skep_large_en.absa_laptops.infer.json # 预测
#基于SKEP-RoBERTa模型finetune训练和预测英文观点抽取任务(示例数据:MPQA)
sh ./script/run_train.sh ./config/roberta_skep_large_en.MPQA.orl.json # finetune训练
sh ./script/run_infer.sh ./config/roberta_skep_large_en.MPQA.infer.json # 预测注:如需要复现论文数据集结果,请参考论文修改对应任务的参数设置。
如需使用该项目中的代码、模型或是方法,请在相关文档、论文中引用我们的工作。
@inproceedings{tian-etal-2020-skep,
title = "{SKEP}: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis",
author = "Tian, Hao and
Gao, Can and
Xiao, Xinyan and
Liu, Hao and
He, Bolei and
Wu, Hua and
Wang, Haifeng and
wu, feng",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.374",
pages = "4067--4076",
abstract = "Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Se