首页
学习
活动
专区
圈层
工具
发布
清单首页AI文章详情

基于HAI应用,从零开始的NLP处理实践指南

{"type":"doc","content":[{"type":"heading","attrs":{"id":"39e6b163-76a6-4225-b19b-67a3caa3853d","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","text":"人人都能理解的\"语言密码\""}]},{"type":"paragraph","attrs":{"id":"1f44c0bb-b0aa-4a0e-a906-7b28a0100f5a","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"在社交媒体刷屏的评论区里,人工智能如何瞬间识别出愤怒的吐槽与真诚的赞美?购物平台为何能自动将\"质量太差\"归类为差评,把\"物超所值\"标记为好评?这背后正是自然语言处理(NLP)技术在发挥作用。本文中,我们将以情感分析为切入点,用举例和比喻的方式方法,带您亲历一个NLP项目的完整生命周期。你无需专业背景,只需跟着操作步骤,即可亲手打造出能理解人类情感的智能程序。"}]},{"type":"image","attrs":{"id":"888fdfd4-0a28-4d03-908b-e5aebfb31ecf","src":"https://developer.qcloudimg.com/editor/image/5421023/20250319-c1cf0a6b.png","extension":"png","align":"center","alt":"","showAlt":false,"href":"","boxShadow":"","width":1018,"aspectRatio":"1.413889","status":"success","showText":true,"isPercentage":false,"percentage":0,"isHoverDragHandle":false}},{"type":"heading","attrs":{"id":"76337965-dbac-4278-bda3-5137d8a9e5d2","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"情感分析的原理"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"id":"8650f5bb-1ae2-4125-abe8-2be2cca1c175","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"情感分析,顾名思义,就是让计算机判断一段文本是"},{"type":"text","marks":[{"type":"bold"}],"text":"正面的"},{"type":"text","text":"(积极的情绪,比如“很好”“超值”)还是"},{"type":"text","marks":[{"type":"bold"}],"text":"负面的"},{"type":"text","text":"(消极的情绪,比如“太差了”“后悔”)。要做到这一点,计算机需要完成以下几个步骤:"}]},{"type":"orderedList","attrs":{"id":"f509ce7d-09c6-4074-abad-be8038cf9433","start":1,"isHoverDragHandle":false},"content":[{"type":"listItem","attrs":{"id":"bd12e9d0-ddbc-49a6-8e9f-52c1036a3ebe"},"content":[{"type":"paragraph","attrs":{"id":"4a51c6e4-6b39-4872-8a94-e77a6681b4c4","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"数据收集"},{"type":"text","text":":收集包含情感标签的文本数据,比如电商评论、社交媒体评论等。"}]}]},{"type":"listItem","attrs":{"id":"69727c09-17a4-481f-bc4e-2fe7e72acc03"},"content":[{"type":"paragraph","attrs":{"id":"f8887362-8ed3-431d-bb04-f959aa640adc","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"文本预处理"},{"type":"text","text":":去掉无关字符,进行分词,去除停用词等,使文本变得更“干净”。"}]}]},{"type":"listItem","attrs":{"id":"3336dfaf-d4a0-4c2d-b231-06117250a342"},"content":[{"type":"paragraph","attrs":{"id":"57e8598a-b262-43a9-979d-d6e6b793f255","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"特征提取"},{"type":"text","text":":将文本转换成计算机可理解的数值格式,比如使用 "},{"type":"text","marks":[{"type":"code"}],"text":"TF-IDF"},{"type":"text","text":" 统计每个词的重要性。"}]}]},{"type":"listItem","attrs":{"id":"f6c027ac-0181-47e9-bbbf-e838bc903633"},"content":[{"type":"paragraph","attrs":{"id":"3916fcbb-e4a6-4259-9489-f3a42f24d088","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"模型训练"},{"type":"text","text":":选择合适的机器学习模型(如SVM支持向量机),让模型学习哪些词与“正面”或“负面”相关。"}]}]},{"type":"listItem","attrs":{"id":"cd72764e-5f94-4f64-a45b-ccaa6c5f3ab0"},"content":[{"type":"paragraph","attrs":{"id":"391b2eee-d553-4ca7-9bd0-6df9c3678ad9","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"模型评估"},{"type":"text","text":":用测试数据检查模型的准确性,并进行优化调整。"}]}]},{"type":"listItem","attrs":{"id":"6d0a44dc-7c0b-41bb-bd7f-9a5f5723f671"},"content":[{"type":"paragraph","attrs":{"id":"010bf0b1-f67c-4fdb-a594-ee840f99a300","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"情感预测"},{"type":"text","text":":给新文本打上情感标签,判断其是正面还是负面。"}]}]}]},{"type":"paragraph","attrs":{"id":"8879548a-27c8-4395-88fa-27639e828eef","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"接下来,我们就按照这个流程,一步步构建一个"},{"type":"text","marks":[{"type":"bold"}],"text":"中文情感分析系统"},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"id":"da64f5aa-ea6c-4659-9a14-5bfbc3ddf922","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"准备工作:安装必要的库"}]},{"type":"paragraph","attrs":{"id":"f7f417cb-d219-462c-b5fd-c471cb291bb2","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"在开始编写代码前,我们需要安装一些必要的 Python 库。这些库可以帮助我们进行分词、特征提取和机器学习建模。在 JupyterLab 中运行以下代码安装所需库:"}]},{"type":"codeBlock","attrs":{"id":"6d73a9b8-4e93-4812-8972-026986cc4a02","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"pip install jieba scikit-learn wordcloud matplotlib"}]},{"type":"paragraph","attrs":{"id":"9f02a840-6843-4739-8399-89071bfebd96","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"其中,"},{"type":"text","marks":[{"type":"code"}],"text":"jieba"},{"type":"text","text":" 负责中文分词,"},{"type":"text","marks":[{"type":"code"}],"text":"scikit-learn"},{"type":"text","text":" 用于机器学习建模,"},{"type":"text","marks":[{"type":"code"}],"text":"wordcloud"},{"type":"text","text":" 用于生成词云,"},{"type":"text","marks":[{"type":"code"}],"text":"matplotlib"},{"type":"text","text":" 用于数据可视化。"}]},{"type":"image","attrs":{"id":"e8c30cf2-be2f-473e-8487-6959c388cbe3","src":"https://developer.qcloudimg.com/editor/image/5421023/20250317-9b04d754.png","extension":"png","align":"center","alt":"","showAlt":false,"href":"","boxShadow":"","width":1100,"aspectRatio":"5.296296","status":"success","showText":true,"isPercentage":false,"percentage":0,"isHoverDragHandle":false}},{"type":"heading","attrs":{"id":"b18dbd98-b480-4ceb-ac39-31dedb86e30a","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"数据准备"}]},{"type":"paragraph","attrs":{"id":"e73ca0b6-1344-4723-ab60-9d80e7b890ff","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"为了演示,我们创建一个简单的电商评论数据集,每条评论都带有情感标签(1表示正面,0表示负面):"}]},{"type":"bulletList","attrs":{"id":"2a038ac5-6ab2-4301-8b69-c71f2e8a64e0","isHoverDragHandle":false},"content":[{"type":"listItem","attrs":{"id":"85f447b8-60a4-4566-bbc8-7fe169721a52"},"content":[{"type":"paragraph","attrs":{"id":"db86909d-4968-4ac2-b3c0-c3185cb13b66","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"data"},{"type":"text","text":" 是一个字典,包含用户评论 "},{"type":"text","marks":[{"type":"code"}],"text":"text"},{"type":"text","text":" 和情感标签 "},{"type":"text","marks":[{"type":"code"}],"text":"sentiment"},{"type":"text","text":"。"}]}]},{"type":"listItem","attrs":{"id":"ee4f037c-e57d-4a68-953c-ad714b220a03"},"content":[{"type":"paragraph","attrs":{"id":"ace243e2-d1b8-4bdd-8a6f-cff606dd733c","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"pd.DataFrame(data)"},{"type":"text","text":" 将数据转换为 "},{"type":"text","marks":[{"type":"code"}],"text":"pandas"},{"type":"text","text":" 的 "},{"type":"text","marks":[{"type":"code"}],"text":"DataFrame"},{"type":"text","text":",方便后续操作。"}]}]}]},{"type":"codeBlock","attrs":{"id":"1501a25f-2587-4f26-8361-b81d217ca2e6","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 导入库 ===\nimport re\nimport jieba\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom wordcloud import WordCloud\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.metrics import classification_report\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.feature_extraction.text import TfidfVectorizer\n\n# === 自定义中文停用词表 ===\nstopwords = set(\"\"\"\n的 了 是 就 和 在 有 我 这 个 也 不 都 要 还 又 很 让 之 与 等 而\n啊 呀 呢 吧 哦 哇 嘛 嗯 唉 啦 哟 么 哪 么 什么 怎么 为什么\n\"\"\".split())\n\n# === 数据集(模拟电商评论) ===\ndata = {\n \"text\": [\n \"手机颜值超高,运行速度特别快,拍照效果惊艳!\",\n \"物流慢得离谱,等了一个月才到货\",\n \"性价比一般,电池续航没有宣传的那么好\",\n \"客服小姐姐态度超好,问题解决得非常快\",\n \"商品有严重质量问题,完全不能用\",\n \"这个价格能买到这样的品质真的很划算\",\n \"包装破损严重,明显是二手商品\",\n \"操作界面非常人性化,老人也能轻松使用\",\n \"广告宣传和实物差距太大,感觉被欺骗\",\n \"系统流畅不卡顿,游戏体验特别棒\"\n ],\n \"sentiment\": [1,0,0,1,0,1,0,1,0,1] # 1=正面 0=负面\n}\ndf = pd.DataFrame(data)"}]},{"type":"heading","attrs":{"id":"10ec3727-caff-4dff-8cae-5743ad2a6761","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"文本预处理"}]},{"type":"paragraph","attrs":{"id":"4f0d8a7a-e9ac-48b4-b5ec-56dfe9d59a6e","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"计算机无法直接理解汉字,所以我们需要进行"},{"type":"text","marks":[{"type":"bold"}],"text":"文本预处理"},{"type":"text","text":",包括:"}]},{"type":"orderedList","attrs":{"id":"689243b3-e40f-4545-ade3-28ff46b7a67a","start":1,"isHoverDragHandle":false},"content":[{"type":"listItem","attrs":{"id":"439f68d1-78c1-462b-9b19-9c9719ccb772"},"content":[{"type":"paragraph","attrs":{"id":"4973f114-e9e6-463b-a59f-21de52c1925f","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"正则清洗:移除URL、标点等干扰符号"}]}]},{"type":"listItem","attrs":{"id":"135fafcd-119d-4d79-80d0-86919fb626a7"},"content":[{"type":"paragraph","attrs":{"id":"2afea324-d139-4b51-abf6-79e5923839cf","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"分词处理:将连续文本转化为词语序列"}]}]},{"type":"listItem","attrs":{"id":"c7c0ad01-e78a-4007-a04f-196ee34ecf4f"},"content":[{"type":"paragraph","attrs":{"id":"d6f6256e-0637-4e18-93f5-7e74f14aac07","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"停用词过滤:去除\"的\"、\"了\"等无意义词汇"}]}]},{"type":"listItem","attrs":{"id":"e9866efb-bef3-4c45-a772-ccaa72670b10"},"content":[{"type":"paragraph","attrs":{"id":"17b79f13-a3c4-4c02-aaf6-08d42753289e","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"长度筛选:保留具有实际含义的词汇"}]}]}]},{"type":"paragraph","attrs":{"id":"721ff918-d744-4bc1-8b41-0dd414910816","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"我们可以使用 "},{"type":"text","marks":[{"type":"code"}],"text":"jieba"},{"type":"text","text":" 进行分词,并定义一个停用词表:"}]},{"type":"codeBlock","attrs":{"id":"36a07339-4961-4580-9125-92f2356a8725","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 中文预处理函数 ===\ndef chinese_text_processing(text):\n # 清洗特殊字符\n text = re.sub(r'[^\\u4e00-\\u9fa5]', '', text)\n # 精确模式分词\n words = jieba.lcut(text)\n # 过滤停用词和单字\n words = [word for word in words if len(word)>1 and word not in stopwords]\n return ' '.join(words)\n\n# 应用预处理\ndf['processed'] = df['text'].apply(chinese_text_processing)"}]},{"type":"bulletList","attrs":{"id":"0b966ce8-e406-4b66-8d0c-7782d61fdb78","isHoverDragHandle":false},"content":[{"type":"listItem","attrs":{"id":"56915c1a-d7c3-4c3d-a0f5-507436307cb2"},"content":[{"type":"paragraph","attrs":{"id":"2389a085-a5bd-40f4-9b3e-b51b6840755a","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"re.sub(r'[^\\u4e00-\\u9fa5]', '', text)"},{"type":"text","text":" 只保留中文字符。"}]}]},{"type":"listItem","attrs":{"id":"83909151-2d15-471c-acb4-4592490a87aa"},"content":[{"type":"paragraph","attrs":{"id":"e9ef1586-4162-4b61-9c24-589703e69e42","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"jieba.lcut(text)"},{"type":"text","text":" 进行中文分词,将句子转换成词列表。"}]}]},{"type":"listItem","attrs":{"id":"fff4fad3-1647-453e-858f-e60945eedb85"},"content":[{"type":"paragraph","attrs":{"id":"21a63829-2dda-4166-93f1-ffb70d59226f","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"stopwords"},{"type":"text","text":" 是停用词集合,去掉无意义的高频词。"}]}]},{"type":"listItem","attrs":{"id":"4f1e3a09-666d-4933-9786-9f3b07f01485"},"content":[{"type":"paragraph","attrs":{"id":"3675186e-bbc1-4dc7-b21b-7e8c34aed957","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"code"}],"text":"df['processed'] = df['text'].apply(chinese_text_processing)"},{"type":"text","text":" 将预处理函数应用到 "},{"type":"text","marks":[{"type":"code"}],"text":"text"},{"type":"text","text":" 列。"}]}]}]},{"type":"paragraph","attrs":{"id":"a9e96429-f052-491a-af08-d0b11b2df54b","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"经过处理,\"手机颜值超高,运行速度特别快,拍照效果惊艳!\" 可能会变成 "},{"type":"text","marks":[{"type":"bold"}],"text":"\"手机 颜值 超高 运行 速度 特别快 拍照 效果 惊艳\""},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"id":"d1940063-400f-45b2-856e-d1ef455f4617","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"可视化数据"}]},{"type":"paragraph","attrs":{"id":"925afcce-39e9-44d4-9bad-631f1ed85134","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"在训练模型前,我们可以先用 "},{"type":"text","marks":[{"type":"bold"}],"text":"词云"},{"type":"text","text":" 来看看用户评论中最常见的词汇。"}]},{"type":"codeBlock","attrs":{"id":"67da8dee-f87d-4af7-8b96-8721262ec895","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 生成词云 ===\nplt.figure(figsize=(10,6))\nwordcloud = WordCloud(\n font_path='simhei.ttf', # 需要中文字体文件\n width=800,\n height=600,\n background_color='white'\n).generate(' '.join(df['processed']))\nplt.imshow(wordcloud)\nplt.title(\"用户评论关键词云\")\nplt.axis(\"off\")\nplt.show()"}]},{"type":"heading","attrs":{"id":"06efecd4-0576-4128-9f39-786d417272fa","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"构建机器学习模型("},{"type":"text","text":"特征工程"},{"type":"text","marks":[{"type":"bold"}],"text":")"}]},{"type":"paragraph","attrs":{"id":"bf836343-1671-4707-8392-a38a768710da","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"我们使用 "},{"type":"text","marks":[{"type":"code"}],"text":"TF-IDF"},{"type":"text","text":" 提取文本特征,并用 "},{"type":"text","marks":[{"type":"bold"}],"text":"支持向量机(SVM)"},{"type":"text","text":" 进行分类:"}]},{"type":"orderedList","attrs":{"id":"9fa6f30d-c053-4b67-91ec-e058cb37d330","start":1,"isHoverDragHandle":false},"content":[{"type":"listItem","attrs":{"id":"bec5ce7d-5cae-41a1-ac83-ee1923394502"},"content":[{"type":"paragraph","attrs":{"id":"57e9d01e-597d-433b-bdd0-a0ecd4faf3c2","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"TfidfVectorizer:用 TF-IDF 方法提取文本特征,ngram_range=(1,2) 表示同时考虑单个词和两个词的组合。"}]}]},{"type":"listItem","attrs":{"id":"c1d5c5ee-8efa-45c2-a5a9-532c2d616e37"},"content":[{"type":"paragraph","attrs":{"id":"4fc8db50-7c32-4c01-9e54-69a607a58f67","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":" SVC(kernel='linear'):使用支持向量机(SVM)进行分类,linear 表示使用线性核。 "}]}]},{"type":"listItem","attrs":{"id":"e11a2700-19c4-434d-9724-a16b187a1763"},"content":[{"type":"paragraph","attrs":{"id":"277f8aab-f797-4542-bf4b-e54e8adbcdad","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"train_test_split(df['processed'], df['sentiment'], test_size=0.3):将数据分成 70% 训练集,30% 测试集。"}]}]}]},{"type":"codeBlock","attrs":{"id":"dbd71b94-bb5e-42eb-9c10-1a47b743a86c","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 构建机器学习流水线 ===\npipeline = Pipeline([\n ('tfidf', TfidfVectorizer(\n ngram_range=(1,2), # 包含1-2个词的组合\n max_features=500)), # 保留最重要的500个特征\n ('clf', SVC(kernel='linear', probability=True)) # 使用支持向量机\n])\n\n# === 数据集划分 ===\nX_train, X_test, y_train, y_test = train_test_split(\n df['processed'], \n df['sentiment'],\n test_size=0.3,\n stratify=df['sentiment'], # 保持类别分布\n random_state=42\n)\n\n# 训练模型\npipeline.fit(X_train, y_train)"}]},{"type":"heading","attrs":{"id":"cbc957e6-9e1d-4eb4-a04a-dfbd7ad96567","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"模型评估"}]},{"type":"paragraph","attrs":{"id":"7975a352-20e1-426d-8fb7-4d88fbce72d8","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"训练完成后,我们使用测试集评估模型的效果。"},{"type":"text","marks":[{"type":"code"}],"text":"classification_report(y_test, y_pred)"},{"type":"text","text":" 计算模型的 "},{"type":"text","marks":[{"type":"code"}],"text":"准确率"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"code"}],"text":"召回率"},{"type":"text","text":" 和 "},{"type":"text","marks":[{"type":"code"}],"text":"F1-score"},{"type":"text","text":",检查模型表现。"}]},{"type":"codeBlock","attrs":{"id":"527e6ebb-b387-460b-969c-4b5d944a10c9","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 模型评估 ===\nprint(\"\\n=== 模型评估报告 ===\\n\")\ny_pred = pipeline.predict(X_test)\nprint(classification_report(y_test, y_pred, target_names=['负面', '正面']))"}]},{"type":"paragraph","attrs":{"id":"a62d1b1b-3512-4a00-a951-bab40c14d691","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"如果 "},{"type":"text","marks":[{"type":"code"}],"text":"准确率"},{"type":"text","text":" 和 "},{"type":"text","marks":[{"type":"code"}],"text":"F1-score"},{"type":"text","text":" 够高,说明模型效果不错。"}]},{"type":"image","attrs":{"id":"90549490-b414-4380-b838-997f4d664aa0","src":"https://developer.qcloudimg.com/editor/image/5421023/20250317-6b0183bc.png","extension":"png","align":"center","alt":"","showAlt":false,"href":"","boxShadow":"","width":763,"aspectRatio":"1.067133","status":"success","showText":true,"isPercentage":false,"percentage":0,"isHoverDragHandle":false}},{"type":"heading","attrs":{"id":"b1e27e84-b198-43b5-88c9-f57f6b9552aa","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","marks":[{"type":"bold"}],"text":"情感预测"}]},{"type":"paragraph","attrs":{"id":"da42c664-3cdd-42c4-876f-4cc092de9122","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"我们可以用训练好的模型来预测新评论的情感。"},{"type":"text","marks":[{"type":"code"}],"text":"predict_proba"},{"type":"text","text":" 返回分类概率,取最大值作为置信度。"}]},{"type":"codeBlock","attrs":{"id":"bc86e5a8-237e-483d-9540-4472a55a3545","language":"python","theme":"atom-one-dark","runtimes":0,"isHoverDragHandle":false,"key":"","languageByAi":"python"},"content":[{"type":"text","text":"# === 预测函数 ===\n\ndef predict_sentiment(text):\n processed_text = chinese_text_processing(text)\n prediction = pipeline.predict([processed_text])[0]\n proba = pipeline.predict_proba([processed_text])[0]\n result = {\n 'text': text,\n 'sentiment': '正面' if prediction == 1 else '负面',\n 'confidence': f\"{max(proba)*100:.1f}%\"\n }\n return result\n\n# === 测试用例 ===\ntest_cases = [\n \"这次购物体验真是糟糕透顶!\",\n \"物超所值,绝对五星好评!\",\n \"中规中矩,没什么特别的感觉\",\n \"这手机烫得可以煎鸡蛋了\",\n\n \"操作流畅得让人感动\"\n]\n\nprint(\"\\n=== 预测测试 ===\")\nfor case in test_cases:\n res = predict_sentiment(case)\n print(f\"「{res['text']}」 → {res['sentiment']}(置信度:{res['confidence']})\")"}]},{"type":"image","attrs":{"id":"636a73bc-a50f-4c2d-8dfc-e3298936f373","src":"https://developer.qcloudimg.com/editor/image/5421023/20250317-a4f5d5de.png","extension":"png","align":"center","alt":"","showAlt":false,"href":"","boxShadow":"","width":988,"aspectRatio":"2.398058","status":"success","showText":true,"isPercentage":false,"percentage":0,"isHoverDragHandle":false}},{"type":"heading","attrs":{"id":"bcc89970-c63d-4a53-bf5a-ce234a5aac27","textAlign":"inherit","indent":0,"level":2,"isHoverDragHandle":false},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"id":"425c3a3c-4fd2-4ead-ae31-085adbdcfaf8","textAlign":"inherit","indent":0,"color":null,"background":null,"isHoverDragHandle":false},"content":[{"type":"text","text":"通过这篇教程,我们实现了一个完整的"},{"type":"text","marks":[{"type":"bold"}],"text":"中文情感分析系统"},{"type":"text","text":",包括 "},{"type":"text","marks":[{"type":"bold"}],"text":"文本预处理、特征提取、模型训练和预测"},{"type":"text","text":"。从预测结果来看,返回的结果并不理想,在未来我们可以尝试更大的数据集,或使用 "},{"type":"text","marks":[{"type":"bold"}],"text":"深度学习(如BERT)"},{"type":"text","text":" 提升准确率。"}]}]}

下一篇
举报
领券