在使用tfidf.vectorizer后,可以通过以下步骤获得每类词的最高tf-IDF值:
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
documents = [
"This is document 1.",
"This is document 2.",
"Document 3 is different from the others."
]
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)
tfidf_array = tfidf_matrix.toarray()
feature_names = vectorizer.get_feature_names()
max_tfidf_per_class = []
for i in range(tfidf_array.shape[0]):
max_tfidf_idx = np.argmax(tfidf_array[i])
max_tfidf_value = tfidf_array[i][max_tfidf_idx]
max_tfidf_word = feature_names[max_tfidf_idx]
max_tfidf_per_class.append((max_tfidf_word, max_tfidf_value))
for word, value in max_tfidf_per_class:
print("Word: {}, TF-IDF value: {}".format(word, value))
以上是获取每类词的最高tf-IDF值的步骤。TF-IDF(Term Frequency-Inverse Document Frequency)是一种用于衡量文本特征在文档集合中重要性的统计方法。它可以用于文本分类、信息检索、文本摘要等任务中。
推荐的腾讯云相关产品和产品介绍链接地址:
领取专属 10元无门槛券
手把手带您无忧上云