我已经阅读了几个小时,每次我解决一个错误时,我都会遇到另一个错误。
我试图使用nltk生成各种ngram (unigram / bigram / trigram .)基于在csv中找到的单词(附例)。
抱歉,这可能真的很简单。尽管如此,任何帮助都会受到惊吓!
import re
import os
import csv
from collections import Counter
from nltk.util import ngrams
from nltk import word_tokenize
import nltk
nltk.download('punkt')
cwd = os.getcwd()
ngrams = open(os.path.join(cwd, "combined.csv"),
"r", encoding="utf8")
with ngrams as f:
reader = csv.DictReader(f, delimiter=',')
keywords = [item['Keyword'] for item in reader]
string = " ".join(keywords)
# token = nltk.word_tokenize(string)
unigrams = ngrams(string, 1)
bigrams = ngrams(string, 2)
trigrams = ngrams(string, 3)
print(trigrams)错误
File "ngram.py", line 27, in <module>
unigrams = ngrams(string, 1)
TypeError: '_io.TextIOWrapper' object is not callablecombined.csv >>
关键字
'k杯‘,
一杯咖啡,
“咖啡荚”,
“咖啡豆”
“不要浇水”,
“不能抽水”,
“k杯能持续多久”,
他不会抽水,
“'keurig故障排除”,
“廉价的k杯”,
“折叠广告”,
‘茶杯’,
“水不出来”,
发布于 2020-12-05 11:53:01
你的错误是你超载的ngrams。(将其用作文件和ntlk函数)
解决办法可以是:
with open(os.path.join(cwd, "combined.csv"),
"r", encoding="utf8") as ngrams_file
reader = csv.DictReader(ngrams_file, delimiter=',')
keywords = [item['Keyword'] for item in reader]
string = " ".join(keywords)
# token = nltk.word_tokenize(string)
unigrams = ngrams(string, 1)
bigrams = ngrams(string, 2)
trigrams = ngrams(string, 3)
print(trigrams)发布于 2020-12-05 11:53:13
NLTK函数名和文件描述符名称之间存在冲突。您需要更改描述符or名称或重写with构造:
with open(os.path.join(cwd, "combined.csv"), "r", encoding="utf8") as f:
# your operations and ngrams method here https://stackoverflow.com/questions/65156554
复制相似问题