非常新手的Keras,我遇到了一个问题,当我试图打印出的形状,以便我可以使用它作为input_shape。到目前为止,我的代码如下:
df = pd.read_csv(pathname, encoding = "ISO-8859-1")
df = df[['content_cleaned', 'meaningful']]
df = df.sample(frac=1) #Shuffling the data
X = np.asarray(df[['content_cleaned']])
y = np.asarray(df[['meaningful']])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)
tokenizer = Tokenizer()
X_train = keras.preprocessing.text.Tokenizer(num_words=100)
X_test = keras.preprocessing.text.Tokenizer(num_words=100)
encoder = LabelBinarizer()
encoder.fit(y_train)
y_train = encoder.transform(y_train)
encoder.fit(y_test)
y_test = encoder.transform(y_test)
print(X_train.shape)
代码在最后一条print语句中失败。错误消息:
AttributeError: 'Tokenizer' object has no attribute 'shape'
再说一次,我对此还是个新手,似乎不知道如何克服这个错误。任何帮助都是最好的!
编辑:我对代码做了一些修改,试图实现另一个用户的建议。以下是代码(已更改):
# Create tokenizer
tokenizer = Tokenizer(num_words=100) #No row has more than 100 words.
#Tokenize the predictors (text)
X_train = tokenizer.sequences_to_matrix(X_train, mode="binary")
X_test = tokenizer.sequences_to_matrix(X_test, mode="binary")
声明X_train变量时失败。错误消息为:
TypeError: '>=' not supported between instances of 'str' and 'int'
编辑2:进行以下更改后,代码即可运行。当我运行print命令时,没有打印任何内容:
X_train = tokenizer.sequences_to_matrix(int(input(X_train)), mode="binary")
X_test = tokenizer.sequences_to_matrix(int(input(X_test)), mode="binary")
发布于 2018-12-24 10:47:36
我相信这是因为虽然您首先将其设置为numpy数组...
X = np.asarray(df[['content_cleaned']])
..。并给它提供数据。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)
..。然后,将其设置为Tokenizer对象,该对象显然没有'shape‘属性。
X_train = keras.preprocessing.text.Tokenizer(num_words=100)
https://stackoverflow.com/questions/53908686
复制相似问题