我正在使用flume导入TWITTER数据。
我在我的flume conf中添加了以下一行:
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
如何在ubuntu 12.04上安装mahout?
sudo apt-get install mahout
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package mahout
https://ccp.cloudera.com/display/CDHDOC/Mahout+Installation
To install Mahout on an Ubuntu or other Debian syste
我找不到正确配置mahout的方法。这就是当我尝试运行"Mahout in Action“一书中的"donut.cvs”示例时发生的事情:
Running on hadoop, using /home/myname/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/myname/mahout/mahout-examples-0.7-job.jar
Not a valid JAR: C:\home\myname\mahout\mahout-examples-0.7-job.jar
我必须在哪里更改参数?
根据和提供的说明,我已经在bitnami AMI ami-02fb006b上安装了mahout (以及其他几个ami,否则我不会问这个问题):
我在尝试运行时总是卡住。/examples/bin/build-reurs.sh以下是命令的输出:
> Please select a number to choose the corresponding clustering
> algorithm
> 1. kmeans clustering
> 2. lda clustering Enter your choice : 1 ok. You chose 1 and we
我试图在Mahout中运行2新闻组分类示例。我已经设置了MAHOUT_LOCAL=true,分类器不显示混淆矩阵,并发出以下警告:
ok. You chose 2 and we'll use naivebayes
creating work directory at /tmp/mahout-work-cloudera
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /tmp/mahout-work-cloudera/20news-all
+ mkdir /tmp/maho
这是我在主干目录中执行mvn全新安装时得到的错误。某些测试失败,并且未安装内核。对于可能出错的地方有什么建议吗?我已经安装了maven,并且我知道它已经正确安装了,因为mvn --version可以工作。
下面是完整的输出:
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.mahout:mahout-integration:jar:0.7-SNAPSHOT
[WARNING]
我以multi_node的身份在linux (centos).hadoop上安装了mahout-0.7和hadoop-1.2.1。我创建了一个名为hadoop的用户,并将mahout和hadoop安装在path /home/hadoop/opt/我设置了MAHOU_HOME、HADOOP_HOME和MAHOUT_LOCAL,...在用户环境hadoop的.bashrc文件中
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific al
我正在尝试在Mahout中运行2news Mahout分类示例。我设置了MAHOUT_LOCAL=true,分类器不显示混淆矩阵,并给出以下警告:
ok. You chose 1 and we'll use cnaivebayes
creating work directory at /tmp/mahout-work-cloudera
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /tmp/mahout-work-cloudera/20news-all
+ mkdir
当运行mahout时,我会得到以下错误:从输入路径到目录的终端之间的火花项目相似。
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:119)
at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityD
在此之前,我将Mahout_Local=TRUE设置为让程序在本地运行。我使用的是OSX 10.9和~./bash_profile:
export MAHOUT_LOCAL="TRUE"
现在我希望程序在Hadoop文件系统中运行。如何取消设置MAHOUT_LOCAL以执行此操作?
我试过了:
export MAHOUT_LOCAL=""
source ~/.bash_profile
然后运行作业。但我还是得到了:
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
MAH
我已经使用Hortonworks在windows上安装了Hadoop。然后我下载了mahout并成功运行:
.HADOOP_HOME\bin\hadoop jar C:\mahout-distribution-0.7\mahout-core-0.7-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCURRENCE --input i --output o
然而,这是仅有的几个有效的模块之一。
当我运行其他命令时,比如:
.HADOOP_HOME/bin/hadoop jar C:\ma
我正在尝试将Mahout导入mac上的eclipse。我使用以下命令安装了Apache Mahout:
$wget -c http://archive.apache.org/dist/mahout/0.9/mahout-distribution-0.9.tar.gz
$tar zxf mahout-distribution-0.9.tar.gz
$cd mahout-distribution-0.9
$mvn eclipse:eclipse
当我运行最后一部分时,我得到了错误:
Goal requires a project to execute but there is no POM in
有一个mahout数学项目。我怎么才能把它做成罐子呢?我需要使用最新的代码,因为我能找到的唯一的是错误的。我得到的错误:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.14.1:test (default-test) on project mahout-math: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.14.1:test failed: The forked V