follow all steps in hadoop-3.1.3 cluster setup on linux
and then switch to root user:
su
tar -xvzf /opt/software/spark-3.1.1-bin-hadoop3.2.tgz -C /opt/module
vi /etc/profile
add the following 4 lines:
export SPARK_HOME="/opt/module/spark-3.1.1-bin-hadoop3.2"
export PATH=$PATH:$SPARK_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
source or re-login:
source /etc/profile
cd /opt
spark-submit --version
cd $SPARK_HOME
cp conf/spark-defaults.conf.template conf/spark-defaults.conf
vi conf/spark-defaults.conf
add:
spark.master yarn
start hdfs and yarn:
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
use jps
to check ResourceManager
process
spark-submit --master yarn --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar
put test file in hdfs:
cd ~
wget -O alice.txt https://www.gutenberg.org/files/11/11-0.txt
hdfs dfs -mkdir inputs
hdfs dfs -put alice.txt inputs
run spark-shell
and read the test file:
spark-shell --master yarn --deploy-mode client
val input = sc.textFile("inputs/alice.txt")
// Count the number of non blank lines
input.filter(line => line.length()>0).count()
vi $SPARK_HOME/conf/spark-defaults.conf
add following 3 lines:
spark.driver.memory 512m
spark.yarn.am.memory 512m
spark.executor.memory 512m
if necessary:
try following cmds:
which java
ls -l /usr/bin/java
mv /usr/bin/java /usr/bin/java2
java -version
jdk should be 1.8 now.
Spark web UI at http://master:4040
Yarn web UI at http://master:8088/
download archive version of pkgs, view http://archive.apache.org/dist/spark/
for more information, view
https://spark.apache.org/docs/latest/running-on-yarn.html
https://www.linode.com/docs/guides/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/
https://sparkbyexamples.com/spark/spark-setup-on-hadoop-yarn/
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。