Step #1: Install Java -> Install Apache Spark Standalone in CentOs 7
In this tutorial I will show you how you can easily install Apache Spark Standalone in CentOs 7. First of all you have to install Java on your machine.
Table of Contents
[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk [root@sparkCentOs pawel] java -version openjdk version "1.8.0_161" OpenJDK Runtime Environment (build 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
Step #2: Install Scala
In second step please install Scala.
[root@sparkCentOs pawel] wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz [root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz [root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib [root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala [root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin
Step #3: Installation of Apache Spark
Now we will download Apache Spark from official website and install on your machine. (How to install Apache Spark Standalone in CentOs 7)
# Download Spark [root@sparkCentOs pawel] wget http://ftp.ps.pl/pub/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] mkdir /usr/local/spark [root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark [root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar [root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin [root@sparkCentOs pawel] source ~/.bash_profile
Step #4: Run Spark Shell
Please run Spark shell and verify if Spark is working correctly.
[root@sparkCentOs pawel]# spark-shell 2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://sparkCentOs:4040 Spark context available as 'sc' (master = local[*], app id = local-1534795057680). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161) Type in expressions to have them evaluated. Type :help for more information. scala>
Let’s type some code 🙂
scala> val data = spark.sparkContext.parallelize( Seq("I like Spark","Spark is awesome", "My first Spark job is working now and is counting these words") ) data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :23 scala> val wordCounts = data.flatMap(row => row.split(" ")). map(word => (word, 1)).reduceByKey(_ + _) wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at :25 scala> wordCounts.foreach(println) (Spark,3) (is,3) (first,1) (My,1) (now,1) (job,1) (I,1) (like,1) (words,1) (awesome,1) (and,1) (counting,1) (working,1) (these,1)
That’s all about how to install Apache Spark Standalone in CentOs 7. Enjoy!
Really nice post.
It worked perfectly.
Thanks Marcos! I am glad to hear that! 🙂