How to install Apache Spark Standalone in CentOs 7? – check how it is easy in 5 mins!

You are currently viewing How to install Apache Spark Standalone in CentOs 7? – check how it is easy in 5 mins!
Share This Post, Help Others, And Earn My Heartfelt Appreciation! :)
4.8
(895)

Step #1: Install Java -> Install Apache Spark Standalone in CentOs 7

In this tutorial I will show you how you can easily install Apache Spark Standalone in CentOs 7. First of all you have to install Java on your machine.

[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk
[root@sparkCentOs pawel] java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Step #2: Install Scala

In second step please install Scala.

[root@sparkCentOs pawel] wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
[root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz
[root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib
[root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala
[root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin

Step #3: Installation of Apache Spark​

Now we will download Apache Spark from official website and install on your machine. (How to install Apache Spark Standalone in CentOs 7)

# Download Spark
[root@sparkCentOs pawel] wget http://ftp.ps.pl/pub/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
[root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz
[root@sparkCentOs pawel] mkdir /usr/local/spark
[root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark
[root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar
[root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin
[root@sparkCentOs pawel] source ~/.bash_profile

Step #4: Run Spark Shell

Please run Spark shell and verify if Spark is working correctly.

[root@sparkCentOs pawel]# spark-shell
2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://sparkCentOs:4040
Spark context available as 'sc' (master = local[*], app id = local-1534795057680).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
 
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161)
Type in expressions to have them evaluated.
Type :help for more information.
 
scala>

Let’s type some code 🙂

scala> val data = spark.sparkContext.parallelize(
    Seq("I like Spark","Spark is awesome",
    "My first Spark job is working now and is counting these words")
)
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :23
 
scala> val wordCounts = data.flatMap(row => row.split(" ")).
        map(word => (word, 1)).reduceByKey(_ + _)
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at :25
 
scala> wordCounts.foreach(println)
(Spark,3)
(is,3)
(first,1)
(My,1)
(now,1)
(job,1)
(I,1)
(like,1)
(words,1)
(awesome,1)
(and,1)
(counting,1)
(working,1)
(these,1)

That’s all about how to install Apache Spark Standalone in CentOs 7. Enjoy!

If you enjoyed this post please leave the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage. Thanks in advanced!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 895

No votes so far! Be the first to rate this post.

Subscribe
Notify of
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Marcos Oliveira

Really nice post.
It worked perfectly.