You are currently viewing How to install Apache Spark Standalone in CentOs 7? – check how it is easy in 5 mins!
Could You Please Share This Post? I Appreciate It And Thank YOU! :) Have A Nice Day!

Step #1: Install Java -> Install Apache Spark Standalone in CentOs 7

In this tutorial I will show you how you can easily install Apache Spark Standalone in CentOs 7. First of all you have to install Java on your machine.

[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk
[root@sparkCentOs pawel] java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Step #2: Install Scala

In second step please install Scala.

[root@sparkCentOs pawel] wget
[root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz
[root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib
[root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala
[root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin

Step #3: Installation of Apache Spark​

Now we will download Apache Spark from official website and install on your machine. (How to install Apache Spark Standalone in CentOs 7)

# Download Spark
[root@sparkCentOs pawel] wget
[root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz
[root@sparkCentOs pawel] mkdir /usr/local/spark
[root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark
[root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar
[root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin
[root@sparkCentOs pawel] source ~/.bash_profile

Step #4: Run Spark Shell

Please run Spark shell and verify if Spark is working correctly.

[root@sparkCentOs pawel]# spark-shell
2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://sparkCentOs:4040
Spark context available as 'sc' (master = local[*], app id = local-1534795057680).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161)
Type in expressions to have them evaluated.
Type :help for more information.

Let’s type some code 🙂

scala> val data = spark.sparkContext.parallelize(
    Seq("I like Spark","Spark is awesome",
    "My first Spark job is working now and is counting these words")
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :23
scala> val wordCounts = data.flatMap(row => row.split(" ")).
        map(word => (word, 1)).reduceByKey(_ + _)
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at :25
scala> wordCounts.foreach(println)

That’s all about How to install Apache Spark Standalone in CentOs 7. Enjoy!

If you enjoyed this post please leave the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage. Thanks in advanced!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 896

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?