How to install Apache Spark Standalone in CentOs?

How to install Apache Spark Standalone in CentOs?

Step #1: Install Java

First of all you have to install Java on your machine.

[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk
[root@sparkCentOs pawel] java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Step #2: Install Scala

In second step please install Scala.

[root@sparkCentOs pawel] wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
[root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz
[root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib
[root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala
[root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin

Step #3: Installation of Apache Spark​

Now we will download Apache Spark from official website and install on your machine.

# Download Spark
[root@sparkCentOs pawel] wget http://ftp.ps.pl/pub/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
[root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz
[root@sparkCentOs pawel] mkdir /usr/local/spark
[root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark
[root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar
[root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin
[root@sparkCentOs pawel] source ~/.bash_profile

Step #4: Run Spark Shell

Please run Spark shell and verify if Spark is working correctly.

[root@sparkCentOs pawel]# spark-shell
2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://sparkCentOs:4040
Spark context available as 'sc' (master = local[*], app id = local-1534795057680).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
 
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161)
Type in expressions to have them evaluated.
Type :help for more information.
 
scala>

Let’s type some code 🙂

scala> val data = spark.sparkContext.parallelize(
    Seq("I like Spark","Spark is awesome",
    "My first Spark job is working now and is counting these words")
)
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :23
 
scala> val wordCounts = data.flatMap(row => row.split(" ")).
        map(word => (word, 1)).reduceByKey(_ + _)
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at :25
 
scala> wordCounts.foreach(println)
(Spark,3)
(is,3)
(first,1)
(My,1)
(now,1)
(job,1)
(I,1)
(like,1)
(words,1)
(awesome,1)
(and,1)
(counting,1)
(working,1)
(these,1)

If you enjoyed this post please leave the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage. Thanks in advanced!

2
Leave a Reply

avatar
1 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
pawelMarcos Oliveira Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Marcos Oliveira
Guest
Marcos Oliveira

Really nice post.
It worked perfectly.

Close Menu