In this tutorial I will show you how to install Apache Spark" Standalone In Centos 7".
Table of Contents
Introduction
Apache Spark
Apache Spark" is an open-source, distributed computing system that is designed for large-scale data processing. It is a fast and flexible data processing engine that can process data in a variety of formats, including structured, semi-structured, and unstructured data.
Spark provides a range of tools and libraries for data processing, including support for SQL", machine learning", and stream processing. It also offers a flexible and interactive programming" model that allows you to easily develop and deploy distributed applications.
Spark is designed to be highly scalable, so it can handle data processing tasks of any size. It can run on a single machine or on a cluster of hundreds of machines, making it an ideal choice for big data" processing tasks.
Overall, Apache Spark" is a powerful and versatile data processing platform that is well-suited for a wide range of data processing tasks and applications.
CentOs
CentOS" (Community Enterprise Operating System) is a Linux distribution that is based on Red Hat Enterprise Linux" (RHEL). It is a free and open-source operating system that is designed to be stable, reliable, and secure.
Centos 7" is the seventh major release of the CentOS" operating system. It was released in 2014 and is based on RHEL 7. It includes a number of improvements and new features, such as support for the XFS file system, support for Docker", and the ability to use the system as a router.
Centos 7" is a popular choice for servers and other enterprise-level systems due to its stability and security features. It is also widely used in the hosting industry and for building web servers and other Internet-facing systems.
To use Centos 7", you will need to install it on a computer or server. You can download the Centos 7" installation media (such as an ISO image) from the CentOS" website and create a bootable USB drive or DVD. Once the installation media is prepared, you can boot your computer from the media and follow the prompts to install Centos 7".
Install Spark On CentOs 7
To install Spark" on CenOs 7 you have to follow steps:
Install Apache Spark Standalone in CentOs 7
Step #1: Install Java
First of all you have to install Java" on your machine which means that to install Apache Spark" on Centos 7", you will need to have the Java" Development Kit (JDK) installed on your system.
[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk [root@sparkCentOs pawel] java -version openjdk version "1.8.0_161" OpenJDK Runtime Environment (build 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
If JDK is not installed, you can install it by running the following command:
yum install java-1.8.0-openjdk
Step #2: Install Scala
In second step please install Scala".
[root@sparkCentOs pawel] wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz [root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz [root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib [root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala [root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin
Step #3: Installation of Apache Spark
Now we will download Apache Spark" from official website and install on your machine. (How to install Apache Spark" Standalone in Centos 7")
# Download Spark [root@sparkCentOs pawel] wget http://ftp.ps.pl/pub/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] mkdir /usr/local/spark [root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark [root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar [root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin [root@sparkCentOs pawel] source ~/.bash_profile
Step #4: Run Spark Shell
Please run Spark shell" and verify if Spark" is working correctly.
[root@sparkCentOs pawel]# spark-shell 2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://sparkCentOs:4040 Spark context available as 'sc' (master = local[*], app id = local-1534795057680). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161) Type in expressions to have them evaluated. Type :help for more information. scala>
This will start the Spark shell", which is an interactive environment for running Spark commands. You can use the Spark shell" to run Spark applications and perform other tasks.
Let’s type some code 🙂
scala> val data = spark.sparkContext.parallelize( Seq("I like Spark","Spark is awesome", "My first Spark job is working now and is counting these words") ) data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :23 scala> val wordCounts = data.flatMap(row => row.split(" ")). map(word => (word, 1)).reduceByKey(_ + _) wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at :25 scala> wordCounts.foreach(println) (Spark,3) (is,3) (first,1) (My,1) (now,1) (job,1) (I,1) (like,1) (words,1) (awesome,1) (and,1) (counting,1) (working,1) (these,1)
Summary
That’s all about How to install Apache Spark" Standalone in Centos 7". Enjoy! For more information on installing and using Apache Spark", you can consult the Spark documentation or visit the Spark website.
Could You Please Share This Post?
I appreciate It And Thank YOU! :)
Have A Nice Day!