Table of Contents
Apache Spark is an open-source, distributed computing system that is designed for large-scale data processing. It is a fast and flexible data processing engine that can process data in a variety of formats, including structured, semi-structured, and unstructured data.
Spark provides a range of tools and libraries for data processing, including support for SQL, machine learning, and stream processing. It also offers a flexible and interactive programming model that allows you to easily develop and deploy distributed applications.
Spark is designed to be highly scalable, so it can handle data processing tasks of any size. It can run on a single machine or on a cluster of hundreds of machines, making it an ideal choice for big data processing tasks.
Overall, Apache Spark is a powerful and versatile data processing platform that is well-suited for a wide range of data processing tasks and applications.
CentOS (Community Enterprise Operating System) is a Linux distribution that is based on Red Hat Enterprise Linux (RHEL). It is a free and open-source operating system that is designed to be stable, reliable, and secure.
Centos 7 is the seventh major release of the CentOS operating system. It was released in 2014 and is based on RHEL 7. It includes a number of improvements and new features, such as support for the XFS file system, support for Docker, and the ability to use the system as a router.
Centos 7 is a popular choice for servers and other enterprise-level systems due to its stability and security features. It is also widely used in the hosting industry and for building web servers and other Internet-facing systems.
To use Centos 7, you will need to install it on a computer or server. You can download the Centos 7 installation media (such as an ISO image) from the CentOS website and create a bootable USB drive or DVD. Once the installation media is prepared, you can boot your computer from the media and follow the prompts to install Centos 7.
Install Spark On CentOs 7
To install Spark on CenOs 7 you have to follow steps:
[root@sparkCentOs pawel] sudo yum install java-1.8.0-openjdk [root@sparkCentOs pawel] java -version openjdk version "1.8.0_161" OpenJDK Runtime Environment (build 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
If JDK is not installed, you can install it by running the following command:
yum install java-1.8.0-openjdk
[root@sparkCentOs pawel] wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz [root@sparkCentOs pawel] tar xvf scala-2.11.8.tgz [root@sparkCentOs pawel] sudo mv scala-2.11.8 /usr/lib [root@sparkCentOs pawel] sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala [root@sparkCentOs pawel] export PATH=$PATH:/usr/lib/scala/bin
Step #3: Installation of Apache Spark
# Download Spark [root@sparkCentOs pawel] wget http://ftp.ps.pl/pub/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] tar xf spark-2.3.1-bin-hadoop2.7.tgz [root@sparkCentOs pawel] mkdir /usr/local/spark [root@sparkCentOs pawel] cp -r spark-2.3.1-bin-hadoop2.7/* /usr/local/spark [root@sparkCentOs pawel] export SPARK_EXAMPLES_JAR=/usr/local/spark/examples/jars/spark-examples_2.11-2.3.1.jar [root@sparkCentOs pawel] PATH=$PATH:$HOME/bin:/usr/local/spark/bin [root@sparkCentOs pawel] source ~/.bash_profile
Step #4: Run Spark Shell
Please run Spark shell and verify if Spark is working correctly.
[root@sparkCentOs pawel]# spark-shell 2018-08-20 19:57:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://sparkCentOs:4040 Spark context available as 'sc' (master = local[*], app id = local-1534795057680). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_161) Type in expressions to have them evaluated. Type :help for more information. scala>
Let’s type some code 🙂
scala> val data = spark.sparkContext.parallelize( Seq("I like Spark","Spark is awesome", "My first Spark job is working now and is counting these words") ) data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD at parallelize at :23 scala> val wordCounts = data.flatMap(row => row.split(" ")). map(word => (word, 1)).reduceByKey(_ + _) wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD at reduceByKey at :25 scala> wordCounts.foreach(println) (Spark,3) (is,3) (first,1) (My,1) (now,1) (job,1) (I,1) (like,1) (words,1) (awesome,1) (and,1) (counting,1) (working,1) (these,1)
That’s all about How to install Apache Spark Standalone in Centos 7. Enjoy! For more information on installing and using Apache Spark, you can consult the Spark documentation or visit the Spark website.
Could You Please Share This Post? I appreciate It And Thank YOU! :) Have A Nice Day!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?