You are currently viewing Create Spark Session In Scala
Could You Please Share This Post? I Appreciate It And Thank YOU! :) Have A Nice Day!
0
(0)

Creating a Spark Session object, which instructs Spark how to access a cluster, is the first step a Spark application must do. You must first generate a SparkSession object, which holds details about your application, before you can establish a SparkContext and SQLContext instances which open for you the Spark functionalities.

Every Spark application, at its core, comprises of a driver software that performs the user’s primary purpose and a number of parallel tasks on a cluster.


Spark Free Tutorials

This post is a part of Spark Free Tutorial. Check the rest of the Spark tutorials which uou can find on the right side bar of this page! Stay tuned!


What Is RDD?

A resilient distributed dataset (RDD), which is a set of items divided across the cluster’s nodes and capable of being processed in parallel, is the primary abstraction Spark offers. RDDs are made by changing an existing Scala collection in the driver application or a file in the Hadoop file system (or any other file system supported by Hadoop) as the starting point for a new RDD.

Additionally, users can request that Spark keep an RDD in memory so that it can be effectively used in several concurrent processes. RDDs also automatically restore themselves after node failures.

Spark Driver VS Spark Executor

Each Spark application must consist of:

Spark Driver is like a Boss. It manage the whole application. It decides what part of job will be done on which Executor and also gets the information from Executors about task statuses.

The communication must be bidirectional. In Hadoop world when the application is submitted to YARN for acceptation the requested resources should be given. Spark Driver is setup on one of the Hadoop Node and the executors on the other Nodes (Spark Driver also can be on the same machine as one of the executor).

BigData-ETL: image 3

Spark Session VS Spark Context

Spark Session is the main object in Spark – it’s the entry point of each Spark application.

Spark Context is the Spark Session object variable which is used to operate on RDD.

SQL Context is the same as Spark Context the Spark Session object variable which is used to execute operation on DataFrames and DataSets.

To visualize these dependencies take a look at the following diagram:

BigData-ETL: image 4

Create SparkSession Object

First of all what we need to do to start woking with Spark is to create the SparkSession instance. To create it you need just few lines of code. That’s all! Now you can start your journey with Apache Spark!

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .appName("BigData-ETL.com")
    .getOrCreate()

// Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]" to run locally with 4 cores, with * to run with all available cores, or "spark://master:7077" to run on a Spark standalone cluster.

Summary

That’s all about how to create Spark Session In Scala. Now you are ready to write your first Spark Application. Don’t waste the time and let’s go to the next section!

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

BigData-ETL: image 7YOU MIGHT ALSO LIKE

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?