You are currently viewing [SOLVED] Apache Spark Check If The File Exists On HDFS? – 1 Min Solution!
Could You Please Share This Post? I Appreciate It And Thank YOU! :) Have A Nice Day!
4.8
(762)

Hadoop Distributed FileSystem

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.

The Hadoop Distributed File System provides high throughput access to application data and is suitable for applications that have large data sets. The Hadoop Distributed File System relaxes a few POSIX requirements to enable streaming access to file system data. The Hadoop Distributed File System was originally built as infrastructure for the Apache Nutch web search engine project. The Hadoop Distributed File System is now an Apache Hadoop subproject. The project URL is https://hadoop.apache.org/hdfs/.

https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

Problem

[ Apache Spark Check if the file exists on HDFS ] We will use the FileSystem and Path classes from the org.apache.hadoop.fs library to achieve it. (Apache Spark Check if the file exists on HDFS?)

Spark 2.0 or higher

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession

object Test extends App {

  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set from spark-submit command.
    .master("local[*]")
    .appName("BigDataETL - Check if file exists")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // This methods returns Boolean (true - if file exists, false - if file doesn't exist
  val fileExists = fs.exists(new Path("<parh_to_file>"))

  if (fileExists) println("File exists!")
  else println("File doesn't exist!")

}

//  (Apache Spark Check if the file exists on HDFS?)

 Since Spark 1.6 to 2.0

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sparkConf = new SparkConf().setAppName(s"BigDataETL - Check if file exists")
  val sc = new SparkContext(sparkConf)

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(sc.hadoopConfiguration)

  // This methods returns Boolean (true - if file exists, false - if file doesn't exist
  val fileExists = fs.exists(new Path("<parh_to_file>"))

  if (fileExists) println("File exists!")
  else println("File doesn't exist!")

}

That’s all about topic: Apache Spark Check if the file exists on HDFS. Enjoy!

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

BigData-ETL: image 7YOU MIGHT ALSO LIKE

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 762

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?