Apache Spark: W jaki sposób sprawdzić czy plik istnieje na HDFS?

Problem

Wykorzystamy do tego klase FileSystem oraz Path z biblioteki org.apache.hadoop.fs.

Spark 2.0 lub wyższy

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession

object Test extends App {

  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set from spark-submit command.
    .master("local[*]")
    .appName("BigDataETL - Check if file exists")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // This methods returns Boolean (true - if file exists, false - if file doesn't exist
  val fileExists = fs.exists(new Path("<parh_to_file>"))

  if (fileExists) println("File exists!")
  else println("File doesn't exist!")

}

 

 Od Spark 1.6 do 2.0

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.{SparkConf, SparkContext}

object Test extends App {

  val sparkConf = new SparkConf().setAppName(s"BigDataETL - Check if file exists")
  val sc = new SparkContext(sparkConf)

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(sc.hadoopConfiguration)

  // This methods returns Boolean (true - if file exists, false - if file doesn't exist
  val fileExists = fs.exists(new Path("<parh_to_file>"))

  if (fileExists) println("File exists!")
  else println("File doesn't exist!")

}

 

 

Jeśli spodobał Ci się ten post to napisz proszę komentarz poniżej oraz udostępnij ten post na swoim Facebook’u, Twitter’ze, LinkedIn lub innej stronie z mediami społecznościowymi.
Z góry dzięki!

Please follow and like us:

Dodaj komentarz

Close Menu
Social media & sharing icons powered by UltimatelySocial

Enjoy this blog? Please spread the word :)