Apache Spark: how to rename or delete a file from HDFS?

In this short post I will show you how you can change the name of the file / files created by Apache Spark to HDFS or simply rename or delete any file.

Rename file / files

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession

object Test extends App {

  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set rom s
    .master("local[*]")
    .appName("BigDataETL")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // Base path where Spark will produce output file
  val basePath = "/bigtadata_etl/spark/output"
  val newFileName = "renamed_spark_output"

  // Change file name from Spark generic to new one
  fs.rename(new Path(s"$basePath/part-00000"), new Path(s"$basePath/$newFileName"))

}

 

Removal of file / files

Using DSL. (I described another example under the post: Scala: how to run a shell command from the code level?) or use the FileSystem class from the org.apache.hadoop.fs package.

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession
import scala.sys.process._

object SparkDeleteFile {
  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set rom s
    .master("local[*]")
    .appName("BigDataETL")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // Delete directories recursively using FileSystem class
  fs.delete(new Path("/bigdata_etl/data"), true)
  // Delete using Scala DSL
  s"hdfs dfs -rm -r /bigdata_etl/data/" !

  // Delete file
  fs.removeAcl(new Path("/bigdata_etl/data/file_to_delete.dat"))
  // Delete using Scala DSL
  s"hdfs dfs -rm /bigdata_etl/data/file_to_delete.dat" !

}

 

If you enjoyed this post please leave the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

Please follow and like us:
error

This Post Has One Comment

  1. I needed to create hdfs directory (mkdir) in a spark scala application, this really helped. Thanks

Leave a Reply

Close Menu
Social media & sharing icons powered by UltimatelySocial
error

Enjoy this blog? Please spread the word :)