Apache Spark: how to rename or delete a file from HDFS?

In this short post I will show you how you can change the name of the file / files created by Apache Spark to HDFS or simply rename or delete any file.

Rename file / files

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession

object Test extends App {

  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set rom s
    .master("local[*]")
    .appName("BigDataETL")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // Base path where Spark will produce output file
  val basePath = "/bigtadata_etl/spark/output"
  val newFileName = "renamed_spark_output"

  // Change file name from Spark generic to new one
  fs.rename(new Path(s"$basePath/part-00000"), new Path(s"$basePath/$newFileName"))

}

 

Removal of file / files

Using DSL. (I described another example under the post: Scala: how to run a shell command from the code level?) or use the FileSystem class from the org.apache.hadoop.fs package.

package com.bigdataetl

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession
import scala.sys.process._

object SparkDeleteFile {
  val spark = SparkSession.builder
    // I set master to local[*], because I run it on my local computer.
    // I production mode master will be set rom s
    .master("local[*]")
    .appName("BigDataETL")
    .getOrCreate()

  // Create FileSystem object from Hadoop Configuration
  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

  // Delete directories recursively using FileSystem class
  fs.delete(new Path("/bigdata_etl/data"), true)
  // Delete using Scala DSL
  s"hdfs dfs -rm -r /bigdata_etl/data/" !

  // Delete file
  fs.removeAcl(new Path("/bigdata_etl/data/file_to_delete.dat"))
  // Delete using Scala DSL
  s"hdfs dfs -rm /bigdata_etl/data/file_to_delete.dat" !

}

 

If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

2
Leave a Reply

avatar
1 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
pawelJogn Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Jogn
Guest
Jogn

I needed to create hdfs directory (mkdir) in a spark scala application, this really helped. Thanks

Close Menu