In this short post I will show you how you can change the name of the file / files created by Apache Spark to HDFS or simply rename or delete any file.
Rename file / files
package com.bigdataetl import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.sql.SparkSession object Test extends App { val spark = SparkSession.builder // I set master to local[*], because I run it on my local computer. // I production mode master will be set rom s .master("local[*]") .appName("BigDataETL") .getOrCreate() // Create FileSystem object from Hadoop Configuration val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration) // Base path where Spark will produce output file val basePath = "/bigtadata_etl/spark/output" val newFileName = "renamed_spark_output" // Change file name from Spark generic to new one fs.rename(new Path(s"$basePath/part-00000"), new Path(s"$basePath/$newFileName")) }
Removal of file / files
Using DSL. (I described another example under the post: Scala: how to run a shell command from the code level?) or use the FileSystem class from the org.apache.hadoop.fs package.
package com.bigdataetl import org.apache.hadoop.fs.{FileSystem, Path} import org.apache.spark.sql.SparkSession import scala.sys.process._ object SparkDeleteFile { val spark = SparkSession.builder // I set master to local[*], because I run it on my local computer. // I production mode master will be set rom s .master("local[*]") .appName("BigDataETL") .getOrCreate() // Create FileSystem object from Hadoop Configuration val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration) // Delete directories recursively using FileSystem class fs.delete(new Path("/bigdata_etl/data"), true) // Delete using Scala DSL s"hdfs dfs -rm -r /bigdata_etl/data/" ! // Delete file fs.removeAcl(new Path("/bigdata_etl/data/file_to_delete.dat")) // Delete using Scala DSL s"hdfs dfs -rm /bigdata_etl/data/file_to_delete.dat" ! }
If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!
I needed to create hdfs directory (mkdir) in a spark scala application, this really helped. Thanks
Thanks Jogn! I am very glad it has helped you! 🙂