How to save data ORC Parquet Text CSV in Hive file or any different file type? – 4 types 1 easy approach!

You are currently viewing How to save data ORC Parquet Text CSV in Hive file or any different file type? – 4 types 1 easy approach!
Photo by Annie Spratt on Unsplash
Share This Post, Help Others, And Earn My Heartfelt Appreciation! :)
5
(1298)

In this post I will show you how to save data ORC Parquet Text CSV in Hive in few ways how you can export data from Hive to csv file. For this tutorial I have prepared table “test_csv_data” with few records into this table.

Save Data ORC Parquet Text CSV in Hive

Method 1: INSERT OVERWRITE LOCAL DIRECTORY…

Please find the below HiveQL syntax. I use “INSERT OVERWRITE LOCAL DIRECTORY” syntax to create csv file as result of select “Select * from test_csv_data”. In this method we have to execute this HiveQL syntax using hive or beeline command line or Hue for instance. (How to Save Data ORC Parquet Text CSV in Hive)

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
LINES TERMINATED BY "\n"
SELECT * FROM test_csv_data;

Command output:

19/12/20 08:19:33 INFO ql.Driver: Compiling command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246): INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
LINES TERMINATED BY "\n"
SELECT * FROM test_csv_data
19/12/20 08:19:34 INFO ql.Driver: Semantic Analysis Completed
19/12/20 08:19:34 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_csv_data.id, type:int, comment:null), FieldSchema(name:test_csv_data.name, type:varchar(20), comment:null), FieldSchema(name:test_csv_data.age, type:int, comment:null)], properties:null)
19/12/20 08:19:34 INFO ql.Driver: Completed compiling command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246); Time taken: 0.108 seconds
19/12/20 08:19:34 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/12/20 08:19:34 INFO ql.Driver: Executing command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246): INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
LINES TERMINATED BY "\n"
SELECT * FROM test_csv_data
19/12/20 08:19:34 INFO ql.Driver: Query ID = hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246
19/12/20 08:19:34 INFO ql.Driver: Total jobs = 1
19/12/20 08:19:34 INFO ql.Driver: Launching Job 1 out of 1
19/12/20 08:19:34 INFO ql.Driver: Starting task [Stage-1:MAPRED] in serial mode
19/12/20 08:19:34 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
19/12/20 08:19:34 INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
19/12/20 08:19:34 INFO mr.ExecDriver: adding libjars: file:/usr/lib/hive/lib/hive-contrib.jar
19/12/20 08:19:34 ERROR mr.ExecDriver: yarn
19/12/20 08:19:34 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/12/20 08:19:34 INFO exec.Task: Starting Job = job_1576766377926_0012, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1576766377926_0012/
19/12/20 08:19:34 INFO exec.Task: Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1576766377926_0012
19/12/20 08:19:40 INFO exec.Task: Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
19/12/20 08:19:40 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
19/12/20 08:19:40 INFO exec.Task: 2019-12-20 08:19:40,838 Stage-1 map = 0%,  reduce = 0%
19/12/20 08:19:47 INFO exec.Task: 2019-12-20 08:19:47,130 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.6 sec
19/12/20 08:19:48 INFO exec.Task: MapReduce Total cumulative CPU time: 1 seconds 600 msec
19/12/20 08:19:48 INFO exec.Task: Ended Job = job_1576766377926_0012
19/12/20 08:19:48 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/12/20 08:19:48 INFO exec.Task: Copying data to local directory /tmp/bigdataetl/hive2csv_insOverLocal from hdfs://quickstart.cloudera:8020/tmp/hive/cloudera/ee559ee2-25dd-41b7-bd65-a0767a7a03b2/hive_2019-12-20_08-19-33_928_8164278149162791118-1/-mr-10000
19/12/20 08:19:48 INFO ql.Driver: MapReduce Jobs Launched: 
19/12/20 08:19:48 INFO ql.Driver: Stage-Stage-1: Map: 1   Cumulative CPU: 1.6 sec   HDFS Read: 3712 HDFS Write: 70 SUCCESS
19/12/20 08:19:48 INFO ql.Driver: Total MapReduce CPU Time Spent: 1 seconds 600 msec
19/12/20 08:19:48 INFO ql.Driver: Completed executing command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246); Time taken: 14.245 seconds
19/12/20 08:19:48 INFO ql.Driver: OK

Method 2: hive -e

Using this method firstly we have to create shell/bash script which we will execute. In short: we will execute shell script which fetch data from hive and write output to file. (How to Save Data ORC Parquet Text CSV in Hive)

Create shell script

[root@quickstart bigdataetl]# vim hive2csv_hive.sh

# Put this command into shell file
hive -e 'select * from test_csv_data' | sed 's/[\t]/,/g'  > hive2csv_hive.csv

Execute script and see the command output. (How to Save Data ORC Parquet Text CSV in Hive)

[root@quickstart bigdataetl]# ./hive2csv_hive.sh 
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.161 seconds, Fetched: 8 row(s)

Method 3: beeline

beeline -u 'jdbc:hive2://localhost:10000/default' --outputformat=csv2 -f hive2csv_beeline.sql > hive2csv_beeline.csv

Command output: (How to Save Data ORC Parquet Text CSV in Hive)

scan complete in 4ms
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.7.0)
Driver: Hive JDBC (version 1.1.0-cdh5.7.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_csv_data.id, type:int, comment:null), FieldSchema(name:test_csv_data.name, type:varchar(20), comment:null), FieldSchema(name:test_csv_data.age, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c); Time taken: 0.113 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c): select * from test_csv_data
INFO  : Completed executing command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c); Time taken: 0.002 seconds
INFO  : OK
8 rows selected (0.371 seconds)
Closing: 0: jdbc:hive2://localhost:10000/default

Results

Below you can find the generated files under /tmp/bigdataetl path: (How to Save Data ORC Parquet Text CSV in Hive)

[root@quickstart bigdataetl]# find "$(pwd)"
/tmp/bigdataetl
/tmp/bigdataetl/hive2csv_beeline.csv
/tmp/bigdataetl/hive2csv_insOverLocal
/tmp/bigdataetl/hive2csv_insOverLocal/000000_0
/tmp/bigdataetl/hive2csv_insOverLocal/.000000_0.crc
/tmp/bigdataetl/hive2csv_beeline.sql
/tmp/bigdataetl/hive2csv_hive.csv
/tmp/bigdataetl/hive2csv_hive.sh

That’s all about How to Save Data ORC Parquet Text CSV in Hive!

If you enjoyed this post please add the comment below and share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1298

No votes so far! Be the first to rate this post.

Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
PARITOSH

please tell for PARQUET file using HIVE -e