In this post I will show you how to save data ORC Parquet Text CSV in Hive in few ways how you can export data from Hive to csv file. For this tutorial I have prepared table “test_csv_data” with few records into this table.
Save Data ORC Parquet Text CSV in Hive
Method 1: INSERT OVERWRITE LOCAL DIRECTORY…
Please find the below HiveQL syntax. I use “INSERT OVERWRITE LOCAL DIRECTORY” syntax to create csv file as result of select “Select * from test_csv_data”. In this method we have to execute this HiveQL syntax using hive or beeline command line or Hue for instance. (How to Save Data ORC Parquet Text CSV in Hive)
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY "\n" SELECT * FROM test_csv_data;
Command output:
19/12/20 08:19:33 INFO ql.Driver: Compiling command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246): INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY "\n" SELECT * FROM test_csv_data 19/12/20 08:19:34 INFO ql.Driver: Semantic Analysis Completed 19/12/20 08:19:34 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_csv_data.id, type:int, comment:null), FieldSchema(name:test_csv_data.name, type:varchar(20), comment:null), FieldSchema(name:test_csv_data.age, type:int, comment:null)], properties:null) 19/12/20 08:19:34 INFO ql.Driver: Completed compiling command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246); Time taken: 0.108 seconds 19/12/20 08:19:34 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/12/20 08:19:34 INFO ql.Driver: Executing command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246): INSERT OVERWRITE LOCAL DIRECTORY '/tmp/bigdataetl/hive2csv_insOverLocal' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY "\n" SELECT * FROM test_csv_data 19/12/20 08:19:34 INFO ql.Driver: Query ID = hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246 19/12/20 08:19:34 INFO ql.Driver: Total jobs = 1 19/12/20 08:19:34 INFO ql.Driver: Launching Job 1 out of 1 19/12/20 08:19:34 INFO ql.Driver: Starting task [Stage-1:MAPRED] in serial mode 19/12/20 08:19:34 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator 19/12/20 08:19:34 INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 19/12/20 08:19:34 INFO mr.ExecDriver: adding libjars: file:/usr/lib/hive/lib/hive-contrib.jar 19/12/20 08:19:34 ERROR mr.ExecDriver: yarn 19/12/20 08:19:34 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 19/12/20 08:19:34 INFO exec.Task: Starting Job = job_1576766377926_0012, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1576766377926_0012/ 19/12/20 08:19:34 INFO exec.Task: Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1576766377926_0012 19/12/20 08:19:40 INFO exec.Task: Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 19/12/20 08:19:40 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 19/12/20 08:19:40 INFO exec.Task: 2019-12-20 08:19:40,838 Stage-1 map = 0%, reduce = 0% 19/12/20 08:19:47 INFO exec.Task: 2019-12-20 08:19:47,130 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.6 sec 19/12/20 08:19:48 INFO exec.Task: MapReduce Total cumulative CPU time: 1 seconds 600 msec 19/12/20 08:19:48 INFO exec.Task: Ended Job = job_1576766377926_0012 19/12/20 08:19:48 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/12/20 08:19:48 INFO exec.Task: Copying data to local directory /tmp/bigdataetl/hive2csv_insOverLocal from hdfs://quickstart.cloudera:8020/tmp/hive/cloudera/ee559ee2-25dd-41b7-bd65-a0767a7a03b2/hive_2019-12-20_08-19-33_928_8164278149162791118-1/-mr-10000 19/12/20 08:19:48 INFO ql.Driver: MapReduce Jobs Launched: 19/12/20 08:19:48 INFO ql.Driver: Stage-Stage-1: Map: 1 Cumulative CPU: 1.6 sec HDFS Read: 3712 HDFS Write: 70 SUCCESS 19/12/20 08:19:48 INFO ql.Driver: Total MapReduce CPU Time Spent: 1 seconds 600 msec 19/12/20 08:19:48 INFO ql.Driver: Completed executing command(queryId=hive_20191220081919_33941327-d364-42ea-bfa4-7eb2739ef246); Time taken: 14.245 seconds 19/12/20 08:19:48 INFO ql.Driver: OK
Method 2: hive -e
Using this method firstly we have to create shell/bash script which we will execute. In short: we will execute shell script which fetch data from hive and write output to file. (How to Save Data ORC Parquet Text CSV in Hive)
Create shell script
[root@quickstart bigdataetl]# vim hive2csv_hive.sh # Put this command into shell file hive -e 'select * from test_csv_data' | sed 's/[\t]/,/g' > hive2csv_hive.csv
Execute script and see the command output. (How to Save Data ORC Parquet Text CSV in Hive)
[root@quickstart bigdataetl]# ./hive2csv_hive.sh Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 1.161 seconds, Fetched: 8 row(s)
Method 3: beeline
beeline -u 'jdbc:hive2://localhost:10000/default' --outputformat=csv2 -f hive2csv_beeline.sql > hive2csv_beeline.csv
Command output: (How to Save Data ORC Parquet Text CSV in Hive)
scan complete in 4ms Connecting to jdbc:hive2://localhost:10000/default Connected to: Apache Hive (version 1.1.0-cdh5.7.0) Driver: Hive JDBC (version 1.1.0-cdh5.7.0) Transaction isolation: TRANSACTION_REPEATABLE_READ INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_csv_data.id, type:int, comment:null), FieldSchema(name:test_csv_data.name, type:varchar(20), comment:null), FieldSchema(name:test_csv_data.age, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c); Time taken: 0.113 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c): select * from test_csv_data INFO : Completed executing command(queryId=hive_20191220080303_a0466887-2fba-4b57-a6cc-81e11243c81c); Time taken: 0.002 seconds INFO : OK 8 rows selected (0.371 seconds) Closing: 0: jdbc:hive2://localhost:10000/default
Results
Below you can find the generated files under /tmp/bigdataetl path: (How to Save Data ORC Parquet Text CSV in Hive)
[root@quickstart bigdataetl]# find "$(pwd)" /tmp/bigdataetl /tmp/bigdataetl/hive2csv_beeline.csv /tmp/bigdataetl/hive2csv_insOverLocal /tmp/bigdataetl/hive2csv_insOverLocal/000000_0 /tmp/bigdataetl/hive2csv_insOverLocal/.000000_0.crc /tmp/bigdataetl/hive2csv_beeline.sql /tmp/bigdataetl/hive2csv_hive.csv /tmp/bigdataetl/hive2csv_hive.sh
That’s all about How to Save Data ORC Parquet Text CSV in Hive!
If you enjoyed this post please add the comment below and share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!
please tell for PARQUET file using HIVE -e