Hello! In this short tutorial I will give you a hint how you can convert the data in Hive from one to another format without any additional application.
Hint: Just copy data between Hive tables
Let’s concern the following scenario:
- You have data in CSV format in table “data_in_csv”
- You would like to have the same data but in ORC format in table “data_in_parquet”
Step #1 – Make copy of table but change the “STORED” format
You have table in CSV format like below:
CREATE TABLE data_in_csv ( id Int, name String, age Int ) PARTITIONED BY (INGESTION_ID BIGINT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ',', "quoteChar" = '"', "escapeChar" = '\' ) STORED AS TEXTFILE;
Now we will create the same table but in ORC format:
CREATE TABLE data_in_orc ( id int, name string, age int ) PARTITIONED BY (INGESTION_ID BIGINT) STORED AS ORC tblproperties ("orc.compress"="SNAPPY");
Step #2 – Copy the data between tables
Now, when you have created these two tables we will just copy the data from first to new one. The conversion will be done by Hive engine. You don’t have to know how it was performed 🙂
INSERT OVERWRITE TABLE DATA_IN_ORC PARTITION (INGESTION_ID) SELECT ID, NAME, AGE, INGESTION_ID FORM DATA_IN_CSV;
For Avro and Parquet examples
-- Avro format CREATE TABLE data_in_avro ( id int, name string, age int ) PARTITIONED BY (INGESTION_ID BIGINT) STORED AS AVRO; -- Parquet format CREATE TABLE data_in_parquet ( id int, name string, age int ) PARTITIONED BY (LOADING_DATE STRING) STORED AS STORED AS PARQUET;
If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!
good
Thanks! 🙂