hdfs - Inserting record in Hive by appending -


i executing insert statements on hive external table. noticed each new insert, there new file created in hdfs path referenced external table. questions are:

  1. is possible have new inserted data append instead of creating new files?

  2. can control in such way size, 1mb, hive create new file place incoming inserts?

cloudera says:

the insert syntax appends data table. existing data files left as-is, , inserted data put 1 or more new data files.

hive append table not underlying files.

you can coerce hive build tables using create table, , forcing reducers one. copy fragmented files in 1 table , combine them in location in hdfs. swap files in hdfs.

you place files in holding area, check size of files there in hdfs , move them on above. seems easier temporarily hold files on local file system , move them over.

to combine files new file using hive can try:

set hive.exec.dynamic.partition.mode=nostrict; set hive.exec.compress.intermediate=false; set hive.exec.compress.output=false; set hive.exec.reducers.max=1;  create table if not exists db.table stored textfiel select * db.othertable; 

db.othertable table has multiple fragemented files. db.table have single text file containing combined data.

i know not ideal , more of work around.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -