hdfs - Inserting record in Hive by appending -
i executing insert statements on hive external table. noticed each new insert, there new file created in hdfs path referenced external table. questions are:
is possible have new inserted data append instead of creating new files?
can control in such way size, 1mb, hive create new file place incoming inserts?
cloudera says:
the insert syntax appends data table. existing data files left as-is, , inserted data put 1 or more new data files.
hive append table not underlying files.
you can coerce hive build tables using create table, , forcing reducers one. copy fragmented files in 1 table , combine them in location in hdfs. swap files in hdfs.
you place files in holding area, check size of files there in hdfs , move them on above. seems easier temporarily hold files on local file system , move them over.
to combine files new file using hive can try:
set hive.exec.dynamic.partition.mode=nostrict; set hive.exec.compress.intermediate=false; set hive.exec.compress.output=false; set hive.exec.reducers.max=1; create table if not exists db.table stored textfiel select * db.othertable;
db.othertable table has multiple fragemented files. db.table have single text file containing combined data.
i know not ideal , more of work around.
Comments
Post a Comment