hadoop - How to pick Dynamic File Name from HDFS while inserting into Hive Table -
i have hive table. need write workflow everyday job search file in location -
/data/data_yyyy-mm-dd.csv /data/data_2015-07-07.csv /data/data_2015-07-08.csv ...
so each day workflow automatically pick file name , load data hive table(mytable).
i writing script of loading below- load data inpath "/data/${filepath}" overwrite table mytable.
now while running same plain hive job can set filepath data_2015-07-07.csv , how in oozie coordinator automatically picks path name date.
i tried set workflow parameter oozie coordinator-
clicklog_${yyyy}-{month}-{day}.csv
well after checking through oozie coordinator documentation, found solution. simple , straightforward, whatever configuration added in hive workflow, ignored , oozie coordinator fill them-
so hive workflow -
<workflow-app name="workflow__" xmlns="uri:oozie:workflow:0.5"> <start to="hive-cfc5"/> <kill name="kill"> <message>action failed, error message[${wf:errormessage(wf:lasterrornode())}]</message> </kill> <action name="hive-cfc5"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobtracker}</job-tracker> <name-node>${namenode}</name-node> <job-xml>/user/hive-site.xml</job-xml> <script>/user/sub/create.hql</script> </hive> <ok to="hive-2ade"/> <error to="kill"/> </action> <action name="hive-2ade"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobtracker}</job-tracker> <name-node>${namenode}</name-node> <job-xml>/user/hive-site.xml</job-xml> <script>/user/sub/load_query.hql</script> <param>filepath=test_2015-06-26.csv</param> </hive> <ok to="end"/> <error to="kill"/> </action> <end name="end"/> </workflow-app>
now scheduled same workflow in oozie coordinator-
simply setting filepath parameter-
test_${yyyy}-{month}-{day}.csv <coordinator-app name="my_coordinator" frequency="*/60 * * * *" start="${start_date}" end="${end_date}" timezone="america/los_angeles" xmlns="uri:oozie:coordinator:0.2" > <controls> <execution>fifo</execution> </controls> <action> <workflow> <app-path>${wf_application_path}</app-path> <configuration> <property> <name>filepath</name> <value>test_${yyyy}-{month}-{day}.csv</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> <property> <name>start_date</name> <value>2015-07-07t14:50z</value> </property> <property> <name>end_date</name> <value>2015-07-14t07:23z</value> </property> </configuration> </workflow> </action> </coordinator-app>
and used crone job run same every 60 minute (*/60 * * * *) check above pattern file available or not
Comments
Post a Comment