hadoop - How to pick Dynamic File Name from HDFS while inserting into Hive Table -


i have hive table. need write workflow everyday job search file in location -

/data/data_yyyy-mm-dd.csv /data/data_2015-07-07.csv /data/data_2015-07-08.csv ... 

so each day workflow automatically pick file name , load data hive table(mytable).

i writing script of loading below- load data inpath "/data/${filepath}" overwrite table mytable.

now while running same plain hive job can set filepath data_2015-07-07.csv , how in oozie coordinator automatically picks path name date.

i tried set workflow parameter oozie coordinator-

clicklog_${yyyy}-{month}-{day}.csv 

well after checking through oozie coordinator documentation, found solution. simple , straightforward, whatever configuration added in hive workflow, ignored , oozie coordinator fill them-

so hive workflow -

<workflow-app name="workflow__" xmlns="uri:oozie:workflow:0.5">     <start to="hive-cfc5"/>     <kill name="kill">         <message>action failed, error message[${wf:errormessage(wf:lasterrornode())}]</message>     </kill>     <action name="hive-cfc5">         <hive xmlns="uri:oozie:hive-action:0.2">             <job-tracker>${jobtracker}</job-tracker>             <name-node>${namenode}</name-node>               <job-xml>/user/hive-site.xml</job-xml>             <script>/user/sub/create.hql</script>         </hive>         <ok to="hive-2ade"/>         <error to="kill"/>     </action>     <action name="hive-2ade">         <hive xmlns="uri:oozie:hive-action:0.2">             <job-tracker>${jobtracker}</job-tracker>             <name-node>${namenode}</name-node>               <job-xml>/user/hive-site.xml</job-xml>             <script>/user/sub/load_query.hql</script>               <param>filepath=test_2015-06-26.csv</param>         </hive>         <ok to="end"/>         <error to="kill"/>     </action>     <end name="end"/> </workflow-app> 

now scheduled same workflow in oozie coordinator-

simply setting filepath parameter-

test_${yyyy}-{month}-{day}.csv  <coordinator-app name="my_coordinator"   frequency="*/60 * * * *"   start="${start_date}" end="${end_date}" timezone="america/los_angeles"   xmlns="uri:oozie:coordinator:0.2"   >   <controls>     <execution>fifo</execution>   </controls>   <action>     <workflow>       <app-path>${wf_application_path}</app-path>       <configuration>         <property>           <name>filepath</name>           <value>test_${yyyy}-{month}-{day}.csv</value>         </property>         <property>           <name>oozie.use.system.libpath</name>           <value>true</value>         </property>         <property>           <name>start_date</name>           <value>2015-07-07t14:50z</value>         </property>         <property>           <name>end_date</name>           <value>2015-07-14t07:23z</value>         </property>       </configuration>    </workflow>   </action> </coordinator-app> 

and used crone job run same every 60 minute (*/60 * * * *) check above pattern file available or not


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -