Deployment of artifacts to Hadoop cluster -


is there pattern how deploy applications (jar-files) hadoop-custer ? not talking map-reduce jobs deploy applications spark, flume etc.

within hadoop ecosystem deployment alone not sufficient. need restart services, deploy configurations (e.g. via ambari) , forth.

i have not found specific tools. assumption correct use standard automation tools maven/jenkins , missing parts ?

just wondering if have overseen something. not want reinvent wheel ;)

if managing hadoop ecosystem can use ambari , cloudera's manager. need stop , restart services configuration , library changes. if ecosystem managed outside of have option of managing jars outside tools puppet , salt. currently, use salt because of push/pull abilities.

if talking applications, jobs running on spark, provide hadoop url in file path. example: spark-submit --class my.dev.org.sparkdriver --properties-file mysparkprops.conf wordcount-shaded.jar hdfs://servername/input/file/sample.txt hdfs://servername/output/sparkresults

for applications have dependencies on third party jar files. have option of shading job's jar file prevent other application libraries interfering each other. down side application jar file big. use maven, added maven-shade-plugin artifact , use default scope (compile) dependencies.


Comments

Popular posts from this blog

How to provide Authorization & Authentication using Asp.net, C#? -

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

How to use Authorization & Authentication in Asp.net, C#? -