{"id":613,"date":"2018-05-01T20:27:22","date_gmt":"2018-05-01T18:27:22","guid":{"rendered":"http:\/\/dekarlab.de\/wp\/?p=613"},"modified":"2020-05-23T15:34:59","modified_gmt":"2020-05-23T13:34:59","slug":"remote-submit-of-spark-jobs","status":"publish","type":"post","link":"https:\/\/dekarlab.de\/wp\/?p=613","title":{"rendered":"Remote submit of spark jobs"},"content":{"rendered":"<p>Remote submit is a powerful feature of Apache Spark. Why it is needed? For example, you can experiment with different versions of Spark, independent of what you have in the cluster. Or if you have no direct access to cluster you can start your spark jobs remotely.<br \/>\n<!--more--><br \/>\nHow it works?<\/p>\n<p><strong>First<\/strong>, you need client configurations files from your cluster and copy them to <strong>remote machine<\/strong>:<\/p>\n<ul>\n<li>core-site.xml<\/li>\n<li>hadoop-env.sh<\/li>\n<li>hdfs-site.xml<\/li>\n<li>log4j.properties<\/li>\n<li>mapred-site.xml.<\/li>\n<\/ul>\n<p>In Cloudera you can download them using these steps: <a href=\"https:\/\/www.cloudera.com\/documentation\/enterprise\/5-3-x\/topics\/cm_mc_client_config.html\">Client Configuration Files<\/a>.<\/p>\n<p><strong>Second<\/strong>, you should download Spark distribution from <a href=\"https:\/\/spark.apache.org\/\">Apache<\/a> or take it from your cluster and put in on <strong>remote machine<\/strong>.<\/p>\n<p><strong>Third<\/strong>, in spark-submit on <strong>remote machine<\/strong> you should specify location of configuration files:  <\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nexport HADOOP_CONF_DIR=&lt;folder with conf files&gt;\r\n<\/pre>\n<p><strong>Fourth<\/strong>, you should prepare jar with your application.<\/p>\n<p>Hence, on <strong>remote machine<\/strong> you have<\/p>\n<ul>\n<li>client configuration files from cluster<\/li>\n<li>spark distribution<\/li>\n<li>jar with your application<\/li>\n<\/ul>\n<p>After starting spark submit on <strong>remote machine<\/strong> you will see in logs, that spark pack all needed files on <strong>remote machine<\/strong> and upload them to cluster, and after that starts execution:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n....\r\nINFO Client: Uploading resource file:\/spark-assembly.jar -&gt; hdfs:\/\/spark-assembly.jar\r\nINFO Client: Uploading resource file:\/app.jar -&gt; hdfs:\/\/app.jar\r\n...\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Remote submit is a powerful feature of Apache Spark. Why it is needed? For example, you can experiment with different versions of Spark, independent of what you have in the cluster. Or if you have no direct access to cluster you can start your spark jobs remotely.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0},"categories":[25],"tags":[57,28],"_links":{"self":[{"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/posts\/613"}],"collection":[{"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=613"}],"version-history":[{"count":10,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/posts\/613\/revisions"}],"predecessor-version":[{"id":624,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=\/wp\/v2\/posts\/613\/revisions\/624"}],"wp:attachment":[{"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dekarlab.de\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}