Editing CommunityData:Hyak Spark

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 64: Line 64:
== Spark Walkthrough ==
== Spark Walkthrough ==


Spark programming is somewhat different from normal python programming. This section will walk you through a script to help you learn how to work with Spark. You may find this script useful as a template for building variables on top of [[wikiq]] data.
Spark programs is somewhat different from normal python programming. This section will walk you through a script to help you learn how to work with Spark. You may find this script useful as a template for building variables on top of [[wikiq]] data.


This section presents a pyspark program that  
This section presents a pyspark program that  
Line 294: Line 294:
</syntaxhighlight>
</syntaxhighlight>


This will setup the cluster.  Make sure you start up the cluster from the same session you used to get_spark_nodes -- otherwise, the startup script doesn't have access to the assigned node list and will fail. Take note of the node that is assigned to be the master, and use that information to set your $SPARK_MASTER environment variable, for example <code>export SPARK_MASTER="n0650"</code>. The program <code> spark-submit </code> submits your script to the running Spark cluster.  
This will setup the cluster.  Take note of the node that is assigned to be the master, and use that information to set your $SPARK_MASTER environment variable, for example <code>export SPARK_MASTER="n0650"</code>. The program <code> spark-submit </code> submits your script to the running Spark cluster.  


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
Line 316: Line 316:
=== Monitoring the cluster ===
=== Monitoring the cluster ===


From a login node (hyak):
From a login node (ikt):


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
Line 325: Line 325:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
     ssh -L localhost:8989:localhost:8989 hyak -N -f && ssh -L localhost:4040:localhost:4040 hyak -n -F
     ssh -L localhost:8989:localhost:8989 ikt -N -f && ssh -L localhost:4040:localhost:4040 ikt -n -F
</syntaxhighlight>
</syntaxhighlight>


Line 466: Line 466:
# You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance.
# You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance.


=== Errors While Starting Cluster ===
=== Launch of workers failed ===
Sometimes I get errors like this:  
Sometimes I get errors like this:  


Line 472: Line 472:


Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish.
Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish.
And sometimes I get errors like this:
scontrol: error: host list is empty
when trying to launch spark. This means I'm in a session that doesn't know what nodes are assigned to the spark cluster. This will launch a dysfunctional cluster. Stop-all and then start_spark_cluster from the same session you landed in when you ran get_spark_nodes.
When I get errors about full logs, I go in and clean up the temporary files folder it refers to.
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)