Editing CommunityData:Hyak Spark
From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 64: | Line 64: | ||
== Spark Walkthrough == | == Spark Walkthrough == | ||
Spark | Spark programs is somewhat different from normal python programming. This section will walk you through a script to help you learn how to work with Spark. You may find this script useful as a template for building variables on top of [[wikiq]] data. | ||
This section presents a pyspark program that | This section presents a pyspark program that | ||
Line 316: | Line 316: | ||
=== Monitoring the cluster === | === Monitoring the cluster === | ||
From a login node ( | From a login node (ikt): | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
Line 325: | Line 325: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
ssh -L localhost:8989:localhost:8989 | ssh -L localhost:8989:localhost:8989 ikt -N -f && ssh -L localhost:4040:localhost:4040 ikt -n -F | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 466: | Line 466: | ||
# You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance. | # You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance. | ||
=== | === Launch of workers failed === | ||
Sometimes I get errors like this: | Sometimes I get errors like this: | ||
Line 472: | Line 472: | ||
Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish. | Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish. | ||