Editing CommunityData:Hyak Spark

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 466: Line 466:
# You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance.
# You might try tweaking memory management options in <code>$SPARK_HOME/conf/spark-env.sh</code> and <$SPARK_HOME/conf/spark-defaults.conf</code>. Decreasing the number of executors, and the total memory allocated to executors should make Spark more resilient at the cost of performance.


=== Errors While Starting Cluster ===
=== Launch of workers failed ===
Sometimes I get errors like this:  
Sometimes I get errors like this:  


Line 472: Line 472:


Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish.
Usually it seems to happen if I relinquish my spark cluster (whether I use the kill script or not) and then immediately restart one. The error goes away if I shut down and wait a minute or two before re-launching; my assumption is that there's some hygienic work being done behind the scenes that the scheduler doesn't know about and I need to let that finish.
And sometimes I get errors like this:
scontrol: error: host list is empty
when trying to launch spark. This means I'm in a session that doesn't know what nodes are assigned to the spark cluster. This will launch a dysfunctional cluster. Stop-all and then start_spark_cluster from the same session you landed in when you ran get_spark_nodes.
When I get errors about full logs, I go in and clean up the temporary files folder it refers to.
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)