Editing CommunityData:Hyak Spark (section)

== Pros and Cons of Spark == 

The main advantages of Spark on Hyak:

# Work with "big data" without ever running out of memory. 
# You get very good parallelism for free.
# Distribute computational work across many hyak nodes so your programs run faster.
# Common database operations (select, join, groupby, filter) are pretty easy. 
# Spark supports common statistical and analytical tasks (stratified sampling, summary and pairwise statistics, common and simple models).
# Spark is a trendy technology that lots of people know or want to learn.

The main disadvantages of Spark are 

# It takes several steps to get the cluster up and running.
# The programming paradigm is not super intuitive, especially if you are not familiar with SQL databases or lazy evaluation.
# Doing more advanced things requires programming in Scala.