Apache Spark Performance Tuning

Internal Job Scheduling

Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

FIFO -

FIFO is default job scheduler that spark uses, and it is suitable for most of the cases
As the name suggests FIFO scheduler would schedule jobs as and when they are queued based on available resources
To start with, the first job that is queued gets hold of all the resources to run the stages and tasks eventually. If it does not acquire all the resources spark shares the rest with next in queue job, that comes in handy when we are running lot of smaller, light weight jobs.
So the main problems arrive in either of the cases where the first job or the last one is a long-running/ resource intensive job. this would not be an ideal scenario where lot of smaller jobs would be stuck waiting or last one takes very long to compute causing the overall time to shoot up.

FAIR -

From Spark 0.8, Spark has introduced a new mechanism called fair scheduling. In this mode all tasks from multiple jobs are scheduled in round robin fashion.
This is especially useful when there are multiple actions and all are independent of others
This scheduling shares all the available resources in a "fair" way between all the tasks across jobs
Using fair scheduler can prove to be more than 20% faster if the application really fits in this particular use-case.

One more good thing about this mode is, you can choose to crate different pools and set priorities to each of them, for that purpose we need to use spark.scheduler.allocation.file this property to manage queues using .XML file. This XML file let's you define weight and minShare.
Sample file would look something like this, [source- spark documentation guide]

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>

Internal Resource Allocation

Spark has its own way of allocating resources and deallocating resources to any application. Spark jobs run in one of the three modes such as Standalone, YARN or Mesos. There are two major way of resource allocation for any spark application, one is Static Resource Allocation and the other one is Dynamic Resource Allocation.

Static Resource Allocation-

With Static allocation of resources, each application is given a maximum amount of resources it can use i.e. maximum available resources in a cluster or an individual machine.
This means any application running on standalone mode will hold onto all the resources provided at the time of execution and for a duration as long as the job runs.
This approach is optimal when you know your data and exact resource utilisation for a particular application. That way you can block out those many resources and provide it to that particular application. But again I am pretty sure for most of the application that would be either over-provisioning or under-provisioning at some point in time during the entire job execution.
Introduces High chances of starvation of other applications, if we kind of over-provision a certain job and that although uses not even half of the provisioned resources ( may be due lack of parallelism) would take up all and starve the next in line application with no resources.
Lack of elastic resource scaling ability, and to maintain optimum resource allocation it needs manual intervention every-time our data size changes.
So not recommended unless you have analysed your data and application thoroughly.

Dynamic Resource Allocation

On the other spectrum, we have Dynamic Resource Allocation. which as the name suggests dynamically adjusts resources based on the workload/ demand.
Very useful when we are using a shared resource pool with bunch of other applications.
Best works for all the other cases as opposed to Static allocation, and when the data and it's size is variable.
So to enter the dynamic allocation invite only party, you need only one configuration, spark.dynamicAllocation.enabled. set this property to "true" to truly enjoy the party (pun intended)
Along with that if you need you can play with two more config properties spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors to set lower and upper bounds to the resource consumption for any application
I know if you have already worked with spark that is most basic stuff but following are few of the less explored spark properties when it comes to dynamic resource allocation.
spark.shuffle.service.enabled = true This property enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed or decommissioned.
spark.dynamicAllocation.executorIdleTimeout (removes idle executors after this amount time)
spark.dynamicAllocation.initialExecutors This config sets the initial number of executors to run if dynamic allocation is enabled. If --num-executors (or spark.executor.instances) is set and larger than this value, it will be used as the initial number of executors
spark.dynamicAllocation.schedulerBacklogTimeout, If dynamic allocation is enabled and there have been pending tasks backlogged for more than this duration, new executors will be requested. Additionally, the number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds.

I have shared enough about spark scheduling and tuning around that, I hope this is helpful. This is second blog in the Spark Performance Tuning series, I have couple more along the same lines.

Until then, Keep Learning and Keep Optimising!

Apache Spark Performance Tuning - Scheduling