SHEPARD

Scheduling on HEterogeneous Platforms using Application Resource Demands

The compute requirements of applications has increased due to demands from high performance computing, big data and real-time analytics. This has seen the emergence of specialised compute resources, such as FPGAs and GPUs, which offer significant performance gains for particular workloads.

Traditionally, applications simply target CPU-only platforms, allowing them to run on most hardware platforms. For developers, this meant that an application could be written once and be expected to run reasonably well on most hardware platforms. For the Cloud, managing application deployment and performance could be accommodated through virtual machine provisioning. This meant that both Cloud and applications have evolved using homogeneous environments. The introduction of acceleration technologies is a disruptive force that requires heterogeneity in the Cloud. These technologies typically require the use of specific languages, tools or programming paradigms, leading to increased development time and costs. Additional development overheads beyond creating device specific code are introduced in the form of workload management, as developers now must determine when and where certain portions of their application should make use of available compute resources.

Therefore, while heterogeneous environments offer many opportunities to improve application performance, they also come with their own specific challenges which must be addressed to make them viable for both developers and the Cloud.

Approach

To better accommodate heterogeneous compute devices in applications we created the SHEPARD framework. SHEPARD’s role is to allow applications to submit tasks and the framework will determine which device it should run on depending on available implementations, the estimated task performance and the current level of demand already present on each device. The two main challenges addressed by SHEPARD are:

  1. Managed Tasks: decoupling application development from the target hardware, thereby reducing the associated overhead
  2. Runtime Allocation: allowing applications to be deployed to any hardware platform and dynamically target workload to whichever compute devices are available

Managed Tasks

Managed tasks allow developers to call tasks in a high-level manner, by supplying only the name of the task they wish to perform and the parameters required by the task. Once executed, the placement of the managed task is determined by the SHEPARD framework. The use of a managed task mechanism allows the complete removal of device specific code from applications. Managed tasks also remove the need to create workload management strategies from within applications. This is particularly necessary as individual applications will not have knowledge of workloads from other applications, therefore, any static allocation made by an individual application will be sub-optimal in the presence of external workloads. Of course, device specific implementations must be provided at some point, therefore, the SHEPARD framework implements a platform repository. This platform repository stores device specific implementations of tasks as well as cost models that describe their performance on the devices they support. With knowledge of this repository, when applications wish to execute tasks at runtime, the SHEPARD framework can allow applications to dynamically load and execute task specific implementations. The use of managed tasks therefore allows the separation of device specific code and high-level application logic. This allows two types of developer to be identified:

  1. Application Developers, who are concerned with high-level tasks that the application needs to perform; and
  2. Expert Developers, responsible for optimised and device-specific codes that can be accessed and loaded at runtime by SHEPARD.

Runtime Allocation

When allocating tasks, the SHEPARD framework aims to effectively load balance all tasks in the system across all available devices. This avoids any single device becoming a system bottleneck while other devices may idle, which is a likely scenario when static task allocation is used. By dynamically executing tasks at run-time, application developers can be assured that applications will take advantage of the best devices available, and cloud providers can be assured that their hardware will be continually utilised.When a task needs to be executed, the SHEPARD framework performs a number of steps to determine when and where it should execute. The basic overview of this process is as follows:

  1. Determine, from the SHEPARD repository, which devices have implementations that support the required task
  2. From the task implementations that exist, cost models are used to estimate the runtime of the task on each supported device
  3. The length of all tasks already queued to each device is used to estimate how long any new task must wait before it can begin executing
  4. Using both the expected wait time and task execution time for each device, the device that yields the lowest turnaround time is chosen

By allowing task implementations to be loaded and task allocations to be made at runtime, applications can dynamically adjust their behaviour to fit the resources available. This means that in a cloud as flexible as HARNESS, applications can now match this flexibility to take advantage of all resources that are available.

Further Reading

  1. E. O’Neill, J. McGlone, P. Milligan and P. Kilpatrick. SHEPARD: Scheduling on HEterogeneous Platforms using Application Resource Demands. In 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2014
  2. E. O’Neill, J. McGlone, J.G.F Coutinho, A. Doole, C. Ragusa, O. Pell and P. Sanders. Cross Resource Optimisation of Database Functionality Across Heterogeneous Processors. In IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2014