Architecture

The HARNESS prototype architecture described on this page was implemented to validate our approach, and has been deployed in two testbeds: Grid’5000, which allows us to explore a large-scale research infrastructure to support parallel and distributed computing experiments, and the Imperial testbed, which supports clusters of hardware accelerators and heterogeneous storage.

The HARNESS architecture is split into three layers: (1) a platform layer that manages applications, (2) an infrastructure layer responsible for managing resources, and (3) a virtual execution layer where the applications actually run. Next, we explain each HARNESS layer in more detail.

HARNESS architecture diagram

The Platform Layer

The Platform layer includes the Frontend, the Director and the Application Manager components:

  • The Frontend is a Web server that provides a graphical interface where users can manage their applications. To deploy an application, a user must upload an application manifest and a Service-Level Objective (SLO) document. The application manifest describes the structure of an application and the resources it needs, while the SLO specifies the functional and non-functional parameters of one particular execution of the application. The purpose of the Frontend is to provide an intuitive graphical user interface for the end users, and translate all user actions into HTTPS/JSON requests which are sent to the Director.
  • The Director is in charge of authenticating users and instantiating one Application Manager for each application instance submitted by a user. Once it has created an Application Manager instance, it forwards all subsequent management requests about this application to that instance.
  • The Application Manager is in charge of controlling the execution of one particular application. It is a generic and application-agnostic component, and thus a new Application Manager does not need to be developed for every new application. The Application Manager operates in a virtual machine (VM) provisioned using the HARNESS cloud resources. This VM contains a specific program in charge of interpreting application manifests and SLOs, building performance models for arbitrary applications, choosing the type and number of resources that an application needs to execute within its SLO, provisioning these resources, deploying the application’s code and data in the provisioned resources, and finally collecting application-level feedback during and after execution.

Whenever the Director or an Application Manager needs to provision resources in the HARNESS platform (either to create a new application manager or to run an actual application), it sends a resource provisioning request to the Infrastructure layer.

The Infrastructure Layer

The Infrastructure layer is in charge of managing all cloud resources and making them available on demand. Its key components are the Cross-Resource Scheduler (CRS) and the Infrastructure Resource Managers (IRMs). The CRS handles high-level requests involving multiple resources, while the IRMs are responsible for translating agnostic management requests into resource-specific requests. The currently implemented IRMs are: IRM-NET, IRM-NOVA, IRM-NEUTRON, IRM-SHEPARD and IRM-XtreemFS. Beneath the IRMs, we have components that manage specific resources, which include OpenStack, SHEPARD, the MaxelerOS orchestrator and the XtreemFS scheduler.

  • The Cross-Resource Scheduler (CRS) is in charge of handling resource provisioning requests. In particular, it processes single resource requests and requests for a group of heterogeneous resources, with optional placement constraints between resources. For example, an application may request one VM and one FPGA such that the two devices are located close to each other. It uses the network proximity maps provided by IRM-NET, and decides which set of physical resources should be chosen to accommodate each request. Once this selection has been made, it delegates the actual provisioning of the resources to corresponding IRMs. Each IRM is in charge of managing some specific type of heterogeneous resources, including VMs, GPGPUs, FPGAs, storage and network devices.
  • The Nova Resource Manager (IRM-NOVA) is in charge of managing VMs running on traditional server machines. It is a thin layer that converts resource-agnostic provisioning requests issued by the CRS into OpenStack Nova requests. Managing the creation of virtual machines over a pool of physical machines is a well-studied problem, so we rely on the popular and well-supported OpenStack Nova to supply this functionality.
  • The SHEPARD Resource Manager (IRM-SHEPARD) is in charge of managing hardware accelerator resources: GPGPUs, FPGAs and DFEs. It therefore translates resource-agnostic provisioning requests issued by the CRS into requests for locally installed accelerators, and networked DFEs via the MaxelerOS Orchestrator.
  • The Networked Resource Manager (IRM-NET) provides the CRS with up-to-date maps of the physical resources which are part of the cloud. In particular, this map contains network proximity measurements realised pairwise between the physical resources, such as latency and available bandwidth. This information allows the CRS to service allocation requests with placement constraints, such as allocating two VMs with a particular latency requirement. This component also handles bandwidth reservations, allowing virtual links to be allocated. Finally, IRM-NET supports subnet and public IP allocations by delegating these requests through IRM-NOVA and IRM-NEUTRON. In particular, users can request one or more subnets and assign VMs to them, and also assign public IPs to individual VMs.
  • The Neutron Resource Manager (IRM-NEUTRON) is in charge of managing subnet and public IP resources. It is a thin layer that converts resource-agnostic provisioning requests issued by the CRS into OpenStack Neutron requests.
  • The XtreemFS Resource Manager (IRM-XtreemFS) is in charge of managing data storage device reservations. It therefore translates resource-agnostic provisioning requests issued by the CRS into concrete requests that can be processed by the XtreemFS Scheduler.

The following components are part of the infrastructure and handle specific types of resources:

  • OpenStack Nova Controller is the interface via which IRM-NOVA can request the creation or deletion of VMs on specific physical machines. It is a standard OpenStack component that we adapted to accept the placement decisions made by the CRS. The OpenStack Nova controller operates by issuing requests to the OpenStack Nova Compute components installed on each physical host.
  • OpenStack Nova Compute is a daemon running on each physical host of the system, and which controls the management of VMs within the local physical machine. It is a standard OpenStack component.
  • The MaxelerOS Orchestrator supports the allocation of networked DFEs located in MPC-X chassis. It provides a way to reserve DFEs for IRM-SHEPARD. These accelerators will then be available to applications over the local network.
  • SHEPARD Compute is a daemon running in each physical host, which provides information to IRM-SHEPARD about available hardware accelerators installed in the physical host, and performs reservation of those resources.
  • OpenStack Neutron Controller is the interface by which IRM-NEUTRON can request the creation or deletion of subnet and public IP resources. It operates by issuing requests to the OpenStack Neutron Agent components installed on each physical host.
  • OpenStack Neutron Agent is a daemon running on each physical host of the system, which controls the management of several services including DHCP and L3 networking. It is a standard OpenStack component.
  • XtreemFS is fault-tolerant distributed file system that provides three kinds of services: (1) a DIR, (2) MRCs, and (3) OSDs. The DIR tracks status information of the OSDs, MRCs, and volumes. The volume metadata is managed by one MRC. File contents are spread over an arbitrary subset of OSDs.
  • The XtreemFS Scheduler handles the reservation/release of data volumes to be used by the HARNESS application. Data volumes are characterised by their size, the type of accesses it is optimised for (random vs. sequential), and the number of provisioned IOPS. The XtreemFS Reservation Scheduler is in charge of ensuring data volume creation such that these requested properties are respected. The actual creation/deletion of data volumes is handled using the XtreemFS DIR, while the actual data and meta-data are respectively stored in the OSDs and MRC.

The Virtual Execution Layer

The Virtual Execution layer is composed of reserved VMs where the application is deployed and executed. In addition to the application itself, the VMs contain components (APIs and services) that support the deployment process, and also allow the application to interact with (reserved) resource instances. These components include:

  • The ConPaaS agent performs management actions on behalf of the Application Manager: it configures its VM, installs code and data resources such as GPGPUs, FPGAs and XtreemFS volumes, configures access to heterogeneous resources, starts the application, and finally collects application-level feedback during execution.
  • The Executive is a process that given a set of provisioned heterogeneous compute resources, selects the most appropriate resource for a given application task.
  • The XtreemFS client is in charge of mounting XtreemFS volumes in the VMs and making them available as regular local directories.