HEPCloud Project

The goal of the Fermilab HEPCloud Project is to extend the current Fermilab computing facility to transparently provide access to disparate resources including commercial and community clouds, grid federations and HPC centers.

The Fermilab Scientific Computing Division supports several types of dedicated and shared resources for data- and compute-intensive scientific work. These resources have historically been limited to either those provisioned by and hosted at Fermilab or to remote resources made available through the Open Science Grid. The resources may be dedicated or shared and, in some cases, are offered only at low priority so use may be pre-empted by higher priority demands. In order to reliably meet peak demands, Fermilab was required to  provision with the forecasted peak demand in mind rather than the median or mean demand. This may not be cost effective, since some resources may be underused during non-peak periods even with the resource sharing enabled by grids. In addition, this can lower scientific productivity if the forecasted demand is too low, since there is a long lead time to significantly increase local or remote resources.


An illustration of provisioning for average vs. provisioning for peak.


The HEPCloud facility will enable experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction, regardless of where the resources are located. This will also allow Fermilab to provision computing resources in a more efficient and cost-effective way, incorporating elasticity that will allow the facility to respond to demand peaks without over-provisioning local resources by using a more cost-effective mix of local and remote resources transparent to the scientific collaborations.


Project Phases

The HEPCloud project is currently in its fourth phase.

  • Phase One, CMS use case, was successfully completed in September 2016. The project team demonstrated the cost-effective use of commercial cloud services for experiment workflows and developed a foundation to realize the HEPCloud concept. [Read CMS experiment use case], [Read NOvA experiment use case].
  • Phase Two was successfully completed in December 2017. This phase built upon the success of Phase One by continuing to provision cloud resources while increasing the reach by focusing on high performance computing (HPC) facilities. During this phase, we worked with the National Energy Research Scientific Computing Center (NERSC) to accomplish this goal.
  • Phase Three focused on further development of the decision engine, extension of the facility to include an additional cloud provider/s and transition to operations.
  • Phase Four, our current phase, focuses on further development of the Decision Engine Framework as well as the Decision Channel design and modules. It will also work on Edge Services and other requirements to expand the Facility into HPC sites. Feedback from reviews and operations, support for transactional provisioning, and support for MPI will be added to ensure a stable and sustainable product for the facility.