Phase 1 – CMS use case

To meet their growing computing needs, CMS has investigated use of resources beyond the traditional grid-provided systems including commercially provided computing services. In early 2015, CMS chose the market leader, Cloud Infrastructure-as-a-Service, Amazon Web Services (AWS) to show the ability to increase the global processing capacity of CMS by a signicant fraction for an extended period. The test was also intended to deliver useful simulated physics events to the collaboration for analysis at a production scale.

CMS was awarded a 9 to 1 matching grant from AWS that allowed the purchase of $300k of credits for computing, storage, and network charges for an investment of $30k. The size of the award was based on an estimate of what it would cost to do one month of large-scale processing. Additionally, a conditional cost waiver was granted for exporting data; as long as the export costs remained under 15% of the total monthly bill, and were transmitted across research networks such as ESNet, the export charges would be waived entirely. This discount program was so successful that it has been extended to researchers at all academic and research institutions.

For details about the CMS Amazon Web Services Investigation, please see: HEPCloud, a new paradigm for HEP facilities: CMS Amazon Web Services Investigation

With dedicated resources, it is important that experiments plan for steady and continuous use during long periods of times as shown in the left side of the following diagram. Experience from Run1 and Run2 at the LHC shows that the computing needs of experiments are not constant over time. Data (re)processing, simulation date generation and reconstruction tend to come in bursts with irregular time structure dictated by software release, conference and data taking schedules. The right side of the diagram below shows where processing and simulation are done in burst.


The following figure shows a comparison of the scale of processing on AWS to other global CMS activity.


The tests performed by Fermilab and CMS on AWS have demonstrated that it is possible to utilize dynamically-provisioned cloud resources to execute many CMS workflows at large scale. As shown in the above figure, the HEPCloud Facility was able to increase the amount of resources available to CMS by 33%. When viewed in terms of the expansion of the Tier-1 facilities, as shown below, the effect is even larger.


If there are individual sites that want to satisfy their processing pledges through purchased computing services, it should be possible to maintain a similar production efficiency by enabling dedicated interface systems for grid services on top of dynamically provisioned resources.