Researchers at University of Helsinki developed a scalable and secure cloud computing infrastructure for CMS data analysis at Cern. The solution is a hybrid one combining the advantages and disadvantages of grid and cloud systems. The infrastructure is expected to support also other scientific applications.
The solution has been designed to run the Compact Muon Solenoid (CMS) analysis framework. CMS is a general-purpose detector for the Large Hadron Collider (LHC) running at European Organization for Nuclear Research (CERN). Since CMS has 150 million data channels receiving data from LHC proton-proton collisions 40 million times per second data archiving and successful analysis is the foundation of today's modern science.
The CMS analysis framework was originally designed to run on distributed infrastructures managed by grid technologies and is inherently complex often with dependencies on legacy components that are difficult to provide or maintain. With the fast paced development of virtualization technologies, cloud computing has become the dominant solution for improving the utilization and flexibility in allocating computing resources while reducing the infrastructural costs.
Taking this into account the team led by Professor Sasu Tarkoma and Professor Paula Eerola from University of Helsinki adopted a hybrid approach for building a data processing infrastructure that supports legacy components from the grid infrastructure and establishing a reliable, scalable and secure infrastructure with the available state-of-the-art cloud technologies, as well.
"After two years of continuous development and testing at Computer Science HPC Ukko Cluster in Finland, the solution can now be regarded as production-ready for CMS data analysis," says Professor Paula Eerola.
Flexible open source data analysis with resources on demand
The entire design is based on open source components allowing the researchers or interested parties to implement, test and deploy their own scientific cloud infrastructure.
In the context of this work, the researchers prototyped a hybrid solution that combines the advantages and disadvantages of grid and cloud systems. The architecture deployed has the advantages of when no grid resources are available, the workload can be off-loaded to a public or a private cloud as the provisioning overhead of clouds might take less time than waiting for grid resources to become available or vice-versa.
To address security issues pertaining to hybrid clouds and increase flexibility, the system builds on the Host Identity Protocol (developed at the Helsinki Institute for Information Technology HIIT in collaboration with Ericsson Research Finland) that improves three different aspects in the way applications can address each other. First, it supports persistent identifiers in the sense that applications use virtual addresses that remain static despite a virtual machine using HIP would migrate to another network. Second, it supports heterogeneous addressing with both IPv4 and IPv6 being easily interchangeable. Third, the namespace in HIP is a secure environment with unique identifiers that cannot be forged or 'stolen', but also creating a secure communication tunnel between the end-points.
The solution is now in production running at CSC (IT-Center for Science) Pouta cloud, Finland, and providing the high energy physics researchers at University of Helsinki capability to analyze data without any hold-ups, and dynamically scaling the resources on demand to private or public clouds such as Amazon EC2, Google Compute etc.
"The novel network approach allows users to expand or shrink their application infrastructure on-demand over public and private clouds in a secure manner," Professor Sasu Tarkoma explains.
The infrastructure is expected to support also other scientific applications that require a secure and reliable execution environment for data analysis.
The work has been funded by Academy of Finland.
Cite This Page: