An adaptive framework for the execution of data-intensive MapReduce applications in the Cloud

Content

Abstract
Authors
Shortfacts

Abstract

Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.

Top

Authors

Köhler, Martin
Kaniovskyi, Yuriy
Benkner, Siegfried

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	The First International Workshop on Data Intensive Computing in the Clouds (DataCloud 2011)
Divisions	Scientific Computing
Event Type	Conference
Publisher	IEEE
Date	May 2011
Export

Top