When: Wednesday, 2014-May-28, 11h30-12h00
Where: FCUL-DI, room C6.3.38
Presenter: Pedro Costa
MapReduce is used for processing large amounts of data using hundreds or thousands of machines. Hadoop MapReduce, its open-source implementation, tolerates machine failures and file corruptions. Despite their ability to tolerate the most common faults, these platforms are not prepared to deal with cloud faults, such as the outage of a whole datacenter. In this smalltalk I will present an Hadoop proxy that allows MapReduce computation to scale out to multiple clouds and to tolerate cloud faults. Our solution was designed with four objectives from the outset: First, it requires no modification to the Hadoop framework. Second, the proposed system tolerates Byzantine faults and cloud outages at reasonable cost. Third, it achieves its goal at reasonable cost by using different techniques to minimize computation, storage, and network data transfers. Fourth, it guarantees acceptable performance in terms of execution time.