InterIMAGE Cloud Platform: Towards the Architecture of an Open-Source, Distributed Platform for Automatic, Knowledge-Based Image Interpretation
ISBN 978-85-88783-11-9
Authors
1Ferreira, R.; 2Oliveira, D.; 3Happ, P.; 4Costa, G.; 5Feitosa, R.; 6Bentes, C.
1PONTIFICAL CATHOLIC UNIVERSITY OF RIO DE JANEIRO Email: rsilva@ele.puc-rio.br
2PONTIFICAL CATHOLIC UNIVERSITY OF RIO DE JANEIRO Email: darioaugusto@gmail.com
3PONTIFICAL CATHOLIC UNIVERSITY OF RIO DE JANEIRO Email: patrick@ele.puc-rio.br
4PONTIFICAL CATHOLIC UNIVERSITY OF RIO DE JANEIRO Email: gilson@ele.puc-rio.br
5PONTIFICAL CATHOLIC UNIVERSITY OF RIO DE JANEIRO Email: raul@ele.puc-rio.br
6RIO DE JANEIRO STATE UNIVERSITY Email: cris@eng.uerj.br
Abstract
Large-scale data processing has become a common challenge for many recent applications. The increasing availability of very high resolution remote sensing images data requires the adaptation of traditional image analysis methods towards scalable solutions. Moreover, the inadequacy of traditional infrastructure and tools to process these ever growing datasets has become notorious. The use of clusters of low-cost and high-performance commodity computers has emerged as one of the main approaches to tackle this problem. These distributed systems can be conveniently built to support data- and compute-intensive applications. In this context, MapReduce provides a highly scalable, reliable and cost effective framework for storing and processing massive data in such clusters. Systems based on the MapReduce model, e.g. Hadoop, are long available and were proven to be an efficient framework for large datasets analysis in many applications such as machine learning, web analytics, medical and remote sensing image analysis. In this paper we describe the architecture of InterIMAGE Cloud Platform (ICP), an open-source, knowledge-based distributed platform for automatic interpretation of remote sensing data over MapReduce. Indeed, the platform is a redesign of the original system that provides a robust, reliable and more flexible architecture to deal with extremely large datasets. In short, ICP: (i) enables the user to embed his knowledge into the system by creating semantic networks and operator graphs; (ii) allows other programmers to extend the system by adding external libraries or their own algorithms straightforwardly; (iii) delivers the robustness of MapReduce and distributed computing without the drawbacks of dealing with MapReduce programming directly. A common workflow for a typical user would include the definition of a directed graph representing the operators to be executed and the data flows involved in the interpretation. This graph is automatically translated to MapReduce processes that are seamlessly executed in the cluster.
Keywords
Remote Sensing; Distributed Systems; Image Analysis