Research Projects

Workload-driven query planning and optimization using machine learning

Mitacs Accelerate
Primary Investigator and Leading Researcher
The applications of Machine Learning (ML) and Deep Learning (DL) have proliferated in most aspects of traditional computer science. Data management discipline is no exception in this regard. Rule-based modules are being replaced by ML/DL-based counterparts that effectively ‘mine the rules’ from experience. Approaches that rely on crude statistics are rapidly being outdated by the ones that ‘learn’ the functional dependencies, correlations, and skewness from the underlying data. In this project we develop novel techniques that we will integrate towards the creation of a learned query optimizer. The research will be conducted with the research team of IBM that develops Db2, the well- known relational DBMS of IBM. The goal is to integrate the produced learned optimizer in Db2.

Workload-driven query planning and optimization using machine learning

IBM CAS
Primary Investigator and Leading Researcher
We target to find ways to exploit information available from the underlying data and the workload as well as the optimizer and runtime feedback, in order to continuously learn the best strategies for enumerating join orders and estimating execution costs with improved accuracy which will lead to faster query execution. We initially aim at investigating the properties related to the partial or complete join graphs that correlate with the associated cost, which in turn can help to perform early pruning in the current join ordering process. This information can eventually be used for designing and developing a machine learning model that learns the patterns associated with higher execution costs.

Creation of an Assignment System Employing Data Analytics for The Improvement of The Document Translation Process of The Translation Bureau

Federal Bureau of Translation, Canada
Primary Investigator and Leading Researcher
The Translation Bureau stores the metadata of the documents that it translates, collected over more than 12 years. This project will perform analytics on the metadata that could give us useful and valuable insights on the data and reveal past trends and situations of assignment of the documents to translators. The information gained through exploratory data analytics will be employed in order to create a system of translation for the documents at Translation Bureau, which is more efficient and productive than the existing one.

Managing the Performance of Big Data Analytics on Heterogeneous Infrastructures

Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant
Primary Investigator and Leading Researcher
For a range of major scientific computing challenges that span fundamental and applied science, the deployment of Big Data Applications (BDAs) on a large-scale system, such as an internal or external cloud, a private cluster or even distributed public volunteer resources (crowd computing) needs to be offered with guarantees of predictable performance and utilization cost. Currently, however, this is not possible, because scientific communities lack the technology, both at the level of modeling and analytics, that identifies the key characteristics of BDAs and their impact on performance. There is also little data or simulations available that address the role of the system operation and infrastructure in defining overall performance. This project will fill this gap by producing a deeper understanding of how to optimize the deployment of BDAs on hybrid large-scale infrastructures.

IoT2Edge: Allocating selected IoT processing and storage activities to edge nodes to optimize performance and resource consumption ensuring interoperability

Horizon 2020, Scientific Excellence (RAWFIE-OC3-SCI)
Primary Investigator and Leading Researcher
IoT2Edge aims to promote scientific excellence in the areas of edge computing and semantic interoperability in IoT environments via experimentation based on the RAWFIE federated testbed infrastructure. The IoT2Edge semantic interoperability mechanisms will be based on open specifications/standards and will be pluggable on top of the RAWFIE platform. The edge computing modules will enable dynamic offloading of resource intensive processes to achieve edge/fog/cloud node resource usage optimization. To establish the above, IoT2Edge will develop new modules in the form of RAWFIE experiment’s supportive software (SSW). These will be evaluated via demanding experiments driven by an IoT-enabled emergency detection use case, while the generated datasets will be carefully handled and made available to the research community. It is envisioned that the developed SSW will extend the RAWFIE architecture by enhancing its openness and will be made available for other types of experiments. NTUA has wide expertise on all related technical domains (i.e., IoT, semantics, optimization, resource allocation, FIRE+) having a long record of relevant successful projects.

Deployment Optimization for Big Data Applications on Hybrid Large-Scale Computing Infrastructures

Swiss National Science Foundation (SNSF) PRN Big Data Project
Applicant and Leading Researcher
For a range of major scientific computing challenges that span fundamental and applied science, the deployment of Big Data Applications (BDAs) on a large-scale system, such as an internal or external cloud, a private cluster or even distributed public volunteer resources (crowd computing) needs to be offered with guarantees of predictable performance and utilization cost. Currently, however, this is not possible, because scientific communities lack the technology, both at the level of modeling and analytics, that identifies the key characteristics of BDAs and their impact on performance. There is also little data or simulations available that address the role of the system operation and infrastructure in defining overall performance. This project will fill this gap by producing a deeper understanding of how to optimize the deployment of BDAs on hybrid large-scale infrastructures. Using a novel combination of Big Data analytics and modeling results, our aim is to improve the performance of three major scientific infrastructures: the Worldwide LHC Computing Grid (WLCG) at CERN in high-energy physics, Vital-IT (part of the Swiss Institute for Bioinformatics (SIB)) in bioinformatics and Baobab, the high performance computing cluster of the University of Geneva.

ASAP: An Adaptive, highly Scalable Analytics Platform (2014-2017)

FP7-ICT-2013-11, ICT-2013.4.2 (Scalable data analytics), Specific Targeted Research Projects (STReP)
Applicant from the University of Geneva and Leading Researcher
Data analytics tools have become essential for harnessing the power of data deluge. Current technologies are restrictive, as their efficacy is usually bound to a single data and compute model, often depending on proprietary systems. This project proposes a unified, open-source execution framework for scalable big data analytics. The project makes the following innovative contributions: (a) A general-purpose task-parallel programming model. (b) A modeling framework that constantly evaluates the cost, quality and performance of data and computational resources in order to decide on the most advantageous store, indexing and execution pattern available. (c) A unique adaptation methodology that will enable the analytics expert to amend the task she has submitted at an initial or later stage. (d) A state-of-the-art visualization engine that will enable the analytics expert to obtain accurate, intuitive results of the analytics tasks she has initiated in real-time.

An open market for cloud data services (2013-2014)

Swiss National Science Foundation (SNSF) Project
Co-applicant from the University of Geneva and Leading Researcher
This project aims to fill the gap between providers and consumers of data services that exists in today’s cloud business, by not only solving the above two issues, but also offering an all-inclusive solution for the provision of efficient and appropriate data services encapsulated in SLAs capable to express these qualities. We propose the exchange of cloud data services in an open market, where cloud providers and their customers can advertise the offered and requested data services in a free manner and make contracts for service provisioning.

Data services in cloud federations and big data analytics (2012-2015)

University of Geneva
Applicant and Leading Researcher
The paradigm of cloud computing is rapidly gaining ground as an alternative to traditional information technology, since it combines utility and grid computing Very recently, research in the data management field focused on the provision of cloud data services , i.e. transparent management of data residing in the cloud, taking advantage of the cloud infrastructure elasticity for the maximization of performance. Data management mainly includes data storage and maintenance, as well as data accessing. Towards this end, we take a step further in cloud computing research and explore the possibilities of offering data services in federations of clouds. Our main interest lies in the management of big analytical data taking advantage of the possibilities offered by a cloud federation and exploring the limits of data and workload execution and migration.

Efficient data management for scientific applications (2008-2010)

École Polytechnique Fédérale de Lausanne
Postdoctoral Researcher
Several scientific applications are constrained by the complexity of manipulating massive datasets. Observation-based sciences, such as astronomy, face immense data placement problems, whereas simulation-based sciences, such as earthquake modelling, must deal with complexity. Efficient data management by means of automated data placement and computational support can push the frontiers of scientists' ability to explore and understand massive scientific datasets.

Hyperion Project (2002-2010)

University of Toronto
Graduate and Post-Graduate Researcher
The Hyperion Project focuses on research about the principles and control of information sharing in a P2P database system. In such a system data are structured and queries complex.
More information on the Hyperion Project can be found at the site: http://www.cs.toronto.edu/db/hyperion
06/04-07/04: Academic visit to the University of Toronto for work in the Hyperion Project.

Data management for location-based services of mobile nodes (2005-2008)

ΠΕΝΕΔ 2003 03ΕΔ291 Ministry of Development - General Secretariat of Research and Technology, Greece
Co-applicant and Leading Researcher as a PhD student
The project aims at the study and development of techniques for the efficient information management that depend on the position of mobile nodes, like humans or vehicles (location-based services). The techniques guarantee the management of enormous volumes of spatio-temporal and topic-based data that are collected and shared through networks, as well as the online service of multiple user requests.

Management of the semantic web: models and algorithms for the processing of semantic content (2004-2006)

Pythagoras: Reinforcement of research teams in universities - Greek Ministry of Education
PhD student
The semantic web suffers from the lack of structure, semantics and meta-information. These are the obstacles of efficient web information search. Current solutions use hierarchicalschemas that tag the available data semantically . The goal of this project is to model and manamge hierarchical schema, offer query processing on such schemas and guarantee the autonomy and distributedness of them.