1. Explainable Machine Learning - A Probabilistic Approach

    This project aims to develop new explainable ML algorithms based on PGMs addressing a wide range of general machine learning problems (i.e. like anomaly detection, entity profiling, supervised and unsupervised learning, etc.) and domain applications problems in bioinformatics (Sebastiani et al., 2007), environmental modelling (Aguilera et al., 2011) and digital image (aesthetics quality evaluation (Deng et al., 2017) and semantic localization (Tompson et al., 2015)). With the results of this project, we aim to show how PGMs are a promising ML model family to be used in safe-critical and high- stakes decisions real-life explainable AI (XAI) systems (Gunning, 2017). Therefore, the main objective of this project is to generate a set of methodological developments in the field of machine learning using probabilistic models, with a solid and innovative theoretical foundation that makes them explainable. The project also addresses some specific applications as a way to effectively verify the proposed methodological developments.

  2. DeepProb - Deep Probabilistic Modeling in Machine Learning. Applications to Genomics and Ecology

    Machine Learning has established itself at the core of the business models of outstanding companies, and the society in general is quickly taking advantage of this technology in a wide variety of application areas. Deep Learning is the key to the expansion of Machine Learning and Artificial Intelligence, in the last years. However, Deep Learning methods are criticized because of their black-box nature, which seriously limits their interpretability, and their inability to handle model uncertainty (i.e. to know what they don’t know). These issues are preventing the adoption of this technology in many critical applications such as medical diagnosis, where doctors (and patients) demand explanations about why this model is making this prediction and, also, models which do not provide precise answers when they are asked to solve a task they have not been specifically trained for. Addressing these issues will make this technology safer, more trustworthy and, in consequence, much more adopted by society. The DeeProb project intends to pave the way to the next generation of Machine Learning methods by introducing Deep Probabilistic Modeling. By appropriately developing a probabilistic component and relying on Bayesian statistics, we plan to solve the above- mentioned drawbacks of Deep Learning while keeping the effectiveness of those models and producing scalable methodologies for inference and learning. All the developments will be made available to the community as open source software tools. The new methods will be instantiated and tested in two use cases with remarkable impact, respectively, in genomics and ecology: prediction of gene duplicability in plants and rural abandonment forecasting.

  3. Probabilistic Programming Languages for the Development of Intelligent Applications on Large Volumes of Data

    This project proposes the development of a probabilistic programming language compatible with a distributed computing model. This language will be implemented as an API so the user can mix probabilistic and non-probabilistic code in their application, and it will be integrated into Spark and Flink, two platforms for processing large volumes of data. The language will be developed as an open source software project freely available to the scientific and professional community interested in the use of machine learning (ML) techniques on large volumes of data.

  4. Probabilistic Graphical Models for Scalable Data Analysis

    The main objective of this project is to generate a set of methodological developments in the PGMs areas and scalable data analysis, sufficiently grounded and innovative to be incorporated into the gallery of tools for massive data processing. One of these contexts is the analysis of documentary collections and their subsequent use by users to resolve information needs effectively and efficiently. Currently, these textual sources are usually large but they are also growing continuously, making their treatment and analysis in a scalable way a real challenge. In a complementary way, the project intends to produce the necessary software tools for the application of these methodological developments to real problems. In this way, the purpose of this project is twofold: generating new knowledge of high scientific quality within the field of scalable data analytics and allowing technology transfer using the software produced.