This project aims to develop new explainable ML algorithms based on PGMs addressing a
wide range of general machine learning problems (i.e. like anomaly detection, entity
profiling, supervised and unsupervised learning, etc.) and domain applications problems
in bioinformatics (Sebastiani et al., 2007), environmental modelling (Aguilera et al., 2011)
and digital image (aesthetics quality evaluation (Deng et al., 2017) and
semantic localization (Tompson et al., 2015)). With the results of this project,
we aim to show how PGMs are a promising ML model family to be used in safe-critical
and high- stakes decisions real-life explainable AI (XAI) systems (Gunning, 2017).
Therefore, the main objective of this project is to generate a set of methodological
developments in the field of machine learning using probabilistic models, with a solid
and innovative theoretical foundation that makes them explainable. The project also
addresses some specific applications as a way to effectively verify the proposed
methodological developments.
Machine Learning has established itself at the core of the business models of outstanding companies, and the society in general is quickly taking advantage of this technology in a wide variety of application areas. Deep Learning is the key to the expansion of Machine Learning and Artificial Intelligence, in the last years. However, Deep Learning methods are criticized because of their black-box nature, which seriously limits their interpretability, and their inability to handle model uncertainty (i.e. to know what they don’t know). These issues are preventing the adoption of this technology in many critical applications such as medical diagnosis, where doctors (and patients) demand explanations about why this model is making this prediction and, also, models which do not provide precise answers when they are asked to solve a task they have not been specifically trained for. Addressing these issues will make this technology safer, more trustworthy and, in consequence, much more adopted by society. The DeeProb project intends to pave the way to the next generation of Machine Learning methods by introducing Deep Probabilistic Modeling. By appropriately developing a probabilistic component and relying on Bayesian statistics, we plan to solve the above- mentioned drawbacks of Deep Learning while keeping the effectiveness of those models and producing scalable methodologies for inference and learning. All the developments will be made available to the community as open source software tools. The new methods will be instantiated and tested in two use cases with remarkable impact, respectively, in genomics and ecology: prediction of gene duplicability in plants and rural abandonment forecasting.
This project proposes the development of a probabilistic programming language
compatible with a distributed computing model. This language will be implemented as
an API so the user can mix probabilistic and non-probabilistic code in their application,
and it will be integrated into Spark and Flink, two platforms for processing large
volumes of data. The language will be developed as an open source software project
freely available to the scientific and professional community interested in the use of
machine learning (ML) techniques on large volumes of data.
The main objective of this project is to generate a set of methodological developments in the PGMs
areas and scalable data analysis, sufficiently grounded and innovative to be incorporated into
the gallery of tools for massive data processing. One of these contexts is the analysis of
documentary collections and their subsequent use by users to resolve information needs effectively
and efficiently. Currently, these textual sources are usually large but they are also growing
continuously, making their treatment and analysis in a scalable way a real challenge. In
a complementary way, the project intends to produce the necessary software tools for the
application of these methodological developments to real problems. In this way, the purpose
of this project is twofold: generating new knowledge of high scientific quality within the
field of scalable data analytics and allowing technology transfer using the software produced.