1. Understanding Dynamic Ecological Networks with AI (AI:EcoNet)

    The AI:EcoNet project aims to explore one of the grand challenges of modern ecology: predicting how species interactions in complex biological networks change across time and space. Species form intricate interaction networks (such as food webs) that are essential for ecosystem stability, yet these networks are increasingly disrupted by human activities, climate change, and biodiversity loss. Traditional models fall short of capturing this complexity, making it difficult to anticipate how ecosystems will respond to future pressures.

    This project will pioneer Graph Representation Learning (GRL) methods tailored to ecological networks. By developing scalable models for signed, temporal, and attributed networks, we will reveal how interactions are lost, gained, or rearranged as species respond to shifting climates.

  2. Illuminating microbial dark matter through data science (DarkScience)

    This project, Illuminating Microbial Dark Matter through Data Science (DarkScience), tackles one of the biggest mysteries of life on Earth: the fact that most microbes—tiny organisms that are essential for our health, the environment, and even the discovery of new medicines—remain completely unknown. While microbes drive processes that sustain ecosystems and human life, only a fraction of their vast diversity has been mapped because most species are nearly impossible to isolate in a lab. The DarkScience project brings together biology and advanced computer science to change this. By combining cutting-edge DNA sequencing with powerful machine learning, the team will develop new methods to recover and analyze genomes of these hidden microbes, and link them to environmental data. This will open the door to discovering new species, understanding how microbial communities shape our planet, and unlocking solutions to challenges like antibiotic resistance, waste recycling, and climate change

  3. Explainable Machine Learning - A Probabilistic Approach

    This project aims to develop new explainable ML algorithms based on PGMs addressing a wide range of general machine learning problems (i.e. like anomaly detection, entity profiling, supervised and unsupervised learning, etc.) and domain applications problems in bioinformatics (Sebastiani et al., 2007), environmental modelling (Aguilera et al., 2011) and digital image (aesthetics quality evaluation (Deng et al., 2017) and semantic localization (Tompson et al., 2015)). With the results of this project, we aim to show how PGMs are a promising ML model family to be used in safe-critical and high- stakes decisions real-life explainable AI (XAI) systems (Gunning, 2017). Therefore, the main objective of this project is to generate a set of methodological developments in the field of machine learning using probabilistic models, with a solid and innovative theoretical foundation that makes them explainable. The project also addresses some specific applications as a way to effectively verify the proposed methodological developments.

  4. DeepProb - Deep Probabilistic Modeling in Machine Learning. Applications to Genomics and Ecology

    Machine Learning has established itself at the core of the business models of outstanding companies, and the society in general is quickly taking advantage of this technology in a wide variety of application areas. Deep Learning is the key to the expansion of Machine Learning and Artificial Intelligence, in the last years. However, Deep Learning methods are criticized because of their black-box nature, which seriously limits their interpretability, and their inability to handle model uncertainty (i.e. to know what they don’t know). These issues are preventing the adoption of this technology in many critical applications such as medical diagnosis, where doctors (and patients) demand explanations about why this model is making this prediction and, also, models which do not provide precise answers when they are asked to solve a task they have not been specifically trained for. Addressing these issues will make this technology safer, more trustworthy and, in consequence, much more adopted by society. The DeeProb project intends to pave the way to the next generation of Machine Learning methods by introducing Deep Probabilistic Modeling. By appropriately developing a probabilistic component and relying on Bayesian statistics, we plan to solve the above- mentioned drawbacks of Deep Learning while keeping the effectiveness of those models and producing scalable methodologies for inference and learning. All the developments will be made available to the community as open source software tools. The new methods will be instantiated and tested in two use cases with remarkable impact, respectively, in genomics and ecology: prediction of gene duplicability in plants and rural abandonment forecasting.

  5. Probabilistic Programming Languages for the Development of Intelligent Applications on Large Volumes of Data

    This project proposes the development of a probabilistic programming language compatible with a distributed computing model. This language will be implemented as an API so the user can mix probabilistic and non-probabilistic code in their application, and it will be integrated into Spark and Flink, two platforms for processing large volumes of data. The language will be developed as an open source software project freely available to the scientific and professional community interested in the use of machine learning (ML) techniques on large volumes of data.

  6. Probabilistic Graphical Models for Scalable Data Analysis

    The main objective of this project is to generate a set of methodological developments in the PGMs areas and scalable data analysis, sufficiently grounded and innovative to be incorporated into the gallery of tools for massive data processing. One of these contexts is the analysis of documentary collections and their subsequent use by users to resolve information needs effectively and efficiently. Currently, these textual sources are usually large but they are also growing continuously, making their treatment and analysis in a scalable way a real challenge. In a complementary way, the project intends to produce the necessary software tools for the application of these methodological developments to real problems. In this way, the purpose of this project is twofold: generating new knowledge of high scientific quality within the field of scalable data analytics and allowing technology transfer using the software produced.