Adrien - Tuesday, December 2, 2025

🧪 When artificial intelligence learns to sniff out novel molecules

A new computer tool based on machine learning allows for quicker identification of previously unknown molecules in natural extracts. Based on decision theory, it learns to "think" like an expert by cross-referencing results from multiple chemical analysis software programs to classify present compounds and highlight those never before identified. A powerful approach for exploring the still poorly understood secrets of biodiversity.

Nature holds an infinity of potentially useful molecules (medicines, flavors, materials...) and many of them have never yet been identified. But discovering them is like looking for a needle... in a haystack of data! Chemists searching for new natural molecules rely on mass spectrometry, a technique that measures the mass of a molecule's fragments when it decomposes after ionization. How a molecule breaks apart and the mass of its fragments, measured very precisely by the instrument, depend directly on its chemical structure.



By comparing these "spectral signatures" to those of known molecules, we can deduce the composition of the sample and sometimes even identify new compounds. These analyses generate mountains of data that need to be interpreted. Until now, scientists had to manually compare the results of multiple software programs that assign a signal to a structure, each based on different databases and models that don't always agree, risking missing a discovery.

To overcome this impasse, an interdisciplinary team of chemists and computer scientists from CNRS, Université Paris-Saclay, and Université Paris Sciences & Lettres designed MS2DECIDE, a program that acts as an intelligent arbitrator between these tools. Inspired by decision theory, it learns to combine the results of different software programs while considering their reliability and degree of agreement, just as a human expert would.

The program thus calculates a "knownness" score, which allows ranking all molecules present in a natural sample according to their probability of being already known and listed in databases. The lower this score, the more likely the molecule is to be new.

Tested on about a hundred compounds mixed in an "artificial" sample, including six never-before-identified ones, MS2DECIDE ranked all new molecules in the top ten positions. In a second, real-world trial, it was applied to a sample of an African plant, Pleiocarpa mutica, known to contain indolomonoterpenic alkaloids, complex natural molecules with often remarkable biological properties (anti-tumor, anti-malarial, analgesic...). The program highlighted a novel alkaloid whose properties can now be evaluated.

The tool, presented in the journal Chemistry-Methods, could transform chemical prospecting. In the future, each laboratory could even adjust MS2DECIDE, which is open access, to its own expertise, so that the machine adopts the chemist's "perspective." A promising alliance between human reasoning and algorithmic intelligence.

Author: AVR
Ce site fait l'objet d'une déclaration à la CNIL
sous le numéro de dossier 1037632
Informations légales