An innovative algorithm to elucidate the structure of unknown compounds using tandem mass spectrometry and NMR data

Elisabeth Ortega-Carrasco(1), Tatiana Radchenko(1,2), Guillem Plasencia(3), Ismael Zamora(1,3)
1. Lead Molecular Design, S.L., Sant Cugat del Valles, Barcelona, Spain; 2. Universitat Pompeu Fabra, Pl. de la Merce, 10-12, Barcelona, Spain; 3. Molecular Discovery, Ltd. London, UK

Introduction The interpretation of data obtained by tandem mass spectrometry is usually the bottleneck in different areas. This process becomes more complicated if the scientist does not have any clue about the structure of the analyzed compound. Until now, several algorithms have been developed to make easier the structural determination of the MS/MS data. Unfortunately, most of them use a database of interpreted MS/MS spectra where the input data is queried, which can reduce the number of potential results to those ones contained in the original dataset. The algorithm developed and presented here makes the difference between other software in the origin of the initial dataset where the input MS/MS data is looked into. Methods The presented methodology is composed by three parts. The creation of the database is the first one. Users can choose a set of compounds from their own data or take them from an external database. Those compounds will be fragmented and stored on the database individually. In the second part, mz values from the input MS/MS are queried on the database and used to build a set of candidates by its rational combination. In the last part of the code, all the candidates are fragmented and compared with the peaks of the input MS/MS spectra. To decrease the number results, the NMR spectra of all of them is predicted and compared with the experimental NMR data of unknown compound. Preliminary Data The algorithm has been developed and successfully tested using a small set of compounds from the pharmacological area. One of them is 10P-909 (PubChem CID: 1480036; IUPAC name: 2-chloro-5-[[4-[3-(trifluoromethyl)phenyl]piperazin-1-yl]methyl]-1,3-thiazole). The procedure to elucidate this benchmarking compound will be described in the following lines. The first step we made was the creation of the database. As a set of compounds, we use 500,000 structures from PubChem. Then, those compounds were fragmented using an in house code, generating a final set of 6 million of independent fragments. Next, the algorithm was feed with the required parameters: the mz of the unknown structure, the tolerance given to this mz value, the ion mode, the adduct type and the MS/MS data. In this case study, the mz of the unknown compound is 362.0703, was acquired with positive ion mode and its adduct type is [M+H]+. Tolerance was set to 3 ppm because of the quality of the acquisition. The original MS/MS input contains a total of 18 peaks, reduced to the half after removing isotopes. The mz data from the input culminates in a total of 6492 fragments from the database. The rational combination of them yields to up to 6500 solutions. To clean up the amount of solutions, each one was fragmented and later compared with the original MS/MS data. Close to 2000 results match with the 9 peaks of the original spectra. Then, to obtain a most accurate result, both 1H and 13C NMR spectra of the best matched structures were predicted by an in house program and latter compared with the NMR data from the unknown structure. In our case, the 1H, 13C NMR and COSY spectra of the original unknown help us to find the structure of the 10P-909 on the first position of the ranking. Novel Aspect This algorithm operates with real fragments instead of using existing MS/MS spectra to predict the structure of the unknown compound.