Machine learning methods

Machine learning (ML) algorithms, and more recently deep learning (DL) methods, have proven to perform well in different chemical related fields, and are thus broadly used in drug design and toxicity prediction. Given a labeled data set with known outcome, the ML algorithm learns to identify the often highly non-linear combinations of physico-chemical and structural features in the underlying data (e.g. compounds, protein structures or complexes) that may be responsible for their (toxic) effect.

Machine learning based toxic endpoint prediction

Determining the toxicity of compounds is vital to identify their harmful effects on humans, animals, plants and the environment. We focus on several aspects of machine learning (ML) in the context of toxicity prediction, e.g. investigating into novel descriptors, applicability of models to external data and interpretability of deep learning (DL) models (toxicophore). These investigations aim to come closer to the vision of transforming toxicology into a predictive science and reducing the number of animal testing.

Deep learning based virtual screening

DeeplearningVS is a project which aims to study a novel rescoring method based on deep learning (DL) techniques to enhance the accuracy of docking results, and boost the structure-based virtual screening outcome.

Data augmentation for molecular property prediction using deep learning

Deep learning requires lots of data which in the case of physico- chemical and bioactivity remains scarce. Here, we exploit that one compound can be represented by various SMILES strings as means of data augmentation and we explore several augmentation techniques. The best strategies lead to the Maxsmi models, the models that maximize the performance in SMILES augmentation. These models are trained on four data sets, including experimental solubility, lipophilicity, and bioactivity measurements, and are available for prediction on novel compounds.

Molecular Property Representation and Optimization using Transformers

In this project, we aim to explore the potential of transformer models to optimize molecular properties and/or generate new molecules with desired non-toxic properties. Transformer models are well-known for capturing long-distance relationships within an input sequence. This ability recruits them as promising candidates for capturing dependencies within the studied small molecules that can be redeemed important for understanding and controlling for toxic properties. The transformer architecture provides generalizability by being pre-trained on large unsupervised dataset then fine-tuned on small downstream datasets.

Historical Virtual Control Groups

Historical Virtual Control Groups: one step forward into the future of animal testing in toxicology. Our goal is to reduce the number of animals used in experiments. Starting with an exceptional dataset provided by members of the eTRANSAFE consortium we start the journey into the future of animal testing via derivation and incorporation of virtual control groups in animal testing approaches and thus enabling a 3R strategy.

Kinodata-3D

Machine learning - and especially deep learning - models require large datasets for training. As such datasets, especially those containing protein-ligand-complex information - are more rare in the drug design landscape, we assess the use of in silico structural docking data for machine learning. To this end, we perform template docking using the OpenEye software on a large kinase activity dataset (kinodata) following the complex generation pipeline developed in kinoml.

KinoML

MORPHology-based Endocrine DisrUptor Screening

The computational part of the MORPHology-based Endocrine DisrUptor Screening (Morpheus) project aims to develop deep learning models to predict the effects of substances and identify characteristic fingerprints based on morphological and molecular input data.