Machine learning (ML) algorithms, and more recently deep learning (DL) methods, have proven to perform well in different chemical related fields, and are thus broadly used in drug design and toxicity prediction. Given a labeled data set with known outcome, the ML algorithm learns to identify the often highly non-linear combinations of physico-chemical and structural features in the underlying data (e.g. compounds, protein structures or complexes) that may be responsible for their (toxic) effect.
Determining the toxicity of compounds is vital to identify their harmful effects on humans, animals, plants and the environment. We focus on several aspects of machine learning (ML) in the context of toxicity prediction, e.g. investigating into novel descriptors, applicability of models to external data and interpretability of deep learning (DL) models (toxicophore). These investigations aim to come closer to the vision of transforming toxicology into a predictive science and reducing the number of animal testing.
DeeplearningVS is a project which aims to study a novel rescoring method based on deep learning (DL) techniques to enhance the accuracy of docking results, and boost the structure-based virtual screening outcome.
Deep learning requires lots of data which in the case of physico- chemical and bioactivity remains scarce. Here, we exploit that one compound can be represented by various SMILES strings as means of data augmentation and we explore several augmentation techniques. The best strategies lead to the Maxsmi models, the models that maximize the performance in SMILES augmentation. These models are trained on four data sets, including experimental solubility, lipophilicity, and bioactivity measurements, and are available for prediction on novel compounds.
In this project, we aim to explore the potential of transformer models to optimize molecular properties and/or generate new molecules with desired non-toxic properties. Unlike currently available machine and deep learning methods, self supervised learning models, e.g., the transformer architecture, provide generalizability by being pre-trained on large unsupervised dataset then fine-tuned on small downstream datasets. The transformer model is highly resourceful with the ability to perform molecular property prediction, optimization, and/or generation.
Historical Virtual Control Groups: one step forward into the future of animal testing in toxicology. Our goal is to reduce the number of animals used in experiments. Starting with an exceptional dataset provided by members of the eTRANSAFE consortium we start the journey into the future of animal testing via derivation and incorporation of virtual control groups in animal testing approaches and thus enabling a 3R strategy.
Machine learning - and especially deep learning - models require large datasets for training. As such datasets, especially those containing protein-ligand-complex information - are more rare in the drug design landscape, we assess the use of in silico structural docking data for machine learning. To this end, we perform template docking using the OpenEye software on a large kinase activity dataset (kinodata) following the complex generation pipeline developed in kinoml.