TeachOpenCADD

Open source data and software are increasingly generated, developed and used in computer-aided drug design (CADD). This development allows to build modular pipelines for reproducible and reusable research as well as to explore and contribute to open software code. While code and usage of such software is usually well documented, its full potential for CADD projects often remains unreached, especially for beginners, due to the lack of application examples combining different toolkits.

TeachOpenCADD is a teaching platform offering tutorials on central topics in cheminformatics and structural bioinformatics. The tutorials contain theoretical background and practical implementations using open source data and software. Implementations are available in two formats: On the one hand, interactive Jupyter notebooks demonstrate how to set up code-based pipelines (Python). On the other hand, the same topics are transformed into KNIME workflows, an alternative to code-based workflows. Here, an intuitive, drag-and-drop style graphical interface is used to string together pre-implemented code units (nodes) with standardized functionalities.

TeachOpenCADD is suitable for self-study training and classroom teaching, but can also serve as a starting point in research projects. The platform is freely available on GitHub and open to contributions from the community.

The TeachOpenCADD platform offers tutorials covering a step-by-step pipeline to propose novel EGFR kinase inhibitors with concepts from cheminformatics (green) and structural bioinformatics (orange).

Figure: The TeachOpenCADD platform offers tutorials covering a step-by-step pipeline to propose novel EGFR kinase inhibitors with concepts from cheminformatics (green) and structural bioinformatics (orange).

TeachOpenCADD topics

TeachOpenCADD offers teaching material on common tasks in computer-aided drug design. Currently, the following topics are available:

Cheminformatics
  1. Compound data acquisition: ChEMBL
  2. Molecular filtering: ADME and lead-likeness criteria
  3. Molecular filtering: Unwanted substructures
  4. Ligand-based screening: Compound similarity
  5. Compound clustering
  6. Maximum common substructures
  7. Ligand-based screening: Machine learning
Structural bioinformatics
  1. Protein data acquisition: Protein Data Bank (PDB)
  2. Ligand-based pharmacophores
  3. Binding site similarity

Topics 1-10 are available as Python-based Jupyter notebooks and topics 1-8 can additionally be used in the form of KNIME workflows.

Software and resources

People

Collaborators:
  • Greg Landrum · KNIME
  • Daria Goldmann · KNIME

Funding

  • Bundesministerium für Bildung und Forschung, grant ID 031A262C
  • Deutsche Forschungsgemeinschaft (DFG), grant ID VO 2353 / 1-1
  • HaVo-Stiftung, Ludwig-shafen, Germany
  • Open Access Publication Fund of Charité – Universitätsmedizin Berlin
  • Stiftung Charité (Einstein BIH Visiting Fellow Project)
  • “SUPPORT für die Lehre” program (Förderung innovativer Lehrvorhaben) of Freie Universität Berlin.

Publications