15/12/2021 – AI3SD Autumn Seminar X: Molecules & Data : AI 4 Scientific Discovery

This was the tenth and final event of the AI3SD Autumn Seminar Series that was run from October 2021 to December 2021. This seminar was hosted online via a zoom webinar and the theme for this seminar was Molecules & Data, and consisted of two talks on the subject. Below are the videos of the talk and speaker biographies. The full playlist of this seminar can be found here.

Data-Driven Molecular Design in Computational Toxicology – Dr Barbara Zdrazil

Barbara Zdrazil is a group leader at the University of Vienna, and works as a safety data scientist for the European Bioinformatics Institute (EMBL-EBI). Barbara’s research is concentrated on integrating Data Science approaches into the Computational Molecular Design process. She focuses on off-targets (mainly hepatic uptake transporters of the SLC family), and develops automatized computational techniques to link heterogeneous data sources, perform bioactivity profiling, and generate predictive models – especially for toxicity predictions. In addition, Barbara is interested in large-scale data analyses including time trend analyses by utilizing public domain data. At EMBL-EBI, Barbara is contributing to Open Targets, a project which aims to enable systematic target identification and prioritization. Barbara received her PhD from the University of Vienna. During her PhD, Barbara mainly focused on ligand-based models for P-glycoprotein inhibitors. In her postdoctoral studies at the University of Düsseldorf she focused on structure-based modeling of DNA polymerase inhibitors. Barbara contributed to many EU-funded projects (including Open PHACTS and EU-ToxRisk) and was leading a nationally funded FWF project focusing on modeling of hepatic transporters from 2017-2021. In 2019, Barbara accomplished her Habilitation in Pharmacoinformatics at the University of Vienna. Since 2020, Barbara is also working as Associate Editor for the Journal of Cheminformatics.

Q & A

Q1: What does the hydropathy index measure or indicate?

This basically was something we knew in the beginning about the transporters – the relative hydrophobicity and hydrophilicity of the amino acids. We therefore assumed it’s most probably 12 transmembrane helices but much more, we didn’t really know, it was all we knew in the beginning.

Q2: Can Cryo-EM produce a 3D structure of a transmembrane protein?

Of course, yes. It can and there have been many more structures being released over the past few years, but unfortunately not yet for any OATP member. But for some of the SLC transporters, there have been a few, yes, so much more easily than with all the more conventional methods.

Finding Small Molecules in Big Data – Associate Professor Emma Schymanski

Associate Professor Emma Schymanski is head of the Environmental Cheminformatics (ECI) group at the Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg. In 2018 she received a Luxembourg National Research Fund (FNR) ATTRACT Fellowship to establish her group in Luxembourg, following a 6 year postdoc at Eawag, the Swiss Federal Institute of Aquatic Science and Technology and a PhD at the Helmholtz Centre for Environmental Research (UFZ) in Leipzig, Germany. Before undertaking her PhD, she worked as a consulting environmental engineer in Perth, Australia. She has over 90 publications and a book, and is involved in many collaborative software efforts. Her research combines cheminformatics and computational (high resolution) mass spectrometry approaches to elucidate the unknowns in complex samples, primarily with non-target screening, and relating these to environmental causes of disease. An advocate for open science, she is involved in and organizes several European and worldwide activities to improve the exchange of data, information and ideas between scientists to push progress in this field, including NORMAN Network activities (e.g. NORMAN-SLE https://www.norman-network.com/nds/SLE/), MassBank (https://massbank.eu/MassBank/), MetFrag (https://msbi.ipb-halle.de/MetFrag/) and PubChemLite for Exposomics (https://doi.org/10.1186/s13321-021-00489-0).

Q & A

Q1: How are you finding the buy into fair templates? We are very pro making our data fair and I know a lot of people are, but equally it’s still a problem that not all data is, so how have you found that?

I’m not aware of anyone having directly used the template yet. They really haven’t had the articles out there for very long, but basically with the NORMAN Suspect List Exchange we deal with this data and with the mapping – we have been dealing with it for years, PubChem has equally been dealing with it for years. Let’s just say we work on a slight modification thereof. We have header mapping files that do this, so we haven’t gone back and retrospectively put all the Suspect List Exchange into this template, but rather we’re working with our data and we started putting out these articles because we can see there’s a very consistent pattern in the information that you want to use, and if people are providing it with very consistent headers, this is extremely easy for people to digest and value add. And it’s really not that hard. What we’ve found with contributors through experiences, if there’s no template out there, then people are not sure what to provide. But if there’s a template out there and they’re insecure, you can just point them to the template and then they’re like, “oh, OK, this is not so hard”, and so we’re hoping over time it would just help raise awareness and keep growing the data. For the NORMAN Suspect List Exchange, we never imagined how big it would be. We’ve now got 89 lists and we’re about a tenth of PubChem contributors (they have 828 contributors). We have 89 contributors, so for a small environmental initiative, it’s growing.