05/12/2018 - AI3SD Launch : AI 4 Scientific Discovery

On the 5th December 2018 we launched AI3SD officially! Here is a blog post written by Michelle Pauli about the event. You can find full coverage of all the presentations in the main report, also written by Michelle Pauli, which can be found here: https://eprints.soton .ac.uk/427810/, and an online version of the launch booklet can be found here.

Will AI save the NHS?
Michelle Pauli

The pace of progress in AI has been picking up dramatically. The recent launch meeting of the EPSRC-funded AI3SD Network+ brought together experts in the field to explore new developments in machine learning and drug discovery and set the scene for the next three years of innovation and collaboration.

Will artificial intelligence (AI) be the saviour of the NHS? It might sound far-fetched but it was one of the most compelling assertions offered at the launch meeting of AI3SD Network+ (Artificial Intelligence and Augmented Intelligence for Automated Investigations for Scientific Discovery) in London earlier this month.

The claim was made by Professor Jackie Hunter, director of BenevolentAI, who kicked off the meeting by highlighting the sheer unsustainable inefficiency of the drug discovery industry. With only 6% of molecules currently reaching market, what other sector would tolerate a failure rate of 94%? It takes 12-15 years to get a drug to market and most people in the industry never work on something that makes it through to clinical trials, let alone patients. She outlined how AI will reduce costs and improve success rates, describing the “tsunami of evidence” of the last few years, and the inability to harness such data at scale until AI is applied to that space. Using machine learning will reduce timelines in all phases of drug discovery: hypothesis generation, drug target validation, lead discovery and lead optimisation. In drug development it will have equally powerful effects, from rapid interrogation of study data, identification of novel biomarkers and enhanced mining of clinical data for patterns of response to better real-world outcomes monitoring. AI is, she says, “already transforming R+D and the pace will only accelerate. It will require cultural change, different ways of working, different ways of funding and new business models.”

Professor Hunter was talking to more than 100 delegates from a wide mix of universities, commercial companies, government agencies and research organisations. They had come to hear a range of perspectives on the topic of AI and chemical synthesis, and to participate in an exciting new ESPRC-funded network which will demonstrate how cutting-edge artificial and augmented intelligence technologies can be used to push the boundaries of scientific discovery.

The launch took place in the month when DeepMind announced that its latest AI program, AlphaFold, had taken a significant first step in solving the ‘protein folding’ problem, coming top of a competition to predict the 3D shapes of proteins. Professor Adam Prugel-Bennett touched on these developments in his fascinating overview of machine learning, particularly the progress DeepMind’s programs AlphaGo and AlphaGo Zero have made in the Chinese board game Go in the last three years and how it exemplifies the field’s sudden, recent change in pace. The ImageNet large scale visual recognition challenge also gives a clear indication of progress with image classification: in the competition’s first year in 2010 every team got at least 25% wrong; by 2017, 29 out of 38 teams got less than 5% wrong – a “super human performance”.

Professor Prugel-Bennett was keen to explode the myth that AI is some kind of machine that does impeccable logic. He emphasised instead that it is “just reducing errors”, recognising pattern sets very well and using those to make judgement, reducing errors on some sets. There are certain areas where making fewer errors is clearly of benefit, such as fraud and spam detection, self-driving cars and, of course, medical diagnosis. However, the training data needs to be there.

Early work in deep learning used supervised learning where there were labels for the data. In the last three years the excitement has been around unsupervised learning, working with the data alone and learning patterns in data. One of the techniques for doing this is Generative Adversarial Networks (GANs), the other is Variational Auto-Encoders (VAEs). With object recognition now mainstream, the machine learning community is looking at more demanding tasks, including visual question answering, which involves understanding both natural language and images.

The question of strategies for collating the colossal amounts of data needed for AI was picked up by Professor John Overington, from the Medicines Discovery Catapult (MDC), a national facility connecting the UK community to accelerate innovative drug discovery. He used his experience with ChEMBL, the world’s largest public primary database of medical chemistry data, to delve into the challenges and opportunities of using free, large-scale datasets for AI training and application data, touching on the ‘reproducibility reproducibility problem’ along the way.

He was frank about the level of errors in the public datasets he has been involved with – ChEMBL (2.3m compounds, an open-data API is available), SureChEMBL (the public chemical patent resource with 18m structures generated via name-recognition, and available as a client feed) and Unichem (a single chemical integration source). For instance, errors run to 5% of structures; 2-3% of targets; 1% of activity values. There is also variability in the data (the underlying ‘reproducibility problem’) – about the same 10-fold difference for different orthologues as for different labs, a five to 10-fold variance in cell line data, and about 1-2% of compounds from suppliers may not be correct. However, he was clear that, nine years ago when ChEMBL was set up, it was a good design decision to focus on collecting all the data in case it would have a use in the future and could be cleaned up accordingly.

In describing Crispy, a live knowledge graph of UK drug discovery assets and capability, Professor Overington emphasised the need for collaborative and competitive intelligence, the current difficulty of discovering the best collaborators and the need to join up people with assets and skills to work together. This renewed imperative for collaboration for successful AI discoveries is also an area Professor Hunter touched on in her presentation, highlighting the way that AI is changing ways of working, creating more integrated environments and cross-functional teams.

Creating an environment for collaboration and bringing people – from both diverse organisations and disciplines – together is one of the key aims of the Network, with funding calls focusing on interdisciplinary applications between AI and chemistry. This launch meeting is just the beginning, urged the network’s principal investigator Professor Jeremy Frey and coordinator Dr Samantha Kanza. The first funding call will be early next year and future events, which will all be listed on the website include conferences, workshops and hackathons.

To join the mailing list send an email to mailto:listserv@jiscmail.ac.uk with the following details: