This was the penultimate event of the AI3SD Autumn Seminar Series that was run from October 2021 to December 2021. This seminar was hosted online via a zoom webinar and the theme for this seminar was Large Spaces, and consisted of two talks on the subject. Below are the videos of the talk and speaker biographies. The full playlist of this seminar can be found here.
Audacity of huge: Machine Learning for the discovery of transition metal catalysts and materials – Professor Heather Kulik
Heather J. Kulik is an Associate Professor in Chemical Engineering at MIT. She received her B.E. in Chemical Engineering from Cooper Union in 2004 and her Ph.D. in Materials Science and Engineering from MIT in 2009. She completed postdocs at Lawrence Livermore (2010) and Stanford (2010−2013), prior to returning to MIT as a faculty member in 2013 and receiving tenure in 2021. Her work has been recognized by a Burroughs Wellcome Fund Career Award at the Scientific Interface (2012-2017), Office of Naval Research Young Investigator Award (2018), DARPA Young Faculty Award (2018), AAAS Marion Milligan Mason Award (2019-2020), NSF CAREER Award (2019), the Industrial & Engineering Chemistry Research “Class of Influential Researchers”, the ACS COMP Division OpenEye Award for Outstanding Junior Faculty in Computational Chemistry, the JPCB Lectureship (ACS PHYS), the DARPA Director’s Fellowship (2020), MSDE Outstanding Early-Career Paper Award (2021), and a Sloan Fellowship (2021).
Q & A
Q1: Is the machine learning model you used with DFT suitable for other quantum chemistry methods like MP2 and CCSD(T)? Please explain why or why not.
So in a different part of my research program, we spend a lot of time asking ourselves where do errors come from if our materials are strongly correlated, where should we have been using DFT models? The reason that everything I show is with DFT is because the computational cost is so high to generate a few thousand points with coupled cluster, and in transition metal chemistry there’s a less obvious hierarchy of correlated wavefunction methods sitting above DFT. Some of the things that we’ve done, it absolutely is amenable to training on those datasets. Some of the things we’ve done is we’ve predicted strong correlation with multireference diagnostics over these spaces, so we’ve trained models and then looked at hotspots of strong correlation, that’s one thing, because then you can identify that you definitely need to go beyond DFT. Implicit in the double hybrids is MP2 correlation. So obviously MP2 is feasible, but it tends not to perform better necessarily on its own for transition metal chemistry, and then one of the things we’ve done is we’ve used the same approach to learn the difference between the DFT answer and the coupled cluster answer and identify when that difference is going to be large. And that’s a paper that’s coming out shortly and in that work what we’ve done is we’ve focused on identifying places where there’s an imbalance in the degree of strong correlation between the points being compared. That’s when going beyond DFT is most essential. And naturally, we’ve also shown some machine learning of experimental data. We’ve also done that in the context of spin crossover and predicting properties that are less sensitive to the DFT functional. So, there’s no inherent limitation except that we have to have a bunch of high-quality data that we really trust, and historically that hasn’t been something that we’ve been able to get on transition metal complexes that have 50 atoms or more.
Q2: How many electrons are in the DFT calculations you are doing and how much time do the DFT calculations take?
I guess I’m not going to be able to convert it to electrons. The biggest systems we were studying were 200 atoms, at least 100-something of which were heavy atoms. The smallest complexes we ever study are about 25 atoms, so they’re all bigger than the small molecule sets that people typically study. In terms of valence electrons, you multiply all the organic/the heavy organic by at least six and then add with an ECP add about 24 for the metal centre, so a decent number of electrons. I should mention we use GPU accelerated DFT with the with the code TeraChem which we develop which helps accelerate these larger DFT calculations, certainly at the cost of making some other shortcuts on the basis set, but in general, these are big calculations. We’re able to complete most of them in five days or less, but most of them take considerably more than a couple hours. Even with a fast DFT code.
Q3: What hardware and software are you using for the DFT calculations?
The first bit, actually there is one subtlety. The first bit I show we run everything through with a modest double zeta basis set and we use TeraChem, which is a GPU accelerated quantum chemistry code developed at Stanford and that we use our code molSimplify to run in an automated fashion. We write our own codes to automate that process, and that allows us to get a large number of calculations to converge. When I was showing the 23 different density functionals in that case, what we do is we actually take that calculation out of TeraChem and we pass it to PSI 4, mainly because we could write an interface to PSI 4, but we’ve also done this with Orca and other codes, and the main idea is that we translate and we keep the wavefunction as close to the original result as possible, and then we can carry out other density functionals that maybe aren’t implemented in TeraChem. So, for instance, the Minnesota functionals are not in TeraChem and some of those double hybrids are not in TeraChem. So implicitly our approach is agnostic to the code that we use, we try to use whatever code is going to have the methodology we want, and we write our own. Our own molSimplify automatic design ensures that it’s pretty seamless to switch between these codes, but you know TeraChem to generate the initial geometry and stuff is chosen for speed. And we use a range of graphics cards so it is GPU accelerated and you can get better performance out of the high end of GPU cards, but we also we tend to use the low cost gamer cards that do just as well. The benefit increases slightly if you do the higher end cards, but the cost increases dramatically, so we tend to do this on one or two GPUs. The bigger molecules may be on two GPUs. And just pretty low-cost standard cards, both locally on our own cluster as well as in in supercomputing resources.
Q4: What type of water or solvent model do you use with DFT?
We use implicit solvent models for everything I talked about including things like logP. We use a polarisable continuum model and that’s, again, for speed. You could imagine certain properties if you care about explicit hydrogen bonding interactions. You might want to do more of a QM/MM approach. In other parts of my group we worry about that, but for here even the logP I was showing for the RFBs that was the difference in the solvation free energy in a polarisable continuum model between water and octanol.
Q5: Do you favour the consensus vs delta learning approach for DFT data?
I think they both have promise, we’ve had trouble in the delta learning specifically figuring out what we put at the top of the ground truth. We have some small transition metal complex data that I mentioned we’re working on putting together. It’s relatively small, and it looks promising, but we have to treat coupled cluster as the ground truth or DLPNO coupled cluster as the ground truth, and if you read into literature, that’s not necessarily something people consistently believe is going to be more robust than the consensus approach. So, I think the consensus approaches adds value and it’s also cheaper.
Q6: I thought the low-cost gamer GPUs do not have FP64. Are you doing your computations in FP64?
We use mixed precision that’s built into TeraChem, which takes advantage of the fact that most pieces of the quantum chemistry calculation do not require double precision all the time. But, as newer cards have come out, the difference between the double and single precision, speed, the differential has been less and less, so that has been less of a problem, but it was originally the problem you would think about with these cards. It’s not something that comes up that often in quantum chemistry, and it comes up even less in molecular dynamics so a lot of things are good enough in single precision.
Artificial Intelligence for Safer Urban Space – Professor Zoheir Sabeur
Zoheir Sabeur is Professor of Data Science and Artificial Intelligence at Bournemouth University (2019-present). He is also Visiting Professor of Data Science at Colorado School of Mines, Golden, Colorado, USA (2017-present). Zoheir was Science Director at the School of Electronics and Computer Science, IT Innovation Centre, University of Southampton (2009-2019). He led his data science research teams in more than 30 large projects as Principal Investigator (PI). The research was mainly supported with research grants (totalling £8.0M) and awarded by the European Commission (under FP5, FP6, FP7 and H2020), Innovate UK, DSTL, NERC and Industries. Prior to Southampton, Zoheir worked as Director and Head of Research, at BMT Group Limited (1996-2009), where he led his teams in the development of advanced environmental information systems, in particular the PROTEUS system for the UK O&G Industries and UK Government. Prior to that, Zoheir held several academic appointments in Computing as Senior Research Fellow at Oxford Brookes University(1993-1995), SERC Research Fellow at University of Leeds(1991-1993) and University of Strathclyde(1990-1991). He also worked as a Research Scientist in the Intensive Computing Lattice QCD Group at the University of Wuppertal, Germany (1987). Zoheir graduated with a PhD and MSc in Particle Physics from the University of Glasgow (1985-1990). His PhD was on “Lattice QCD at High Density with Dynamical Fermions”. This was in fact his earliest involvement in “Data Science” using vector machines for intensive computing and understanding hadron matter thermodynamics, under the UK Lattice QCD Grand-Challenge. In the last decade or so, Zoheir’s long research career, focussed more on fundamentals of Artificial Intelligence, knowledge extraction for human, natural, or industrial processes behaviour understanding. These are being investigated in context of cyber-physical security, healthcare, industrial, environmental systems, and more. Zoheir has published over 130 papers in scientific journals, conference proceedings and books. He is peer reviewer, member of international scientific committees and editing board of various science and engineering conferences and journals. Zoheir chairs the OGC Digital Global Grid System Specification and Domain Working Groups, and co-chairs the AI and Data Science Task Group at the BDVA. He is Fellow of the British Computer Society; Member of the Institute of Physics; and Fellow of IMaREST.
Q & A
Q1: What software platform did you use for your classifiers (SciKitLearn, H2O, Mathematica, …)?
It’s all coded in Python by Allesandro, using YOLO v5.0 …so this is the early investigation work we’re doing, but we’re going to push towards a GPU platform as a data science platform that we are purchasing in the New Year to actually deal with the scalability of the work.
Q2: What does a “crowd simulation” do and how does it work?
This is an activity done by our colleagues in a in the Czech Republic as well as in Spain, an academic partner from Cantabria University near Bilbao. And also Crowd Dynamics, a company in Manchester, basically the simulation of crowd is based on first principles. There are models that are related to social interactions, which are understood over the years and implemented from such theories, ….but models from these theories are not fully expert in crowd behaviour modelling, but the simulations, they produce, are actually very realistic. They take into account the boundary conditions of the space therein and how the crowd can be evacuated safely. So, this additional piece of information actually goes to level 4, of high-level fusion, for further reasoning. In addition to what we produce in terms of detections of behaviour. So, our part of the work actually deals with machine learning or data driven machine learning based on understanding of behaviour, whereas crowd modelling simulations are based on first principles theories.
Q3: How is it possible to make a digital twin of humans since human beings cannot be simulated?
I think we’re going towards that, the reason I’m saying that is that in my group I haven’t mentioned those who are doing other activities, but it’s all falling into the processes and behaviour understanding. These processes are natural or industrial in nature, so we work on all of these. So, one example is that we look at humans and try to understand their behaviours in terms of health. I don’t know if I’m answering this question here, but it is possible in principle to develop a digital twin of a human. Should we have for example the measurements of this human in terms of their phenotypic related data, and/or related parameters. This way we will understand their health but we will understand whether they are under stress or basically in need of particular help. This is being developed in collaboration with partners in the health sector, so I’m looking for instance at specific patients with respiratory problems and we monitor them in their homes instead of, leaving them alone and then when they come to a very, very difficult situation, they will have to go to the hospital, A&E for instance, and that would increase the number of patient workload in hospitals. Instead, AI comes in and under their agreement so.. you have to have a legal framework to do that to develop digital a twin of a human. But we’re not at the stage with fully developing it;…. For me it is possible to do that, providing you get the full observations of this human embedded sensing in place. You have embedded sensing but also non-intrusive sensing outside the body, cameras, sound sensing and smell sensing, and all of these can create the digital twin of the human inside a virtual space or smart space, so….., it’s not impossible.