Chapter 20: A Few Guiding Principles for Practical Applications of Machine Learning to Chemistry and Materials
Published:15 Jul 2020
We describe five specific guiding principles for applications of machine learning (ML) to problems in chemistry and material sciences, using data from both experiments and simulations. The principles are the following: 1. Use ML for interpolation but with care for extrapolation; 2. Ensure consistency between sources of data and the targeted application; 3. Correlation is not causation; 4. Optimize information extraction when using ML; 5. Combine different methods, including experiments, theory, and computing to provide a larger window of applications. These principles were developed based on the applications that the authors have been actively involved in, in both industrial and academic settings. Each of these guiding principles is illustrated, using examples from biology, chemistry, physics, engineering, or material science. Examples include Mendeleev's periodic table, estimation of interface adhesion in semiconductor materials, measurements in chemical analysis for cancer chemistry, singularities in evolutionary biology, and the development of faster quantum chemistry methods. The use of specific examples, in turn, will help illustrate the basic premise behind each of the principles. We believe that these unique perspectives highlight potential fallacies in applying these techniques broadly to all problems in natural sciences and engineering, without appropriate bounding of accuracy and precision, especially in areas related to the chemical and materials sciences.