Chapter 4: Data Quality Assessment for In Silico Methods: A Survey of Approaches and Needs Check Access
-
Published:28 Oct 2010
M. Nendza, T. Aldenberg, E. Benfenati, R. Benigni, M. Cronin, S. Escher, ... T. Vermeire, in In Silico Toxicology, ed. M. Cronin and J. Madden, The Royal Society of Chemistry, 2010, ch. 4, pp. 59-117.
Download citation file:
As indicated in Chapter 3, there are a large number of potential sources of data now available for modelling purposes. These range from historical literature references for a few compounds to highly curated databases of hundreds of thousands of compounds, available via the internet. Before including any data in an in silico model, the question of data quality must be addressed. Although it is difficult to define the quality of data in absolute terms, it is possible to assess the suitability of data for a given purpose. There are many reasons for variability within data and the degree of error that is acceptable for one model may not be the same as for another. For example generating a global model intended to pre-screen large numbers of compounds does not require the same degree of accuracy as performing an individual risk assessment for a chemical of interest. In this chapter, sources of data variability and error will be discussed and formal methods to score data quality, such as use of the Klimisch criteria, will be described. Examples of data quality issues will be given for specific endpoints relating to both environmental and human health effects. Mathematical approaches (Dempster-Schafer theory and Bayesian networks) demonstrating how this information relating to confidence in the data can be incorporated into in silico models is also discussed.