Chapter 15: Application of Omics in Agricultural Sciences: Wheat Genome Wide Association Studies Causality Analysis
Published:23 Mar 2021
This chapter reviews the applications of data science methodologies for analysis of omics data in agricultural sciences. In view of omics observational Big Data, a focus on validation of model-based inference and causality is emphasized. The review is also focused on artificial intelligence random decision tree forest modelling with propensity score method for sample adjustments to reduce bias due to cofounding covariates. Special attention is paid to wheat genome wide association studies (GWAS) data sets obtained by diversity arrays technology of 30 000 markers and 2000 landraces, for eight traits, cultivated in four environments. Random forest decision tree models are applied for trait predictions and propensity score classification. Bootstrap simulation of the data sets provided estimation of 13% relative root mean square error (RMSE) for predictions with untrained data sets. The evaluation of causalities of single nucleotide polymorphism (SNP) markers for prediction of grain yield data depending on environment is also emphasized and causalities of individual SNPs are evaluated as average treatment exposures (ATE). Upregulations and downregulations of SNPs are identified as positive and negative ATE values in the range of [−0.2, 0.2] depending on environmental conditions. Intensities of effects of SNPs on traits are estimated as odd ratios (OR) by relative odds data of the subsets with presence and absence of corresponding SNPs. Maximal upregulation OR are in the range [3, 5] while for downregulation the range is [0.2, 0.4]. SNPs causalities show strong dependency on environment.