Comparison of missing data infilling mechanisms for recovering a real-world single station streamflow observation

Baddoo, T. D.; Li, Z.; Odai, S. N.; Boni, K. R. C.; Nooni, I. K.; Andam-Akorful, S. A.

dc.contributor.author	Baddoo, T. D.
dc.contributor.author	Li, Z.
dc.contributor.author	Odai, S. N.
dc.contributor.author	Boni, K. R. C.
dc.contributor.author	Nooni, I. K.
dc.contributor.author	Andam-Akorful, S. A.
dc.date.accessioned	2023-01-16T16:01:32Z
dc.date.available	2023-01-16T16:01:32Z
dc.date.issued	2021
dc.identifier.other	10.3390
dc.identifier.uri	https://pubmed.ncbi.nlm.nih.gov/34444127/
dc.identifier.uri	http://atuspace.atu.edu.gh:8080/handle/123456789/2353
dc.description.abstract	Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of 'missingness' occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.	en_US
dc.language.iso	en_US	en_US
dc.publisher	International Journal of Environmental Research and Public Health	en_US
dc.relation.ispartofseries	vol.;18
dc.subject	missing data	en_US
dc.subject	univariate imputation	en_US
dc.subject	multiple imputation	en_US
dc.subject	SPSS	en_US
dc.subject	R	en_US
dc.subject	China	en_US
dc.title	Comparison of missing data infilling mechanisms for recovering a real-world single station streamflow observation	en_US
dc.type	Article	en_US