Using Bayesian Networks for Cleansing Trauma Data


Medical data is unique due to its large volume, heterogeneity and complexity. This necessitates costly active participation of medical domain experts in the task of cleansing medical data. In this paper we present a new data cleansing approach that utilizes Bayesian networks to correct errant attribute values. Bayesian networks capture expert domain knowledge as well as the uncertainty inherent in the cleansing process, both of which existing cleansing tools fail to model. Accuracy is improved by utilizing contextual information in correcting errant values. Our approach operates in conjunction with models of possible error types that we have identified through our cleansing activities. We have applied our method to correcting instances of these error types. We evaluate our approach and also show the various error corrections in this paper.