Two papers on truth discovery from Web data accepted at CIKM 2015

Two research papers from CDIT researchers have been recently accepted by the 24th ACM Conference on Information and Knowledge Management (CIKM 2015), to be held in Melbourne, Australia in October. Both papers target on challenging issue of discovering truth from conflicting, noisy, and massive Web data.

Approximate Truth Discovery via Problem Scale Reduction. Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Xue Li, Xiaofei Xu, and Lina Yao

Many real-world applications rely on multiple data sources to provide information on their interested items. Due to the noises and uncertainty in the data, given a specific item, the information from different sources may conflict. To make better decisions based on these data, it important to identify the trustworthy information by resolving these conflicts, i.e., the truth discovery problem. Current solutions to this problem predict the veracity of each value jointly with the reliability of each data source for each data item. In this way, the efficiency of truth discovery is strictly constrained by the problem scale, which in turn limits the applicability of truth discovery algorithms on large scale problems. To address this challenge, we propose an approximate truth discovery approach, which divides sources and values into different groups according to user-specified approximation
criteria, to reduce the problem scale. The groups are then used for efficient inter-value in influence computation to improve the accuracy . Our approach is applicable to most existing truth discovery algorithms. Experiments on real-world datasets show the approach reduce the computational time of existing algorithms while achieving similar or even better accuracy. The scalability is demonstrated by experiments on large synthetic datasets.

An Integrated Bayesian Approach for Effective Multi-Truth Discovery. Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, and Xue Li.

Truth-finding is the fundamental technique for corroborating reports from multiple sources in both data integration and collective intelligent applications. Traditional truth-finding methods assume a single true value for each data item and therefore cannot deal will multiple true values (i.e., the multi-truth-finding problem). So far, the existing approaches handle the multi-truth-finding problem in the same way as single-truth-finding problems. Unfortunately, the multi-truth-finding problem has its unique features, such as the involvement of sets of values in claims, different implications of inter-value mutual exclusion, and larger source proflles. Considering these features could provide new opportunities for obtaining more accurate truth-finding results. Based on this insight, we propose an integrated Bayesian approach to the multi-truth-finding problem, by taking these features into account. To improve the truth-finding efficiency, we reformulate the multi-truth-finding problem model based on the mappings between sources and (sets of) values. New mutual exclusive relations are defined to reflect the possible co-existence of multiple true values. A finer-grained copy detection method is also proposed to deal with sources with large profiles. Experimental results on three real-world datasets show the effectiveness of our approach.

This entry was posted in Publications, Research, Web Technologies. Bookmark the permalink.

Comments are closed.