CDIT researchers have made another successful year at this year’s ACM CIKM conference (http://cikm2016.cs.iupui.edu/), to be held at Indianapolis, USA in Oct.
With acceptance rate of full papers as low as 17%, CDIT secured 3 full research papers and 2 short papers. Well done and congratulations!!
The three full research papers are:
1. Efficient Orthogonal Non-negative Matrix Factorization over Stiefel Manifold (Wei Emma Zhang, Mingkui Tan, Quan Z. Sheng, and Qinfeng Shi)
Orthogonal Non-negative Matrix Factorization (ONMF) approximates the data matrix X by the product of two lower-dimensional factor matrices: X ≈ UVT, enforcing one of them orthogonal. ONMF works well for clustering, but does not preserve orthogonality well, and does not have fast convergence. In this paper, we propose to preserve the orthogonality of U in the setting of Stiefel manifold and develop a nonlinear Riemannian Conjugate Gradient (NRCG) method to search on Stiefel manifold with Barzilai-Borwein (BB) step size. We update V using a closed-form solution with a non-negativity constraint. Our approach allows the mixed sign on the orthogonal factor matrix U, which is a variant of Semi-NMF  being preferable for clustering. Extensive experiments on both synthetic and real-world datasets show consistent superiority of our method over other approaches in terms of orthogonality preservation, convergence speed and clustering performance.
2. Truth Discovery via Exploiting Implications from Multi-Source Data (Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, and Xiaofei Xu)
Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery is emerging as a counter-measure of resolving the conflicts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources’ claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of the truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources’ behavioral features in the specific datasets, and considering values’ co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.
3. Empowering Truth Discovery with Multi-Truth Prediction (Xianzhi Wang Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, and Xiaofei Xu)
Truth discovery is the problem of detecting true values from the conflicting data provided by multiple sources on the same data items. Since sources’ reliability is unknown a priori, a truth discovery method usually estimates sources’ reliability along with the truth discovery process. A major limitation of existing truth discovery methods is that they commonly assume exactly one true value on each data item and therefore cannot deal with the more general case that a data item may have multiple true values (or multi-truth). Since the number of true values may vary from data item to data item, this requires truth discovery methods being able to detect varying numbers of truth values from the multi-source data. In this paper, we propose a ranking-based approach for multi-truth discovery, which addresses the above challenges by providing a general framework for enhancing existing truth discovery methods. In particular, we redeem the numbers of true values as an important clue for facilitating multi-truth discovery. We present the procedure and components of our approach, and propose three models, namely the byproduct model, the joint model, and the synthesis model to implement our approach. We further propose two extensions to enhance our approach, by leveraging the implications of similar numerical values and values’ co-occurrence information in sources’ claims to improve the truth discovery accuracy. Experimental studies on real-world datasets demonstrate the eeffectiveness of our approach.