Blog Archive

Search This Blog

Monday, November 20, 2017

Cross-Checking Multiple Data Sources Using Multiway Join in MapReduce

As data sources accumulate information and data size escalates it becomes more and more difficult to maintain the correctness and validity of these datasets. Therefore, tools must emerge to facilitate this daunting task. Fact checking usually involves a large number of data sources that talk about the same thing but we are not sure which holds the correct information or which has any information at all about the query we care for. A join among all or some data sources can guide us through a fact-checking process. However, when we want to perform this join on a distributed computational environment such as MapReduce, it is not obvious how to distribute efficiently the records in the data sources to the reduce tasks in order to join any subset of them in a single MapReduce job. To this end, we propose an efficient approach using the multiway join to cross-check these data sources in a single round.

from # All Medicine by Alexandros G. Sfakianakis via alkiviadis.1961 on Inoreader http://ift.tt/2zRNj4s

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Blog Archive

Pages

   International Journal of Environmental Research and Public Health IJERPH, Vol. 17, Pages 6976: Overcoming Barriers to Agriculture Green T...