Intelligent Wikipedia
Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method, creating enough structured data to motivate the development of applications. However, automatic information extraction systems produce errors and are not tolerated by users, whereas user contributions incentives and management to control vandalism. We therefore propose systems that tightly integrate human and machine feedback: information extraction techniques generate candidate facts, and users correct errors, improving training data and enabling a virtuous cycle.
Publications
|
|
Temporal Information Extraction Xiao Ling and Daniel S. Weld AAAI Conference on Artificial Intelligence, 2010. Full Paper (PDF) (Dataset) |
|
|
Learning 5000 Relational Extractors Raphael Hoffmann and Daniel S. Weld Annual Meeting of the Association for Computational Linguistics, 2010. Full Paper (PDF) |
|
|
Open Information Extraction using Wikipedia Fei Wu and Daniel S. Weld Annual Meeting of the Association for Computational Linguistics, 2010. Full Paper (PDF) (Program Output) (Dataset) (Dataset) (Code) |
|
|
Machine Reading at the University of Washington Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, ChloƩ Kiddon, Thomas Lin, Xiao Ling, Mausam, Alan Ritter, Stefan Schoenmackers, Stephen Soderland, Daniel S. Weld and Fei Wu Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010. Workshop Paper (PDF) |
|
|
Amplifying Community Content Creation with Mixed Initiative Information Extraction Raphael Hoffmann, Saleema Amershi, Kayur Patel, Fei Wu, James Fogarty and Daniel S. Weld Conference on Human Factors in Computing Systems, 2009. Full Paper (PDF) |
|
|
Information Extraction from Wikipedia: Moving Down the long Tail Fei Wu, Raphael Hoffmann and Daniel S. Weld Knowledge Discovery and Data Mining, 2008. Full Paper (PDF) |
|
|
Automatically Refining the Wikipedia Infobox Ontology Fei Wu and Daniel S. Weld International World Wide Web Conference, 2008. Full Paper (PDF) |
|
|
Autonomously Semantifying Wikipedia Fei Wu and Daniel S. Weld ACM Conference on Information and Knowledge Management, 2007. Full Paper (PDF) |
Downloads
Code:WOE: Open Information Extraction using Wikipedia (requires adaptation in order to integrate with the latest Stanford Core NLP)
Data:
sliver data for tackbp
Dataset:
Schema-mapping-testset-KOG
Pseudo-manually created benchmark for subsumption detection in KOG
The benchmark for WOE. It contains three datasets created by randomly selecting 300 sentences from one of the following corpora: Wikipedia, WSJ, and the Web.
50 sampled sentences from Wikipedia, which describe 5 target relations
500 sentences with labeled triples
Program Output:
Extraction patterns created by WOE
Wikipedia Infobox Ontology created by KOG
