Open Information Extraction

How can a computer accumulate a massive body of knowledge? What will Web search engines look like in ten years?


To address the questions above, the Open IE project has been developing a Web-scale information extraction system that reads arbitrary text from any domain on the Web, extracts meaningful information and stores in a unified knowledge base for efficient querying. In contrast to traditional information extraction, the Open Information Extraction paradigm attempts to overcome the knowledge acquisition bottleneck by extracting a large number of relations at once.

Demo: TextRunner extracted over 500,000,000 assertions from 100 million Web pages.
Software: ReVerb Open Information Extraction Software and additional information.
Data: Horn-clause inference rules learned by the Sherlock system.
Demo: Selectional Preferences from Web Text compute admissible argument values for a relation.
Data: 10,000 Functional Relations learned from Web Text predict the functionality of a phrase.

Publications

Open Domain Event Extraction from Twitter Open Domain Event Extraction from Twitter
Alan Ritter, Mausam, Oren Etzioni and Sam Clark
Knowledge Discovery and Data Mining, 2012. Full Paper (PDF)
No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Thomas Lin, Mausam and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2012. Full Paper (PDF)
Open Language Learning for Information Extraction
Mausam, Michael D Schmitz, Robert E. Bart, Stephen Soderland and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2012. Full Paper (PDF)
Entity Linking at Web Scale Entity Linking at Web Scale
Thomas Lin, Mausam and Oren Etzioni
Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction , 2012. Workshop Paper (PDF)
Rel-grams: A Probabilistic Model of Relations in Text Rel-grams: A Probabilistic Model of Relations in Text
Niranjan Balasubramanian, Stephen Soderland and Mausam
Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction , 2012. Workshop Paper (PDF)
Open Information Extraction: the Second Generation Open Information Extraction: the Second Generation
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland and Mausam
International Joint Conference on Artificial Intelligence, 2011. Full Paper (PDF)
Identifying Relations for Open Information Extraction Identifying Relations for Open Information Extraction
Anthony Fader, Stephen Soderland and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2011. Full Paper (PDF) (Data)
Named Entity Recognition in Tweets: An Experimental Study Named Entity Recognition in Tweets: An Experimental Study
Alan Ritter, Sam Clark, Mausam and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2011. Full Paper (PDF)
Commonsense from the Web: Relation Properties Commonsense from the Web: Relation Properties
Thomas Lin, Mausam and Oren Etzioni
AAAI Fall Symposium Series, 2010. AAAI Fall Symposia Paper (PDF)
Identifying Functional Relations in Web Text Identifying Functional Relations in Web Text
Thomas Lin, Mausam and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2010. Full Paper (PDF)
Learning First-Order Horn Clauses from Web Text Learning First-Order Horn Clauses from Web Text
Stefan Schoenmackers, Jesse Davis, Oren Etzioni and Daniel S. Weld
Conference on Empirical Methods in Natural Language Processing, 2010. Full Paper (PDF)
A Latent Dirichlet Allocation method for Selectional Preferences A Latent Dirichlet Allocation method for Selectional Preferences
Alan Ritter, Mausam and Oren Etzioni
Annual Meeting of the Association for Computational Linguistics, 2010. Full Paper (PDF)
Extracting Sequences from the Web Extracting Sequences from the Web
Anthony Fader, Stephen Soderland and Oren Etzioni
Annual Meeting of the Association for Computational Linguistics, 2010. Short Paper (PDF)
Open Information Extraction using Wikipedia Open Information Extraction using Wikipedia
Fei Wu and Daniel S. Weld
Annual Meeting of the Association for Computational Linguistics, 2010. Full Paper (PDF) (Program Output) (Dataset) (Dataset) (Code)
Unsupervised Ontology Induction from Text Unsupervised Ontology Induction from Text
Hoifung Poon and Pedro Domingos
Annual Meeting of the Association for Computational Linguistics, 2010. Full Paper (PDF)
Machine Reading at the University of Washington Machine Reading at the University of Washington
Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, ChloƩ Kiddon, Thomas Lin, Xiao Ling, Mausam, Alan Ritter, Stefan Schoenmackers, Stephen Soderland, Daniel S. Weld and Fei Wu
Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010. Workshop Paper (PDF)
Semantic Role Labeling for Open Information Extraction Semantic Role Labeling for Open Information Extraction
Janara Christensen, Mausam, Stephen Soderland and Oren Etzioni
Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010. Workshop Paper (PDF)
Analysis of a Probabilistic Model of Redundancy in Unsupervised Information Extraction Analysis of a Probabilistic Model of Redundancy in Unsupervised Information Extraction
Doug Downey, Oren Etzioni and Stephen Soderland
Artificial Intelligence, 2010. Journal Article (PDF)
Identifying Interesting Assertions from the Web Identifying Interesting Assertions from the Web
Thomas Lin, Oren Etzioni and James Fogarty
ACM Conference on Information and Knowledge Management, 2009. Full Paper (PDF)
Unsupervised Semantic Parsing Unsupervised Semantic Parsing
Hoifung Poon and Pedro Domingos
Conference on Empirical Methods in Natural Language Processing, 2009. Full Paper (PDF)
    Best Paper Award
What Is This, Anyway: Automatic Hypernym Discovery What Is This, Anyway: Automatic Hypernym Discovery
Alan Ritter, Stephen Soderland and Oren Etzioni
AAAI Spring Symposium Series, 2009. Symposia Paper (PDF)
Unsupervised Methods for Determining Object and Relation Synonyms on the Web Unsupervised Methods for Determining Object and Relation Synonyms on the Web
Alexander Yates and Oren Etzioni
Journal of Artificial Intelligence Research, 2009. Journal Article (PDF)
Open Information Extraction from the Web Open Information Extraction from the Web
Oren Etzioni, Michele Banko, Stephen Soderland and Daniel S. Weld
Communications of the ACM, 2008. Journal Article (PDF)
It's a Contradiction -- No, It's Not: A Case Study using Functional Relations It's a Contradiction -- No, It's Not: A Case Study using Functional Relations
Alan Ritter, Doug Downey, Stephen Soderland and Oren Etzioni
Conference on Empirical Methods in Natural Language Processing, 2008. Full Paper (PDF)
Scaling Textual Inference to the Web Scaling Textual Inference to the Web
Stefan Schoenmackers, Oren Etzioni and Daniel S. Weld
Conference on Empirical Methods in Natural Language Processing, 2008. Full Paper (PDF)
Information Extraction from Wikipedia: Moving Down the long Tail Information Extraction from Wikipedia: Moving Down the long Tail
Fei Wu, Raphael Hoffmann and Daniel S. Weld
Knowledge Discovery and Data Mining, 2008. Full Paper (PDF)
The Tradeoffs Between Open and Traditional Relation Extraction The Tradeoffs Between Open and Traditional Relation Extraction
Michele Banko and Oren Etzioni
Annual Meeting of the Association for Computational Linguistics, 2008. Full Paper (PDF)
Strategies for Lifelong Knowledge Extraction from the Web Strategies for Lifelong Knowledge Extraction from the Web
Michele Banko and Oren Etzioni
International Conference on Knowledge Capture, 2007. Full Paper (PDF)
    Best Student Paper Award
Sparse Information Extraction: Unsupervised Language Models to the Rescue Sparse Information Extraction: Unsupervised Language Models to the Rescue
Doug Downey, Stefan Schoenmackers and Oren Etzioni
Annual Meeting of the Association for Computational Linguistics, 2007. Full Paper (PDF)
Unsupervised Resolution of Objects and Relations on the Web Unsupervised Resolution of Objects and Relations on the Web
Alexander Yates and Oren Etzioni
Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2007. Full Paper (PDF)
Machine Reading
Oren Etzioni, Michele Banko and Michael J Cafarella
AAAI Spring Symposium Series, 2007. Symposia Paper (PDF)
Open Information Extraction from the Web Open Information Extraction from the Web
Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni
International Joint Conference on Artificial Intelligence, 2007. Full Paper (PDF)
Machine Reading Machine Reading
Oren Etzioni, Michele Banko and Michael J Cafarella
AAAI Conference on Artificial Intelligence, 2006. Full Paper (PDF)
A Probabilistic Model of Redundancy in Information Extraction A Probabilistic Model of Redundancy in Information Extraction
Doug Downey, Oren Etzioni and Stephen Soderland
International Joint Conference on Artificial Intelligence, 2005. Full Paper (PDF)
    Distinguished Paper Award