|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.washington.cs.knowitall.extractor.Extractor<java.lang.String,java.lang.String>
edu.washington.cs.knowitall.extractor.SentenceExtractor
edu.washington.cs.knowitall.extractor.HtmlSentenceExtractor
public class HtmlSentenceExtractor
An Extractor
class for extracting NpChunkedSentence
objects from a
String
containing HTML. Is backed by an OpenNLP SentenceDetector
object.
Uses the code in HtmlUtils
to extract plain text from HTML.
Constructor Summary | |
---|---|
HtmlSentenceExtractor()
Constructs a new HtmlSentenceExtractor object using the default OpenNLP
SentenceDetector object, as returned by DefaultObjects.getDefaultSentenceDetector() . |
|
HtmlSentenceExtractor(opennlp.tools.sentdetect.SentenceDetector detector)
Constructs a new SentenceExtractor object using the given OpenNLP SentenceDetector
object. |
Method Summary | |
---|---|
protected java.lang.Iterable<java.lang.String> |
extractCandidates(java.lang.String htmlBlock)
Runs the OpenNLP SentenceDetector object on the given String source,
and returns an Iterable object over the detected sentences. |
static void |
main(java.lang.String[] args)
Extracts sentences from HTML passed via standard input, or through a file given as an argument to the program. |
Methods inherited from class edu.washington.cs.knowitall.extractor.SentenceExtractor |
---|
getSentenceDetector |
Methods inherited from class edu.washington.cs.knowitall.extractor.Extractor |
---|
addMapper, compose, extract, extract, getMappers |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HtmlSentenceExtractor(opennlp.tools.sentdetect.SentenceDetector detector)
SentenceExtractor
object using the given OpenNLP SentenceDetector
object.
detector
- public HtmlSentenceExtractor() throws java.io.IOException
HtmlSentenceExtractor
object using the default OpenNLP
SentenceDetector
object, as returned by DefaultObjects.getDefaultSentenceDetector()
.
java.io.IOException
Method Detail |
---|
protected java.lang.Iterable<java.lang.String> extractCandidates(java.lang.String htmlBlock)
SentenceExtractor
SentenceDetector
object on the given String
source,
and returns an Iterable
object over the detected sentences.
extractCandidates
in class SentenceExtractor
htmlBlock
- The source to extract from.
Iterable
object over the candidate extractions.public static void main(java.lang.String[] args) throws java.lang.Exception
BracketsRemover
mapper class,
and filters sentences using the SentenceEndFilter
, SentenceStartFilter
, and
SentenceLengthFilter
mapper classes. Prints the resulting sentences to standard output,
one sentence per line.
args
-
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |