Class MultiWordChunker2
java.lang.Object
org.languagetool.tagging.disambiguation.AbstractDisambiguator
org.languagetool.tagging.disambiguation.MultiWordChunker2
- All Implemented Interfaces:
Disambiguator
Multiword tagger-chunker.
Note: currently does not support:
- overlapping tagging (first matching multiword entry wins)
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionMultiWordChunker2
(String filename) MultiWordChunker2
(String filename, boolean allowFirstCapitalized) -
Method Summary
Modifier and TypeMethodDescriptiondisambiguate
(AnalyzedSentence input) Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.private MultiWordChunker2.MultiWordEntry
findMultiwordEntry
(AnalyzedTokenReadings[] inputTokens, int startingPosition, List<MultiWordChunker2.MultiWordEntry> multiwordItems) protected String
formatPosTag
(String posTag, int position, int multiwordLength) Override this method if you want format POS tag differentlyprivate boolean
isMatching
(AnalyzedTokenReadings[] inputTokens, int startingPosition, MultiWordChunker2.MultiWordEntry multiWordEntry) private void
lazyInit()
loadWords
(InputStream stream) protected boolean
matches
(String matchText, AnalyzedTokenReadings inputTokens) protected AnalyzedTokenReadings
prepareNewReading
(String tokens, String tok, AnalyzedTokenReadings token, String tag) private AnalyzedTokenReadings
setAndAnnotate
(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) void
setRemoveOtherReadings
(boolean removeOtherReadings) void
setWrapTag
(boolean wrapTag) Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
Field Details
-
WRAP_TAG
- See Also:
-
filename
-
allowFirstCapitalized
private final boolean allowFirstCapitalized -
removeOtherReadings
private boolean removeOtherReadings -
tagFormat
-
tokenToPosTagMap
-
-
Constructor Details
-
MultiWordChunker2
- Parameters:
filename
- file text with multiwords and tags
-
MultiWordChunker2
- Parameters:
filename
- file text with multiwords and tagsallowFirstCapitalized
- if set totrue
, first word of the multiword can be capitalized
-
-
Method Details
-
setRemoveOtherReadings
public void setRemoveOtherReadings(boolean removeOtherReadings) - Parameters:
removeOtherReadings
- If true and multiword matches other readings will be removed
-
setWrapTag
public void setWrapTag(boolean wrapTag) - Parameters:
wrapTag
- If true the tag will be wrapped with < and >
-
formatPosTag
Override this method if you want format POS tag differently- Parameters:
posTag
- POS tag for the multiwordposition
- Position of the token in the multiword- Returns:
- Returns formatted POS tag for the multiword
-
lazyInit
private void lazyInit() -
disambiguate
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input
- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
findMultiwordEntry
private MultiWordChunker2.MultiWordEntry findMultiwordEntry(AnalyzedTokenReadings[] inputTokens, int startingPosition, List<MultiWordChunker2.MultiWordEntry> multiwordItems) -
isMatching
private boolean isMatching(AnalyzedTokenReadings[] inputTokens, int startingPosition, MultiWordChunker2.MultiWordEntry multiWordEntry) -
matches
-
prepareNewReading
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag) -
setAndAnnotate
private AnalyzedTokenReadings setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) -
loadWords
-