Package org.languagetool.tools
Class SynthDictionaryBuilder
java.lang.Object
org.languagetool.tools.DictionaryBuilder
org.languagetool.tools.SynthDictionaryBuilder
Create a Morfologik binary synthesizer dictionary from plain text data.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final String
It makes sense to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) File
collectTags
(File plainTextDictFile) getIgnoreItems
(File file) private @Nullable Pattern
getPosTagIgnoreRegex
(File infoFile) private File
getTagFile
(File tempFile) static void
private File
reverseLineContent
(File plainTextDictFile, Set<String> itemsToBeIgnored, Pattern ignorePosRegex) private void
writePosTagsToFile
(File plainTextDictFile, File tagFile) Methods inherited from class org.languagetool.tools.DictionaryBuilder
addFreqData, buildDict, buildFSA, convertTabToSeparator, getOption, getOutputFilename, hasOption, isOptionTrue, readFreqList, setOutputFilename
-
Field Details
-
POLISH_IGNORE_REGEX
It makes sense to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. Probably nobody will ever use them:- See Also:
-
-
Constructor Details
-
SynthDictionaryBuilder
SynthDictionaryBuilder(File infoFile) throws IOException - Throws:
IOException
-
-
Method Details
-
main
- Throws:
Exception
-
build
- Throws:
Exception
-
getIgnoreItems
- Throws:
FileNotFoundException
-
getPosTagIgnoreRegex
-
reverseLineContent
private File reverseLineContent(File plainTextDictFile, Set<String> itemsToBeIgnored, Pattern ignorePosRegex) throws IOException - Throws:
IOException
-
getTagFile
-
writePosTagsToFile
- Throws:
IOException
-
collectTags
- Throws:
IOException
-