Package morfologik.stemming
Class WordData
java.lang.Object
morfologik.stemming.WordData
- All Implemented Interfaces:
Cloneable
Stem and tag data associated with a given word.
Instances of this class are reused and mutable (values
returned from
getStem()
, getWord()
and other related methods change on subsequent calls to
DictionaryLookup
class that returned a given
instance of WordData
.
If you need a copy of the
stem or tag data for a given word, you have to create a custom buffer
yourself and copy the associated data, perform clone()
or create
strings (they are immutable) using getStem()
and then
CharSequence.toString()
.
For reasons above it makes no sense to use instances
of this class in associative containers or lists. In fact,
both equals(Object)
and hashCode()
are overridden and throw
exceptions to prevent accidental damage.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final String
Error information if somebody puts us in a Java collection.private final CharsetDecoder
Character encoding in internal buffers.(package private) ByteBuffer
Byte buffer holding stem data.private CharBuffer
Character sequence after convertingstemBuffer
usingdecoder
.(package private) ByteBuffer
Byte buffer holding tag data.private CharBuffer
(package private) ByteBuffer
Byte buffer holding the inflected word form data.private CharSequence
Inflected word form data. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionclone()
Declare a covariant ofObject.clone()
that returns a deep copy of this object.private CharSequence
Clone char sequences only if not immutable.boolean
getStem()
getStemBytes
(ByteBuffer target) Copy the stem's binary data (no charset decoding) to a custom byte buffer.getTag()
getTagBytes
(ByteBuffer target) Copy the tag's binary data (no charset decoding) to a custom byte buffer.getWord()
getWordBytes
(ByteBuffer target) Copy the inflected word's binary data (no charset decoding) to a custom byte buffer.int
hashCode()
toString()
(package private) void
update
(ByteBuffer wordBuffer, CharSequence word)
-
Field Details
-
COLLECTIONS_ERROR_MESSAGE
Error information if somebody puts us in a Java collection.- See Also:
-
decoder
Character encoding in internal buffers. -
wordCharSequence
Inflected word form data. -
stemCharSequence
Character sequence after convertingstemBuffer
usingdecoder
. -
tagCharSequence
-
wordBuffer
ByteBuffer wordBufferByte buffer holding the inflected word form data. -
stemBuffer
ByteBuffer stemBufferByte buffer holding stem data. -
tagBuffer
ByteBuffer tagBufferByte buffer holding tag data.
-
-
Constructor Details
-
WordData
WordData(CharsetDecoder decoder) Package scope constructor. -
WordData
A constructor for tests only.
-
-
Method Details
-
getStemBytes
Copy the stem's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.- Parameters:
target
- Target byte buffer to copy the stem buffer to ornull
if a new buffer should be allocated.- Returns:
- Returns
target
or the new reallocated buffer.
-
getTagBytes
Copy the tag's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.- Parameters:
target
- Target byte buffer to copy the tag buffer to ornull
if a new buffer should be allocated.- Returns:
- Returns
target
or the new reallocated buffer.
-
getWordBytes
Copy the inflected word's binary data (no charset decoding) to a custom byte buffer. The buffer is cleared prior to copying and flipped for reading upon returning from this method. If the buffer is null or not large enough to hold the result, a new buffer is allocated.- Parameters:
target
- Target byte buffer to copy the word buffer to ornull
if a new buffer should be allocated.- Returns:
- Returns
target
or the new reallocated buffer.
-
getTag
- Returns:
- Return tag data decoded to a character sequence or
null
if no associated tag data exists.
-
getStem
- Returns:
- Return stem data decoded to a character sequence or
null
if no associated stem data exists.
-
getWord
- Returns:
- Return inflected word form data. Usually the parameter passed to
DictionaryLookup.lookup(CharSequence)
.
-
equals
-
hashCode
public int hashCode() -
toString
-
clone
Declare a covariant ofObject.clone()
that returns a deep copy of this object. The content of all internal buffers is copied. -
cloneCharSequence
Clone char sequences only if not immutable. -
update
-