Class TrimSuffixEncoder

java.lang.Object
morfologik.stemming.TrimSuffixEncoder
All Implemented Interfaces:
ISequenceEncoder

public class TrimSuffixEncoder extends Object implements ISequenceEncoder
Encodes dst relative to src by trimming whatever non-equal suffix src has. The output code is (bytes):
 {K}{suffix}
 
where (K - 'A') bytes should be trimmed from the end of src and then the suffix should be appended to the resulting byte sequence.

Examples:

 src: foo
 dst: foobar
 encoded: Abar
 
 src: foo
 dst: bar
 encoded: Dbar
 
  • Field Details

    • REMOVE_EVERYTHING

      private static final int REMOVE_EVERYTHING
      Maximum encodable single-byte code.
      See Also:
  • Constructor Details

    • TrimSuffixEncoder

      public TrimSuffixEncoder()
  • Method Details

    • encode

      public ByteBuffer encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
      Description copied from interface: ISequenceEncoder
      Encodes target relative to source, optionally reusing the provided ByteBuffer.
      Specified by:
      encode in interface ISequenceEncoder
      Parameters:
      reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
      source - The source byte sequence.
      target - The target byte sequence to encode relative to source
      Returns:
      Returns the ByteBuffer with encoded target.
    • prefixBytes

      public int prefixBytes()
      Description copied from interface: ISequenceEncoder
      The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.
      Specified by:
      prefixBytes in interface ISequenceEncoder
      See Also:
      • "https://github.com/morfologik/morfologik-stemming/issues/85"
    • decode

      public ByteBuffer decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
      Description copied from interface: ISequenceEncoder
      Decodes encoded relative to source, optionally reusing the provided ByteBuffer.
      Specified by:
      decode in interface ISequenceEncoder
      Parameters:
      reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
      source - The source byte sequence.
      encoded - The previously encoded byte sequence.
      Returns:
      Returns the ByteBuffer with decoded target.
    • toString

      public String toString()
      Overrides:
      toString in class Object