Class DeltaIndex

java.lang.Object
org.eclipse.jgit.internal.storage.pack.DeltaIndex

public class DeltaIndex extends Object
Index of blocks in a source file.

The index can be passed a result buffer, and output an instruction sequence that transforms the source buffer used by the index into the result buffer. The instruction sequence can be executed by BinaryDelta to recreate the result buffer.

An index stores the entire contents of the source buffer, but also a table of block identities mapped to locations where the block appears in the source buffer. The mapping table uses 12 bytes for every 16 bytes of source buffer, and is therefore ~75% of the source buffer size. The overall index is ~1.75x the size of the source buffer. This relationship holds for any JVM, as only a constant number of objects are allocated per index. Callers can use the method getIndexSize() to obtain a reasonably accurate estimate of the complete heap space used by this index.

A DeltaIndex is thread-safe. Concurrent threads can use the same index to encode delta instructions for different result buffers.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final int
    Number of bytes in a block.
    private final long[]
    Pairs of block hash value and src offsets.
    private static final int
    Maximum number of positions to consider for a given content hash.
    private final byte[]
    Original source file that we indexed.
    private static final int[]
     
    private final int[]
    Pointers into the entries table, indexed by block hash.
    private final int
    Mask to make block hashes into an array index for table.
    private static final int[]
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    DeltaIndex(byte[] sourceBuffer)
    Construct an index from the source file.
  • Method Summary

    Modifier and Type
    Method
    Description
    private void
     
    private int
     
    void
    encode(OutputStream out, byte[] res)
    Generate a delta sequence to recreate the result buffer.
    boolean
    encode(OutputStream out, byte[] res, int deltaSizeLimit)
    Generate a delta sequence to recreate the result buffer.
    static long
    estimateIndexSize(int sourceLength)
    Estimate the size of an index for a given source.
    private static int
    fwdmatch(byte[] res, int resPtr, byte[] src, int srcPtr)
     
    long
    Get an estimate of the memory required by this index.
    long
    Get size of the source buffer this index has scanned.
    (package private) static int
    hashBlock(byte[] raw, int ptr)
     
    private static int
    keyOf(long ent)
     
    private static int
    negmatch(byte[] res, int resPtr, byte[] src, int srcPtr, int limit)
     
    private DeltaEncoder
    newEncoder(OutputStream out, long resSize, int limit)
     
    private static long
    sizeOf(byte[] b)
     
    private static long
    sizeOf(int[] b)
     
    private static long
    sizeOf(long[] b)
     
    private static int
    sizeOfArray(int entSize, int len)
     
    private static int
    step(int hash, byte toRemove, byte toAdd)
     
    private static int
    valOf(long ent)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • BLKSZ

      static final int BLKSZ
      Number of bytes in a block.
      See Also:
    • MAX_CHAIN_LENGTH

      private static final int MAX_CHAIN_LENGTH
      Maximum number of positions to consider for a given content hash.

      All positions with the same content hash are stored into a single chain. The chain size is capped to ensure delta encoding stays linear time at O(len_src + len_dst) rather than quadratic at O(len_src * len_dst).

      See Also:
    • src

      private final byte[] src
      Original source file that we indexed.
    • table

      private final int[] table
      Pointers into the entries table, indexed by block hash.

      A block hash is masked with tableMask to become the array index of this table. The value stored here is the first index within entries that starts the consecutive list of blocks with that same masked hash. If there are no matching blocks, 0 is stored instead.

      Note that this table is always a power of 2 in size, to support fast normalization of a block hash into an array index.

    • entries

      private final long[] entries
      Pairs of block hash value and src offsets.

      The very first entry in this table at index 0 is always empty, this is to allow fast evaluation when table has no values under any given slot. Remaining entries are pairs of integers, with the upper 32 bits holding the block hash and the lower 32 bits holding the source offset.

    • tableMask

      private final int tableMask
      Mask to make block hashes into an array index for table.
    • T

      private static final int[] T
    • U

      private static final int[] U
  • Constructor Details

    • DeltaIndex

      public DeltaIndex(byte[] sourceBuffer)
      Construct an index from the source file.
      Parameters:
      sourceBuffer - the source file's raw contents. The buffer will be held by the index instance to facilitate matching, and therefore must not be modified by the caller.
  • Method Details

    • estimateIndexSize

      public static long estimateIndexSize(int sourceLength)
      Estimate the size of an index for a given source.

      This is roughly a worst-case estimate. The actual index may be smaller.

      Parameters:
      sourceLength - length of the source, in bytes.
      Returns:
      estimated size. Approximately 1.75 * sourceLength.
    • countEntries

      private int countEntries(DeltaIndexScanner scan)
    • copyEntries

      private void copyEntries(DeltaIndexScanner scan)
    • getSourceSize

      public long getSourceSize()
      Get size of the source buffer this index has scanned.
      Returns:
      size of the source buffer this index has scanned.
    • getIndexSize

      public long getIndexSize()
      Get an estimate of the memory required by this index.
      Returns:
      an approximation of the number of bytes used by this index in memory. The size includes the cached source buffer size from getSourceSize(), as well as a rough approximation of JVM object overheads.
    • sizeOf

      private static long sizeOf(byte[] b)
    • sizeOf

      private static long sizeOf(int[] b)
    • sizeOf

      private static long sizeOf(long[] b)
    • sizeOfArray

      private static int sizeOfArray(int entSize, int len)
    • encode

      public void encode(OutputStream out, byte[] res) throws IOException
      Generate a delta sequence to recreate the result buffer.

      There is no limit on the size of the delta sequence created. This is the same as encode(out, res, 0).

      Parameters:
      out - stream to receive the delta instructions that can transform this index's source buffer into res. This stream should be buffered, as instructions are written directly to it in small bursts.
      res - the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.
      Throws:
      IOException - the output stream refused to write the instructions.
    • encode

      public boolean encode(OutputStream out, byte[] res, int deltaSizeLimit) throws IOException
      Generate a delta sequence to recreate the result buffer.
      Parameters:
      out - stream to receive the delta instructions that can transform this index's source buffer into res. This stream should be buffered, as instructions are written directly to it in small bursts. If the caller might need to discard the instructions (such as when deltaSizeLimit would be exceeded) the caller is responsible for discarding or rewinding the stream when this method returns false.
      res - the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.
      deltaSizeLimit - maximum number of bytes that the delta instructions can occupy. If the generated instructions would be longer than this amount, this method returns false. If 0, there is no limit on the length of delta created.
      Returns:
      true if the delta is smaller than deltaSizeLimit; false if the encoder aborted because the encoded delta instructions would be longer than deltaSizeLimit bytes.
      Throws:
      IOException - the output stream refused to write the instructions.
    • newEncoder

      private DeltaEncoder newEncoder(OutputStream out, long resSize, int limit) throws IOException
      Throws:
      IOException
    • fwdmatch

      private static int fwdmatch(byte[] res, int resPtr, byte[] src, int srcPtr)
    • negmatch

      private static int negmatch(byte[] res, int resPtr, byte[] src, int srcPtr, int limit)
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashBlock

      static int hashBlock(byte[] raw, int ptr)
    • step

      private static int step(int hash, byte toRemove, byte toAdd)
    • keyOf

      private static int keyOf(long ent)
    • valOf

      private static int valOf(long ent)