Class DeltaIndex
The index can be passed a result buffer, and output an instruction sequence
that transforms the source buffer used by the index into the result buffer.
The instruction sequence can be executed by
BinaryDelta
to recreate the
result buffer.
An index stores the entire contents of the source buffer, but also a table of
block identities mapped to locations where the block appears in the source
buffer. The mapping table uses 12 bytes for every 16 bytes of source buffer,
and is therefore ~75% of the source buffer size. The overall index is ~1.75x
the size of the source buffer. This relationship holds for any JVM, as only a
constant number of objects are allocated per index. Callers can use the
method getIndexSize()
to obtain a reasonably accurate estimate of
the complete heap space used by this index.
A DeltaIndex
is thread-safe. Concurrent threads can use the same
index to encode delta instructions for different result buffers.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final int
Number of bytes in a block.private final long[]
Pairs of block hash value andsrc
offsets.private static final int
Maximum number of positions to consider for a given content hash.private final byte[]
Original source file that we indexed.private static final int[]
private final int[]
Pointers into theentries
table, indexed by block hash.private final int
Mask to make block hashes into an array index fortable
.private static final int[]
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
copyEntries
(DeltaIndexScanner scan) private int
void
encode
(OutputStream out, byte[] res) Generate a delta sequence to recreate the result buffer.boolean
encode
(OutputStream out, byte[] res, int deltaSizeLimit) Generate a delta sequence to recreate the result buffer.static long
estimateIndexSize
(int sourceLength) Estimate the size of an index for a given source.private static int
fwdmatch
(byte[] res, int resPtr, byte[] src, int srcPtr) long
Get an estimate of the memory required by this index.long
Get size of the source buffer this index has scanned.(package private) static int
hashBlock
(byte[] raw, int ptr) private static int
keyOf
(long ent) private static int
negmatch
(byte[] res, int resPtr, byte[] src, int srcPtr, int limit) private DeltaEncoder
newEncoder
(OutputStream out, long resSize, int limit) private static long
sizeOf
(byte[] b) private static long
sizeOf
(int[] b) private static long
sizeOf
(long[] b) private static int
sizeOfArray
(int entSize, int len) private static int
step
(int hash, byte toRemove, byte toAdd) toString()
private static int
valOf
(long ent)
-
Field Details
-
BLKSZ
static final int BLKSZNumber of bytes in a block.- See Also:
-
MAX_CHAIN_LENGTH
private static final int MAX_CHAIN_LENGTHMaximum number of positions to consider for a given content hash.All positions with the same content hash are stored into a single chain. The chain size is capped to ensure delta encoding stays linear time at O(len_src + len_dst) rather than quadratic at O(len_src * len_dst).
- See Also:
-
src
private final byte[] srcOriginal source file that we indexed. -
table
private final int[] tablePointers into theentries
table, indexed by block hash.A block hash is masked with
tableMask
to become the array index of this table. The value stored here is the first index withinentries
that starts the consecutive list of blocks with that same masked hash. If there are no matching blocks, 0 is stored instead.Note that this table is always a power of 2 in size, to support fast normalization of a block hash into an array index.
-
entries
private final long[] entriesPairs of block hash value andsrc
offsets.The very first entry in this table at index 0 is always empty, this is to allow fast evaluation when
table
has no values under any given slot. Remaining entries are pairs of integers, with the upper 32 bits holding the block hash and the lower 32 bits holding the source offset. -
tableMask
private final int tableMaskMask to make block hashes into an array index fortable
. -
T
private static final int[] T -
U
private static final int[] U
-
-
Constructor Details
-
DeltaIndex
public DeltaIndex(byte[] sourceBuffer) Construct an index from the source file.- Parameters:
sourceBuffer
- the source file's raw contents. The buffer will be held by the index instance to facilitate matching, and therefore must not be modified by the caller.
-
-
Method Details
-
estimateIndexSize
public static long estimateIndexSize(int sourceLength) Estimate the size of an index for a given source.This is roughly a worst-case estimate. The actual index may be smaller.
- Parameters:
sourceLength
- length of the source, in bytes.- Returns:
- estimated size. Approximately
1.75 * sourceLength
.
-
countEntries
-
copyEntries
-
getSourceSize
public long getSourceSize()Get size of the source buffer this index has scanned.- Returns:
- size of the source buffer this index has scanned.
-
getIndexSize
public long getIndexSize()Get an estimate of the memory required by this index.- Returns:
- an approximation of the number of bytes used by this index in
memory. The size includes the cached source buffer size from
getSourceSize()
, as well as a rough approximation of JVM object overheads.
-
sizeOf
private static long sizeOf(byte[] b) -
sizeOf
private static long sizeOf(int[] b) -
sizeOf
private static long sizeOf(long[] b) -
sizeOfArray
private static int sizeOfArray(int entSize, int len) -
encode
Generate a delta sequence to recreate the result buffer.There is no limit on the size of the delta sequence created. This is the same as
encode(out, res, 0)
.- Parameters:
out
- stream to receive the delta instructions that can transform this index's source buffer intores
. This stream should be buffered, as instructions are written directly to it in small bursts.res
- the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.- Throws:
IOException
- the output stream refused to write the instructions.
-
encode
Generate a delta sequence to recreate the result buffer.- Parameters:
out
- stream to receive the delta instructions that can transform this index's source buffer intores
. This stream should be buffered, as instructions are written directly to it in small bursts. If the caller might need to discard the instructions (such as when deltaSizeLimit would be exceeded) the caller is responsible for discarding or rewinding the stream when this method returns false.res
- the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.deltaSizeLimit
- maximum number of bytes that the delta instructions can occupy. If the generated instructions would be longer than this amount, this method returns false. If 0, there is no limit on the length of delta created.- Returns:
- true if the delta is smaller than deltaSizeLimit; false if the encoder aborted because the encoded delta instructions would be longer than deltaSizeLimit bytes.
- Throws:
IOException
- the output stream refused to write the instructions.
-
newEncoder
- Throws:
IOException
-
fwdmatch
private static int fwdmatch(byte[] res, int resPtr, byte[] src, int srcPtr) -
negmatch
private static int negmatch(byte[] res, int resPtr, byte[] src, int srcPtr, int limit) -
toString
-
hashBlock
static int hashBlock(byte[] raw, int ptr) -
step
private static int step(int hash, byte toRemove, byte toAdd) -
keyOf
private static int keyOf(long ent) -
valOf
private static int valOf(long ent)
-