Class PackParser
- Direct Known Subclasses:
DfsPackParser
,FsckPackParser
,ObjectDirectoryPackParser
ObjectInserter
.
Applications can acquire an instance of a parser from ObjectInserter's
ObjectInserter.newPackParser(InputStream)
method.
Implementations of ObjectInserter
should
subclass this type and provide their own logic for the various on*()
event methods declared to be abstract.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private static class
private class
static class
Type and size information about an object in the database buffer.static enum
Location data is being obtained from.static class
Information about an unresolved delta in this pack stream. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
private ObjectIdOwnerMap
<PackParser.DeltaChain> private LongMap
<PackParser.UnresolvedDelta> private ObjectIdSubclassMap
<ObjectId> Objects referenced by their name from deltas, that aren't in this pack.(package private) int
private long
Position in the input stream ofbuf[0]
.private int
(package private) byte[]
private static final int
Size of the internal stream buffer.private boolean
private boolean
private BlockList
<PackedObjectInfo> Objects need to be double-checked for collision after indexing.private int
private PackedObjectInfo[]
private int
private boolean
private long
private byte[]
private InputStream
private PackParser.InflaterStream
private String
Message to protect the pack data from garbage collection.private long
Git object size limitprivate boolean
private ObjectIdSubclassMap
<ObjectId> Every object contained within the incoming pack.private ObjectChecker
private final ObjectDatabase
Object database used for loading existing objects.private final SHA1
private MessageDigest
private ObjectReader
private final ReceivedPackStatistics.Builder
private byte[]
private final MutableObjectId
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
PackParser
(ObjectDatabase odb, InputStream src) Initialize a pack parser. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
protected byte[]
buffer()
Get a temporary byte array for use by the caller.protected abstract boolean
checkCRC
(int oldCRC) Check the current CRC matches the expected value.private final void
checkIfTooLarge
(int typeCode, long size) private void
private void
checkObjectCollision
(AnyObjectId obj, int type, byte[] data, long sizeBeforeInflating) private void
private void
endInput()
(package private) int
fill
(PackParser.Source src, int need) private PackParser.UnresolvedDelta
Get set of objects the incoming pack assumed for delta purposesGet the message to record with the pack lock.Get the new objects that were sent by the usergetObject
(int nth) Get the information about the requested object.int
Get the number of objects in the stream.long
Get the size of the newly created pack.Returns the statistics of the parsed pack.Get all of the objects, sorted by their name.private void
growEntries
(int extraObjects) private void
private InputStream
inflate
(PackParser.Source src, long inflatedSize) private byte[]
inflateAndReturn
(PackParser.Source src, long inflatedSize) private void
inflateAndSkip
(PackParser.Source src, long inflatedSize) boolean
Whether a thin pack (missing base objects) is permitted.boolean
Whether the EOF should be read from the input after the footer.protected boolean
Whether received objects are verified to prevent collisions.boolean
Whether there is data expected after the pack footer.private boolean
protected PackedObjectInfo
newInfo
(AnyObjectId id, PackParser.UnresolvedDelta delta, ObjectId deltaBase) Construct a PackedObjectInfo instance for this parser.protected abstract boolean
onAppendBase
(int typeCode, byte[] data, PackedObjectInfo info) Provide the implementation with a base that was outside of the pack.protected abstract void
onBeginOfsDelta
(long deltaStreamPosition, long baseStreamPosition, long inflatedSize) Event notifying start of a delta referencing its base by offset.protected abstract void
onBeginRefDelta
(long deltaStreamPosition, AnyObjectId baseId, long inflatedSize) Event notifying start of a delta referencing its base by ObjectId.protected abstract void
onBeginWholeObject
(long streamPosition, int type, long inflatedSize) Event notifying the start of an object stored whole (not as a delta).protected PackParser.UnresolvedDelta
Event notifying the current object.protected abstract void
Event indicating a thin pack has been completely processed.protected abstract void
Event notifying the current object.protected abstract void
onInflatedObjectData
(PackedObjectInfo obj, int typeCode, byte[] data) Invoked for commits, trees, tags, and small blobs.protected abstract void
onObjectData
(PackParser.Source src, byte[] raw, int pos, int len) Store (and/or checksum) a portion of an object's data.protected abstract void
onObjectHeader
(PackParser.Source src, byte[] raw, int pos, int len) Store (and/or checksum) an object header.protected abstract void
onPackFooter
(byte[] hash) Provide the implementation with the original stream's pack footer.protected abstract void
onPackHeader
(long objCnt) Provide the implementation with the original stream's pack header.protected abstract void
onStoreStream
(byte[] raw, int pos, int len) Store bytes received from the raw stream.private PackParser.ObjectTypeAndSize
private PackParser.ObjectTypeAndSize
final PackLock
parse
(ProgressMonitor progress) Parse the pack stream.parse
(ProgressMonitor receiving, ProgressMonitor resolving) Parse the pack stream.private void
processDeltas
(ProgressMonitor resolving) protected abstract int
readDatabase
(byte[] dst, int pos, int cnt) Read from the database's current position into the buffer.private int
protected PackParser.ObjectTypeAndSize
Read the header of the current object.private void
private void
private PackParser.UnresolvedDelta
private void
resolveDeltas
(ProgressMonitor progress) private void
resolveDeltas
(PackedObjectInfo oe, ProgressMonitor progress) private void
resolveDeltas
(PackParser.DeltaVisit visit, int type, PackParser.ObjectTypeAndSize info, ProgressMonitor progress) private void
private static PackParser.UnresolvedDelta
protected abstract PackParser.ObjectTypeAndSize
Reposition the database to re-read a previously stored object.protected abstract PackParser.ObjectTypeAndSize
Reposition the database to re-read a previously stored object.void
setAllowThin
(boolean allow) Configure this index pack instance to allow a thin pack.void
setCheckEofAfterPackFooter
(boolean b) Ensure EOF is read from the input stream after the footer.protected void
setCheckObjectCollisions
(boolean check) Enable checking for collisions with existing objects.void
setExpectDataAfterPackFooter
(boolean e) Set if there is additional data in InputStream after pack.protected void
setExpectedObjectCount
(long expectedObjectCount) Set the expected number of objects in the pack stream.void
setLockMessage
(String msg) Set the lock message for the incoming pack data.void
setMaxObjectSizeLimit
(long limit) Set the maximum allowed Git object size.void
setNeedBaseObjectIds
(boolean b) Configure this index pack instance to keep track of the objects assumed for delta bases.void
setNeedNewObjectIds
(boolean b) Configure this index pack instance to keep track of new objects.void
Configure the checker used to validate received objects.void
setObjectChecking
(boolean on) Configure the checker used to validate received objects.private long
private void
sync()
(package private) void
use
(int cnt) protected void
verifySafeObject
(AnyObjectId id, int type, byte[] data) Verify the integrity of the object.private void
whole
(long pos, int type, long sz)
-
Field Details
-
BUFFER_SIZE
private static final int BUFFER_SIZESize of the internal stream buffer.- See Also:
-
objectDatabase
Object database used for loading existing objects. -
inflater
-
tempBuffer
private byte[] tempBuffer -
hdrBuf
private byte[] hdrBuf -
objectHasher
-
tempObjectId
-
in
-
buf
byte[] buf -
bBase
private long bBasePosition in the input stream ofbuf[0]
. -
bOffset
private int bOffset -
bAvail
int bAvail -
objCheck
-
allowThin
private boolean allowThin -
checkObjectCollisions
private boolean checkObjectCollisions -
needBaseObjectIds
private boolean needBaseObjectIds -
expectedObjectCount
private long expectedObjectCount -
entries
-
newObjectIds
Every object contained within the incoming pack.This is a subset of
entries
, as thin packs can add additional objects toentries
by copying already existing objects from the repository onto the end of the thin pack to make it self-contained. -
deltaCount
private int deltaCount -
entryCount
private int entryCount -
baseById
-
baseObjectIds
Objects referenced by their name from deltas, that aren't in this pack.This is the set of objects that were copied onto the end of this pack to make it complete. These objects were not transmitted by the remote peer, but instead were assumed to already exist in the local repository.
-
baseByPos
-
collisionCheckObjs
Objects need to be double-checked for collision after indexing. -
packDigest
-
readCurs
-
lockMessage
Message to protect the pack data from garbage collection. -
maxObjectSizeLimit
private long maxObjectSizeLimitGit object size limit -
stats
-
-
Constructor Details
-
PackParser
Initialize a pack parser.- Parameters:
odb
- database the parser will write its objects into.src
- the stream the parser will read.
-
-
Method Details
-
isAllowThin
public boolean isAllowThin()Whether a thin pack (missing base objects) is permitted.- Returns:
true
if a thin pack (missing base objects) is permitted.
-
setAllowThin
public void setAllowThin(boolean allow) Configure this index pack instance to allow a thin pack.Thin packs are sometimes used during network transfers to allow a delta to be sent without a base object. Such packs are not permitted on disk.
- Parameters:
allow
- true to enable a thin pack.
-
isCheckObjectCollisions
protected boolean isCheckObjectCollisions()Whether received objects are verified to prevent collisions.- Returns:
- if true received objects are verified to prevent collisions.
- Since:
- 4.1
-
setCheckObjectCollisions
protected void setCheckObjectCollisions(boolean check) Enable checking for collisions with existing objects.By default PackParser looks for each received object in the repository. If the object already exists, the existing object is compared byte-for-byte with the newly received copy to ensure they are identical. The receive is aborted with an exception if any byte differs. This check is necessary to prevent an evil attacker from supplying a replacement object into this repository in the event that a discovery enabling SHA-1 collisions is made.
This check may be very costly to perform, and some repositories may have other ways to segregate newly received object data. The check is enabled by default, but can be explicitly disabled if the implementation can provide the same guarantee, or is willing to accept the risks associated with bypassing the check.
- Parameters:
check
- true to enable collision checking (strongly encouraged).- Since:
- 4.1
-
setNeedNewObjectIds
public void setNeedNewObjectIds(boolean b) Configure this index pack instance to keep track of new objects.By default an index pack doesn't save the new objects that were created when it was instantiated. Setting this flag to
true
allows the caller to usegetNewObjectIds()
to retrieve that list.- Parameters:
b
-true
to enable keeping track of new objects.
-
needNewObjectIds
private boolean needNewObjectIds() -
setNeedBaseObjectIds
public void setNeedBaseObjectIds(boolean b) Configure this index pack instance to keep track of the objects assumed for delta bases.By default an index pack doesn't save the objects that were used as delta bases. Setting this flag to
true
will allow the caller to usegetBaseObjectIds()
to retrieve that list.- Parameters:
b
-true
to enable keeping track of delta bases.
-
getNewObjectIds
Get the new objects that were sent by the user- Returns:
- the new objects that were sent by the user
-
getBaseObjectIds
Get set of objects the incoming pack assumed for delta purposes- Returns:
- set of objects the incoming pack assumed for delta purposes
-
setObjectChecker
Configure the checker used to validate received objects.Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.
- Parameters:
oc
- the checker instance; null to disable object checking.
-
setObjectChecking
public void setObjectChecking(boolean on) Configure the checker used to validate received objects.Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.
This is shorthand for:
setObjectChecker(on ? new ObjectChecker() : null);
- Parameters:
on
- true to enable the default checker; false to disable it.
-
getLockMessage
Get the message to record with the pack lock.- Returns:
- the message to record with the pack lock.
-
setLockMessage
Set the lock message for the incoming pack data.- Parameters:
msg
- if not null, the message to associate with the incoming data while it is locked to prevent garbage collection.
-
setMaxObjectSizeLimit
public void setMaxObjectSizeLimit(long limit) Set the maximum allowed Git object size.If an object is larger than the given size the pack-parsing will throw an exception aborting the parsing.
- Parameters:
limit
- the Git object size limit. If zero then there is not limit.
-
getObjectCount
public int getObjectCount()Get the number of objects in the stream.The object count is only available after
parse(ProgressMonitor)
has returned. The count may have been increased if the stream was a thin pack, and missing bases objects were appending onto it by the subclass.- Returns:
- number of objects parsed out of the stream.
-
getObject
Get the information about the requested object.The object information is only available after
parse(ProgressMonitor)
has returned.- Parameters:
nth
- index of the object in the stream. Must be between 0 andgetObjectCount()
-1.- Returns:
- the object information.
-
getSortedObjectList
Get all of the objects, sorted by their name.The object information is only available after
parse(ProgressMonitor)
has returned.To maintain lower memory usage and good runtime performance, this method sorts the objects in-place and therefore impacts the ordering presented by
getObject(int)
.- Parameters:
cmp
- comparison function, if null objects are stored by ObjectId.- Returns:
- sorted list of objects in this pack stream.
-
getPackSize
public long getPackSize()Get the size of the newly created pack.This will also include the pack index size if an index was created. This method should only be called after pack parsing is finished.
- Returns:
- the pack size (including the index size) or -1 if the size cannot be determined
- Since:
- 3.3
-
getReceivedPackStatistics
Returns the statistics of the parsed pack.This should only be called after pack parsing is finished.
- Returns:
ReceivedPackStatistics
- Since:
- 4.6
-
parse
Parse the pack stream.- Parameters:
progress
- callback to provide progress feedback during parsing. If null,NullProgressMonitor
will be used.- Returns:
- the pack lock, if one was requested by setting
setLockMessage(String)
. - Throws:
IOException
- the stream is malformed, or contains corrupt objects.- Since:
- 3.0
-
parse
Parse the pack stream.- Parameters:
receiving
- receives progress feedback during the initial receiving objects phase. If null,NullProgressMonitor
will be used.resolving
- receives progress feedback during the resolving objects phase.- Returns:
- the pack lock, if one was requested by setting
setLockMessage(String)
. - Throws:
IOException
- the stream is malformed, or contains corrupt objects.- Since:
- 3.0
-
processDeltas
- Throws:
IOException
-
resolveDeltas
- Throws:
IOException
-
resolveDeltas
- Throws:
IOException
-
resolveDeltas
private void resolveDeltas(PackParser.DeltaVisit visit, int type, PackParser.ObjectTypeAndSize info, ProgressMonitor progress) throws IOException - Throws:
IOException
-
checkIfTooLarge
- Throws:
IOException
-
readObjectHeader
protected PackParser.ObjectTypeAndSize readObjectHeader(PackParser.ObjectTypeAndSize info) throws IOException Read the header of the current object.After the header has been parsed, this method automatically invokes
onObjectHeader(Source, byte[], int, int)
to allow the implementation to update its internal checksums for the bytes read.When this method returns the database will be positioned on the first byte of the deflated data stream.
- Parameters:
info
- the info object to populate.- Returns:
info
, after populating.- Throws:
IOException
- the size cannot be read.
-
removeBaseById
-
reverse
-
firstChildOf
-
resolveDeltasWithExternalBases
- Throws:
IOException
-
growEntries
private void growEntries(int extraObjects) -
readPackHeader
- Throws:
IOException
-
endInput
private void endInput() -
indexOneObject
- Throws:
IOException
-
whole
- Throws:
IOException
-
verifySafeObject
protected void verifySafeObject(AnyObjectId id, int type, byte[] data) throws CorruptObjectException Verify the integrity of the object.- Parameters:
id
- identity of the object to be checked.type
- the type of the object.data
- raw content of the object.- Throws:
CorruptObjectException
- Since:
- 4.9
-
checkObjectCollision
- Throws:
IOException
-
checkObjectCollision
- Throws:
IOException
-
checkObjectCollision
private void checkObjectCollision(AnyObjectId obj, int type, byte[] data, long sizeBeforeInflating) throws IOException - Throws:
IOException
-
streamPosition
private long streamPosition()- Returns:
- current position of the input stream being parsed.
-
openDatabase
private PackParser.ObjectTypeAndSize openDatabase(PackedObjectInfo obj, PackParser.ObjectTypeAndSize info) throws IOException - Throws:
IOException
-
openDatabase
private PackParser.ObjectTypeAndSize openDatabase(PackParser.UnresolvedDelta delta, PackParser.ObjectTypeAndSize info) throws IOException - Throws:
IOException
-
readFrom
- Throws:
IOException
-
use
void use(int cnt) -
fill
- Throws:
IOException
-
sync
- Throws:
IOException
-
buffer
protected byte[] buffer()Get a temporary byte array for use by the caller.- Returns:
- a temporary byte array for use by the caller.
-
newInfo
protected PackedObjectInfo newInfo(AnyObjectId id, PackParser.UnresolvedDelta delta, ObjectId deltaBase) Construct a PackedObjectInfo instance for this parser.- Parameters:
id
- identity of the object to be tracked.delta
- if the object was previously an unresolved delta, this is the delta object that was tracking it. Otherwise null.deltaBase
- if the object was previously an unresolved delta, this is the ObjectId of the base of the delta. The base may be outside of the pack stream if the stream was a thin-pack.- Returns:
- info object containing this object's data.
-
setExpectedObjectCount
protected void setExpectedObjectCount(long expectedObjectCount) Set the expected number of objects in the pack stream.The object count in the pack header is not always correct for some Dfs pack files. e.g. INSERT pack always assume 1 object in the header since the actual object count is unknown when the pack is written.
If external implementation wants to overwrite the expectedObjectCount, they should call this method during
onPackHeader(long)
.- Parameters:
expectedObjectCount
- a long.- Since:
- 4.9
-
onStoreStream
Store bytes received from the raw stream.This method is invoked during
parse(ProgressMonitor)
as data is consumed from the incoming stream. Implementors may use this event to archive the raw incoming stream to the destination repository in large chunks, without paying attention to object boundaries.The only component of the pack not supplied to this method is the last 20 bytes of the pack that comprise the trailing SHA-1 checksum. Those are passed to
onPackFooter(byte[])
.- Parameters:
raw
- buffer to copy data out of.pos
- first offset within the buffer that is valid.len
- number of bytes in the buffer that are valid.- Throws:
IOException
- the stream cannot be archived.
-
onObjectHeader
protected abstract void onObjectHeader(PackParser.Source src, byte[] raw, int pos, int len) throws IOException Store (and/or checksum) an object header.Invoked after any of the
onBegin()
events. The entire header is supplied in a single invocation, before any object data is supplied.- Parameters:
src
- where the data came fromraw
- buffer to read data from.pos
- first offset within buffer that is valid.len
- number of bytes in buffer that are valid.- Throws:
IOException
- the stream cannot be archived.
-
onObjectData
protected abstract void onObjectData(PackParser.Source src, byte[] raw, int pos, int len) throws IOException Store (and/or checksum) a portion of an object's data.This method may be invoked multiple times per object, depending on the size of the object, the size of the parser's internal read buffer, and the alignment of the object relative to the read buffer.
Invoked after
onObjectHeader(Source, byte[], int, int)
.- Parameters:
src
- where the data came fromraw
- buffer to read data from.pos
- first offset within buffer that is valid.len
- number of bytes in buffer that are valid.- Throws:
IOException
- the stream cannot be archived.
-
onInflatedObjectData
protected abstract void onInflatedObjectData(PackedObjectInfo obj, int typeCode, byte[] data) throws IOException Invoked for commits, trees, tags, and small blobs.- Parameters:
obj
- the object info, populated.typeCode
- the type of the object.data
- inflated data for the object.- Throws:
IOException
- the object cannot be archived.
-
onPackHeader
Provide the implementation with the original stream's pack header.- Parameters:
objCnt
- number of objects expected in the stream.- Throws:
IOException
- the implementation refuses to work with this many objects.
-
onAppendBase
protected abstract boolean onAppendBase(int typeCode, byte[] data, PackedObjectInfo info) throws IOException Provide the implementation with a base that was outside of the pack.This event only occurs on a thin pack for base objects that were outside of the pack and came from the local repository. Usually an implementation uses this event to compress the base and append it onto the end of the pack, so the pack stays self-contained.
- Parameters:
typeCode
- type of the base object.data
- complete content of the base object.info
- packed object information for this base. Implementors must populate the CRC and offset members if returning true.- Returns:
- true if the
info
should be included in the object list returned bygetSortedObjectList(Comparator)
, false if it should not be included. - Throws:
IOException
- the base could not be included into the pack.
-
onEndThinPack
Event indicating a thin pack has been completely processed.This event is invoked only if a thin pack has delta references to objects external from the pack. The event is called after all of those deltas have been resolved.
- Throws:
IOException
- the pack cannot be archived.
-
seekDatabase
protected abstract PackParser.ObjectTypeAndSize seekDatabase(PackedObjectInfo obj, PackParser.ObjectTypeAndSize info) throws IOException Reposition the database to re-read a previously stored object.If the database is computing CRC-32 checksums for object data, it should reset its internal CRC instance during this method call.
- Parameters:
obj
- the object position to begin reading from. This is fromnewInfo(AnyObjectId, UnresolvedDelta, ObjectId)
.info
- object to populate with type and size.- Returns:
- the
info
object. - Throws:
IOException
- the database cannot reposition to this location.
-
seekDatabase
protected abstract PackParser.ObjectTypeAndSize seekDatabase(PackParser.UnresolvedDelta delta, PackParser.ObjectTypeAndSize info) throws IOException Reposition the database to re-read a previously stored object.If the database is computing CRC-32 checksums for object data, it should reset its internal CRC instance during this method call.
- Parameters:
delta
- the object position to begin reading from. This is an instance previously returned byonEndDelta()
.info
- object to populate with type and size.- Returns:
- the
info
object. - Throws:
IOException
- the database cannot reposition to this location.
-
readDatabase
Read from the database's current position into the buffer.- Parameters:
dst
- the buffer to copy read data into.pos
- position withindst
to start copying data into.cnt
- ideal target number of bytes to read. Actual read length may be shorter.- Returns:
- number of bytes stored.
- Throws:
IOException
- the database cannot be accessed.
-
checkCRC
protected abstract boolean checkCRC(int oldCRC) Check the current CRC matches the expected value.This method is invoked when an object is read back in from the database and its data is used during delta resolution. The CRC is validated after the object has been fully read, allowing the parser to verify there was no silent data corruption.
Implementations are free to ignore this check by always returning true if they are performing other data integrity validations at a lower level.
- Parameters:
oldCRC
- the prior CRC that was recorded during the first scan of the object from the pack stream.- Returns:
- true if the CRC matches; false if it does not.
-
onBeginWholeObject
protected abstract void onBeginWholeObject(long streamPosition, int type, long inflatedSize) throws IOException Event notifying the start of an object stored whole (not as a delta).- Parameters:
streamPosition
- position of this object in the incoming stream.type
- type of the object; one ofConstants.OBJ_COMMIT
,Constants.OBJ_TREE
,Constants.OBJ_BLOB
, orConstants.OBJ_TAG
.inflatedSize
- size of the object when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.- Throws:
IOException
- the object cannot be recorded.
-
onEndWholeObject
Event notifying the current object.- Parameters:
info
- object information.- Throws:
IOException
- the object cannot be recorded.
-
onBeginOfsDelta
protected abstract void onBeginOfsDelta(long deltaStreamPosition, long baseStreamPosition, long inflatedSize) throws IOException Event notifying start of a delta referencing its base by offset.- Parameters:
deltaStreamPosition
- position of this object in the incoming stream.baseStreamPosition
- position of the base object in the incoming stream. The base must be before the delta, thereforebaseStreamPosition < deltaStreamPosition
. This is not the position returned by a prior end object event.inflatedSize
- size of the delta when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.- Throws:
IOException
- the object cannot be recorded.
-
onBeginRefDelta
protected abstract void onBeginRefDelta(long deltaStreamPosition, AnyObjectId baseId, long inflatedSize) throws IOException Event notifying start of a delta referencing its base by ObjectId.- Parameters:
deltaStreamPosition
- position of this object in the incoming stream.baseId
- name of the base object. This object may be later in the stream, or might not appear at all in the stream (in the case of a thin-pack).inflatedSize
- size of the delta when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.- Throws:
IOException
- the object cannot be recorded.
-
onEndDelta
Event notifying the current object.- Returns:
- object information that must be populated with at least the offset.
- Throws:
IOException
- the object cannot be recorded.
-
inflateAndSkip
- Throws:
IOException
-
inflateAndReturn
- Throws:
IOException
-
inflate
- Throws:
IOException
-
addObjectAndTrack
-