Class SimilarityRenameDetector
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
Number of bits we need to express an index into src or dst list.All destinations to consider looking for a rename.private static final int
private long[]
Matrix of all examined file pairs, and their scores.private ContentSource.Pair
private int
Score a pair must exceed to be considered a rename.private static final int
All sources to consider for copies or renames.private boolean
Set if anySimilarityIndex.TableFullException
occurs. -
Constructor Summary
ConstructorsConstructorDescriptionSimilarityRenameDetector
(ContentSource.Pair reader, List<DiffEntry> srcs, List<DiffEntry> dsts) -
Method Summary
Modifier and TypeMethodDescriptionprivate int
compactDstList
(List<DiffEntry> in) compactSrcList
(List<DiffEntry> in) (package private) void
private static int
decodeFile
(int v) (package private) static int
dstFile
(long value) (package private) static long
encode
(int score, int srcIdx, int dstIdx) private static long
encodeFile
(int idx) private SimilarityIndex
hash
(DiffEntry.Side side, DiffEntry ent) private static boolean
(package private) boolean
(package private) static int
private static int
score
(long value) (package private) void
setRenameScore
(int score) private long
size
(DiffEntry.Side side, DiffEntry ent) (package private) static int
srcFile
(long value)
-
Field Details
-
BITS_PER_INDEX
private static final int BITS_PER_INDEXNumber of bits we need to express an index into src or dst list.This must be 28, giving us a limit of 2^28 entries in either list, which is an insane limit of 536,870,912 file names being considered in a single rename pass. The other 8 bits are used to store the score, while staying under 127 so the long doesn't go negative.
- See Also:
-
INDEX_MASK
private static final int INDEX_MASK- See Also:
-
SCORE_SHIFT
private static final int SCORE_SHIFT- See Also:
-
reader
-
srcs
All sources to consider for copies or renames.A source is typically a
DiffEntry.ChangeType.DELETE
change, but could be another type when trying to perform copy detection concurrently with rename detection. -
dsts
All destinations to consider looking for a rename.A destination is typically an
DiffEntry.ChangeType.ADD
, as the name has just come into existence, and we want to discover where its initial content came from. -
matrix
private long[] matrixMatrix of all examined file pairs, and their scores.The upper 8 bits of each long stores the score, but the score is bounded to be in the range (0, 128] so that the highest bit is never set, and all entries are therefore positive.
List indexes to an element of
srcs
anddsts
are encoded as the lower two groups of 28 bits, respectively, but the encoding is inverted, so that 0 is expressed as(1 << 28) - 1
. This sorts lower list indices later in the matrix, giving precedence to files whose names sort earlier in the tree. -
renameScore
private int renameScoreScore a pair must exceed to be considered a rename. -
tableOverflow
private boolean tableOverflowSet if anySimilarityIndex.TableFullException
occurs. -
out
-
-
Constructor Details
-
SimilarityRenameDetector
SimilarityRenameDetector(ContentSource.Pair reader, List<DiffEntry> srcs, List<DiffEntry> dsts)
-
-
Method Details
-
setRenameScore
void setRenameScore(int score) -
compute
- Throws:
IOException
CancelledException
-
getMatches
-
getLeftOverSources
-
getLeftOverDestinations
-
isTableOverflow
boolean isTableOverflow() -
compactSrcList
-
compactDstList
-
buildMatrix
- Throws:
IOException
CancelledException
-
nameScore
-
hash
private SimilarityIndex hash(DiffEntry.Side side, DiffEntry ent) throws IOException, SimilarityIndex.TableFullException -
size
- Throws:
IOException
-
score
private static int score(long value) -
srcFile
static int srcFile(long value) -
dstFile
static int dstFile(long value) -
encode
static long encode(int score, int srcIdx, int dstIdx) -
encodeFile
private static long encodeFile(int idx) -
decodeFile
private static int decodeFile(int v) -
isFile
-