Class RawParseUtils

java.lang.Object
org.eclipse.jgit.util.RawParseUtils

public final class RawParseUtils extends Object
Handy utility functions to parse raw object contents.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final byte[]
     
    private static final byte[]
     
    private static final byte[]
     
    private static final Map<String,Charset>
     
    private static final byte[]
     
    static final Charset
    Deprecated.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static final int
    author(byte[] b, int ptr)
    Locate the "author " header line data.
    private static Charset
     
    static final int
    commitMessage(byte[] b, int ptr)
    Locate the position of the commit message body.
    static final int
    committer(byte[] b, int ptr)
    Locate the "committer " header line data.
    static String
    decode(byte[] buffer)
    Decode a buffer under UTF-8, if possible.
    static String
    decode(byte[] buffer, int start, int end)
    Decode a buffer under UTF-8, if possible.
    private static String
    decode(ByteBuffer b, Charset charset)
     
    static String
    decode(Charset cs, byte[] buffer)
    Decode a buffer under the specified character set if possible.
    static String
    decode(Charset cs, byte[] buffer, int start, int end)
    Decode a region of the buffer under the specified character set if possible.
    static String
    decodeNoFallback(Charset cs, byte[] buffer, int start, int end)
    Decode a region of the buffer under the specified character set if possible.
    static final int
    encoding(byte[] b, int ptr)
    Locate the "encoding " header line.
    static int
    endOfFooterLineKey(byte[] raw, int ptr)
    Locate the end of a footer line key string.
    static final int
    endOfParagraph(byte[] b, int start)
    Locate the end of a paragraph.
    static String
    extractBinaryString(byte[] buffer, int start, int end)
    Decode a region of the buffer under the ISO-8859-1 encoding.
    static int
    formatBase10(byte[] b, int o, int value)
    Format a base 10 numeric into a temporary buffer.
    static final int
    headerEnd(byte[] b, int ptr)
    Locate the end of the header.
    static final int
    headerStart(byte[] headerName, byte[] b, int ptr)
    Find the start of the contents of a given header.
    static int
    lastIndexOfTrim(byte[] raw, char ch, int pos)
    Get last index of ch in raw, trimming spaces.
    static final IntList
    lineMap(byte[] buf, int ptr, int end)
    Index the region between [ptr, end) to find line starts.
    static final IntList
    lineMapOrBinary(byte[] buf, int ptr, int end)
    Like lineMap(byte[], int, int) but throw BinaryBlobException if a NUL byte is encountered.
    private static IntList
    lineMapOrNull(byte[] buf, int ptr, int end)
     
    static final int
    match(byte[] b, int ptr, byte[] src)
    Determine if b[ptr] matches src.
    static final int
    next(byte[] b, int ptr, char chrA)
    Locate the first position after a given character.
    static final int
    nextLF(byte[] b, int ptr)
    Locate the first position after the next LF.
    static final int
    nextLF(byte[] b, int ptr, char chrA)
    Locate the first position after either the given character or LF.
    static final int
    parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
    Parse a base 10 numeric from a sequence of ASCII digits into an int.
    static Charset
    parseEncoding(byte[] b)
    Parse the "encoding " header into a character set reference.
    static String
    Parse the "encoding " header as a string.
    static final int
    parseHexInt16(byte[] bs, int p)
    Parse 4 character base 16 (hex) formatted string to unsigned integer.
    static final int
    parseHexInt32(byte[] bs, int p)
    Parse 8 character base 16 (hex) formatted string to unsigned integer.
    static final int
    parseHexInt4(byte digit)
    Parse a single hex digit to its numeric value (0-15).
    static final long
    parseHexInt64(byte[] bs, int p)
    Parse 16 character base 16 (hex) formatted string to unsigned long.
    static final long
    parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
    Parse a base 10 numeric from a sequence of ASCII digits into a long.
    parsePersonIdent(byte[] raw, int nameB)
    Parse a name line (e.g.
    Parse a name string (e.g.
    parsePersonIdentOnly(byte[] raw, int nameB)
    Parse a name data (e.g.
    static final int
    parseTimeZoneOffset(byte[] b, int ptr)
    Parse a Git style timezone string.
    static final int
    parseTimeZoneOffset(byte[] b, int ptr, MutableInteger ptrResult)
    Parse a Git style timezone string.
    static final int
    prev(byte[] b, int ptr, char chrA)
    Locate the first position before a given character.
    static final int
    prevLF(byte[] b, int ptr)
    Locate the first position before the previous LF.
    static final int
    prevLF(byte[] b, int ptr, char chrA)
    Locate the previous position before either the given character or LF.
    static final int
    tagger(byte[] b, int ptr)
    Locate the "tagger " header line data.
    static final int
    tagMessage(byte[] b, int ptr)
    Locate the position of the tag message body.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • UTF8_CHARSET

      @Deprecated public static final Charset UTF8_CHARSET
      Deprecated.
      UTF-8 charset constant.
      Since:
      2.2
    • digits10

      private static final byte[] digits10
    • digits16

      private static final byte[] digits16
    • footerLineKeyChars

      private static final byte[] footerLineKeyChars
    • encodingAliases

      private static final Map<String,Charset> encodingAliases
    • base10byte

      private static final byte[] base10byte
  • Constructor Details

    • RawParseUtils

      private RawParseUtils()
  • Method Details

    • match

      public static final int match(byte[] b, int ptr, byte[] src)
      Determine if b[ptr] matches src.
      Parameters:
      b - the buffer to scan.
      ptr - first position within b, this should match src[0].
      src - the buffer to test for equality with b.
      Returns:
      ptr + src.length if b[ptr..src.length] == src; else -1.
    • formatBase10

      public static int formatBase10(byte[] b, int o, int value)
      Format a base 10 numeric into a temporary buffer.

      Formatting is performed backwards. The method starts at offset o-1 and ends at o-1-digits, where digits is the number of positions necessary to store the base 10 value.

      The argument and return values from this method make it easy to chain writing, for example:

       final byte[] tmp = new byte[64];
       int ptr = tmp.length;
       tmp[--ptr] = '\n';
       ptr = RawParseUtils.formatBase10(tmp, ptr, 32);
       tmp[--ptr] = ' ';
       ptr = RawParseUtils.formatBase10(tmp, ptr, 18);
       tmp[--ptr] = 0;
       final String str = new String(tmp, ptr, tmp.length - ptr);
       
      Parameters:
      b - buffer to write into.
      o - one offset past the location where writing will begin; writing proceeds towards lower index values.
      value - the value to store.
      Returns:
      the new offset value o. This is the position of the last byte written. Additional writing should start at one position earlier.
    • parseBase10

      public static final int parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a base 10 numeric from a sequence of ASCII digits into an int.

      Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start parsing digits at.
      ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
      Returns:
      the value at this location; 0 if the location is not a valid numeric.
    • parseLongBase10

      public static final long parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a base 10 numeric from a sequence of ASCII digits into a long.

      Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start parsing digits at.
      ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
      Returns:
      the value at this location; 0 if the location is not a valid numeric.
    • parseHexInt16

      public static final int parseHexInt16(byte[] bs, int p)
      Parse 4 character base 16 (hex) formatted string to unsigned integer.

      The number is read in network byte order, that is, most significant nybble first.

      Parameters:
      bs - buffer to parse digits from; positions [p, p+4) will be parsed.
      p - first position within the buffer to parse.
      Returns:
      the integer value.
      Throws:
      ArrayIndexOutOfBoundsException - if the string is not hex formatted.
    • parseHexInt32

      public static final int parseHexInt32(byte[] bs, int p)
      Parse 8 character base 16 (hex) formatted string to unsigned integer.

      The number is read in network byte order, that is, most significant nybble first.

      Parameters:
      bs - buffer to parse digits from; positions [p, p+8) will be parsed.
      p - first position within the buffer to parse.
      Returns:
      the integer value.
      Throws:
      ArrayIndexOutOfBoundsException - if the string is not hex formatted.
    • parseHexInt64

      public static final long parseHexInt64(byte[] bs, int p)
      Parse 16 character base 16 (hex) formatted string to unsigned long.

      The number is read in network byte order, that is, most significant nibble first.

      Parameters:
      bs - buffer to parse digits from; positions [p, p+16) will be parsed.
      p - first position within the buffer to parse.
      Returns:
      the integer value.
      Throws:
      ArrayIndexOutOfBoundsException - if the string is not hex formatted.
      Since:
      4.3
    • parseHexInt4

      public static final int parseHexInt4(byte digit)
      Parse a single hex digit to its numeric value (0-15).
      Parameters:
      digit - hex character to parse.
      Returns:
      numeric value, in the range 0-15.
      Throws:
      ArrayIndexOutOfBoundsException - if the input digit is not a valid hex digit.
    • parseTimeZoneOffset

      public static final int parseTimeZoneOffset(byte[] b, int ptr)
      Parse a Git style timezone string.

      The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start parsing digits at.
      Returns:
      the timezone at this location, expressed in minutes.
    • parseTimeZoneOffset

      public static final int parseTimeZoneOffset(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a Git style timezone string.

      The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start parsing digits at.
      ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
      Returns:
      the timezone at this location, expressed in minutes.
      Since:
      4.1
    • next

      public static final int next(byte[] b, int ptr, char chrA)
      Locate the first position after a given character.
      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for chrA at.
      chrA - character to find.
      Returns:
      new position just after chrA.
    • nextLF

      public static final int nextLF(byte[] b, int ptr)
      Locate the first position after the next LF.

      This method stops on the first '\n' it finds.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for LF at.
      Returns:
      new position just after the first LF found.
    • nextLF

      public static final int nextLF(byte[] b, int ptr, char chrA)
      Locate the first position after either the given character or LF.

      This method stops on the first match it finds from either chrA or '\n'.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for chrA or LF at.
      chrA - character to find.
      Returns:
      new position just after the first chrA or LF to be found.
    • headerEnd

      public static final int headerEnd(byte[] b, int ptr)
      Locate the end of the header. Note that headers may be more than one line long.
      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for the end-of-header.
      Returns:
      new position just after the header. This is either b.length, or the index of the header's terminating newline.
      Since:
      5.1
    • headerStart

      public static final int headerStart(byte[] headerName, byte[] b, int ptr)
      Find the start of the contents of a given header.
      Parameters:
      headerName - header to search for
      b - buffer to scan.
      ptr - position within buffer to start looking for header at.
      Returns:
      new position at the start of the header's contents, -1 for not found
      Since:
      5.1
    • prev

      public static final int prev(byte[] b, int ptr, char chrA)
      Locate the first position before a given character.
      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for chrA at.
      chrA - character to find.
      Returns:
      new position just before chrA, -1 for not found
    • prevLF

      public static final int prevLF(byte[] b, int ptr)
      Locate the first position before the previous LF.

      This method stops on the first '\n' it finds.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for LF at.
      Returns:
      new position just before the first LF found, -1 for not found
    • prevLF

      public static final int prevLF(byte[] b, int ptr, char chrA)
      Locate the previous position before either the given character or LF.

      This method stops on the first match it finds from either chrA or '\n'.

      Parameters:
      b - buffer to scan.
      ptr - position within buffer to start looking for chrA or LF at.
      chrA - character to find.
      Returns:
      new position just before the first chrA or LF to be found, -1 for not found
    • lineMap

      public static final IntList lineMap(byte[] buf, int ptr, int end)
      Index the region between [ptr, end) to find line starts.

      The returned list is 1 indexed. Index 0 contains Integer.MIN_VALUE to pad the list out.

      Using a 1 indexed list means that line numbers can be directly accessed from the list, so list.get(1) (aka get line 1) returns ptr.

      The last element (index map.size()-1) always contains end.

      Parameters:
      buf - buffer to scan.
      ptr - position within the buffer corresponding to the first byte of line 1.
      end - 1 past the end of the content within buf.
      Returns:
      a line map indicating the starting position of each line.
    • lineMapOrBinary

      public static final IntList lineMapOrBinary(byte[] buf, int ptr, int end) throws BinaryBlobException
      Like lineMap(byte[], int, int) but throw BinaryBlobException if a NUL byte is encountered.
      Parameters:
      buf - buffer to scan.
      ptr - position within the buffer corresponding to the first byte of line 1.
      end - 1 past the end of the content within buf.
      Returns:
      a line map indicating the starting position of each line.
      Throws:
      BinaryBlobException - if a NUL byte is found.
      Since:
      5.0
    • lineMapOrNull

      @Nullable private static IntList lineMapOrNull(byte[] buf, int ptr, int end)
    • author

      public static final int author(byte[] b, int ptr)
      Locate the "author " header line data.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
      Returns:
      position just after the space in "author ", so the first character of the author's name. If no author header can be located -1 is returned.
    • committer

      public static final int committer(byte[] b, int ptr)
      Locate the "committer " header line data.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
      Returns:
      position just after the space in "committer ", so the first character of the committer's name. If no committer header can be located -1 is returned.
    • tagger

      public static final int tagger(byte[] b, int ptr)
      Locate the "tagger " header line data.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer and does not accidentally look at message body.
      Returns:
      position just after the space in "tagger ", so the first character of the tagger's name. If no tagger header can be located -1 is returned.
    • encoding

      public static final int encoding(byte[] b, int ptr)
      Locate the "encoding " header line.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the buffer and does not accidentally look at the message body.
      Returns:
      position just after the space in "encoding ", so the first character of the encoding's name. If no encoding header can be located -1 is returned (and UTF-8 should be assumed).
    • parseEncodingName

      @Nullable public static String parseEncodingName(byte[] b)
      Parse the "encoding " header as a string.

      Locates the "encoding " header (if present) and returns its value.

      Parameters:
      b - buffer to scan.
      Returns:
      the encoding header as specified in the commit; null if the header was not present and should be assumed.
      Since:
      4.2
    • parseEncoding

      public static Charset parseEncoding(byte[] b)
      Parse the "encoding " header into a character set reference.

      Locates the "encoding " header (if present) by first calling encoding(byte[], int) and then returns the proper character set to apply to this buffer to evaluate its contents as character data.

      If no encoding header is present UTF-8 is assumed.

      Parameters:
      b - buffer to scan.
      Returns:
      the Java character set representation. Never null.
      Throws:
      IllegalCharsetNameException - if the character set requested by the encoding header is malformed and unsupportable.
      UnsupportedCharsetException - if the JRE does not support the character set requested by the encoding header.
    • parsePersonIdent

      public static PersonIdent parsePersonIdent(String in)
      Parse a name string (e.g. author, committer, tagger) into a PersonIdent.

      Leading spaces won't be trimmed from the string, i.e. will show up in the parsed name afterwards.

      Parameters:
      in - the string to parse a name from.
      Returns:
      the parsed identity or null in case the identity could not be parsed.
    • parsePersonIdent

      public static PersonIdent parsePersonIdent(byte[] raw, int nameB)
      Parse a name line (e.g. author, committer, tagger) into a PersonIdent.

      When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

      Parameters:
      raw - the buffer to parse character data from.
      nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
      Returns:
      the parsed identity or null in case the identity could not be parsed.
    • parsePersonIdentOnly

      public static PersonIdent parsePersonIdentOnly(byte[] raw, int nameB)
      Parse a name data (e.g. as within a reflog) into a PersonIdent.

      When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

      Parameters:
      raw - the buffer to parse character data from.
      nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
      Returns:
      the parsed identity. Never null.
    • endOfFooterLineKey

      public static int endOfFooterLineKey(byte[] raw, int ptr)
      Locate the end of a footer line key string.

      If the region at raw[ptr] matches ^[A-Za-z0-9-]+: (e.g. "Signed-off-by: A. U. Thor\n") then this method returns the position of the first ':'.

      If the region at raw[ptr] does not match ^[A-Za-z0-9-]+: then this method returns -1.

      Parameters:
      raw - buffer to scan.
      ptr - first position within raw to consider as a footer line key.
      Returns:
      position of the ':' which terminates the footer line key if this is otherwise a valid footer line key; otherwise -1.
    • decode

      public static String decode(byte[] buffer)
      Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
      Parameters:
      buffer - buffer to pull raw bytes from.
      Returns:
      a string representation of the range [start,end), after decoding the region through the specified character set.
    • decode

      public static String decode(byte[] buffer, int start, int end)
      Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
      Parameters:
      buffer - buffer to pull raw bytes from.
      start - start position in buffer
      end - one position past the last location within the buffer to take data from.
      Returns:
      a string representation of the range [start,end), after decoding the region through the specified character set.
    • decode

      public static String decode(Charset cs, byte[] buffer)
      Decode a buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
      Parameters:
      cs - character set to use when decoding the buffer.
      buffer - buffer to pull raw bytes from.
      Returns:
      a string representation of the range [start,end), after decoding the region through the specified character set.
    • decode

      public static String decode(Charset cs, byte[] buffer, int start, int end)
      Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
      Parameters:
      cs - character set to use when decoding the buffer.
      buffer - buffer to pull raw bytes from.
      start - first position within the buffer to take data from.
      end - one position past the last location within the buffer to take data from.
      Returns:
      a string representation of the range [start,end), after decoding the region through the specified character set.
    • decodeNoFallback

      public static String decodeNoFallback(Charset cs, byte[] buffer, int start, int end) throws CharacterCodingException
      Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, an exception is thrown.
      Parameters:
      cs - character set to use when decoding the buffer.
      buffer - buffer to pull raw bytes from.
      start - first position within the buffer to take data from.
      end - one position past the last location within the buffer to take data from.
      Returns:
      a string representation of the range [start,end), after decoding the region through the specified character set.
      Throws:
      CharacterCodingException - the input is not in any of the tested character sets.
    • extractBinaryString

      public static String extractBinaryString(byte[] buffer, int start, int end)
      Decode a region of the buffer under the ISO-8859-1 encoding. Each byte is treated as a single character in the 8859-1 character encoding, performing a raw binary->char conversion.
      Parameters:
      buffer - buffer to pull raw bytes from.
      start - first position within the buffer to take data from.
      end - one position past the last location within the buffer to take data from.
      Returns:
      a string representation of the range [start,end).
    • decode

      private static String decode(ByteBuffer b, Charset charset) throws CharacterCodingException
      Throws:
      CharacterCodingException
    • commitMessage

      public static final int commitMessage(byte[] b, int ptr)
      Locate the position of the commit message body.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer.
      Returns:
      position of the user's message buffer.
    • tagMessage

      public static final int tagMessage(byte[] b, int ptr)
      Locate the position of the tag message body.
      Parameters:
      b - buffer to scan.
      ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer.
      Returns:
      position of the user's message buffer.
    • endOfParagraph

      public static final int endOfParagraph(byte[] b, int start)
      Locate the end of a paragraph.

      A paragraph is ended by two consecutive LF bytes or CRLF pairs

      Parameters:
      b - buffer to scan.
      start - position in buffer to start the scan at. Most callers will want to pass the first position of the commit message (as found by commitMessage(byte[], int).
      Returns:
      position of the LF at the end of the paragraph; b.length if no paragraph end could be located.
    • lastIndexOfTrim

      public static int lastIndexOfTrim(byte[] raw, char ch, int pos)
      Get last index of ch in raw, trimming spaces.
      Parameters:
      raw - buffer to scan.
      ch - character to find.
      pos - starting position.
      Returns:
      last index of ch in raw, trimming spaces.
      Since:
      4.1
    • charsetForAlias

      private static Charset charsetForAlias(String name)