Class Lexer

java.lang.Object
org.apache.commons.csv.Lexer
All Implemented Interfaces:
Closeable, AutoCloseable

final class Lexer extends Object implements Closeable
Lexical analyzer.
  • Field Details

    • CR_STRING

      private static final String CR_STRING
    • LF_STRING

      private static final String LF_STRING
    • delimiter

      private final char[] delimiter
    • delimiterBuf

      private final char[] delimiterBuf
    • escapeDelimiterBuf

      private final char[] escapeDelimiterBuf
    • escape

      private final int escape
    • quoteChar

      private final int quoteChar
    • commentStart

      private final int commentStart
    • ignoreSurroundingSpaces

      private final boolean ignoreSurroundingSpaces
    • ignoreEmptyLines

      private final boolean ignoreEmptyLines
    • lenientEof

      private final boolean lenientEof
    • trailingData

      private final boolean trailingData
    • reader

      private final ExtendedBufferedReader reader
      The buffered reader.
    • firstEol

      private String firstEol
    • isLastTokenDelimiter

      private boolean isLastTokenDelimiter
  • Constructor Details

  • Method Details

    • appendNextEscapedCharacterToToken

      private void appendNextEscapedCharacterToToken(Token token) throws IOException
      Appends the next escaped character to the token's content.
      Parameters:
      token - the current token
      Throws:
      IOException - on stream access error
      CSVException - Thrown on invalid input.
    • close

      public void close() throws IOException
      Closes resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - If an I/O error occurs
    • getBytesRead

      long getBytesRead()
      Gets the number of bytes read
      Returns:
      the number of bytes read
    • getCharacterPosition

      long getCharacterPosition()
      Returns the current character position
      Returns:
      the current character position
    • getCurrentLineNumber

      long getCurrentLineNumber()
      Returns the current line number
      Returns:
      the current line number
    • getFirstEol

      String getFirstEol()
    • isClosed

      boolean isClosed()
    • isCommentStart

      boolean isCommentStart(int ch)
    • isDelimiter

      boolean isDelimiter(int ch) throws IOException
      Determine whether the next characters constitute a delimiter through UnsynchronizedBufferedReader.peek(char[]).
      Parameters:
      ch - the current character.
      Returns:
      true if the next characters constitute a delimiter.
      Throws:
      IOException - If an I/O error occurs.
    • isEndOfFile

      boolean isEndOfFile(int ch)
      Tests if the given character indicates the end of the file.
      Returns:
      true if the given character indicates the end of the file.
    • isEscape

      boolean isEscape(int ch)
      Tests if the given character is the escape character.
      Returns:
      true if the given character is the escape character.
    • isEscapeDelimiter

      boolean isEscapeDelimiter() throws IOException
      Tests if the next characters constitute a escape delimiter through UnsynchronizedBufferedReader.peek(char[]). For example, for delimiter "[|]" and escape '!', return true if the next characters constitute "![!|!]".
      Returns:
      true if the next characters constitute an escape delimiter.
      Throws:
      IOException - If an I/O error occurs.
    • isMetaChar

      private boolean isMetaChar(int ch)
    • isQuoteChar

      boolean isQuoteChar(int ch)
    • isStartOfLine

      boolean isStartOfLine(int ch)
      Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.
      Parameters:
      ch - the character to check
      Returns:
      true if the character is at the start of a line.
    • nextToken

      Token nextToken(Token token) throws IOException
      Returns the next token.

      A token corresponds to a term, a record change or an end-of-file indicator.

      Parameters:
      token - an existing Token object to reuse. The caller is responsible for initializing the Token.
      Returns:
      the next token found.
      Throws:
      IOException - on stream access error.
      CSVException - Thrown on invalid input.
    • nullToDisabled

      private int nullToDisabled(Character c)
    • parseEncapsulatedToken

      private Token parseEncapsulatedToken(Token token) throws IOException
      Parses an encapsulated token.

      Encapsulated tokens are surrounded by the given encapsulating string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token is ignored. The token is finished when one of the following conditions becomes true:

      • An unescaped encapsulator has been reached and is followed by optional whitespace then:
        • delimiter (TOKEN)
        • end of line (EORECORD)
      • end of stream has been reached (EOF)
      Parameters:
      token - the current token
      Returns:
      a valid token object
      Throws:
      IOException - Thrown when in an invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL.
      CSVException - Thrown on invalid input.
    • parseSimpleToken

      private Token parseSimpleToken(Token token, int ch) throws IOException
      Parses a simple token.

      Simple tokens are tokens that are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions becomes true:

      • The end of line has been reached (EORECORD)
      • The end of stream has been reached (EOF)
      • An unescaped delimiter has been reached (TOKEN)
      Parameters:
      token - the current token
      ch - the current character
      Returns:
      the filled token
      Throws:
      IOException - on stream access error
      CSVException - Thrown on invalid input.
    • readEndOfLine

      boolean readEndOfLine(int ch) throws IOException
      Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...
      Returns:
      true if the given or next character is a line-terminator
      Throws:
      IOException
    • readEscape

      int readEscape() throws IOException
      Handle an escape sequence. The current character must be the escape character. On return, the next character is available by calling ExtendedBufferedReader.getLastChar() on the input stream.
      Returns:
      the unescaped character (as an int) or IOUtils.EOF if char following the escape is invalid.
      Throws:
      IOException - if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of stream
      CSVException - Thrown on invalid input.
    • trimTrailingSpaces

      void trimTrailingSpaces(StringBuilder buffer)