Class QueryParser

All Implemented Interfaces:
Map<String,String>, Query
Direct Known Subclasses:
QueryCombiner

public class QueryParser extends MapParser<String> implements Query
The ParameterParser is used to parse data encoded in the application/x-www-form-urlencoded MIME type. It is also used to parse a query string from a HTTP URL, see RFC 2616. The parsed parameters are available through the various methods of the org.simpleframework.http.net.Query interface. The syntax of the parsed parameters is described below in BNF.

    params  = *(pair [ "&" params])
    pair    = name "=" value
    name    = *(text | escaped)
    value   = *(text | escaped)
    escaped = % HEX HEX

 
This will consume all data found as a name or value, if the data is a "+" character then it is replaced with a space character. This regards only "=", "&", and "%" as having special values. The "=" character delimits the name from the value and the "&" delimits the name value pair. The "%" character represents the start of an escaped sequence, which consists of two hex digits. All escaped sequences are converted to its character value.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private class 
    This is used to mark regions within the buffer that represent a valid token for either the name of a parameter or its value.

    Nested classes/interfaces inherited from interface java.util.Map

    Map.Entry<K,V>
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    Used to accumulate the characters for the parameter name.
    Used to accumulate the characters for the parameter value.

    Fields inherited from class org.simpleframework.common.parse.MapParser

    all, map

    Fields inherited from class org.simpleframework.common.parse.Parser

    buf, count, off
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor for the ParameterParser.
    Constructor for the ParameterParser.
  • Method Summary

    Modifier and Type
    Method
    Description
    private boolean
    binary(int peek)
    This method determines, using a peek character, whether the sequence of escaped characters within the URI is binary data.
    private char
    bits(int data)
    Defines behaviour for UCS-2 versus UCS-4 conversion from four octets.
    private int
    convert(char high, char low)
    This will convert the two hexidecimal characters to a real integer value, which is returned.
    private String
    encode(String text)
    This encode method will escape the text that is provided.
    private String
    encode(String name, String value)
    This encode method will escape the name=value pair provided using the UTF-8 character set.
    private void
    This converts an encountered escaped sequence, that is all embedded hexidecimal characters into a native UCS character value.
    boolean
    This extracts a boolean parameter for the named value.
    float
    This extracts a float parameter for the named value.
    int
    This extracts an integer parameter for the named value.
    private boolean
    hex(char ch)
    This is used to determine whether a char is a hexadecimal char or not.
    protected void
    This initializes the parser so that it can be used several times.
    private void
    This method adds the name and value to a map so that the next name and value can be collected.
    private void
    This will add the given name and value to the parameters map.
    private void
    This extracts the name of the parameter from the character buffer.
    private void
    This is an expression that is defined by RFC 2396 it is used in the definition of a segment expression.
    protected void
    This performs the actual parsing of the parameter text.
    private int
    peek(int pos)
    This will return the escape expression specified from the URI as an integer value of the hexadecimal sequence.
    This toString method is used to compose an string in the application/x-www-form-urlencoded MIME type.
    This toString method is used to compose an string in the application/x-www-form-urlencoded MIME type.
    private boolean
    unicode(int peek)
    This method determines, using a peek character, whether the sequence of escaped characters within the URI is in UTF-8.
    private boolean
    unicode(int peek, int more)
    This method will decode the specified amount of escaped characters from the URI and convert them into a single Java UCS-2 character.
    private boolean
    unicode(int peek, int more, int pos)
    This will decode the specified amount of trailing UTF-8 bits from the URI.
    private void
    This extracts a parameter value from a path segment.

    Methods inherited from class org.simpleframework.common.parse.MapParser

    clear, containsKey, containsValue, entrySet, get, getAll, isEmpty, keySet, put, putAll, remove, size, values

    Methods inherited from class org.simpleframework.common.parse.Parser

    digit, ensureCapacity, parse, skip, space, toLower

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.simpleframework.http.Query

    getAll
  • Field Details

    • name

      private QueryParser.Token name
      Used to accumulate the characters for the parameter name.
    • value

      private QueryParser.Token value
      Used to accumulate the characters for the parameter value.
  • Constructor Details

    • QueryParser

      public QueryParser()
      Constructor for the ParameterParser. This creates an instance that can be use to parse HTML form data and URL query strings encoded as application/x-www-form-urlencoded. The parsed parameters are made available through the interface org.simpleframework.util.net.Query.
    • QueryParser

      public QueryParser(String text)
      Constructor for the ParameterParser. This creates an instance that can be use to parse HTML form data and URL query strings encoded as application/x-www-form-urlencoded. The parsed parameters are made available through the interface org.simpleframework.util.net.Query.
      Parameters:
      text - this is the text to parse for the parameters
  • Method Details

    • getInteger

      public int getInteger(Object name)
      This extracts an integer parameter for the named value. If the named parameter does not exist this will return a zero value. If however the parameter exists but is not in the format of a decimal integer value then this will throw an exception.
      Specified by:
      getInteger in interface Query
      Parameters:
      name - the name of the parameter value to retrieve
      Returns:
      this returns the named parameter value as an integer
    • getFloat

      public float getFloat(Object name)
      This extracts a float parameter for the named value. If the named parameter does not exist this will return a zero value. If however the parameter exists but is not in the format of a floating point number then this will throw an exception.
      Specified by:
      getFloat in interface Query
      Parameters:
      name - the name of the parameter value to retrieve
      Returns:
      this returns the named parameter value as a float
    • getBoolean

      public boolean getBoolean(Object name)
      This extracts a boolean parameter for the named value. If the named parameter does not exist this will return false otherwise the value is evaluated. If it is either true or false then those boolean values are returned.
      Specified by:
      getBoolean in interface Query
      Parameters:
      name - the name of the parameter value to retrieve
      Returns:
      this returns the named parameter value as an float
    • init

      protected void init()
      This initializes the parser so that it can be used several times. This clears any previous parameters extracted. This ensures that when the next parse(String) is invoked the status of the Query is empty.
      Specified by:
      init in class Parser
    • parse

      protected void parse()
      This performs the actual parsing of the parameter text. The parameters parsed from this are taken as "name=value" pairs. Multiple pairs within the text are separated by an "&". This will parse and insert all parameters into a hashtable.
      Specified by:
      parse in class Parser
    • insert

      private void insert()
      This method adds the name and value to a map so that the next name and value can be collected. The name and value are added to the map as string objects. Once added to the map the Token objects are set to have zero length so they can be reused to collect further values. This will add the values to the map as an array of type string. This is done so that if there are multiple values that they can be stored.
    • insert

      private void insert(QueryParser.Token name, QueryParser.Token value)
      This will add the given name and value to the parameters map. If any previous value of the given name has been inserted into the map then this will overwrite that value. This is used to ensure that the string value is inserted to the map.
      Parameters:
      name - this is the name of the value to be inserted
      value - this is the value of a that is to be inserted
    • param

      private void param()
      This is an expression that is defined by RFC 2396 it is used in the definition of a segment expression. This is basically a list of chars with escaped sequences.

      This method has to ensure that no escaped chars go unchecked. This ensures that the read offset does not go out of bounds and consequently throw an out of bounds exception.

    • name

      private void name()
      This extracts the name of the parameter from the character buffer. The name of a parameter is defined as a set of chars including escape sequences. This will extract the parameter name and buffer the chars. The name ends when a equals character, "=", is encountered.
    • value

      private void value()
      This extracts a parameter value from a path segment. The parameter value consists of a sequence of chars and some escape sequences. The parameter value is buffered so that the name and values can be paired. The end of the value is determined as the end of the buffer or an ampersand.
    • escape

      private void escape()
      This converts an encountered escaped sequence, that is all embedded hexidecimal characters into a native UCS character value. This does not take any characters from the stream it just prepares the buffer with the correct byte. The escaped sequence within the URI will be interpreded as UTF-8.

      This will leave the next character to read from the buffer as the character encoded from the URI. If there is a fully valid escaped sequence, that is "%" HEX HEX. This decodes the escaped sequence using UTF-8 encoding, all encoded sequences should be in UCS-2 to fit in a Java char.

    • binary

      private boolean binary(int peek)
      This method determines, using a peek character, whether the sequence of escaped characters within the URI is binary data. If the data within the escaped sequence is binary then this will ensure that the next character read from the URI is the binary octet. This is used strictly for backward compatible parsing of URI strings, binary data should never appear.
      Parameters:
      peek - this is the first escaped character from the URI
      Returns:
      currently this implementation always returns true
    • unicode

      private boolean unicode(int peek)
      This method determines, using a peek character, whether the sequence of escaped characters within the URI is in UTF-8. If a UTF-8 character can be successfully decoded from the URI it will be the next character read from the buffer. This can check for both UCS-2 and UCS-4 characters. However, because the Java char can only hold UCS-2, the UCS-4 characters will have only the low order octets stored.

      The WWW Consortium provides a reference implementation of a UTF-8 decoding for Java, in this the low order octets in the UCS-4 sequence are used for the character. So, in the absence of a defined behaviour, the W3C behaviour is assumed.

      Parameters:
      peek - this is the first escaped character from the URI
      Returns:
      this returns true if a UTF-8 character is decoded
    • unicode

      private boolean unicode(int peek, int more)
      This method will decode the specified amount of escaped characters from the URI and convert them into a single Java UCS-2 character. If there are not enough characters within the URI then this will return false and leave the URI alone.

      The number of characters left is determined from the first UTF-8 octet, as specified in RFC 2279, and because this is a URI there must that number of "%" HEX HEX sequences left. If successful the next character read is the UTF-8 sequence decoded into a native UCS-2 character.

      Parameters:
      peek - contains the bits read from the first UTF octet
      more - this specifies the number of UTF octets left
      Returns:
      this returns true if a UTF-8 character is decoded
    • unicode

      private boolean unicode(int peek, int more, int pos)
      This will decode the specified amount of trailing UTF-8 bits from the URI. The trailing bits are those following the first UTF-8 octet, which specifies the length, in octets, of the sequence. The trailing octets are of the form 10xxxxxx, for each of these octets only the last six bits are valid UCS bits. So a conversion is basically an accumulation of these.

      If at any point during the accumulation of the UTF-8 bits there is a parsing error, then parsing is aborted an false is returned, as a result the URI is left unchanged.

      Parameters:
      peek - bytes that have been accumulated fron the URI
      more - this specifies the number of UTF octets left
      pos - this specifies the position the parsing begins
      Returns:
      this returns true if a UTF-8 character is decoded
    • bits

      private char bits(int data)
      Defines behaviour for UCS-2 versus UCS-4 conversion from four octets. The UTF-8 encoding scheme enables UCS-4 characters to be encoded and decodeded. However, Java supports the 16-bit UCS-2 character set, and so the 32-bit UCS-4 character set is not compatable. This basically decides what to do with UCS-4.
      Parameters:
      data - up to four octets to be converted to UCS-2 format
      Returns:
      this returns a native UCS-2 character from the int
    • peek

      private int peek(int pos)
      This will return the escape expression specified from the URI as an integer value of the hexadecimal sequence. This does not make any changes to the buffer it simply checks to see if the characters at the position specified are an escaped set characters of the form "%" HEX HEX, if so, then it will convert that hexadecimal string in to an integer value, or -1 if the expression is not hexadecimal.
      Parameters:
      pos - this is the position the expression starts from
      Returns:
      the integer value of the hexadecimal expression
    • convert

      private int convert(char high, char low)
      This will convert the two hexidecimal characters to a real integer value, which is returned. This requires characters within the range of 'A' to 'F' and 'a' to 'f', and also the digits '0' to '9'. The characters encoded using the ISO-8859-1 encoding scheme, if the characters are not with in the range specified then this returns -1.
      Parameters:
      high - this is the high four bits within the integer
      low - this is the low four bits within the integer
      Returns:
      this returns the indeger value of the conversion
    • hex

      private boolean hex(char ch)
      This is used to determine whether a char is a hexadecimal char or not. A hexadecimal character is considered to be a character within the range of 0 - 9 and between a - f and A - F. This will return true if the character is in this range.
      Parameters:
      ch - this is the character which is to be determined here
      Returns:
      true if the character given has a hexadecimal value
    • encode

      private String encode(String text)
      This encode method will escape the text that is provided. This is used to that the parameter pairs can be encoded in such a way that it can be transferred over HTTP/1.1 using the ISO-8859-1 character set.
      Parameters:
      text - this is the text that is to be escaped
      Returns:
      the text with % HEX HEX UTF-8 escape sequences
    • encode

      private String encode(String name, String value)
      This encode method will escape the name=value pair provided using the UTF-8 character set. This method will ensure that the parameters are encoded in such a way that they can be transferred via HTTP in ISO-8859-1.
      Parameters:
      name - this is the name of that is to be escaped
      value - this is the value that is to be escaped
      Returns:
      the pair with % HEX HEX UTF-8 escape sequences
    • toString

      public String toString(Set set)
      This toString method is used to compose an string in the application/x-www-form-urlencoded MIME type. This will encode the tokens specified in the Set. Each name=value pair acquired is converted into a UTF-8 escape sequence so that the parameters can be sent in the IS0-8859-1 format required via the HTTP/1.1 specification RFC 2616.
      Parameters:
      set - this is the set of parameters to be encoded
      Returns:
      returns a HTTP parameter encoding for the pairs
    • toString

      public String toString()
      This toString method is used to compose an string in the application/x-www-form-urlencoded MIME type. This will iterate over all tokens that have been added to this object, either during parsing, or during use of the instance. Each name=value pair acquired is converted into a UTF-8 escape sequence so that the parameters can be sent in the IS0-8859-1 format required via the HTTP/1.1 specification RFC 2616.
      Specified by:
      toString in interface Query
      Overrides:
      toString in class Object
      Returns:
      returns a HTTP parameter encoding for the pairs