Class StringEncoder

java.lang.Object
org.dellroad.stuff.string.StringEncoder

public final class StringEncoder extends Object
Encodes/decodes Java strings, escaping control and XML-invalid characters.
  • Field Details

    • ENCODE_PATTERN

      public static final Pattern ENCODE_PATTERN
      A regular expression that matches exactly the set of valid encode()'d strings.
    • ENQUOTE_PATTERN

      public static final Pattern ENQUOTE_PATTERN
      A regular expression that matches exactly the set of valid enquote()'d strings.
  • Method Details

    • encode

      public static String encode(String value, boolean escapeTABNLCR)
      Encode a string, escaping control and XML-invalid characters. Whether tab, newline, and carriage return are escaped is optional; these are the three control characters that are valid inside XML documents.

      Characters are escaped using \uNNNN notation like Java unicode characters, e.g., 0x001f would appear in the encoded string as \u001f. Normal Java backslash escapes are used for tab, newline, carriage return, backspace, and formfeed. Backslash characters are themselves encoded with a double backslash.

      Parameters:
      value - string to encode (possibly null)
      escapeTABNLCR - escape tab, newline, and carriage return characters as well
      Returns:
      the encoded version of value, or null if value was null
      See Also:
    • decode

      public static String decode(String text)
      Decode a string encoded by encode(java.lang.String, boolean).

      The parsing is strict; any ill-formed backslash escape sequence (i.e., not of the form \uNNNN, \b, \t, \n, \f, \r or \\) will cause an exception to be thrown.

      Parameters:
      text - string to decode (possibly null)
      Returns:
      the decoded version of text, or null if text was null
      Throws:
      IllegalArgumentException - if text contains an invalid escape sequence
      See Also:
    • enquote

      public static String enquote(String string)
      Enquote a string. Functions like encode(string, true) but in addition the resulting string is surrounded by double quotes and double quotes in the string are backslash-escaped.

      Note: the strings returned by this method are not suitable for decoding by decode(java.lang.String). Use dequote(java.lang.String) instead.

      Parameters:
      string - string to enquote
      Returns:
      enquoted string, or the string null (not quoted) if string is null
      Throws:
      IllegalArgumentException - if string is null
    • enquote

      public static String enquote(byte[] data, int off, int len)
      Enquote bytes, treating them as an ASCII string.
      Parameters:
      data - ascii character buffer
      off - starting offset in data
      len - number of characters in data
      Returns:
      enquoted string
      Throws:
      IllegalArgumentException - if data is null
      IndexOutOfBoundsException - if off and/or len is invalid
      See Also:
    • dequote

      public static String dequote(String quotedString)
      Dequote a string previously enquoted by enquote(java.lang.String).
      Parameters:
      quotedString - a string returned by enquote(java.lang.String)
      Returns:
      original unquoted string
      Throws:
      IllegalArgumentException - if quotedString has an invalid format (i.e., it could not have ever been returned by enquote(java.lang.String))
    • enquotedLength

      public static int enquotedLength(String string)
      Determine the length of a string previously enquoted by enquote(java.lang.String) when it appears as the prefix of a longer string. This method assumes that the prefix is a valid quoted string; use dequote(java.lang.String) to verify.
      Parameters:
      string - a string containing a prefix returned by enquote(java.lang.String)
      Returns:
      length of string when enquoted
      Throws:
      IllegalArgumentException - if a starting or terminating quote character is not found
    • isValidXMLChar

      public static boolean isValidXMLChar(int codepoint)
      Determine if the given character is a valid XML character according to the XML 1.0 specification.

      Valid characters are tab, newline, carriage return, and characters in the ranges 0x0020 - 0xd7ff, 0xe000 - 0xffdf, and 0x10000 - 0x10ffff.

      Parameters:
      codepoint - character codepoint
      Returns:
      true if codepoint is a valid XML character
      See Also: