What is a string?

Strings in the way computers represent textual information.

Classes of problems on strings

Most problems with strings have to do with search a string or a list of strings/patterns in another string.

String representation

Encodings

In computer memory, strings are nothing more that binary data.

Data in computer memory

Data in computer memory

In order to interprect what the value 0x48 (first 8 bits) means in terms of characters, we need an encoding.

The problem of encoding predates computers. For example, Morse code is an encoding between short and long sounds to Latin script.

Typical String Encodings

  • 1-byte encodings: EBCDIC and ASCII
  • 2-byte encodings: CJK (for logographic languages), Shift-JIS
  • multibyte-byte encodings: Unicode UTF-8

ASCII

ASCII was originally a 7-bit encoding.