Guanlun Zhao
Web developer and learner who occasionally writes.
Find me on Github: github.com/guanlun
TIL: Unicode and UTF-8
  • Unicode is a character set. It specifies how codepoints (numbers) are mapping to characters but does not specify how numbers are stored as bits.
  • UTF-8 is an encoding. It specified how codepoints are stored at bits. UTF-8 encoded characters can be 1 to 4 bytes long.
  • UTF-16 is also a variable length encoding method and it starts with 2 bytes.
comments powered by Disqus