当前位置:网站首页>ASCII, Unicode and UTF-8

ASCII, Unicode and UTF-8

2022-08-10 22:31:00 TABE_

Standard ASCII

Standard ASCII, also known as Basic ASCII, uses 7 binary digits (the remaining 1 binary 0 is 0) to represent all uppercase and lowercase letters, the numbers 0 to 9, punctuation, and the alphanumeric characters used in American English.Special control characters.

ASCII code just uses 7-bit binary number, when it is represented by a byte, its first bit is always 0.If only English is represented, one byte is enough, but to represent all the characters in the world, multiple bytes must be used.

Unicode

Unicode is to be able to represent all text on the computer.It sets a unified and unique binary encoding for each character in each language to meet the requirements of cross-language and cross-platform text conversion and processing.It should be noted that Unicode is only a symbol set, it only specifies the binary code of the symbol, but does not specify how the binary code should be stored.

UTF-8

UTF-8 is the most widely used unicode implementation on the Internet.UTF-8 is a variable-length encoding method, which can use 1~4 bytes to represent a symbol, and the byte length varies according to different symbols.

UTF-8 encoding rules:

  1. For a single-byte character, the first bit is set to 0, and the next 7 bits correspond to the Unicode code point of the character.Therefore, for characters 0 - 127 in English, it is exactly the same as the ASCII code.This means that documents from the ASCII era can be opened with UTF-8 encoding without any problems.
  2. For a character that needs to be represented by N bytes (N > 1), the first N bits of the first byte are set to 1, the N + 1th bit is set to 0, and the remaining N - 1 wordsThe first two bits of the section are set to 10, and the remaining bits are filled with the character's Unicode code point.
原网站

版权声明
本文为[TABE_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208102148161510.html