Unicode

by moodyharsh
unicode
  1. Character is what one would normally write on a piece of paper.

  2. Charater set is a group of characters linked by some common property.

  3. Encoding is the binary representation.

  4. Glyph is the actual shape we see on the screen, variations are left to the font.

  5. Font is what actully represents the glyph at machine level.

  6. Code point is the number representation of each character in character set

  7. Unicode is a 32-bit representation scheme consisting of 256x256x256x256 character points assigned to charaters

  8. Currently about 0.1 million points have been assigned. Only ancient symbols have not been given any assignements.

  9. Each byte represents as follows -> column x row x plane x space. Since the number is huge, all the ASCII tables have been filled on the
    first plane and this first plane also hosts all commercially important languages.

  10. There is UTF-32, UTF-16. With the help of surrogate pairs that UTF-16 covers entire unicode space.

  11. UTF-8 this is the best encoding available where 7-bit ascii is kept intact.

  12. All encoding greater than ascii are witten from the last byte in literal hex form from right to left.
    If any vacancies remain in the leftmost byte then they are made 0. The now of ones suffixed by zero in the first byte indicate the no of bytes.