Unicode Character Database

The Unicode Standard Character Database

Twitter · Facebook
Unicode Standard Character DatabaseUnicode Standard Character Database

The Unicode Standard (commonly known as simply Unicode) is a universal character encoding standard for written characters and text. It defines a consistent way of encoding multilingual text that enables the representation of worldwide text for computer processing and display of written texts of classical and modern languages, as well as many technical disciplines of the world. As the default encoding of HTML and XML, the Unicode Standard provides the pillar for the World Wide Web and the global business ecosystem of the current age. Required in new Internet protocols and implemented in all modern operating systems and programming languages, Unicode is the basis of software that must function all around the world. With Unicode, the technology industry has replaced proliferating character sets with a single, stable, and universal character repertoire that allows for global interoperability and reliable cross-language data interchange.

From a software developer's point of view, the Unicode Standard and its associated specifications provide programmers with a unified universal character encoding, extensive descriptions, and vast amounts of data about how characters in the Unicode repertoire function. The specifications describe how to form words and break lines; sort text in different languages; format numbers, dates, and times appropriate to certain languages; display languages whose written form flows from right to left, such as Arabic, Hebrew, and Thaana; or whose written form splits, combines, and reorders, such as languages of South Asia. Without the character properties and algorithms in the Unicode Standard and its associated core specifications, interoperability between different implementations would be impossible, and much of the vast breadth of the world’s languages would lie outside the reach of modern computer software.

The Unicode Standard associates a rich set of semantics with each encoded character: properties that are required for interoperability and correct behavior in implementations, as well as for Unicode conformance. These semantics are comprehensively cataloged in what is known as the Unicode Character Database, a collection of data files which contain the Unicode character code points and character names. The data files define character properties and mappings between Unicode characters (such as case mappings). The Unicode Character Database, being an integral part of the Unicode Standard, contains normative property and mapping information required for implementation of Unicode Standard algorithms such as the Bidirectional, Line Breaking, Normalization, Word Boundary Determination, and Casefolding algorithms. The data files also contain additional informative and provisional character property information.


  1. SwmnSwmn
    Aug 23, 2023 04:40 GMT

    Hello, I am myswlf

  2. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼
    Jul 28, 2023 10:12 GMT


  3. ‮
    Jun 12, 2023 10:42 GMT

    Reply to my comment and see the magic in the top of reply box...

    1. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼
      Jul 24, 2023 23:02 GMT

      Ok. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼ ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼

    Mar 30, 2023 14:21 GMT

    wow i rally like it can you make more pls

  5. redactedredacted
    Apr 24, 2022 09:45 GMT

    I was fed binary in my dream and I was told to type it into my phone in my dream, when I woke up it was in my phone and directly translated to ༡ and after investigating on my phone it has now stopped working completely.

    1. wowwow
      May 10, 2022 23:58 GMT

      bro ngl this scares me

    Feb 15, 2022 18:58 GMT


  7. grpgrp
    Sep 17, 2020 17:19 GMT

    supposedly it is "hoax" for wikidiots, this REAL yot letter...

  8. JeremyJeremy
    Aug 18, 2020 15:56 GMT

    How about a javascript array?
    length 1,111,998
    one index for each character
    0=not assigned;

    could fit on one page even
    very usefull chunk of code too.

    1. Michael KwayisiMichael Kwayisi
      Aug 18, 2020 19:40 GMT

      Says the guy who has JavaScript disabled :)

  9. ̀̀̀̀
    May 15, 2020 16:19 GMT

    Ⲟⲩⲟⲓ ⲉ̀ⲣⲱⲧⲉⲛ

    1. ⁙⁘⁖:··⁙⁙⁘⁖:··⁙
      May 15, 2020 16:20 GMT


  10. ????
    Apr 3, 2020 18:17 GMT


    1. Michael KwayisiMichael Kwayisi
      Apr 3, 2020 21:28 GMT

      Clever U+E01F0. Got me scratching my head for a while :)

NOTE: You are replying to 's comment. [Cancel]