Unicode Character Database

The Unicode Standard Characters Repertoire

Twitter · Facebook
Unicode Standard Character DatabaseUnicode Standard Character Database

The Unicode Standard specifies a numeric value (also known as code point) and a name for each of its characters. In this respect, it is similar to other character encoding standards from ASCII onward. In addition to character codes and names, other information is crucial to ensure legible text: a character’s case, directionality, and alphabetic properties must also be well defined. The Unicode Standard defines these and other semantic values, and includes application data such as case mapping tables and character property tables as part of the Unicode Character Database. Character properties define a character’s identity and behavior; they ensure consistency in the processing and interchange of Unicode data. (See the section Unicode Character Properties.)

The Unicode Standard contains 1,114,112 code points, most of which are available for encoding of characters. The majority of the common characters used in the major languages of the world are encoded in the first 65,536 code points, known as the Basic Multilingual Plane (BMP). The overall capacity for a little over a million characters is more than sufficient for all currently known character encoding requirements, including full coverage of all minority and historic scripts of the world. Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF- 32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. The Unicode Standard is code-for-code identical with International Standard ISO/IEC 10646. Any implementation that is conformant to Unicode is therefore conformant to ISO/ IEC 10646.

The latest Unicode Standard, that is, Version 12.0, contains a total of 137,929 characters from the world’s scripts. These characters are ample for communication for all modern languages as well as representing the classical forms of many languages. The Standard encompasses the European alphabetic scripts, Middle Eastern right-to-left scripts, and other regional scripts such as those of Asia and Africa. Likewise, many archaic and historic scripts are encoded. The Han script includes 87,887 unified ideographic characters defined by national, international, and industry standards of China, Japan, Korea, Taiwan, Vietnam, and Singapore. Additionally, the Standard contains many important symbol sets, including currency symbols, punctuation marks, mathematical symbols, technical symbols, geometric shapes, dingbats, and emojis.

List of Unicode characters

In Unicode, the range of integers used to code characters is called the codespace. A particular integer in this set is called a code point. When an abstract character is assigned to a given code point in the codespace, it is then referred to as an encoded character. The Unicode codespace consists of the integers from 0 to 10FFFF, comprising 1,114,112 code points available for mapping per the repertoire of abstract characters. The table below presents an ordered list of all the code points defined in the current repertoire of the Unicode Standard.

Unicode CharactersPage 2 of 4352
Ā
U+0100
ā
U+0101
Ă
U+0102
ă
U+0103
Ą
U+0104
ą
U+0105
Ć
U+0106
ć
U+0107
Ĉ
U+0108
ĉ
U+0109
Ċ
U+010A
ċ
U+010B
Č
U+010C
č
U+010D
Ď
U+010E
ď
U+010F
Đ
U+0110
đ
U+0111
Ē
U+0112
ē
U+0113
Ĕ
U+0114
ĕ
U+0115
Ė
U+0116
ė
U+0117
Ę
U+0118
ę
U+0119
Ě
U+011A
ě
U+011B
Ĝ
U+011C
ĝ
U+011D
Ğ
U+011E
ğ
U+011F
Ġ
U+0120
ġ
U+0121
Ģ
U+0122
ģ
U+0123
Ĥ
U+0124
ĥ
U+0125
Ħ
U+0126
ħ
U+0127
Ĩ
U+0128
ĩ
U+0129
Ī
U+012A
ī
U+012B
Ĭ
U+012C
ĭ
U+012D
Į
U+012E
į
U+012F
İ
U+0130
ı
U+0131
IJ
U+0132
ij
U+0133
Ĵ
U+0134
ĵ
U+0135
Ķ
U+0136
ķ
U+0137
ĸ
U+0138
Ĺ
U+0139
ĺ
U+013A
Ļ
U+013B
ļ
U+013C
Ľ
U+013D
ľ
U+013E
Ŀ
U+013F
ŀ
U+0140
Ł
U+0141
ł
U+0142
Ń
U+0143
ń
U+0144
Ņ
U+0145
ņ
U+0146
Ň
U+0147
ň
U+0148
ʼn
U+0149
Ŋ
U+014A
ŋ
U+014B
Ō
U+014C
ō
U+014D
Ŏ
U+014E
ŏ
U+014F
Ő
U+0150
ő
U+0151
Œ
U+0152
œ
U+0153
Ŕ
U+0154
ŕ
U+0155
Ŗ
U+0156
ŗ
U+0157
Ř
U+0158
ř
U+0159
Ś
U+015A
ś
U+015B
Ŝ
U+015C
ŝ
U+015D
Ş
U+015E
ş
U+015F
Š
U+0160
š
U+0161
Ţ
U+0162
ţ
U+0163
Ť
U+0164
ť
U+0165
Ŧ
U+0166
ŧ
U+0167
Ũ
U+0168
ũ
U+0169
Ū
U+016A
ū
U+016B
Ŭ
U+016C
ŭ
U+016D
Ů
U+016E
ů
U+016F
Ű
U+0170
ű
U+0171
Ų
U+0172
ų
U+0173
Ŵ
U+0174
ŵ
U+0175
Ŷ
U+0176
ŷ
U+0177
Ÿ
U+0178
Ź
U+0179
ź
U+017A
Ż
U+017B
ż
U+017C
Ž
U+017D
ž
U+017E
ſ
U+017F
ƀ
U+0180
Ɓ
U+0181
Ƃ
U+0182
ƃ
U+0183
Ƅ
U+0184
ƅ
U+0185
Ɔ
U+0186
Ƈ
U+0187
ƈ
U+0188
Ɖ
U+0189
Ɗ
U+018A
Ƌ
U+018B
ƌ
U+018C
ƍ
U+018D
Ǝ
U+018E
Ə
U+018F
Ɛ
U+0190
Ƒ
U+0191
ƒ
U+0192
Ɠ
U+0193
Ɣ
U+0194
ƕ
U+0195
Ɩ
U+0196
Ɨ
U+0197
Ƙ
U+0198
ƙ
U+0199
ƚ
U+019A
ƛ
U+019B
Ɯ
U+019C
Ɲ
U+019D
ƞ
U+019E
Ɵ
U+019F
Ơ
U+01A0
ơ
U+01A1
Ƣ
U+01A2
ƣ
U+01A3
Ƥ
U+01A4
ƥ
U+01A5
Ʀ
U+01A6
Ƨ
U+01A7
ƨ
U+01A8
Ʃ
U+01A9
ƪ
U+01AA
ƫ
U+01AB
Ƭ
U+01AC
ƭ
U+01AD
Ʈ
U+01AE
Ư
U+01AF
ư
U+01B0
Ʊ
U+01B1
Ʋ
U+01B2
Ƴ
U+01B3
ƴ
U+01B4
Ƶ
U+01B5
ƶ
U+01B6
Ʒ
U+01B7
Ƹ
U+01B8
ƹ
U+01B9
ƺ
U+01BA
ƻ
U+01BB
Ƽ
U+01BC
ƽ
U+01BD
ƾ
U+01BE
ƿ
U+01BF
ǀ
U+01C0
ǁ
U+01C1
ǂ
U+01C2
ǃ
U+01C3
DŽ
U+01C4
Dž
U+01C5
dž
U+01C6
LJ
U+01C7
Lj
U+01C8
lj
U+01C9
NJ
U+01CA
Nj
U+01CB
nj
U+01CC
Ǎ
U+01CD
ǎ
U+01CE
Ǐ
U+01CF
ǐ
U+01D0
Ǒ
U+01D1
ǒ
U+01D2
Ǔ
U+01D3
ǔ
U+01D4
Ǖ
U+01D5
ǖ
U+01D6
Ǘ
U+01D7
ǘ
U+01D8
Ǚ
U+01D9
ǚ
U+01DA
Ǜ
U+01DB
ǜ
U+01DC
ǝ
U+01DD
Ǟ
U+01DE
ǟ
U+01DF
Ǡ
U+01E0
ǡ
U+01E1
Ǣ
U+01E2
ǣ
U+01E3
Ǥ
U+01E4
ǥ
U+01E5
Ǧ
U+01E6
ǧ
U+01E7
Ǩ
U+01E8
ǩ
U+01E9
Ǫ
U+01EA
ǫ
U+01EB
Ǭ
U+01EC
ǭ
U+01ED
Ǯ
U+01EE
ǯ
U+01EF
ǰ
U+01F0
DZ
U+01F1
Dz
U+01F2
dz
U+01F3
Ǵ
U+01F4
ǵ
U+01F5
Ƕ
U+01F6
Ƿ
U+01F7
Ǹ
U+01F8
ǹ
U+01F9
Ǻ
U+01FA
ǻ
U+01FB
Ǽ
U+01FC
ǽ
U+01FD
Ǿ
U+01FE
ǿ
U+01FF

NOTE: The Unicode Standard does not encode idiosyncratic, novel, or private-use characters, nor does it encode logos or graphics. Graphologies unrelated to text, such as dance notations, are likewise outside the scope of Unicode. Font variants are explicitly not encoded. The Standard reserves 6,400 code points in the BMP for private use, which may be used to assign codes to characters not included in the Unicode repertoire. Another 131,068 private-use code points are available outside the BMP, should 6,400 prove insufficient for particular applications.

Comments

  1. SwmnSwmn
    Aug 23, 2023 04:40 GMT

    Hello, I am myswlf

  2. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼
    Jul 28, 2023 10:12 GMT

    ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼

  3. ‮
    Jun 12, 2023 10:42 GMT

    Reply to my comment and see the magic in the top of reply box...

    1. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼
      Jul 24, 2023 23:02 GMT

      Ok. ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼ ᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼᲼

  4. tRUE PINGAStRUE PINGAS
    Mar 30, 2023 14:21 GMT

    wow i rally like it can you make more pls

  5. redactedredacted
    Apr 24, 2022 09:45 GMT

    I was fed binary in my dream and I was told to type it into my phone in my dream, when I woke up it was in my phone and directly translated to ༡ and after investigating on my phone it has now stopped working completely.

    1. wowwow
      May 10, 2022 23:58 GMT

      bro ngl this scares me

  6. SUPERWINDOWS79SUPERWINDOWS79
    Feb 15, 2022 18:58 GMT

    U+03A2 GREEK CAPAITIAL LETTER FINAL SIGMA

  7. grpgrp
    Sep 17, 2020 17:19 GMT

    supposedly it is "hoax" for wikidiots, this REAL yot letter...

  8. JeremyJeremy
    Aug 18, 2020 15:56 GMT

    How about a javascript array?
    length 1,111,998
    one index for each character
    0=not assigned;
    1=assigned;

    could fit on one page even
    very usefull chunk of code too.

    1. Michael KwayisiMichael Kwayisi
      Aug 18, 2020 19:40 GMT

      Says the guy who has JavaScript disabled :)

  9. ̀̀̀̀
    May 15, 2020 16:19 GMT

    Ⲟⲩⲟⲓ ⲉ̀ⲣⲱⲧⲉⲛ

    1. ⁙⁘⁖:··⁙⁙⁘⁖:··⁙
      May 15, 2020 16:20 GMT

      ϣϩⲁⲁⲁⲁⲁⲁⲁⲁⲁⲁⲁⲁ

  10. ????
    Apr 3, 2020 18:17 GMT

    ??????????????????????

    1. Michael KwayisiMichael Kwayisi
      Apr 3, 2020 21:28 GMT

      Clever U+E01F0. Got me scratching my head for a while :)

NOTE: You are replying to 's comment. [Cancel]