Unicode Character Database

The Unicode Standard Character Properties

Twitter · Facebook
Unicode Standard Character DatabaseUnicode Standard Character Database

A character property is a named attribute of an entity in the Unicode Standard, associated with a defined set of values. The Standard specifies many different types of character properties some of whose interpretation (such as the case of a character) is independent of context, whereas the interpretation of other properties (such as directionality) is applicable to a character sequence as a whole, rather than to the individual characters that compose the sequence. As an example, a code point property refers to the inherent attributes of code points irrespective of any particular encoded character; an abstract character property, on the other hand, refers to attributes of abstract characters per se, based on their independent existence as elements of writing systems or other notational systems, irrespective of their encoding in the Unicode Standard.

As for encoded character properties, for each, there is a mapping from every character code point to some value in the set of values associated with that property. They are defined this way to facilitate the implementation of character property APIs based on the Unicode Character Database. Typically, an API will take a property and a code point as input, and will return a value for that property as output, rendering it as the “character property” for the “character” encoded at that code point. In some cases, an encoded character property is exactly equivalent to a code point property. In others, it reflects an abstract character property, but extends the scope of the property to include all code points, including unassigned code points. Still in many instances, it is semantically complex and may telescope together values associated with a number of abstract character properties and/or code point properties.

List of Unicode character properties

In Unicode, the terms “Unicode character property,” “character property,” and “property”—that is, without qualifier—refer to an encoded character property, unless otherwise indicated. The table below presents the list of encoded character properties formally considered to be a part of the latest version of the Unicode Standard. The list of the values associated with each property (where the “type” is indicated as “Enum”) can be found on their respective linked pages.

3ASCII Hex DigitEnum
4Bidi ClassEnum
5Bidi ControlEnum
6Bidi MirroredEnum
7Bidi Mirroring GlyphScalar
8Bidi Paired BracketScalar
9Bidi Paired Bracket TypeEnum
11Canonical Combining ClassEnum
12Case FoldingScalar
13Case IgnorableEnum
15Changes When CasefoldedEnum
16Changes When CasemappedEnum
17Changes When LowercasedEnum
18Changes When NFKC CasefoldedEnum
19Changes When TitlecasedEnum
20Changes When UppercasedEnum
21CJK Accounting NumericScalar
22CJK Compatibility VariantScalar
23CJK II CoreScalar
24CJK IRG G-SourceScalar
25CJK IRG H-SourceScalar
26CJK IRG J-SourceScalar
27CJK IRG K-SourceScalar
28CJK IRG KP-SourceScalar
29CJK IRG M-SourceScalar
30CJK IRG T-SourceScalar
31CJK IRG U-SourceScalar
32CJK IRG V-SourceScalar
33CJK Other NumericScalar
34CJK Primary NumericScalar
35Composition ExclusionEnum
37Decomposition MappingScalar
38Decomposition TypeEnum
39Default Ignorable Code PointEnum
42East Asian WidthEnum
43Equivalent Unified IdeographScalar
44Expands on NFCEnum
45Expands on NFDEnum
46Expands on NFKCEnum
47Expands on NFKDEnum
49FC NFKC ClosureScalar
50Full Composition ExclusionEnum
51General CategoryEnum
52Grapheme BaseEnum
53Grapheme Cluster BreakEnum
54Grapheme ExtendEnum
55Grapheme LinkEnum
56Hangul Syllable TypeEnum
57Hex DigitEnum
59ID ContinueEnum
60ID StartEnum
62IDS Binary OperatorEnum
63IDS Trinary OperatorEnum
64Indic Positional CategoryEnum
65Indic Syllabic CategoryEnum
66ISO CommentScalar
67Jamo Short NameEnum
68Join ControlEnum
69Joining GroupEnum
70Joining TypeEnum
71Line BreakEnum
72Logical Order ExceptionEnum
74Lowercase MappingScalar
77Name AliasScalar
78NFC Quick CheckEnum
79NFD Quick CheckEnum
80NFKC CasefoldScalar
81NFKC Quick CheckEnum
82NFKD Quick CheckEnum
83Noncharacter Code PointEnum
84Numeric TypeEnum
85Numeric ValueScalar
86Other AlphabeticEnum
87Other Default Ignorable Code PointEnum
88Other Grapheme ExtendEnum
89Other ID ContinueEnum
90Other ID StartEnum
91Other LowercaseEnum
92Other MathEnum
93Other UppercaseEnum
94Pattern SyntaxEnum
95Pattern White SpaceEnum
96Prepended Concatenation MarkEnum
97Quotation MarkEnum
99Regional IndicatorEnum
101Script ExtensionsScalar
102Sentence BreakEnum
103Sentence TerminalEnum
104Simple Case FoldingScalar
105Simple Lowercase MappingScalar
106Simple Titlecase MappingScalar
107Simple Uppercase MappingScalar
108Soft DottedEnum
109Terminal PunctuationEnum
110Titlecase MappingScalar
111Unicode 1 NameScalar
112Unicode Radical StrokeScalar
113Unified IdeographEnum
115Uppercase MappingScalar
116Variation SelectorEnum
117Vertical OrientationEnum
118White SpaceEnum
119Word BreakEnum
120XID ContinueEnum
121XID StartEnum
Showing 1 - 121 of 121 properties

NOTE: Numeric properties (properties whose values are numbers that can take on any integer or real values) and string-valued properties (those whose values are strings) are indicated in the table above as “Scalar” types. All other official property value types as designated in the Unicode Standard, including enumerated, closed enumeration, boolean, and catalog properties, are marked “enumerated” to facilitate a more accessible browsing of the constituent characters.

Post a comment

    NOTE: You are replying to 's comment. [Cancel]