⟨g⟩ rendered with or without a looptail are allographs of each other

In graphemics and typography, the term allograph is used of a glyph that is a design variant of a letter or other grapheme, such as a letter, a number, an ideograph, a punctuation mark or other typographic symbol.

Graphemics

edit

In graphemics, an obvious example in the Latin alphabet (and many other writing systems) is the distinction between uppercase and lowercase letters. Allographs can vary greatly, without affecting the underlying identity of the grapheme. Even if the word "cat" is rendered as "cAt", it remains recognizable as the sequence of the three graphemes ⟨c⟩, ⟨a⟩, ⟨t⟩.[1]

Letters and other graphemes can also have significant variations that may be missed by many readers. The letter g, for example, has two common forms in different typefaces, and a wide variety in people's handwriting. A positional example of allography is the long s |ſ|, a symbol which was once a widely used as a non-final allograph for the lowercase letter s. The Arabic script has particularly strong positional allography; Arabic letters have two to four allographs based on their position in the word.[2]

Allographs can cause difficulty for character recognition, both by humans and computers. Children learning to read do not immediately realize that allographs represent the same character; the skills develop over the initial years of reading instruction. Mismatches between the allographs used in reading and writing (e.g., reading manuscript/block letters but writing cursive) may inhibit students' ability to recognize and name letters.[3] Computerized optical character recognition (OCR) systems also encounter difficulties with allograph recognition, similar to human difficulties.[4] Many different character recognition algorithms have been developed to alleviate the allograph problem for different input methods, different languages, and different users.[5]

A further complication of allographs is that a grapheme variant can acquire a separate meaning in a specialized writing system. Two symbols that are allographic in one setting may represent different meanings in another. For example, in the International Phonetic Alphabet used in linguistics, a and ɑ represent different sounds, even though they are allographs of a lower-case a in normal English usage.[6]

Such variants have distinct code points in Unicode and thus are not allographs for some applications. Because they have separate code points, even allographs like upper- versus lower-case letters may be treated as different characters by some computer applications (e.g., case-dependent passwords).[7]

Typography

edit
Official dimensions of the euro sign
Allographs of the sign in a selection of type faces

In typography, the term 'allograph' is used more specifically to describe the different representations of the same grapheme or character in different typefaces.[8] The resulting glyphs may look quite different in shape and style from the reference character or each other, but nevertheless their meaning remains the same.[9]

In Unicode, a given character is allocated a code point: all allographs of that character have the same code point and thus the essential meaning is retained irrespective of font choice at time of printing or display. Typically, for example, U+0067 g LATIN SMALL LETTER G is given a loop tail in serif typefaces but not in sans-serif faces (e.g., Times New Roman: g, Helvetica: g) but its code point is constant and its meaning persists irrespective of typeface.[a]

Typography of Han characters

edit

In the Han script, there exist several graphemes that have more than one written representation. Han typefaces often contain many variants of some graphemes. Different regional standards have adopted certain character variants. For instance:

Standard Allograph Dictionary definition
Mainland China
Japan
Taiwan

Homoglyph

edit

The concept of the allograph may be compared and contrasted with that of the homoglyph – glyphs of different meaning that are visually similar. For example, the letter O and the figure 0 have similar shape but have different meanings; the three letters A, Α and А look identical but are characters from three different scripts (Latin, Greek and Cyrillic). These can be exploited for IDN homograph attacks, where a malicious actor uses homoglyphs to create a URL that looks identical to the intended URL to the user but will direct to a different location than intended.[10]

See also

edit

Notes

edit
  1. ^ The code U+0261 ɡ LATIN SMALL LETTER SCRIPT G in the IPA Extensions block is specified for use with the International Phonetic Alphabet and so incidental to this discussion.

References

edit
  1. ^ "allograph". The Cambridge Encyclopedia of Language (second ed.). Cambridge University Press. 1997. p. 196.
  2. ^ Milo, Thomas; González Martínez, Alicia (2019). "A New Strategy for Arabic OCR: Archigraphemes, Letter Blocks, Script Grammar, and shape synthesis". Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. pp. 93–96. doi:10.1145/3322905.3322928. ISBN 978-1-4503-7194-0.
  3. ^ Bara, Florence; Morin, Marie-France; Alamargot, Denis; Bosse, Marie-Line (January 2016). "Learning different allographs through handwriting: The impact on letter knowledge and reading acquisition". Learning and Individual Differences. 45: 88–94. doi:10.1016/j.lindif.2015.11.020.
  4. ^ Parizeau, M.; Plamondon, R. (1994). "Machine vs humans in a cursive script reading experiment without linguistic knowledge". Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5). Vol. 2. pp. 93–98. doi:10.1109/ICPR.1994.576882. ISBN 0-8186-6270-0.
  5. ^ Bharath; Madhvanath, Sriganesh (3 October 2014). "Allograph modeling for online handwritten characters in Devanagari using constrained stroke clustering". ACM Transactions on Asian Language Information Processing. 13 (3): 1–21. doi:10.1145/2629622.
  6. ^ "The International Phonetic Alphabet and the IPA Chart". International Phonetic Association. Retrieved 7 November 2025.
  7. ^ Kumar, Sanjeev (2012-10-15). "A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point". The IUP Journal of Telecommunications. IV (2): 50–59. SSRN 2161812.
  8. ^ Thomas Milo (2012). "Arabic Script Tutorial". nuqta.com. Retrieved 24 November 2019. In Arabic the abstract, nominal graphemes are represented by context-dependent allographs. Simplified support for Arabic handles contextual allographs according to two patterns, discontinuous and continuous assimilation. (Allographs and Ligatures)
  9. ^ David Rothlein; Brenda Rapp (3 April 2017). "The role of allograph representations in font-invariant letter identification". Journal of Experimental Psychology: Human Perception and Performance. 43 (7): 1411–1429. doi:10.1037/xhp0000384. PMC 5481478. PMID 28368166.
  10. ^ Umawing, Jovi (5 October 2017). "Out of character: Homograph attacks explained | Malwarebytes Labs". Malwarebytes. Retrieved 7 November 2025.

📚 Artikel Terkait di Wikipedia

Copto-Arabic literature

texts were written in Arabic but in Coptic script (a practice known as allography). Finally, after having been completely supplanted as the spoken language

Garshunography

associated with it sociolinguistically". The phenomenon has also been called allography or heterography, although both these terms have other uses, the former

Rongorongo

similar-looking glyphs that he believed to be allographs (variants). In the case of allography, the bare numeric code was assigned to what Barthel believed to be the

Papyrus Amherst 63

Aramaic on the basis of photographs in 1944. As such, it is an example of allography. It was finally deciphered only in the 1980s. Parts were first published

-graphy

language which already has a script associated with it; also known as allography or heterography Haplography – accidental omission of repeated letters

Decipherment of rongorongo

Fedorova's catalog consisted of only 130 glyphs; Pozdniakov's additional allography would have reduced that number and made her interpretation even more repetitive