:: wikimiki.org ::
| Digraph (orthography) |
Digraph (orthography)
A digraph or bigraph is a pair of letters used to write one sound. This is often, but not necessarily, a sound (or more precisely a phoneme) which cannot be expressed using a single letter in the alphabet used for writing.
Sometimes, when digraphs do not represent a new phoneme, they are a relic from an earlier period in the language's history when they did (or remain phonemic only in certain dialects, e.g. wh in English).
Some schemes of transliteration into the Roman alphabet make extensive use of digraphs (e.g. Cyrillic to Roman for English readers), while others rely solely on diacritics (e.g. Cyrillic to the modified Roman used for Turkish). To avoid ambiguity, transliteration based on diacritics is generally preferred in academic circles. Many writing systems, like Cyrillic and Devanagari, have no digraphs, and so transliterations into languages using them also cannot use digraphs.
There are three kinds of digraphs: sequences, reversals (really a special kind of sequence) and doubled letters.
Sequences
This is a group of two letters, both of which are different.
Examples from languages include:
- Basque
- tx, corresponds to (voiceless postalveolar affricate)
- Czech
- ch, corresponds to (voiceless velar fricative)
- Dutch
- ch, corresponds to (voiceless velar fricative)
- eu, corresponds to (close-mid front rounded vowel)
- ie, corresponds to (close front unrounded vowel)
- ng, corresponds to (velar nasal)
- oe, corresponds to (close back rounded vowel)
- sj, corresponds to (voiceless postalveolar fricative)
- English
- ch, usually corresponds to (voiceless postalveolar affricate) or (voiceless postalveolar fricative)
- th, usually corresponds to (voiceless interdental fricative) or , (voiced interdental fricative)
- sh, corresponds to , (voiceless postalveolar fricative)
- ng, corresponds to (velar nasal)
- kn, corresponds to (alveolar nasal)
- ph, corresponds to (voiceless labiodental fricative)
- gh, corresponds to (voiceless labiodental fricative) or is silent
- ck, corresponds to (voiceless velar plosive)
- ea, ie, ei correspond mostly to (close front unrounded vowel)
- ai, ay correspond mostly to (diphthong: close-mid front unrounded vowel followed by close front unrounded vowel)
- ue corresponds to (close back unrounded vowel)
- French
- ai, equivalent to è, corresponds to (open-mid front unrounded vowel)
- au, corresponds to (close-mid back rounded vowel)
- ch, corresponds to (voiceless postalveolar fricative)
- ou, corresponds to (close back rounded vowel) or (labio-velar approximant)
- gn, corresponds to (palatal nasal)
- qu, corresponds to (voiceless velar stop), typically before historic front vowels
:See also French phonology and orthography
- Italian
- gl before -i (with some exceptions), similar to ll in Spanish
- gn, similar to ñ in Spanish, like ny in English canyon, corresponds to (palatal nasal)
- sc before -i and -e, like sh in English, corresponds to , (voiceless postalveolar fricative)
- ch corresponds to (only used before i, e)
- gh corresponds to (only used before i, e)
- Modern Greek
- αι (ai), corresponds to
- ει (ei), corresponds to
- οι (oi), corresponds to
- ου (ou), corresponds to
- γκ (gk), corresponds to
- μπ (mp), corresponds to
- ντ (nt), corresponds to
Some of the above depend on context — see Greek alphabet.
- Polish
- dz
- dzi
- dź
- dż
- ch
- rz
- sz, as sh in English
- Portuguese
- ch, like sh in English
- lh, similar to ll in Spanish, like lli in English million
- nh, similar to ñ in Spanish, like ny in English canyon
- qu, as k in English
- sc
- xc
- ss, provides for silibant s between two vowels, where single s is pronounced like English z
- rr, throaty r sound in middle of words
- Spanish
- ch, corresponds to (voiceless postalveolar affricate)
- gu, as g in English before e or i. Pronounced /gw/ before a, o and u.
- gü, corresponds to [gw]. Used only before the letters e and i.
- ll
- qu, as k in English. Used only before the letters e and i.
- rr
- Welsh
- ch, corresponds to (voiceless uvular fricative), similar to French "r"
- ng, corresponds to (velar nasal), the same sound as in English
- ph, corresponds to (voiceless labiodental fricative)
- rh, corresponds to a voiceless R, pronounced roughly like the English combination HR
- th, corresponds to (voiceless interdental fricative)
Reversals
Reversals are sequences in which both possible orders of letters are common enough to be digraphs.
- English
- re corresponds to
- le corresponds to
Doubled letters
These have both letters the same. In some languages these indicate length, a stressed syllable or a new sound, and in some cases they are just part of the spelling convention. Ll is the most common in English, though it represents no new sound, but that is not the case in other languages; Welsh's ll is a voiceless lateral, and in Spanish it is a palatalized l (Castilian only) or else a palatal fricative. Ee and oo are common examples from English. Rr in Spanish and Portuguese indicates a trill, and forms a minimal pair with the single r. Italian's zz represents the affricate .
- English
- ll corresponds to (voiced alveolar lateral approximant)
- ee corresponds to (close front unrounded vowel)
- oo corresponds to (close back unrounded vowel)
- Dutch
- aa corresponds to (open front unrounded vowel)
- ee corresponds to (close-mid front unrounded vowel)
- oo corresponds to (close-mid back rounded vowel)
- uu corresponds to (close front rounded vowel)
- cc (any consonant) corresponds to c and changes the preceding vowel to its "short" variant
- Welsh
- dd, a voiced dental fricative, like English then
- ff, the voiceless labiodental fricative, (like English f, as Welsh F is pronounced like English V)
- ll, a voiceless alveolar lateral fricative, (see Welsh pronunciation guide for more details)
See also
- orthography
- trigraph
- diphthong
- ligature (typography)
als:Digraph
ja:二重音字
PhonemeIn human language, a phoneme is a set of phones (speech sounds or sign elements) that are cognitively equivalent. It is the basic unit that distinguishes words and morphemes. That is, changing an element of a word from one phoneme to another produces either a different word or obvious nonsense; whereas changing an element from one phone to another, when both belong to the same phoneme, produces the same word with an odd or incomprehensible pronunciation.
Phonemes are not the physical segments themselves, but mental abstractions of them. A phoneme is a family of phones, called allophones, that the speakers of a language think of, and hear or see, as being the same.
In sign languages, the phoneme was formerly called a chereme (or cheireme), but usage changed to phoneme when it was recognized that the mental abstractions involved are essentially the same as in oral languages.
A "perfect" alphabet is one that has a single symbol for each phoneme.
Phonemics, a branch of phonology, is the study of the systems of phonemes of languages.
Although it is fundamental to most phonological theories, some linguists reject the theoretical validity of the phoneme. Some think that phonemes are more a product of literacy (i.e., the need to categorize the phonetics of a language in order to write it down systematically with a minimum number of letters). Other critics charge that the mind processes sub-phonemic elements of speech (e.g., features) in meaningful ways.
A common test to determine whether two phones are allophones or separate phonemes relies on finding so-called minimal pairs: words that differ only in the phones in question.
Background and related ideas
The term phonème was reportedly first used by Dufriche-Desgenettes in 1873, but it refered to only a sound of speech. The term phoneme as an abstraction was developed by the Polish linguist Jan Niecislaw Baudouin de Courtenay and his student Mikołaj Kruszewski during 1875-1895. The term used by these two was fonema, the basic unit of what they called psychophonetics. The concept of the phoneme was elaborated in the works of Nikolai Trubetzkoi and other of the Prague School (during the years 1926-1935), as well as in that of structuralists like Ferdinand de Saussure, Edward Sapir, and Leonard Bloomfield. Later, it was also used in generative linguistics, most famously by Noam Chomsky and Morris Halle, and remains central in any accounts of the development of virtually all modern schools of phonology.
The phoneme can be defined as "the smallest meaningful psychological unit of sound." The phoneme has mental, physiological, and physical substance: our brains process the sounds; the sounds are produced by the human speech organs; and the sounds are physical entities that can be recorded and measured.
For an example of phonemes, consider the English words pat and sat, which differ only in their initial consonants. This difference, known as contrastiveness or opposition, is sufficient to distinguish these words, and therefore the P and S sounds are said to be different phonemes. A pair of words that are identical except for such a sound are known as a minimal pair; this is the most frequent demonstration that two sounds are separate phonemes.
If no minimal pair can be found to demonstrate that two sounds are distinct, it may be that they are allophones. Allophones are variant phones (i.e., sounds) that are not recognized as distinct by a speaker, and are not meaningfully different in the language, and yet are perceived as "the same". This is especially likely if they consistently occur in different environments. For example, the "dark" L sound at the end of the English word "wool" is quite different from the "light" L sound at the beginning of the word "leaf", but this difference is meaningless in English, and is determined by whether the sound is at the beginning or end of a word. A native English speaker might have a hard time hearing the difference at first, but in Turkish the difference between "light" and "dark" L is sufficient to distinguish words. That is, they are two separate phonemes in Turkish, but allophones of a single phoneme in English.
The phonemic relationship of two sounds may not be obvious to a non-native speaker, which is why minimal pairs and an understanding of phonetic environments are important. For example, in Korean, there is a phoneme /r/ that is a flapped r between vowels, and is an l-sound next to other consonants. These sound very different to an English speaker, who is attuned to hearing them because the differences are meaningful in English. However, the native speaker has learned from an early age to filter out the difference, as they are not meaningful in their language. In Korean, for instance, it is impossible to distinguish the two words "ram" and "lam", despite the fact that both R and L sounds occur in the language.
The exact number of phonemes in English depends on the speaker and the method of determining phoneme vs. allophone, but estimates typically range from 40 to 45, which is above average across all languages. Pirahã has only 10, while !Xóõ has 141.
Depending on the language and the alphabet used, a phoneme may be written consistently with one letter; however there are many exceptions to this rule — see Writing systems below.
Some languages make use of pitch for the precise same purpose. In this case, the tones used are called tonemes. Some languages distinguish words made up of the same phonemes (and tonemes) by using different durations of some elements, which are called chronemes. However, the chroneme is not employed by the majority of scholars working on languages with distinctive duration, and the term itself may not even be recognized by most linguists. Usually, long vowels and consonants are represented either by a length indicator or doubling of the sound in question.
In sign languages, phonemes may be classified as Tab (elements of location, from Latin tabula), Dez (the hand shape, from designator), Sig (the motion, from signation), and with some researchers, Ori (orientation). Facial expressions and mouthing are also phonemic.
Notation
A transcription that only indicates the different phonemes of a languages is said to be phonemic. Such transcriptions are enclosed within virgules (slashes), / /; these show that each enclosed symbol is claimed to be phonemically meaningful. On the other hand, a transcription that indicates finer detail, including allophonic variation like the two English L's, is said to be phonetic, and is enclosed in square brackets, [ ].
The common notation used in linguistics employs virgules (slashes) (/ /) around the symbol that stands for the phoneme. For example, the phoneme for the initial consonant sound in the word "phoneme" would be written as . In other words, the graphemes are <ph>, but this digraph represents one sound . Allophones, real speech variants of a phoneme, are often denoted in linguistics by the use of diacritical or other marks added to the phoneme symbols and then placed in square brackets ([ ]) to differentiate them from the phoneme in slant brackets (/ /). The conventions of orthography are then kept separate from both phonemes and allophones by the use of the markers < > to enclose the spelling.
The symbols of the International Phonetic Alphabet (IPA) and extended sets adapted to a particular language are often used by linguists to write phonemes of oral languages, with the principle being one symbol equals one categorical sound. Due to problems displaying some symbols in the early days of the Internet, systems such as X-SAMPA and Kirshenbaum were developed to represent IPA symbols in plain text. As of 2004, any modern web browser can display IPA symbols (as long as the operating system provides the appropriate fonts), and we use this system in this article.
The only published set of phonemic symbols for a sign language is the Stokoe notation developed for American Sign Language, which has since been applied to British Sign Language by Kyle and Woll, and to Australian Aboriginal sign languages by Adam Kendon. However, there are several phonetic systems, such as SignWriting.
Examples
Examples of phonemes in the English language would include sounds from the set of English consonants, like and . These two are most often written consistently with one letter for each sound. However, phonemes might not be so apparent in written English, such as when they are typically represented with combined letters, called digraphs, like <sh> (pronounced ) or <ch> (pronounced ).
To see a list of the phonemes in the English language, see IPA for English.
Two sounds that may be allophones (sound variants belonging to the same phoneme) in one language may belong to separate phonemes in another language or dialect. In English, for example, has aspirated and non-aspirated allophones:aspirated as in , and non-aspirated as in . However, in many languages (e. g. Chinese), aspirated is a phoneme distinct from unaspirated . As another example, there is no distinction between and in Japanese, there is only one phoneme in Japanese, although the Japanese has allophones that make it sound more like an , , or to English speakers. The sounds and are distinct phonemes in English, but allophones in Spanish. (as in run) and (as in rung) are phonemes in English, but allophones in Italian and Spanish.
An important phoneme is the chroneme, a phonemically-relevant extension of the duration a consonant or vowel. Some languages or dialects such as Finnish or Japanese allow chronemes after both consonants and vowels. Others, like Italian or Australian English use it after only one (in the case of Italian, consonants; in the case of Australian, vowels).
Arguments against the phoneme
Rather than a basic mental unit of language, some think that the phoneme may well be a perceptual artifact of alphabetic literacy (see the terms Phonemic awareness and Phonological awareness). If not that, it may be an epiphenomenal aspect to listening removed from face-to-face encounters, that is, text-like listening (qv phone and feature). It could be said that the unit of the phoneme is a necessary construct if we wish to set a dynamic, complex spoken language into static, written form expressed at a sub-syllabic level, though the model is a simplification and no where near phonologically or phonetically complete. The phoneme has the theoretical weakness from the perspective of phonology in that it uses, in part, lexical criteria to determine something that is supposed to be phonological (i.e., minimal pairs of words to point out phonological categories).
Much of phonology, while accepting the phoneme as possible model or unit of language for description, has largely moved past the segmental phoneme as a basic unit of speech, of speech processing or of language acquisition. This is because the concept of the 'feature' is viewed as beneath the level of the phoneme while also spanning across segments. Meanwhile, attempts at capturing a phonological picture of the psychological control and structure underlying real speech flounder on the inadequacies of the phoneme for such purposes (that is, the phoneme can not account for co-articulation or assimilation of controlled speech, among other phenomena). However, the term, though variably defined and delimited, remains a widely and uncritically accepted concept in foreign language teaching and native literacy (especially for alphabetic languages, such as English).
Restricted phonemes
A restricted phoneme is a phoneme that can only occur in a certain environment: There are restrictions as to where it can occur. English has several restricted phonemes:
- , as in sing, occurs only at the end of a syllable, never at the beginning. (In many other languages, such as Swahili, can start a word.)
- occurs only at the beginning of a syllable, never at the end. (A few languages such as Arabic allow /h/ at the ends of words.)
- In many American dialects with the cot-caught merger, occurs only before /r/, /l/, and in the diphthong .
- In non-rhotic dialects, /r/ can only occur before a vowel, never at the end of a word or before a consonant.
- Under most interpretations, and occur only before a vowel, never at the end of a syllable. However, many phonologists interpret a word like boy as either or .
Neutralization, archiphoneme, underspecification
Phonemes that are contrastive in certain environments may not be contrastive in all environments. In the environments where they don't contrast, the contrast is said to be neutralized. In English there are three nasal phonemes, , as shown by the minimal triplet,
However, these sounds are not contrastive before plosives such as . Although all three phones appear before plosives, for example in limp, lint, link, only one of these may appear before each of the plosives. That is, the distinction is neutralized before each of the plosives :
- Only occurs before ,
- only before , and
- only before .
Thus these phonemes are not contrastive in these environments, and according to some theorists, there is no evidence as to what the underlying representation might be. If we hypothesize that we are dealing with only a single underlying nasal, there is no reason to pick one of the three phonemes over the other two.
(In some languages there is only one phonemic nasal anywhere, and due to obligatory assimilation, it surfaces as in just these environments, so this idea is not as far-fetched as it might seem at first glance.)
In certain schools of phonology, such a neutralized distinction is known as an archiphoneme (Nikolai Trubetzkoy of the Prague school is often associated with this analysis.). Archiphonemes are often notated with a capital letter. Following this convention, the neutralization of before could be notated as |N|, and limp, lint, link would be represented as |lɪNp, lɪNt, lɪNk|. (The |pipes| indicate underlying representation.) Other ways this archiphoneme could be notated are |m-n-ŋ|, , or |n - |.
Another example from English is the neutralization of the plosives /k, g/ following /s/. Phonetically, the unaspirated tenuis plosive in sky is closer to English /g/, which is partially voiceless in initial position, than to aspirated /k/. This can be heard by comparing the sky with this guy; also, in the speech of young children who are not yet able to produce consonant clusters, they often pronounce sky as what sounds like to adult ears. That is, /k/ and /g/ are constrastive word initially,
But not after an /s/,
Thus one cannot say whether the underlying representation of the plosive in sky is /skai/ without aspiration, or /sgai/ without voicing. This neutralization can instead be represented as an archiphoneme |G|, in which case the underlying representation of sky would be |sGai|.
Another way to talk about archiphonemes involves the concept of underspecification. Phonemes can be considered fully specified segments while archiphonemes are underspecified segments. In Tuvan, phonemic vowels are specified with the features of tongue height, backness, and lip rounding. The archiphoneme |U| is an underspecified high vowel where only the tongue height is specified.
Whether |U| is pronounced as front or back and whether rounded or unrounded depends on vowel harmony. If |U| occurs following a front unrounded vowel, it will be pronounced as the phoneme ; if following a back unrounded vowel, it will be as an ; and if following a back rounded vowel, it will be an . This can been seen in the following words:
It should be noted that not all phonologists accept the concept of archiphonemes. Many doubt that it reflects how people process language.
Non-phonemes
Prothesis, epenthesis and paragoge due to phonotactics add sounds into words without adding meaning. Nevertheless, the sound is added, and thus the phoneme status may be ambiguous. For example, Spanish prothetic e- must be added before consonant clusters, e.g. estres.
Phonological extremes
Of all the sounds that a human vocal tract can create, different languages vary considerably in the number of these sounds that are considered to be distinctive phonemes in the speech of that language. Ubyx and some dialects of Abkhaz have only two phonemic vowels, and many Native American languages have three. On other extreme, the Bantu language Ngwe has fourteen vowel qualities, twelve of which may occur long or short, for twenty-six oral vowels, plus six nasalized vowels, long and short, for thirty-eight vowels; while !Xóõ achieves thirty-one pure vowels—not counting vowel length, which it also has—by varying the phonation. Rotokas has only six consonants, while !Xóõ has somewhere in the neighborhood of seventy-seven, and Ubyx eighty-one. French has no phonemic tone or stress, while several of the Kam-Sui languages have nine tones, and one of the Kru languages, Wobe, has been claimed to have fourteen, though this is disputed. The total number of phonemes in languages varies from as few as eleven in Rotokas to as many as 112 in !Xóõ (including four tones). These may range from familiar sounds like , , or to very unusual ones produced in extraordinary ways (see: Click consonant, phonation, airstream mechanism). The English language itself uses a rather large set of thirteen to twenty-two vowels, including diphthongs, though its twenty-two to twenty-six consonants are close to average. (There are twenty-one consonant and five vowel letters in the English alphabet, but this does not correspond to the number of consonant and vowel sounds.)
The most common vowel system consists of the five vowels . The most common consonants are . A very few languages lack one of these: standard Hawai‘ian lacks , Mohawk lacks and , Hupa lacks both and a simple , colloquial Samoan lacks and , while Rotokas and Quileute lack and . While most of these languages have very small inventories, Quileute and Hupa have quite complex consonant systems.
The ways that sounds are pronounced can vary slightly from language to language even if the same IPA symbol is used. For example, the Finnish word maat ("countries") sounds different from the British English (Received Pronunciation) word mart even though both are transcribed as IPA [http://www.helsinki.fi/hum/hyfl/projektit/vokaalikartat_eng.html#sweswedish_vowels]; the Spanish word sin ("without") has a somewhat different vowel from the American English seen though both are transcribed as .
Writing systems
In a phonemic writing system, a given symbol represents a single phoneme and each phoneme is represented by a single symbol. This may differ from a phonetic orthography, which only requires that the spelling be unambiguously determined by the pronunciation, and the pronunciation unambiguously indicated by the spelling. English spelling is the classic example of an nonphonemic, and indeed unphonetic, spelling system. Welsh and Irish are, by contrast, among the more predictable orthographies among languages using the Latin alphabet. In French, rules to predict pronunciation from spelling are quite simple and have few exceptions, as long as there are some clues such as context or part of speech, but guessing spelling from pronunciation is quite difficult, especially because of the many silent letters. Italian, Spanish and especially Finnish have a very close letter-to-phoneme correspondence. Karelian has a perfectly phonemic spelling system, as it has no standard language, but it has a complete spelling system.
However, the split between phonemic and nonphonemic orthographies is exaggerated. All languages are written with conventions that represent both meaning and pronunciation. This is true at both ends of the scale: Chinese characters are first and foremost symbols of words, but they have some phonetic information as well. At the other extreme, there are a few orthographies which are perfect phonemic representations of an artificial national standard, but since they make no effort to represent variation in pronunciation within the language, they too are conventional.
Other languages fall somewhere in between. Although English is often given as an example of an unphonetic orthography, its system is nowhere near to being as purely conventional a system as Chinese writing is. English spelling conveys etymological information, but also vast amounts of phonetic information. Spanish is often given as an example of a phonetic orthography, but it has numerous imperfections including silent letters. It is, at least, possible to tell the correct pronunciation of any written Spanish word. Another phonemic orthography is Serbian. Its phonemicity was established by Serbian "Webster" Vuk Stefanović Karadžić. He followed a strict phonemic principle, which is best told by his own words: "Write as you speak and read as it is written.". Hindi, a descendant of Sanskrit, is an example of phonetic language written with a non-Roman Alphabet.
See also
- Minimal pair
- Phone
- Phonology
- Emic and etic
- Tone (linguistics)
- Morphophonology
- List of phonetics topics
- Initial-stress-derived noun
External links
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhoneme.htm What is a phoneme? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAnAllophone.htm What is an allophone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhone.htm What is a phone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhoneticallySimilarSegm.htm What is a phonetically similar segment? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAMinimalPair.htm What is a minimal pair? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsComplementaryDistributio.htm What is complementary distribution? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAnEnvironment.htm What is an environment? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsContrastInIdenticalEnvir.htm What is an contrast in identical environments? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsContrastInAnalogousEnvir.htm What is an contrast in analogous environments? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/ComparisonOfMorphemeMorphAllom.htm Comparison of morpheme-morph-allomorph & phoneme-phone-allophone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/Phonology.htm What is phonology? (SIL)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=phoneme Phoneme (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=allophony Allophony (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=transcription Transcription (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=Grapheme-phoneme+conversion Grapheme-Phoneme Conversion (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=Phoneme+restoration Phoneme Restoration (Lexicon of Linguistics)]
- [http://moodle.ed.uiuc.edu/wiked/index.php/Phonemic_awareness phonemic awareness]
Category:Phonetics
Category:Phonology
zh-min-nan:Im-sò·
ko:낱소리
ja:音素
Alphabet
An alphabet is a complete standardized set of letters — basic written symbols — each of which roughly represents a phoneme of a spoken language, either as it exists now or as it may have been in the past. There are other systems of writing such as logograms, in which each symbol represents a morpheme, or word, and syllabaries, in which each symbol represents a syllable.
The word "alphabet" itself comes from alpha and beta, the first two symbols of the Greek alphabet. There are dozens of alphabets in use today. Most of them are 'linear', which means that they are made up of lines. Notable exceptions are the Braille alphabet, Morse code and the cuneiform alphabet of the ancient city of Ugarit.
Types
Among segmental scripts (that is, scripts that use a separate glyph for each phoneme, commonly called "alphabets"), one may distinguish abjads, which only record consonants and were first developed by the Egyptians as part of their hieroglyphic script; true alphabets which record consonants and vowels separately, first developed by the Greeks; and abugidas, in which the vowels are indicated by diacritical marks or systematic modification of the form of the consonants, first developed by the Indians. Examples of present-day abjads are the Arabic and Hebrew scripts; true alphabets include Latin, Cyrillic, and Korean Hangul; and abugidas are used to write Amharic, Hindi, and Thai. The Canadian Aboriginal Syllabics are also an abugida rather than a syllabary, as a glyph stands for a consonant and is rotated to represent the vowel, rather than each consonant-vowel combination being represented by a separate glyph, as in a true syllabary.
The boundaries between these three types are not always clear-cut. For example, Iraqi Kurdish is written in the Arabic script, which is normally an abjad. However, in Kurdish, writing the vowels is mandatory, and full letters are used, so the script is a true alphabet. Other languages may use a Semitic abjad with mandatory vowel diacritics, effectively making them abugidas. On the other hand, the Phagspa script of the Mongol Empire was based closely on the Tibetan abugida, but all vowel marks were written after the preceding consonant rather than as diacritic marks. Although short a was not written, as in the abugidas, one could argue that the linear arrangement made this a true alphabet. Conversely, the vowel marks of the Ge'ez abugida have been so completely assimilated into their consonants that the system is learned as a syllabary rather than as a segmental script. Even more extreme, the Pahlavi abjad became logographic. (See below.)
Thus the primary classification of alphabets reflects how they treat vowels. For tonal languages, further classification can be based on the treatment of tone, though there are as yet no names to distinguish the various types. Some alphabets disregard tone entirely, especially when it does not carry a heavy functional load, as in Somali and many other languages of Africa and the Americas. Such scripts are to tone what abjads are to vowels. Most commonly, tones are indicated with diacritics, the way vowels are treated in abugidas. This is the case for Vietnamese (a true alphabet) and Thai (an abugida). In Thai, tone is determined primarily by the choice of consonant, with diacritics for disambiguation. In the Pollard script (an abugida), vowels are indicated by diacritics, but the placement of the vowel relative to the consonant indicates the tone. More rarely, a script has separate letters for the tones, as is the case for Hmong and Zhuang. For many of these languages, regardless of whether letters or diacritics are used, the most common tone is not marked, just as the most common vowel is not marked in Indic abugidas.
Alphabets can be quite small. The Book Pahlavi script, an abjad, had only twelve letters at one point, and may have had even fewer later on. Today the Rotokas alphabet has only twelve letters. (The Hawaiian alphabet is sometimes claimed to be as small, but it actually consists of 18 letters, including the ʻokina and five long vowels.) While Rotokas has a small alphabet because it has few phonemes to represent (just eleven), Book Pahlavi was small because many letters had been conflated, that is, the graphic distinctions had been lost over time, and diacritics were not developed to compensate for this as they were in Arabic, another script that lost many of its distinct letter shapes. For example, a comma-shaped letter represented g, d, y, k, and j. However, such simplifications can perversely make a script more complicated. In later Pahlavi papyri, up to half of the remaining graphic distinctions were lost, and the script could no longer be read as a sequence of letters at all, but had to be learned as word symbols – that is, as logograms like Egyptian Demotic.
The largest segmental script is probably an abugida, Devanagari. When written in Devanagari, Vedic Sanskrit has an alphabet of 53 letters, including the visarga mark for final aspiration and special letters for kš and jñ, though one of the letters is theoretical and not actually used. The Hindi alphabet must represent both Sanskrit and modern vocabulary, and so has been expanded to 58 with the khutma letters (letters with a dot added to represent sounds from Persian and English).
The largest known abjad is Sindhi, with 51 letters. The largest true alphabets include Kabardian and Abxaz (for Cyrillic), with 58 and 56 letters, respectively, and Slovak (for the Latin alphabet), with 46. However, these scripts either include di- and tri-graphs, similar to Spanish ch, or diacritics, like Slovak č. The largest true alphabet where each letter is graphically independent is probably Georgian, with 41 letters.
Syllabaries typically include 50 to 400 glyphs (though the Múra-Pirahã language of Brazil would require only 24 if tone were not indicated, and Rotokas 30), and the glyphs of logographic systems number from the hundreds to the thousands. Thus a simple count of the number of distinct symbols is an important clue to the nature of an unknown script.
It is not always clear what constitutes a distinct alphabet. French uses the same basic alphabet as English, but many of the letters can carry diacritic and other marks (for example, é, à or ô). In French, these marks are not considered to create additional letters. However, in Icelandic, the accented letters (such as á, í and ö) are considered distinct letters of the alphabet. Some adaptations of the Latin alphabet are augmented with ligatures, such as æ in Old English and Ȣ in Algonquian; by borrowings from other alphabets, such as the thorn þ in Old English and Icelandic, which came from the Futhark runes; and by modifying existing letters, such as the eth ð of Old English and Icelandic, which came from d. Other alphabets only use a subset of the Latin alphabet, such as Hawaiian, or Italian, which only uses the letters j, k, x, y and w for foreign words.
Spelling
Each language may establish certain general rules that govern the association between letters and phonemes, but, depending on the language, these rules may or may not be consistently followed. In a perfectly phonological alphabet, the phonemes and letters would correspond perfectly in two directions: a writer could predict the spelling of a word given its pronunciation, and a speaker could predict the pronunciation of a word given its spelling. However, languages often evolve independently of their writing systems, and writing systems have been borrowed for languages they were not designed for, so the degree to which letters of an alphabet correspond to phonemes of a language varies greatly from one language to another and even within a single language.
Languages may fail to achieve a one-to-one correspondence between letters and sounds in any of several ways:
- A language may represent a given phoneme with a combination of letters rather than just a single letter. Two-letter combinations are called digraphs and three-letter groups are called trigraphs. Kabardian uses a tesseragraph (four letters) for one of its phonemes.
- A language may represent the same phoneme with two different letters or combinations of letters.
- A language may spell some words with unpronounced letters that exist for historical or other reasons.
- Pronunciation of individual words may change according to the presence of surrounding words in a sentence.
- Different dialects of a language may use different phonemes for the same word.
- A language may use different sets of symbols or different rules for distinct sets of vocabulary items (such as the Japanese hiragana and katakana syllabaries, or the various rules in English for spelling words from Latin and Greek, or the original Germanic vocabulary.
National languages generally elect to address the problem of dialects by simply associating the alphabet with the national standard. However, with an international language with wide variations in its dialects, such as English, it would be impossible to represent the language in all its variations with a single phonetic alphabet.
Some national languages like Finnish have a very regular spelling system with a nearly one-to-one correspondence between letters and phonemes. The Italian verb corresponding to 'spell', compitare, is unknown to many Italians because the act of spelling itself is almost never needed: each phoneme of Standard Italian is represented in only one way. However, pronunciation cannot always be predicted from spelling because certain letters are pronounced in more than one way. In standard Spanish, it is possible to tell the pronunciation of a word from its spelling, but not vice versa; this is because certain phonemes can be represented in more than one way, but a given letter is consistently pronounced. French, with its silent letters and its heavy use of nasal vowels and elision, may seem to lack much correspondence between spelling and pronunciation, but its rules on pronunciation are actually consistent and predictable with a fair degree of accuracy. At the other extreme, however, are languages such as English and Irish, where the spelling of many words simply has to be memorized as they do not correspond to sounds in a consistent way. For English, this is because the Great Vowel Shift occurred after the orthography was established, and because English has acquired a large number of loanwords at different times retaining their original spelling at varying levels. However, even English has general rules that predict pronunciation from spelling, and these rules are successful most of the time.
The sounds of speech of all languages of the world can be written by a rather small universal phonetic alphabet. A standard for this is the International Phonetic Alphabet.
Collation
An alphabet also serves to establish an order among letters that can be used for sorting entries in lists, called collating. Note that the order does not have to be constant among different languages using this alphabet; for examples see Latin alphabet: Collating in other languages.
In recent years the Unicode initiative has attempted to collate most of the world's known writing systems into a single character encoding. As well as its primary purpose of standardising computer processing of non-Roman scripts, the Unicode project has provided a focus for script-related scholarship.
The Alphabet Effect
Some communication theorists (notably those associated with the so-called "Toronto school of communications", such as Marshall McLuhan, Harold Innis and more recently Robert K. Logan) have advanced hypotheses to the effect that alphabetic scripts in particular have served to promote and encourage the skills of analysis, coding, decoding, and classification. This set of hypotheses may be known as "the Alphabet effect", after the title of Logan's 1986 work.
The theory claims that a greater level of abstraction is required due to the greater economy of symbols in alphabetic systems; and this abstraction needed to interpret phonemic symbols in turn has contributed in some way to the development of the societies which use it. Proponents of this theory hold that the development of alphabetic (as distinct to other types of) writing systems has made a significant impact on "Western" thinking and development because it introduced a new level of abstraction, analysis, and classification. McLuhan and Logan (1977) postulates that, as a result of these skills, the use of the alphabet created an environment conducive to the development of codified law, monotheism, abstract science, deductive logic, objective history, and individualism. According to Logan, "All of these innovations, including the alphabet, arose within the very narrow geographic zone between the Tigris-Euphrates river system and the Aegean Sea, and within the very narrow time frame between 2000 B.C. and 500 B.C." (Logan 2004).
However, many of these abstractions first occurred in societies which did not use an alphabet, such as the codified law of Hammurabi in Babylonia, which predated similar codes in societies with the alphabet. Since the alphabet quickly spread to become nearly ubiquitous, it is difficult to trace cause and effect in this matter.
See also
- Abecedarium
- Abjad
- Abugida
- Alphabetical order
- Alphabets derived from the Latin
- Artificial scripts
- Character set
- Lipogram
- List of alphabets
- Syllabary
- Transliteration
- Unicode
References
-
-
- McLuhan, Marshall; Logan, Robert K. (1977). Alphabet, Mother of Invention. Etcetera. Vol. 34, pp. 373-383.
-
-
External links
- [http://omniglot.com/writing/alphabetic.htm Alphabetic Writing Systems]
- Michael Everson's [http://www.evertype.com/alphabets/index.html Alphabets of Europe]
- The [http://www.unicode.org/cldr/data/diff/by_type/characters.html Unicode Consortium]
- [http://www.wam.umd.edu/~rfradkin/alphapage.html Evolution of alphabets] animation by Prof. Robert Fradkin at the University of Maryland
- [http://www.ancientscripts.com/alphabet.html History of alphabet]
- [http://hebrew4christians.com/Grammar/Unit_One/Aleph-Bet/aleph-bet.html The Hebrew Alphabet]
Category:Alphabetic writing systems
Category:Documents
Category:Writing
als:Alphabet
ko:자모 문자
ms:Aksara
ja:アルファベット
simple:Alphabet
th:อักษร
TransliterationTransliteration is a mapping from one system of writing into another. Transliteration attempts to be lossless, so that an informed reader should be able to reconstruct the original spelling of unknown transliterated words. To achieve this objective transliteration may define complex conventions for dealing with letters in a source script which do not correspond with letters in a goal script. Romaji is an example of a transliterating method.
This is opposed to transcription, which maps the sounds of one language to the script of another language. Still, most transliterations map the letters of the source script to letters pronounced similarly in the goal script, for some specific pair of source and goal language.
One instance of transliteration is the use of an English computer keyboard to type in a language that uses a different alphabet, such as in Russian. While the first usage of the word implies seeking the best way to render foreign words into a particular language, the typing transliteration is a purely pragmatic process of inputting text in a particular language. Transliteration from English letters is particularly important for users who are only familiar with the English keyboard layout, and hence could not type quickly in a different alphabet even if their software would actually support a keyboard layout for another language. Some programs, such as the Russian language word processor Hieroglyph provide typing by transliteration as an important feature. The rest of the article concerns itself with the first meaning of the word, that is rendering foreign words into a different alphabet.
If the relations between letters and sounds are similar in both languages,
a transliteration may be (almost) the same as a transcription.
In practice, there are also some mixed transliteration/transcription systems,
that transliterate a part of the original script and transcribe the rest.
Greeklish is an example of such a mixture.
In a broader sense, the word transliteration is used to include both transliteration in the narrow sense and transcription.
Anglicizing is a transcription method.
Romanization encompasses several transliteration and transcription methods.
Example to illustrate the difference between transliteration and transcription
In Modern Greek, the letters <η> <ι> <υ> and the letter combinations <ει> <oι> <υι> are all pronounced (in IPA notation).
A transcription consequently renders them all as <i>,
but a transliteration still distinguishes them, for example by transliterating to <ē> <i> <y> and <ei> <oi> <yi>.
(As the old Greek pronunciation of <η> was ,
this proposal uses the character appropriate for an Old Greek transliteration or transcription <ē>,
an <e> with a macron.)
On the other hand, <ευ> is sometimes pronounced and sometimes , depending on the following sound. A transcription distinguishes them, but this is no requirement for a transliteration.
Uses of transliteration
Transliterations in the narrow sense are used in situations where the original script is not available to write down a word in that script, while still high precision is required. For example, traditional or cheap typesetting with a small character set; editions of old texts in scripts not used any more (such as Linear B); some library catalogues (see [http://www.ifla.org/VII/s13/pubs/isbdg0.htm#0.6 www.ifla.org/VII/s13/pubs/isbdg0.htm]).
For example, the Greek language is written in the 24-letter Greek alphabet, which overlaps with, but differs from, the 26-letter version of the Roman alphabet in which English is written. Etymologies in English dictionaries often identify Greek words as ancestors of words used in English. Consequently, most such dictionaries transliterate the Greek words into Roman letters.
Transliteration in the broader sense is a necessary process when using words or concepts expressed in a language with a script other than one's own.
The idea of transliteration is complicated by the genuine use in multiple languages of different common nouns for the same person, place or thing. Thus, "Muhammad" is in common use now in English and "Mohammed" is less popular, though there are excellent reasons for each transcription (and similarly for "Muslim" and "Moslem"). "Muslim" and "Mohammedan" are not interchangeable, as "Mohammedan" has come to be viewed as a religious slur, and the typical French usage "Musulman" is considered offensively colonialist in English language contexts. However, "Musulmaan" is the way to say "Muslim" in other languages, such as Urdu, Hindi and Russian.
Transliteration is also used for simple encryption.
Issues in transliterating particular languages
Some languages and scripts present particular difficulties to transcribers. These are discussed on separate pages.
- Ancient Near East
- Transliterating cuneiform languages
- Transliteration of ancient Egyptian (see also Egyptian hieroglyphs)
- hieroglyphic Luwian
- Avestan
- Brahmic family
- Devanagari: see IAST, Harvard-Kyoto, ITRANS
- Pali
- Tocharian
- Chinese language
- Pinyin
- Wade-Giles
- Bopomofo
- Greek language
- Transliteration of Greek to the Latin Alphabet
- Greek alphabet
- List of Greek words with English derivatives
- Linear B
- Greeklish
- Japanese language
- Romaji Transliterating Japanese to Latin script
- Transcribing English to Japanese
- Cyrillization of Japanese
- Korean Language
- McCune-Reischauer
- Semitic languages
- Ugaritic alphabet
- Hebrew alphabet
- Romanization of Hebrew
- Arabic alphabet
- Arabic transliteration
- Arabic Chat Alphabet
- Slavic languages written in the Cyrillic or Glagolitic alphabets
- Transliteration of Russian into English
- Volapuk encoding
- Romanization of Ukrainian
- Thai language
- Royal Thai General System of Transcription
See also
- Romanization
Transliteration sites
- [http://www.latkey.com/translit Transliteration .NET service] - a free online translit service for MS Internet Explorer and MS Office for Russian, Arabic, Hebrew, Greek, Hindi, and other languages.
- [http://transliteration.eki.ee Eesti Keele Instituut] - Collection of Transliteration Tables for many Non-Roman Scripts.
- [http://www.eki.ee/wgrs/ United Nations Group of Experts on Geographical Names (UNGEGN)] - Working Group on Romanization Systems.
- [http://www.sil.org/ SIL International] - Provides free fonts for transliteration and IPA
- [http://www.mashke.org/Conv/ Automatic Cyrillic Converter]
- [http://www.lcweb.loc.gov/catdir/cpso/roman.html Library of Congress: Romanization]
- [http://www.library.arizona.edu/users/brewerm/sil/lib/transhist.html Transliteration history] - history of the transliteration of Slavic languages into Latin alphabets.
- [http://homepage.ntlworld.com/stone-catend/trind.htm Transliteration of Indic Scripts] - How to use ISO 15919
- [http://girish.co.in/projects/dev/trans4.html Online Devanagari Transliteration] - Transliteratation service for transliterating from Devanagari to 8 Indian Scripts.
- [http://lost1.net/?page=hebrew Al's Hebrew Transliterator] - converts phonetic Hebrew (using Latin alphabet) into Hebrew & HTML unicode.
- [http://icu.sourceforge.net/userguide/Transform.html ICU User Guide: Transforms] - Transliteration services in International Components for Unicode
- [http://www.genomantra.biz/unitrans/ Online Devanagari Transliteration, transcodes ITrans to Unicode] - Online demo and Open Source code available for download. Uses a simple table based algorithm.
Category:Linguistics
CyrillicThe Cyrillic alphabet (or azbuka, from the old name of the first letters) is an alphabet used to write six natural Slavic languages (Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe.
Origins
The plan of the alphabet is derived from the early Cyrillic alphabet, itself a derivative of the Glagolitic alphabet, a ninth century uncial cursive usually credited to two brothers from Thessaloniki, Saint Cyril and Saint Methodius. The glyphs in the Cyrillic alphabet are, however, mainly Byzantine Greek letters. Some of them, especially those representing sounds that did not exist in medieval Greek, retain their Glagolitic forms.
Whereas it is widely accepted that the Glagolitic alphabet was invented by Saints Cyril and Methodius, the origins of the early Cyrillic alphabet are still a source of much controversy. Though it is usually attributed to Saint Clement of Ohrid, a Bulgarian scholar and disciple of Saint Cyril and Saint Methodius, the alphabet is more likely to have developed at the Preslav Literary School in north-eastern Bulgaria, where the oldest Cyrillic inscriptions have been found, dating back to the 940s. The theory is supported by the fact that the Cyrillic alphabet almost completely replaced the Glagolitic in northeastern Bulgaria as early as the end of the tenth century, whereas the Ohrid Literary School—where Saint Clement worked—continued to use the Glagolitic until the twelfth century.
Among the reasons for the replacement of the Glagolithic with the Cyrillic alphabet is the greater simplicity and ease of use of the latter and its closeness with the Greek alphabet, which had been well known in the First Bulgarian Empire.
There are also other theories regarding the origins of the Cyrillic alphabet, namely that the alphabet was created by Saint Cyril and Saint Methodius themselves, or that it preceded the Glagolitic alphabet, representing a "transitional" stage between Greek and Glagolitic cursive, but these have been widely disproved. Although Cyril is almost certainly not the author of the Cyrillic alphabet, his contributions to the Glagolitic and hence to the Cyrillic alphabet are still recognised, as the latter is named after him.
The alphabet was disseminated along with the Old Church Slavonic liturgical language, and the alphabet used for modern Church Slavonic language in Eastern Orthodox rites still resembles early Cyrillic. However, over the following ten centuries, the Cyrillic alphabet adapted to changes in spoken language, developed regional variations to suit the features of national languages, and was subjected to academic reforms and political decrees. Today, dozens of languages in Eastern Europe and Asia are written in the Cyrillic alphabet.
Letter-forms and typography
The development of Cyrillic typography passed directly from the medieval stage to the late Baroque, without a Renaissance phase as in Western Europe. Late Medieval Cyrillic letters (still found on many icon inscriptions even today) show a marked tendency to be very tall and narrow; strokes are often shared between adjacent letters.
Peter the Great, tsar of Russia, mandated the use of westernized letter forms in the early eighteenth century; over time, these were largely adopted in the other languages that use the alphabet. Thus, unlike modern Greek fonts that retained their own set of design principles (such as the placement of serifs, the shapes of stroke ends, and stroke-thickness rules), modern Cyrillic fonts are much the same as modern Latin fonts of the same font family. The development of some Cyrillic computer typefaces from Latin ones has also contributed to the visual Latinization of Cyrillic type.
Cyrillic uppercase and lowercase letter-forms are not as differentiated as in Latin typography. Upright Cyrillic lowercase letters are essentially small capitals (with the exception of a few forms such as "а" and "е" which adopted western lowercase shapes), although a good-quality Cyrillic typeface will still include separate small caps glyphs.
small capitals
In the absence of Roman and Italic traditions, Cyrillic type fonts are properly classified as upright (Russian: pryamoi shrift) and cursive (kursivnyi). Cursive or hand-written shapes of many letters, especially the lowercase letters, are entirely different from the upright shapes. As in Latin typography, a sans-serif face may have a mechanically-sloped oblique font (naklonnyi).
In Bulgarian, Macedonian, and Serbian, some cursive letters are different from those used in other languages. These cursive letter shapes are often used in upright fonts as well, especially for road signs, inscriptions, posters and the like, less so in newspapers or books. External link: [http://jankojs.tripod.com/SerbianCyr.htm Serbian Cyrillic Letters BE, GHE, DE, PE, TE].
The following table shows the differences between the upright and cursive Cyrillic letters as used in Russian. Cursive glyphs that are bound to confuse beginners (either because of an entirely different look, or because of being a false friend with an entirely different Latin character) are highlighted.
Reference: Bringhurst, Robert (2002). The Elements of Typographic Style (version 2.5), pp. 262–264. Vancouver, Hartley & Marks. ISBN 0-88179-133-4.
Romanization
There are various systems for Romanization of Cyrillic text, including transliteration to convey Cyrillic spelling in Latin characters, and transcription to convey pronunciation.
Standard Cyrillic-to-Latin transliteration systems include:
- Scientific transliteration, used in linguistics, is based on the Latin Croatian alphabet.
- The [http://www.eki.ee/wgrs/ Working Group on Romanization Systems] of the United Nations recommends different systems for specific languages. These are the most commonly used around the world.
- ISO 9:1995, from the International Organization for Standardization.
- America Library Association & Library of Congress (ALA-LC) [http://www.loc.gov/catdir/cpso/roman.html Romanization tables for Slavic alphabets], used in North American libraries.
- BGN/PCGN 1947 transliteration system (United States Board on Geographic Names & Permanent Committee on Geographical Names for British Official Use).
- GOST 16876-71 (1983), from the Main Administration of Geodesy and Cartography of the former Soviet Union. Russian abbreviation of GOsudarstvenny STandart, "the State Standard". GOST has limited support for non-Russian alphabets.
Serbian is written in both Cyrillic and Latin alphabets. There is also a Latin alphabet for Belarusian, and some non-Slavic languages, such as Azerbaijani, Uzbek or Moldavian have confronted permanent Romanization after the disintegration of the Soviet Union. In Serbian there is a one-to-one correspondence between Vuk Karadžić's Serbian Cyrillic and Ljudevit Gaj's Croatian Gajica (derived from the Czech alphabet. See Serbo-Croatian language#Writing systems.) The Belarusian Latin alphabet is traditionally based on Polish and is called Łacinka, but, because of the political realities in the former USSR, Belarusian is usually Romanized by analogy to Russian.
See also:
- Romanization.
- Transliteration of Russian into English.
- Romanization of Ukrainian.
- Transliteration of Bulgarian into English.
External links:
- [http://transliteration.eki.ee/ Transliteration of Non-Roman Scripts], a collection of writing systems and transliteration tables, by Thomas T. Pederson. Includes PDF reference charts for many languages' transliteration systems.
As used in various languages
Sounds are indicated using IPA.
These are only approximate indicators.
While these languages by and large have phonemic orthographies, there are occasional exceptions—for example, Russian его (meaning him/his), which is pronounced instead of .
Note that spellings of names may vary, especially Y/J/I, but also GH/G/H and ZH/J.
Slavic languages
Old Church Slavonic
Main article: early Cyrillic alphabet
Old Church Slavonic is the first literary and liturgical Slavic language developed from the native language of the 9th century missionaries, Saints Cyril and Methodius. It is not the same as the modern Church Slavonic language, which is still used in some Eastern Orthodox and Eastern Catholic church services.
As the Cyrillic alphabet spread throughout the Slavic world, it was adopted for writing local languages, such as Old Ruthenian. Its adaptation to the characteristics of local languages led to the development of its many modern variants, below.
Yeri (ЪІ) was originally a ligature of Yer and I. Ya (Я) was written in an archaic form called A iotified. Capital and lowercase letters were not distinguished in old manuscripts.
The early Cyrillic alphabet is difficult to represent on computers. Many of the letterforms differed from modern Cyrillic and varied a great deal in manuscripts, and changed over time. Few fonts include adequate glyphs to reproduce the alphabet. Some characters are missing from the current Unicode standard altogether, including Cyrillic dotless I, iotified Yat, abbreviated Yer ("Yerok"), and many ligatures.
See also: Glagolitic alphabet.
Russian
Main article: Russian alphabet
Notes:
# In the pre-reform Russian orthography, in Old Russian and in Old Church Slavonic the letter is called yer. Historically, the "hard sign" takes the place of a now-absent vowel, still preserved in Bulgarian. See the notes for Bulgarian.
# When an iotated vowel (vowel whose sound begins with ) follows a consonant, the consonant will become palatalised (the sound will mix with the consonant), and the vowel's sound will not be heard independently. The Hard Sign will indicate that this does not happen, and the sound will appear only in front of the vowel. The Soft Sign will indicate the consonant should be palatised, but the vowel's sound will not mix with the palatalization of the consonant. The Soft Sign will also indicate that a consonant before another consonant or at the end of a word is palatised. Examples: та (); тя (); тья (); тъя (); т (); ть ().
Historical letters: before 1918, there were four extra letters in use: (replaced by Ии), (Фита "Fita", replaced by Фф), (Ять "Yat", replaced by Ее), and (ижица "Izhitsa", replaced by Ии); these were eliminated by reforms of Russian orthography.
Ukrainian
Main article: Ukrainian alphabet.
Ukrainian differs from Russian in the following ways:
- He (Г, г) is a voiced fricative consonant, pronounced .
- Ge (Ґ, ґ) appears after He, pronounced , i.e., like a Russian Г. It looks like He with an "upturn" pointing up from the right side of the top bar. (This letter was not officially used in the Soviet Union after 1933, so it is missing from older Cyrillic fonts.)
- E (Е, е) is pronounced .
- Ye (Є, є) appears after E, pronounced . It looks like a mirrored Russian letter Э.
- Y (И, и) is pronounced (similar to Russian Yery).
- I (І, і) appears after Y, pronounced . It looks like the Latin letter I.
- Yi (Ї, ї) appears after I, pronounced . It looks like I with a diaeresis above it (the same two dots that appear over the Russian letter Yo).
- Yot (Й, й) is the equivalent of Russian Short I.
- Shcha (Щ, щ) is pronounced .
- An apostrophe (’) serves the purpose of the Russian Hard Sign.
- Yo does not appear.
Belarusian
Belarusian is also written in a Belarusian Latin alphabet (Łacinka). Historically, Belarusian Tatars have written the language in the Arabic alphabet (Arabica), and Belarusian Jews in the Hebrew alphabet.
NB: Before 1933, Ґ () was also present. Some linguists call for restoring the letter.
Belarusian differs from Russian in the following ways:
- I looks like the Latin letter I (І, і). (But non-syllable short I looks the same as in Russian.)
- Between U and Ef is the letter U short (Ў, ў), which looks like U (У) with a breve and pronounced , or like the u part in diphthongs in now, low.
- Shcha (Щщ) does not appear. A combination of sh and ch (ШЧ/шч) is typically used instead.
- The Hard Sign is not used. Its purpose (removing of palatalisation) is served by an apostrophe.
- The letter combinations Дж дж and Дз дз appear after Д д in the Belarusian alphabet in some publications. These digraphs each represent one sound: Дж , Дз .
- Г represents a voiced fricative consonant.
External links
- [http://www.pravapis.org/art_belarusian_alphabet.asp Introduction to Belarusian Alphabet]
- [http://www.pravapis.org/art_lac1.asp Introduction to Belarusian Latin Script]
- [http://www.pravapis.org/art_kitab1_en.asp Belarusian language using Arabic script]
- [http://www.pravapis.org/art_letter_frequency.asp Letter Frequency in Belarusian and Russian]
- [http://www.pravapis.org/translator.asp Converter from Latin "Translit" into Cyrillics]
Bulgarian
See Bulgarian language#Alphabet. Bulgarian differs from Russian in the following ways:
- Ye (Е) is pronounced and is called "E".
- Yo (Ё) does not appear.
- The Russian letter Э does not appear.
- Shcha (Щ) is pronounced and is called "Shta".
- The Hard Sign (Ъ) is used for a vowel, (Schwa).
- Yery (Ы) does not appear.
Modern Serbian since the 19th century
Serbian can also be written with the Latin alphabet. See Serbo-Croatian language.
Serbian differs from Russian in the following ways:
- Ye is pronounced . Yo does not appear. The Russian letter E does not appear.
- Between D and E is the letter Djə (Ђ, ђ), which is pronounced , and looks like Tjə, except that the loop of the H curls farther and dips downwards.
- Short I does not appear. Between I and K is the letter Jə (Ј, ј), pronounced , which looks like the Latin letter J.
- Between L and M is the letter Ljə (Љ, љ), pronounced , which looks like L and the Soft Sign smashed together.
- Between N and O is the letter Njə (Њ, њ), pronounced , which looks like N and the Soft Sign smashed together.
- Between T and U is the letter Tjə (Ћ, ћ), which is pronounced and looks like a lowercase Latin letter h with a bar. On the uppercase letter, the bar appears at the top; on the lowercase letter, the bar crosses the top half of the vertical line.
- Between Ch and Sh is the letter Dzhə (Џ, џ), pronounced , which looks like Ts but with the downturn moved from the right side of the bottom bar to the middle of the bottom bar.
- Sh is the last letter; the rest do not appear.
Macedonian
Macedonian differs from Serbian in the following ways:
- Between Ze and I is the letter Dze (Ѕ, ѕ), pronounced , which looks like the Latin letter S.
- Djerv is replaced by Gje (Ѓ, ѓ), pronounced , which looks like Ghe with an acute accent (´).
- Tjerv is replaced by Kja (Ќ, ќ), pronounced , which looks like Ka with an acute accent (´).
Non-Slavic languages
These alphabets are generally modelled after Russian, but often bear striking differences, particularly when adapted for Caucasian languages. The first few of them were generated by Orthodox missionaries for the Finnic and Turkic peoples of Idel-Ural (Mari, Udmurt, Mordva, Chuvash, Kerashen Tatars) in 1870s. Later such alphabets were created for some of the Siberian and Caucasus peoples who had recently converted to Christianity. In the 1930s, some of those alphabets were switched to the Uniform Turkic Alphabet. All of the peoples of the former Soviet Union who had been using an Arabic or other Asian script (Mongolian script, etc.) also adopted Cyrillic alphabets, and during the Great Purge in late 1930s, all of the Roman-based alphabets of the peoples of then Soviet Union were switched over to Cyrillic as well. The Abkhazian alphabet was switched to Georgian script, but after the death of Stalin Abkhaz also adopted Cyrillic. The last language to adopt Cyrillic was the Gagauz language, which had used Greek script before.
In Uzbekistan, Azerbaijan and Turkmenistan, the use of Cyrillic to represent local languages has often been a politically controversial issue after the collapse of the Soviet Union, as it evokes the era of Soviet rule (see Russification). Some of Russia's languages have also tried to drop Cyrillic, but the move was halted under Russian law (see Tatar alphabet). A number of languages have switched from Cyrillic to other orthographies—either Roman-based or returning to a former script.
Unlike the Roman alphabet, which is usually adapted to different languages by using additions to existing letters such as accents, umlauts, tildes and cedillas, the Cyrillic alphabet is usually adapted by the creation of entirely new letter shapes. In some alphabets invented in the 19th century, such as Mari, Udmurt and Chuvash, umlauts and breves also were used.
Abkhaz
Abkhaz is a Caucasian language, spoken in the Autonomous Republic of Abkhazia, Georgia. See Abkhaz alphabet.
Turkic languages
Chuvash
The Cyrillic alphabet is used for the Chuvash language since the late 19th century, with some changes in 1938.
Kazakh
Kazakh is also written with the Latin alphabet (in Turkey and now in Kazakhstan as well), and modified Arabic alphabet (in China, Iran and Afghanistan).
- Ә ә =
- Ғ ғ = (uvular fricative)
- Қ қ = (uvular plosive)
- Ң ң =
- Ө ө =
- У у = , ,
- Ұ ұ =
- Ү ү =
- Һ һ =
- İ і =
The Cyrillic letters Вв, Ёё, Цц, Чч, Щщ, Ъъ, Ьь and Ээ are not used in native Kazakh words, but only for Russian loans.
Kyrgyz
Kyrgyz has also been written in Latin and in Arabic.
- Ң ң =
- Ү ү =
- Ө ө =
Moldovan
The Moldovan language used the Cyrillic alphabet between 1946 and 1989. Nowadays, this alphabet is still official in the breakaway republic of Transnistria.
Mongolian
The Mongolic languages include Khalkha (in Mongolia), Buryat (around Lake Baikal) and Kalmyk (northwest of the Caspian Sea). Khalkha Mongolian is also written with the Mongol vertical alphabet, which is being slowly reintroduced in Mongolia.
Khalkha
- В в =
- Е е = ,
- Ё ё =
- Ж ж =
- З з =
- Н н = ,
- Ө ө =
- Ү ү =
- Ы ы = (after a hard consonant)
- Ь ь = (extra short)
- Ю ю = ,
The Cyrillic letters Кк, Фф and Щщ are not used in native Mongolian words, but only for Russian loans.
Buryat
The Buryat (буряад) Cyrillic alphabet is similar to the Khalkha above, but Ьь indicates palatalization as in Russian. Buryat does not use Вв, Кк, Фф, Цц, Чч, Щщ or Ъъ in its native words.
- Е е = ,
- Ё ё =
- Ж ж =
- Н н = ,
- Ө ө =
- Ү ү =
- Һ һ =
- Ы ы = ,
- Ю ю = ,
Kalmyk
The Kalmyk (хальмг) Cyrillic alphabet is similar to the Khalkha, but the letters Ээ, Юю and Яя appear only word-initially. In Kalmyk, long vowels are written double in the first syllable (нөөрин), but single in syllables after the first. Short vowels are omitted altogether in syllables after the first syllable (хальмг = xaʎmag).
- Ә ә =
- В в =
- Һ һ =
- Е е = ,
- Җ җ =
- Ң ң =
- Ө ө =
- Ү ү =
Cyrillic in Unicode
Main article: Cyrillic characters in Unicode.
In Unicode, the Cyrillic block extends from U+0400 to U+052F. The characters in the range U+0400 to U+045F are basically the characters from ISO 8859-5 moved upward by 864 positions. The characters in the range U+0460 to U+0489 are historic letters, not used now. The characters in the range U+048A to U+052F are additional letters for various languages that are written with Cyrillic script.
Unicode does not include accented Cyrillic letters, but they can be combined by adding U+0301 ("combining acute accent") after the accented vowel (e.g., ы́ э́ ю́ я́). Some languages (e.g., modern Church Slavonic) still are not fully supported.
See also
- Bosnian Cyrillic
- Cyrillization
- Iotation
- palochka
- Languages using Cyrillic
- Volapuk encoding
- Slavic numerals
- Russian Manual Alphabet (the fingerspelled Cyrillic alphabet)
- KOI8-R (8 bit native russian character encoding)
- KOI8-U (8 bit ukrainian character encoding)
- ISO/IEC 8859-5 (8 bit cyrillic character encoding established by International Organization for Standardization)
- CP866 (8 bit cyrillic character encoding established by Microsoft for use in MS-DOS)
- Windows-1251 (8 bit cyrillic character encoding established by Microsoft for use in Microsoft Windows)
External links
- [http://toma.dnsalias.net/phonetic Bulgarian Online Transliterator]
- [http://www.omniglot.com/writing/cyrillic.htm Cyrillic alphabet at omniglot.com]
- [http://www.terena.nl/library/multiling/euroml/mlcs5-cyr.txt A Survey of The Use of Modern Cyrillic Script], including the complete required repertoire of graphic characters, by J. W. van Wingen.
- [http://www.peoples.org.ru/eng_index.html Minority Languages of Russia on the Net], a list of resources.
- [http://www.easybulgarian.com/members/u0a_sample.html Bulgarian Cyrillic Alphabet audio]
- [http://www.jewishgen.org/jri-pl/translit.htm Information on Cyrillic transliteration] and the handwritten script form of Cyrillic.
- [http://www.unicode.org/charts/PDF/U0400.pdf Unicode Code Charts "Cyrillic"] (PDF)
- [http://www.unicode.org/charts/PDF/U0500.pdf Unicode Code Charts "Cyrillic Supplement"] (PDF)
- [http://czyborra.com/charsets/cyrillic.html The Cyrillic Charset Soup], Roman Czyborra's overview and history of Cyrillic charsets.
- [https://addons.mozilla.org/extensions/moreinfo.php?id=561 The Russ Key Mozilla Firefox extension], this extension allows typing in Russian and other languages and transliterating HTML text into Cyrillic.
Category:Cyrillic alphabet
als:Kyrillisches Alphabet
ko:키릴 문자
ja:キリル文字
th:อักษรซีริลลิก
English language
English is a West Germanic language that is spoken in the United Kingdom, United States, Canada, Australia, New Zealand, Ireland, South Africa, and many other countries. English is now the third-most spoken native language worldwide (after Chinese and Hindi), with some 380 million speakers. It has lingua franca status in many parts of the world, due to the military, economic, scientific, political and cultural influence of the British Empire in the 18th and 19th centuries and that of the United States from the 20th century to the present. Through the global influence of native English speakers in cinema, airlines, broadcasting, science, and the Internet in recent decades, English is now the most widely learned second language in the world. Many students worldwide are required to learn some English, and a working knowledge of English is required in many fields and occupations.
History
English is a West Germanic language that originated from the Old Saxon language brought to Britain by Germanic settlers from various parts of northwest Germany. The original Old English language was subsequently influenced by two successive waves of invasion. The first was by speakers of languages in the Scandinavian branch of the Germanic family, who colonised parts of Britain in the 8th and 9th centuries. The second wave was of the Normans in the 11th century, who spoke a variety of French. These two invasions caused English to become "creolised" to some degree (though it was never a full creole in the linguistic sense of the word); creolisation arises from the cohabitation of speakers of different languages, who develop a hybrid tongue for basic communication. Cohabitation with the Scandinavians resulted in a significant grammatical simplification and lexical enrichment of the Anglo-Friesian core of English; the later Norman occupation led to the grafting onto that Germanic core a more elaborate layer of words from the Romance branch of European languages; this new layer entered English through use in the courts and government. Thus, English developed into a "borrowing" language of considerable suppleness and huge vocabulary.
According to the Anglo-Saxon Chronicle, around the year 449, Vortigern, King of the British Isles, invited the "Angle kin" (Angles led by Hengest and Horsa) to help him against the Picts. In return, the Angles were granted lands in the south-east. Further aid was sought, and in response "came men of Ald Seaxum of Anglum of Iotum" (Saxons, Angles, and Jutes). The Chronicle talks of a subsequent influx of settlers who eventually established seven kingdoms, known as the heptarchy. Modern scholarship considers most of this story to be legendary and politically motivated.
These Germanic invaders dominated the original Celtic-speaking inhabitants, whose languages survived largely in Scotland, Wales, Cornwall, and Ireland. The dialects spoken by the invaders formed what would be called Old English, which resembled some coastal dialects in what are now the Netherlands and north-west Germany. Later, it was strongly influenced by the North Germanic language Norse, spoken by the Vikings who settled mainly in the north-east (see Jorvik). The new and the earlier settlers spoke languages from different branches of the Germanic family; many of their lexical roots were the same or similar, although their grammars were more distant, including the prefixes, suffixes and inflections of many of their words. The Germanic language of these Old English inhabitants of Britain would be partly creolised by the contact with Norse invaders. This resulted in a stripping away of much of the grammar of Old English, including gender and case, with the notable exception of the pronouns; thus, the language became simpler and plainer. The most famous work from the Old English period is the epic poem "Beowulf", by an unknown poet.
For the 300 years following the Norman Conquest in 1066, the Norman kings and the high nobility spoke only a variety of French. A large number of Norman words were assimilated into Old English, with some words doubling for Old English words (for instance, ox/beef, sheep/mutton). The Norman influence reinforced the continual evolution of the language over the following centuries, resulting in what is now referred to as Middle English. Among the changes was a broadening in the use of a unique aspect of English grammar, the "continuous" tenses, with the suffix "-ing". During the 15th century, Middle English was transformed by the Great Vowel Shift, the spread of a standardised London-based dialect in government and administration, and the standardising effect of printing. Modern English can be traced back to around the time of William Shakespeare. The most well-known work from the Middle English period is Geoffrey Chaucer's The Canterbury Tales.
Classification and related languages
The English language belongs to the western subbranch of the Germanic branch of the Indo-European family of languages. The closest living relative of English is Scots (Lallans), a West Germanic language spoken mostly in Scotland and parts of Northern Ireland. Like English, Scots is a direct descendant of Old English, also known as Anglo-Saxon.
After Scots, the next closest relative is Frisian—spoken in the Netherlands and Germany. Other less closely related living languages include Dutch, Afrikaans, German, Plattdüütsch and the Scandinavian languages. Many French words are also intelligible to an English speaker (pronunciations are not always identical, of course), because English absorbed a tremendous amount of vocabulary from French, via the Norman language after the Norman conquest and directly from French in further centuries; as a result, a substantial share of English vocabulary is quite close to the French, with some minor spelling differences (word endings, use of old French spellings, etc.), as well as occasional differences in meaning.
Geographic distribution
Norman conquest
English is the second or third most widely spoken language in the world today; a total of 600–700 million people use English regularly. About 377 million people use E | | |