ویکیپیڈیا:نقشہ لسانی شناخت

وپ:ن ل ش

ذیل میں درج فہرست لسانی شناخت کا ایک نقشہ ہے۔ اس میں ایسے سادہ اور آسان سراغ بیان کیے گئے ہیں جنہیں کسی دستاویز کی زبان کو شناخت کرنے کے لیے استعمال کیا جا سکتا ہے۔

حروف یا علامات

کسی غیر ملکی متن کی زبان کو عام حالات میں اس زبان کے مخصوص حروف کے ذریعہ پہچانا جا سکتا ہے۔

ABCDEFGHIJKLMNOPQRSTUVWXYZ (لاطینی حروفِ تہجی)** اور کوئی دیگر نہیں – انگریزی زبان، ہوائی، انڈونیشیائی زبان، لاطینی زبان، مالے زبان، سواحلی زبان، زولو ** àéëïĳ – ولندیزی زبان (Except for the ligature ĳ، these letters are very rare in ولندیزی۔ Even fairly long ولندیزی texts often have no diacritics.)** êéë افریکانز** êôúû – West فریسیائی** ÆØÅæøå – ڈینش زبان، نورویجینی** واحد diacritics، زیادہ تر umlauts*** ÄÖäö – فننش (BCDFGQWXZÅbcfgqwxzå only found in names and loanwords، occasionally also ŠšŽž)*** ÅÄÖåäö – سونسکا (کبھی کبھار é)*** ÄÖÕÜäöõü – استونیائی*** ÄÖÜäöüß – جرمن زبان** Circumflexes*** ÇÊÎŞÛçêîşû – کردی زبان*** ĂÎÂŞŢăîâşţ – رومانیانی زبان*** ÂÊÎÔÛŴŶâêîôûŵŷáéíï – ویلش*** ĈĜĤĴŜŬĉĝĥĵŝŭ – اسپرانتو** Three or more types of diacritics*** ÇĞİÖŞÜğçıöşü – ترکی زبان*** ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö – آئس لینڈی*** ÁÉÍÓÖŐÚÜŰáéíóöőúüű – مجارستانی زبان*** ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· – کیٹالان*** ÀÂÇÉÈÊÎÏÔŒÙÛàâçéèêîïôœùû – فرانسیسی زبان; diacritics on uppercase characters are often optional*** ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· صرف گیسکون لہجے میں) – آکسیٹان زبان*** ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü برازیلی اور k، w اور y مقامی الفاظ میں نہیں) – پرتگیزی زبان** áéíñÑóúü ¡¿ – ہسپانوی زبان** àéèìòù – اطالوی زبان** çkñ (c مقامی الفاظ میں نہیں) – باسک زبانیں** ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ - گوارانی (the only زبان to use g̃)** ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv مقامی الفاظ میں نہیں) – جنوبی Athabaskan زبانیں*** ’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū – مغربی اپاچی*** 'ÓǪǪ́ óǫǫ́ – ناواہو*** ’ÚŲŲ́ úųų́ – Chiricahua/Mescalero** ąłńóż لیخیتیک زبانیں*** ćęśź پولش زبان*** ćśůź Silesian*** ãéëòôù کاشوبیائی زبان** A، Ą، Ã، B، C، D، E، É، Ë، F، G، H، I، J، K، L، Ł، M، N، Ń، O، Ò، Ó، Ô، P، R، S، T، U، Ù، W، Y، Z، Ż – کاشوبی** ČŠŽ*** اور کوئی دیگر نہیں – سلووین*** ĆĐ – بوسنیائی زبان، کروشیائی زبان، سربیائی لاطینی*** ÁĎÉĚŇÓŘŤÚŮÝáďéěňóřťúůý – چیک زبان*** ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý – سلوواک زبان*** ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū – لیٹویائی*** ĄĘĖĮŲŪąęėįųū – لیتھوینیائی** ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ – ویتنامی*** ꞗĕŏŭo᷄ơ᷄u᷄ – Middle ویتنامی** ā ē ī ō ū – May be seen in some جاپانی (زبان) texts in روماجی or transcriptions (see below) or ہوائی and ماوری texts.** é – Sundanese
ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي عربی حروف تہجی** عربی زبان، مالے زبان (Jawi)، کردی زبان (Soranî)، پنجابی زبان، پشتو زبان، سندھی زبان، اردو، others.
- پ چ ژ گ – فارسی زبان * Brahmic family of scripts** بنگالی script*** অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্‍ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯** دیوناگری*** अ प आ पा इ पि ई पी उ पु ऊ पू ऋ पृ ॠ पॄ ऌ पॢ ॡ पॣ ऍ पॅ ऎ पॆ ए पे ऐ पै ऑ पॉ ऒ पॊ ओ पो औ पौ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ*** used to write، either along with other scripts or exclusively، several بھارت زبانیں بشمول سنسکرت زبان، ہندی زبان، مراٹھی زبان، کشمیری زبان، سندھی زبان، Bihari، Bhili، کونکنی زبان، بھوجپوری زبان اور نیپالی از نیپال.** گرمکھی*** ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ*** primarily used to write پنجابی زبان as well as برج بھاشا، کھڑی بولی (اور دیگر ہندوستانی لہجے)، سنسکرت زبان اور سندھی زبان.** گجراتی script*** અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ*** used to write گجراتی اور کچھی زبان** تبتی script*** ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ*** used to write Standard تبتی، زونگکھا (بھوٹانی) اور Sikkimese* БДЖИЛПУЦЧШ (سیریلیک رسمِ خط)** ЙЩЬЮЯ*** ҐЄІЇ – یوکرینی زبان*** Ъ – بلغاری**** ЁЭЫ – روسی زبان***** Ў، І instead of И – بیلاروسی زبان** ЉЊЏ، Ј instead of Й (Vuk Karadžić's reform)*** ЋЂ – سربیائی*** ЃЌЅ – مقدونیائی زبان** ЅЋѸѲѠЩЪЬҌЮЯѦѪѮѰѴ – قدیم Church Slavonic** In ٹرینسنیسٹریا، رومانیائی is written in Cyrillic characters* ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρςστυφχψω (یونانی حروف تہجی) – یونانی زبان* אבגדהוזחטיכלמנסעפצקרשת (عبرانی ابجد)** and maybe some odd dots and lines above، below، or inside characters – عبرانی زبان** פֿ; dots/lines below letters appearing only with א,י، and ו – یدیش زبان** no dots or lines around the letters، and more than a few words end with א (i.e.، they have it at the leftmost position) – آرامی زبانیں** Ladino* 日本語勉強 – مشرقی Asian زبانیں** and no other – چینی زبان** with あいうえお Hiragana اور/یا アイウエオ Katakana – جاپانی (زبان)** with 위키백과에 (note commonplace ellipses and circles) کوریائی زبان** ویتنامی uses لاطینی حروف تہجی – see above* ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨㄧㄋㄈㄨㄏㄠ (Zhuyin)** ㄪㄫㄬ -- نہیں مینڈارن چینی** خمیر حروف تہجی

++កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ خمیر حروف تہجی - خمیر حروف تہجی- خمیر* Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (آرمینیائی حروف تہجی) – آرمینیائی زبان* ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (جارجیائی حروف تہجی) – جارجیائی زبان

لاطینی حروف تہجی (ممکنہ توسیع)

رومنی زبانیں

Lots of لاطینی زبان roots.

====فرانسیسی زبان (le français)====* Common words: de, la, le, du, des, il, et;* Words ending in -ux, especially -aux or -eux;* Letter w is rare and used only in loanwords (e.g whisky).* Many apostrophised contractions, i.e. words beginning with l' or d', less often c', j', m', n', s', t' — only before vowels and h* Accented letters: â ç è é ê î ô û, rarely ë ï ; ù only in the word où, à only in the word à and at end of words ; never á í ì ó ò ú* Rare to use accents on capital letters* Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue traditionally indicated by means of dashes

====Jersey Norman / Jèrriais (Jèrriais)====* Common words: lé, dé, tchi, ès, i', ch'* Tch, dg, th and în are common character combinations. ou is frequently followed by another vowel.* Many apostrophised short forms, e.g. words beginning with l', d' or r'. é frequently alternates with an apostrophe e.g. c'mîn/quémîn.

====ہسپانوی زبان (Español)====* Characters: ¿ ¡ (inverted question and exclamation marks), ñ* All vowels (á, é, í, ó, ú) may take an acute accent* Some words frequently used: de, el, los, la(s), uno(s), una(s), y* No apostrophised contractions* Word beginnings: ll- (check not Welsh)* Word endings: -o, -a, -ción, -miento, -dad* Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue often indicated by means of dashes

اطالوی زبان (Italiano)

Almost every word ends in a vowel. Exceptions include non, il, per, con, del.* Common one-letter word: è.* Common word: perché.* Letter sequences: gli, gn, sci.* Letters j, k, w, x and y are rare and used only in loanwords (e.g. whisky).* Word endings: -o, -a, -zione, -mento, -tà, -aggio.* Grave accent (e.g., on à) almost always occurs in the last letter of words.* Geminate consonants (tt, zz, cc, ss, bb, pp, ll, etc.) are frequent.

====کاتالان زبان (Català)====* Character combination l·l and tz* Letter sequences: tx (check not Basque) and tg* Letters k and w are rare and only used in loanwords (e.g. walkman)* Word endings: -o, -a, -es, ció, -tat* Word beginning: ll-

====رومانیانی زبان (Română)====* Characters: ă â î ș ț* Common words: și, de, la, a, ai, ale, alor, cu* Word endings: -a, -ă, -u, -ul, -ului, -ţie (or -ţiune), -ment, -tate; names ending in -escu* Double and triple i: copii, copiii* Note that Romanian is sometimes written online with no diacritics, making it harder to identify. A cedilla is sometimes used on S (ş) and on T (ţ) instead of the correct diacritic, the comma (above).

====پرتگیزی زبان (Português)====* Characters: ã, õ, â, ê, ô, á, é, í, ó, ú, ü, à* Common one-letter words: a, à, e, é, o* Common two-letter words: ao, as, às, da, de, do, em, os, ou, um* Common three-letter words: aos, com, das, dos, ele, ela, mas, não, por, que, são, uma* Common endings: -ção, -dade, -ismo, -mente* Common digraphs: ch, nh, lh; examples: chave, galinha, baralho.* The letters k, w and y are rare. They are found mostly in loanwords, e.g.: keynesianismo, walkie-talkie, nylon.* Most singular words end in a vowel, l, m, r, or z.* Plural words end in -s.* European Portuguese often uses c before ç and t: acção, acto, etc.

====Walloon (Walon)====* Characters: å, é, è, ê, î, ô, û* Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou* Common one-letter words: a, å, e, i, t', l', s', k'* Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'* Common three-letter words: dji, nén, rén, bén, pol, mel* Common endings: -aedje, -mint, -xhmint, -ès, -ou, -owe, -yî, -åcion* Apostrophes are followed by a space (preferably non breaking one), eg: l' ome instead of l'ome.

====گالیشیائی زبان (Galego)====* Similar to Portuguese.* Articles o or ó (masc. sing.), os (masc. plural), a (fem. sing.), as (fem. plural)* Common diagraphs: nh (ningunha)* The letters j, k, w and y are not in the alphabet, and appear only in loanwords

جرمنیہ زبانیں

====انگریزی زبان====* words: a, an, and, in, of, on, the, that, to, is, I (should always be a capital)* letter sequences: th, ch, sh, ough, augh* word endings: -ing, -tion, -ed, -age, -s, -’s, -’ve, -n’t, -’d* diacritics or accents only in loanwords (piñata)

====ولندیزی زبان (Nederlands)====* letter sequences ij (capitalized as IJ, and also found as a ligature, Ĳ or ĳ), ei, doubled vowels (but not ii), kw, sch, oei, ooi, and uw (especially eeuw, ieuw, auw, and ouw).* words: het, op, en, een, voor (and compounds of voor).* word endings: -tje, -sje, -ing, -en, -lijk,* at the start of words: z-, v-, ge-* t/m occasionally occurs between two points in time or between numbers (e.g. house numbers).

====مغربی فریسی زبان (Frysk)====* letter sequences: ij, ei, oa* words: yn

====افریکانز (Afrikaans)====* Words: 'n, as, vir, nie.* Similar to Dutch, but:** the common Dutch letters c and z are rare and used only in loanwords (e.g. chalet);** the common Dutch vowel ij is not used; instead, i and y are used (e.g. -lik, sy);** the common Dutch word ending -en is rare, being replaced by -e.

====جرمن زبان (Deutsch)====* umlauts (ä, ö, ü), ess-zett (ß)* letter sequences: ch, sch, tsch, tz, ss,* common words: der, die, das, den, dem, des, er, sie, es, ist, ich, du, aber* common endings: -en, -er, -ern, -st, -ung, -chen, -tät* rare letters: x, y (except in loanwords)* letter c rarely used except in the sequences listed above and in loanwords* long compound words* many capitalised words in the middle of sentences

====سونسکا (Svenska)====* letters å, ä, ö, * common words: och, i, att, det, en, som, är, av, den, på* long compound words* letter sequences: stj, sj, skj, tj, ck, än* no use of characters w, z except for foreign proper nouns and some loanwords

====ڈینش زبان (Dansk)====* letters æ, ø, å* common words: af, og, til, er, på, med, det, den* common endings: -tion, -ing, -else, -hed * long compound words* no use of character c, w, z and x except for foreign proper nouns and some loanwords (for most, c is replaced with k).

====نورویجینی زبان (Norsk)====* letters æ, ø, å* common words: av, ble, er, og, en, et, men, i, å, for, eller* common endings: -sjon, -ing, -else, -het * long compound words* no use of character c, w, z and x except for foreign proper nouns and some loanwords (for most, c is replaced with k or s).

====آئس لینڈی زبان (Íslensk)====* letters á, ð, é, í, ó, ú, ý, þ, æ, ö* common beginnings: fj-, gj-, hj-, hl-, hr-, hv-, kj-, and sj-,* common endings: -ar (especially -nar), -ir (especially -nir), -ur, -nn (especially -inn)* no use of character c, q, w, or z except for foreign proper nouns, some loanwords, and, in the case of z, older texts.

بالٹک زبانیں

==== لیٹویائی زبان (Latviešu)====* uses diacritics: ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž* does not have letters: Q, W, X, Y* extremely rare doubling of مصوت* rare doubling of حرف صحیح* a period (.) after ordinal numbers, e.g. 2005. gads* common words: ir, bija, tika, es, viņš

==== لتھووینیائی زبان (Lietuvių)====* visual abundance of letters ą, č, ę, ė, į, š, ų, ū, ž* does not have letters q, w, x* extremely rare doubling of مصوتs and حرف صحیحs* many varying forms (usually endings) of the same word, e.g. namas, namo, namus, namams, etc.* generally long words (absence of articles and fewer prepositions in comparison to Germanic languages)* common words: ir, yra, kad, bet.

سلاوی زبانیں

====پولش زبان (Polski)====* consonant clusters rz, sz, cz, prz, trz;* includes: ą, ę, ć, ś, ł, ó, ż, ź;* words w, we, i, na (prepositions);* words jest, się;* words beginning with był, będ, jest (forms of copula być, "to be").

====چیک زبان (Čeština)====* visual abundance of letters ž š ů ě ř;* words je, v;* to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô.

====سلوواک زبان (Slovenčina)====* visual abundance of letters ž š č;* uses : ä, ľ, ĺ, ŕ and ô;* typical suffixes: -cia, -ť,* to distinguish from Czech: does not use ě, ř or ů;

====کروشیائی زبان (Hrvatski)====* similar to Serbian* letters-digraphs dž, lj, nj* does not have q, w, x, y* typical suffixes: -ti, -ći* special letters: č, ć, š, ž, đ* common words: a, i, u, je* to distinguish from Serbian: infixes -ije- and -je- are common, verbs ending in -irati, -iran

سربیائی زبان (Srpski/Српски)

Serbian Latin

similar to Croatian* letters-digraphs dž, lj, nj (lj and nj are somewhat more common than dž, although not by much)* no q, w, x, y* typical verb suffixes -ti, -ći (infinitive is much less used than in Croatian)* foreign words might end in -tija, -ovan, -ovati, -uje* special letters: đ (rare), č, š (common), ć, ž (less common)* common words: a, i, u, je, jeste* future tense suffix -iće, -ićeš, -ićemo, -ićete (not found in Croatian)* infix -ije- virtually nonexistent, infix -je- extremely rarely appears before a consonant (in contrast with Croatian)

Serbian Cyrillic

uses Џ, Љ, Њ, Ђ, Ћ* does not use Щ, Ъ, Ы, Ь, Э, Ю, Я, Ё, Є, Ґ, Ї, І, Ў* distinguishing from Macedonian: does not use Ѕ, Ѓ, Ќ* distinguishing from any other Cyrillic language: does not use Й (й); uses Ј (ј) instead

کیلٹک زبانیں

====ویلش زبان (Cymraeg)====* letters Ŵ, ŵ used in Welsh* words y, yr, yn, a, ac, i, o* letter sequences wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si* letters not used: k, q, v, x, z* letter only used rarely, in loanwords: j* commonly accented letters: â, ê, î, ô, û, ŵ, ŷ* word endings: -ion, -au, -wr, -wyr* y is the most common letter in the language* w between consonants (w is in fact a vowel in the Welsh language)* circumflex accent (^) is by far the commonest diacritical mark, although diacritics are often omitted altogether.

====آئرش زبان (Gaeilge)====* vowels with acute accents: á é í ó ú* words beginning with letter sequences bp dt gc bhf* letter sequences sc cht* there may be words or names with the second letter capitalized instead of the first

====Scottish Gaelic (Gàidhlig)====* vowels with grave accents: à è ì ò ù* letter sequences sg chd

ایرانی زبانیں

====کردی زبان (Kurdî / كوردی)====* The word xwe (oneself, myself, yourself etc.) is highly specific (xw combination) and frequent.

Finno-Ugric languages

==== فنش زبان (Suomi)====* distinct letters ä and ö; but never õ or ü* b, f, z, š and ž appear in دخیل لفظ and proper names only; the last two are substituted with sh or zh in some texts* c, q, w, x appear in (typically foreign) proper names only* outside of loanwords, d appears only between vowels or in hd* outside of loanwords, g only appears in ng* outside of loanwords, words do not begin with two consonants; this is reflected in the general syllable structure, where consonant clusters only occur across syllable boundaries, except in some loanwords* common words: sinä, on* common endings: -nen, -ka/-kä, -in* common vowel combinations: ai, uo, ei, ie, oi, yö, äi* unusually high degree of letter duplication, both vowels and consonants will be geminated, for example aa, ee, ii, kk, ll, ss* frequent long words

==== استونیائی زبان (Eesti)====* distinct letters: ä, ö, õ and ü; but never ß or å* similar to Finnish, except:** letter y is not used, except in loanwords** letters b and g (without preceding n) are found outside of loanwords** letter õ is unique to Estonian** words end in consonants more frequently than in Finnish, word-final b, d, v being particularly typical** letter d is much more common in Estonian than in Finnish, and in Estonian it is often the last letter of the word, which it never is in Finnish* common words: ja, on, ei, ta, see

==== مجارستانی زبان (Magyar)====* letters Ő, Ű, ő and ű (double acute accent) unique to Hungarian* accented letters á and é frequent* letter combinations: sz, gy, ny, cs, leg‐, ‐obb (note: sz also common in Polish)* common words: a, az, ez, egy, és, van, hogy* letter k very frequent

اسکیمو۔الیوت زبانیں

====Greenlandic====* long polysynthetic words* relatively abundant n, q, u* ubiquitous double consonants and vowels (aa, ii, uu, more rarely ee, oo)* vowels a, i, u conspicuously more frequent than e, o (which are only found before q and r)* no diphthongs except occasional word-final ai, only consonant combinations besides double consonants and (n)ng consist of r + consonant

Southern Athabaskan languages

vowels with acute accent, ogonek (nasal hook), or both: á, ą, ą́* doubled vowels: aa, áá, ąą, ą́ą́* slashed l: ł (check not Polish!) * n with acute accent: ń* quotation mark: ' or ’* sequences: dl, tł, tł’, dz, ts’, ií, áa, aá* may have rather long words

Western Apache (Nnee biyáti’/Ndee biyáti’)

In addition to the above,* may use: u or ú* may use vowels with macron: ā ą̄* does not use ų

Navajo (Diné bizaad)

In addition to the above,* does not use u, ú, or ų

(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)

In addition to the above,* uses: u, ú, ų* does not use o, ó, or ǫ

===Guarani===* lots of tildes over vowels and n* tilde over g: g̃—it's the only language in the world to use it. Example words: hagũa and g̃uahẽ.* b, d, and g usually do not occur without m or n before (mb, nd, ng) unless they're Spanish loan words.* f, l, q, w, x, z extremely rare outside loan words* does not use c without h: ch

=== باسک زبانیں (Euskara)===* word ending: -ak, -ek * letter sequence: tx* Cc, Qq, Vv, Ww, Yy only in loanwords* z is relatively common

=== جاپانی (زبان) in روماجی (Nihongo/日本語)===* words: desu, aru, suru, esp. at end of sentences;* word endings: -masu, -masen, -shita;* letters: nearly 50% vowels (a e i o u);* letters: no consonant clusters, except n and h at end of syllables* a macron or circumflex may be used to indicate doubled vowels, eg. Tōkyō* common words: no, o, wa, de, ni* uses 4 alphabets: romaji (romanized letters), hiragana (used for native words), katakana (used for foreign words) and kanji (originated from Chinese) (Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)

===Hmong (Hmoob) written in Romanized Popular Alphabet===* Almost all written words are quite short (one syllable).* Syllables (unless they are pronounced with mid tone) end in a tone letter: one of b s j v m g d, leading to apparent "consonant clusters" such as -wj* w can be the main vowel of a syllable (e.g. tswv)* Syllables can begin with sequences such as hm-, ntxh-, nq-.* Syllables ending in double vowels (especially -oo, -ee) possibly followed by a tone letters (as in Hmoob "Hmong").

===Vietnamese (tiếng Việt)===* Roman characters with more than one diacritical mark on the same vowel. See above.* Almost all written words are quite short (one syllable).* Words beginning with ng* common words: cái, không, có, ở

====Vietnamese Quoted-Readable (VIQR)====* The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .* DD, Dd, or dd* The following character before punctuation: \

====Vietnamese VNI encoding====* The digits 1-8 after vowels* The digit 9 after a D or d* The following character before numbers: \

====Vietnamese Telex====* The following characters after vowels: s f r x j* The following vowels, doubled up: a e o* The letter w after the following characters: a o u* DD, Dd, or dd

Chinese, Romanized

====معیاری چینی (現代標準漢語)====* In general, Mandarin syllables end only in vowels or n, ng, r; never in p, t, k, m =====پینین=====* Words beginning with x, q, zh* Tone marks on vowels, such as ā, á, ǎ, à** For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4 =====وید-جائیلز=====* Words do not begin with b, d, g* Words beginning with hs* Many hyphenated words* Apostrophes after initial letters or digraphs, e.g. t'a, ch'i

=====Gwoyeu Romatzyh=====* Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.* Insertion of r, e.g. arn, erng, etc.* Words ending in nn, nq

====Standard Cantonese (粵語)====* In general, Cantonese syllables can end in p, t, k, m, n, ng; never r

==== من نان (Bân-lâm-gí/Bân-lâm-gú) in Pe̍h-ōe-jī ====* Many hyphenated words.* Words can end in p, t, k, m, n, ng, h; never r* Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.* Unusual combining characters, namely · (middle dot, always after o) and | (vertical bar). ¯ (macron) is also common.

آسٹرونیشیائی زبانیں

مالے زبان (bahasa Melayu) and انڈونیشیائی زبان (bahasa Indonesia)

May contain the following:
Prefixes: me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes: -kan, -an, -i
Others (these almost always written in lower case): yang, dan, di, ke

مالے زبان and انڈونیشیائی زبان are mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language.

Frequent use of the letter 'a' (comparable to the frequency of the English 'e').

ترک زبانیں

Note that some Turkic languages like آذربائیجانی زبان and Turkmen use a similar لاطینی حروفِ تہجی (often Jaŋalif) and similar words, and might be confused with Turkish. Azeri has the letters Əə, Xx and Qq not present in the Turkish alphabet, and Türkmen has Ää, Žž, Ňň, Ýý and Ww. Latin Characters uniquely (or nearly uniquely) used for Turkic languages: Əə, Ŋŋ, Ɵɵ, Ьь, Ƣƣ, Ğğ, İ, and ı. All Turkic languages can form long words by adding multiple suffixes.

ترکی زبان (Türkçe/Türkiye_Türkçesi)

Turkish Alphabet

Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z

=====Common words=====* bir — one, a* bu — this* fakat — but* oldu — was* şu — that

=====Misc.=====* Look for word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding -lar and -ler.** Common Tense Changes: -yor -mış -muş -sun** Possessivity/person: -im -un -ın -in -iz -dur -tır** Example: Yapmıştır, "[He] did it"; Yap is the verb stem meaning "to do", -mış indicates the perfect tense, -tır indicates the third person (he/she/it).** Example: جزائر پرنس, "Islands"; Ada is a noun meaning "island", -lar makes it plural.)** Example: Evimiz, "Our house"; Ev is a noun meaning "house", -im indicates the first-person possessor, which -iz then makes plural.)

آذربائیجانی زبان (Azərbaycanca)

Azeri can be easily recognized by the frequent use of ə. This letter is not used in any other officially recognized modern Latin alphabet. In addition, it uses the letters x and q, which are not used in Turkish.

Common words: və, ki, ilə, bu, o, isə, görə, da, də* Frequent use of diacritics: ç, ə, ğ, ı, İ, ö, ş, ü* Words ending in -lar, -lər, -ın, -in, -da, -də, -dan, -dən* Words never beginning with ğ or ı* Words rarely beginning with two or more consonants* Transliteration of foreign words and names, e.g. Audrey Hepburn = Odri Hepbern

چینی زبان (中文)

No spaces, except between punctuation marks and (sometimes) foreign words.* Arabic numerals (0-9) sometimes used* Punctuation:** Period 。(not .)** Serial comma 、(distinguished from the regular comma ，)** Ellipse …… (six dots)* No hiragana, katakana, or ہنگل رسمِ خط* May be written vertically

آسان چینی حروف (简体) vs روایتی چینی حروف (繁體)

Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.

Common radicals different between Traditional and Simplified:* Simplified: 讠钅饣纟门(e.g. 语银饭纪问)* Traditional: 訁釒飠糹門(e.g. 語銀飯紀問)

Common characters different between Traditional and Simplified:* Simplified: 国会这来对开关门时个书长万边东车爱儿* Traditional: 國會這來對開關門時個書長萬邊東車愛兒

جاپانی (زبان) (日本語)

Katakana (カタカナ) and hiragana (ひらがな) characters mixed with کانجی (漢字)* Few or no spaces* Arabic numerals (0-9) sometimes used* Punctuation:** Period 。** Comma 、(，also used)** Quotation marks 「」* Occasional small characters beside large ones, eg. しゃ　りゅ　しょ　って　シャ　リュ　ショ　ッテ* Double tick marks (known as dakuon or handakuon) appearing at upper right of characters, eg. で　が　ず　デ　ガ　ズ* Empty circles (maru) appearing at upper right of characters, eg. ぱ　ぴ　パ　ぴ* Frequent characters: の　を　は　が* May be written vertically

Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese

Note: Cantonese-speakers live in Mainland China, Hong Kong and Macau, so written Cantonese can be written in either Simplified or Traditional characters.

Common characters in Vernacular Cantonese that do not occur in Mandarin (only characters that are the same between Traditional and Simplified are chosen here):

嘅咗咁嚟啲唔佢乜嘢

Some of the above characters are not supported in all character encodings, so sometimes the 口 radical on the left is substituted with a 0 or o, e.g.

o既 0既

کوریائی زبان (한국어/조선말)

Western-style punctuation marks* Western-style spacing* ہنگل رسمِ خط letters, e.g. ㅎ h, ㅇ ng, ㅂ b, etc.* Hangul letters used to form syllable blocks; e.g. ㅅ s + ㅓ eo + ㅇ ng = 성 seong* Circles and ellipses are commonplace in Hangul; are exceedingly rare in Chinese.* General appearance has relatively-uniform complexity, as contrasted with Chinese or Japanese.

==خمیر زبان ភាសារខ្មែរ==* no space * use Khmer number in writing ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩ * has two sound "r" and "or"* use double quotes (") to change from "r" sound to "or" sound * use single quote (') to make the words end in short sound* has 24 vowel - ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ េា ៅ ុំ ំ ាំ ះ ុះ េះ ោះ * use this as full stop " ។ " * use " space " as comma

یونانی زبان (Ελληνικά)

Modern Greek is written with یونانی حروف تہجی in monotonic, polytonic or atonic, either according to Demotic (Mr. Triantafilidis) grammar or Katharevousa grammar. Some people write in Greeklish (Greek with Latin script) which is either Visual-based, orthographic or صوتیات or just messed-up (mixed). The only official forms of Greek language are the Monotonic and Polytonic. ===Normal Modern Greek (Greek Monotonic)===* words και, είναι;* Each multi-syllable word has one accent/tone mark (oxia): ά έ ή ί ό ύ ώ* The only other diacritic ever used is the tréma: ϊ/ΐ, ϋ/ΰ, etc.

Pre-1980s Greek (Greek Polytonic)

Katharevousa, Dimotiki (Triantafylidis' grammar)

Diacritics: ά, ᾶ, ἀ, ἁ, and combinations, also with other vowels.
Some texts, especially in Katharevousa, also have ὰ, ᾳ, in combination with other diacritics.

===Ancient Greek===* Diacritics: ά, ὰ, ᾶ, ἀ, ἁ, ᾳ, and combinations, also with other vowels; ῥ

===Greek Atonic===* Was common in some Greek media (television);* You will see Greek characters without accents/tones;* words: και, ειναι, αυτο. ===Greek in Greeklish===* Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).* Keep in mind: in Greeklish more than one character may be used for one letter. (example: th for Θ (theta)). ====Orthographic Greeklish====* words kai, einai. ====Phonetic Greeklish====* words ke, ine;* omega appears as o;* ei, oi appear as i;* ai appears as e. ====Visual-based Greeklish====* omega (Ω or ω) may appear as W or w;* epsilon (E) may appear as 3;* alpha (A) may appear as 4;* theta (Θ) may appear as 8;* upsilon (Y) may appear as \|/;* gamma (γ) may appear as y* More than one character may be used for one letter.

====Messed-up (Mixed) Greeklish====* words kai, eine;* combines principles of phonetic, visual-based and orthographic Greeklish according to writer's idiosyncrasy;* The most commonly used form of Greeklish.

آرمینیائی زبان (Հայերեն)

Armenian can be recognised by its unique 38-letter alphabet:

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ

جارجیائی زبان (ქართული)

Georgian can be recognised by its unique alphabet.

ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ

سلاوی زبانیں using the سیریلیک رسمِ خط

Bolding denotes letters unique to the language

بیلاروسی زبان (беларуская)

uses: ё, і, й, ў, ы, э, ’
features: шч used instead of щ

===بلغاری زبان (български)===* uses: ъ , щ , я , ю , й* words: със , в* features: ъ is used as a vowel; many words end in definite article –ът, –ят, –та, –то, –те

===مقدونیائی زبان (македонски)===* uses: ј , љ , њ , џ , ѓ , ќ , ѕ * words: во , со* features: р is usually found between consonants, for example првин* article suffixes similar to Bulgarian, but does not use ъ, я

روسی زبان (русский)

uses: ё, й, ъ, ы, э, щ

===سربیائی زبان (српски)===* uses: ј , љ, њ , џ , ђ , ћ* does not use: ъ , щ , я , ю , й* words: је , у* features: large consonant clusters, for example српски

یوکرینی زبان (українська)

uses: й , і , ї , ґ , є, щ, ’
words: і, є

Arabic alphabet

All languages using the Arabic alphabet are written right-to-left.
A number of other languages have been written in the Arabic alphabet in the past, but now are more commonly written in Latin characters; examples include ترکی زبان, صومالی زبان and سواحلی زبان.

===عربی زبان (العربية)===* short vowels are not written so many words are written with no vowel at all* common prefix: -ال* common suffix: ة-* words: إلى, من، على

===فارسی زبان (فارسی)===* uses: پ، چ، ژ، گ* words: که, به

===اردو (اردو)===* uses: ‮ٹ‎, ڈ‎, ڑ‎, ں، ے* many words ending in ے* words: اور، ہے

==Dravidian languages==* All Dravidian languages are written from left to right.* All dravidian languages have different scripts. But similarity can be found in their orthography.

===Tamil===* Tamil is written using its own script.* common word endings :ள்ளது, கிறது, கின்றன, ம்* common words: தமிழ், அவர், உள்ள, சில* There are 30 unique alphabets to tamil. With the help of diacritics, as many as 247 letters can be written. அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன* In unicode, tamil is available in latha and vijaya fonts.

North American syllabics

Blackfoot

Cherokee

Cree language

Inuktitut

خود ساختہ زبانیں

===اسپرانتو (Esperanto)===* words: de, la, al, kaj* Six accented letters: ĉ Ĉ ĝ Ĝ ĥ Ĥ ĵ Ĵ ŝ Ŝ ŭ Ŭ or their corresponding X-system representation cx Cx gx Gx hx Hx jx Jx sx Sx ux Ux* words ending in o, a, oj, aj, on, an, ojn, ajn, as, os, is, us, u, i, aŭ

===کلینگون زبان (tlhIngan Hol)===* When written in the Latin alphabet Klingon has the unusual property of a distinction in case; q and Q are different letters, and other letters are either always (e.g. D, I, S) or never (e.g. ch, tlh, v) written in upper case. This causes a large number of words that look quite strange to people who aren't used to it, for example: yIDoghQo', tlhIngan Hol (with mixed case).* The apostrophe is fairly frequent, especially at the end of a word or syllable.* Common suffixes: -be', -'a'* Common words: 'oH, Qapla'* May use one or more apostrophes in the middle of a word: SuvwI″a'

===Lojban (lojban.)===* starts with ni'o or .i (or i);* has many words like ko'a pi'o etc;* almost all lowercase;* usually no punctuation except for dots;* may use commas in the middle of words (typically proper nouns).

== بیرونی روابط ==* Language Identification Web Service, language detection API, 100+ languages supported* Translated, an online language identifier, 102 languages supported* Language Detector, Online language identification from text or URLs.* Google Translate, Google's translation service.* Xerox, an online language identifier, 47 languages supported* Language Guesser, a statistical language identifier, 74 languages recognized

NTextCat - free Language Identification API for .NET (C#): 280+ languages available out of the box. Recognizes language and encoding (UTF-8, Windows-1252, Big5, etc.) of text. Mono compatible.