Javanese script

The language system has taken a huge part in the history of Indonesia. Growing up on Java island made me appreciate its rich culture and social system. I remember taking the Javanese class and learning the language and literature. It sounds modest and traditional. Trust me, the system is more complicated than you think.

Background

For a period from the 15th century onwards, Javanese was written with a version of the Arabic alphabet, called pegon.

By the 17th Century, the Javanese alphabet had developed into its current form. During the Japanese occupation of Indonesia between 1942 and 1945, the alphabet was prohibited.

Since the Dutch introduced the Latin alphabet to Indonesia in the 19th Century, the Javanese alphabet has gradually been supplanted. Today it is used almost exclusively by scholars and for decoration. Those who can read and write it are held in high esteem.

Javanese Language Classification

Javanese language classification

Javanese is spoken differently depending on the social context. There are three distinct styles or registers. Each employs its own vocabulary, grammatical rules, and even prosody. In Javanese these styles are called:

  1. Ngoko, informal speech, used between friends and close relatives. It is also used by persons of higher status (such as elders, or bosses) addressing those of lower status (young people, or subordinates in the workplace).
  2. Krama (Madya), intermediate between ngoko and krama. Strangers on the street would use it, where status differences may be unknown and one wants to be neither too formal nor too informal. The term is from Sanskrit madhya (middle).
  3. Krama Inggil (Krama Alus), polite and formal style. It is used between those of the same status when they do not wish to be informal. It is used by persons of lower status to persons of higher status, such as young people to their elders, or subordinates to bosses; and it is the official style for public speeches, announcements, etc. The term is from Sanskrit krama (in order).

Javanese Letters : Aksara Jawa (Hanacaraka)

Javanese letters

The Javanese script is an abugida writing system which consists of 20 to 33 basic letters, depending on the language being written. Like other Brahmic scripts, each letter (called an aksara) represents a syllable with the inherent vowel /a/ or /ɔ/ which can be changed with the placement of diacritics around the letter. Each letter has a conjunct form called pasangan, which nullifies the inherent vowel of the previous letter. Traditionally, the script is written without space between words (scriptio continua) but is interspersed with a group of decorative punctuation.

Javanese features and fun facts

  • Direction of writing: left to right in horizontal lines
  • The Javanese alphabet consists of akṣara (letters), saṇḍangan (diacritics), wilangan (numerals), and pada (punctuation).
  • The akṣara (letters) consist of akṣara wyanyjana (consonants) and akṣara swara (vowels); the saṇḍangan (diacritics) consist of saṇḍangan swara (vowel diacritics), saṇḍangan panyigeging wanda (sound killers), and saṇḍangan wyanyjana (semivowel diacritics).
  • Each consonants has two forms: the akṣara form is used at the beginning of a syllable, while the pasangan form is used for the second consonant of a consonant cluster and mutes the vowel of the akṣara.
  • There are a number of consonants letters called akṣara murda or akṣara gêḍe (great or important letters) which are used for honorific purposes, such as to write the names of respected people. There are also a number of additional consonant letters to represent foreign sounds called akṣara rekan.
  • The order of the carakan consonants makes the following saying, “Hana caraka, data sawala, paḍa jayanya, maga baṭanga,” which means, “There were (two) emissaries, they began to fight, their valor was equal, they both fell dead.”
  • Dutch loanwords usually have the same form and meaning as in Indonesian. The word sepur also exists in Indonesian, but there it has preserved the literal Dutch meaning of “railway tracks”, while the Javanese word follows Dutch figurative use, and “spoor” (lit. “rail”) is used as metonymy for “trein” (lit. “train”). (Compare a similar metonymic use in English: “to travel by rail” may be used for “to travel by train”.)
  • There are far fewer Arabic loanwords in Javanese than in Malay, and they are usually concerned with Islamic religion. Nevertheless, some words have entered the basic vocabulary, such as pikir (“to think”, from the Arabic fikr), badan (“body”), mripat (“eye”, thought to be derived from the Arabic ma’rifah, meaning “knowledge” or “vision”). However, these Arabic words typically have native Austronesian or Sanskrit alternatives: pikir = galih, idhep (Austronesian) and manah, cipta, or cita (from Sanskrit); badan = awak (Austronesian) and slira, sarira, or angga (from Sanskrit); and mripat = mata (Austronesian) and soca or nétra (from Sanskrit).
  • You can translate from Latin to Javanese script here or here
  • It’s hard to print the Javanese letter online, here is the reference for printing purpose

Research on Javanese

I found some interesting projects and research using the Javanese language use case. Kudos to the people who make Javanese subject great again. I really appreciate the way they conserve the root, identity, and culture.

Natural Language Processing

>>> nltk.corpus.udhr.words('Javanese-Latin1')[11:]
[u'Saben', u'umat', u'manungsa', u'lair', u'kanthi', ...]

Pattern recognition

Speech recognition

Challenges

  • Syntax and spelling difference
  • Phonetic difference (geographic factor)
  • Vocabulary difference (classification)
  • Limited resource, corpus, and dictionary
  • Lack of Javanese letter project (pattern recognition)
  • Lack of Javanese speech project (speech recognition)
  • Lack of Javanese literature research (NLP on Javanese poems and songs)

References

Text Mining | Data Warehouse | NLP