Decrypting Javanese

Oswin Rahadiyan Hartono
6 min readMay 24, 2020

--

Javanese script

Language has played a significant role in Indonesia’s history. Having grown up on Java island, I developed a deep appreciation for its rich culture and social structures. I recall taking Javanese language and literature classes, where I gained insights into its modest yet traditional sound. However, delving deeper, I discovered that the system is far more complex than meets the eye.

Background

From the 15th century onwards, Javanese was predominantly written using a version of the Arabic alphabet known as pegon.

By the 17th century, the Javanese script had evolved into its present form. However, during the Japanese occupation of Indonesia from 1942 to 1945, the use of the Javanese script was prohibited.

Since the 19th century, when the Dutch introduced the Latin alphabet to Indonesia, the Javanese script has gradually been replaced. Today, it is primarily used by scholars and for ornamental purposes. Those who possess the ability to read and write in Javanese are highly esteemed in society.

Javanese Language Classification

Javanese language classification

Javanese is spoken differently depending on the social context, with three distinct styles or registers. Each style employs its own vocabulary, grammatical rules, and prosody. In Javanese, these styles are known as:

  1. Ngoko, informal style, this is commonly used among friends and close relatives. It’s also employed by individuals of higher status, such as elders or bosses, when addressing those of lower status, such as younger people or subordinates in the workplace.
  2. Krama (Madya), intermediate between ngoko and krama. Strangers on the street would use it, where status differences may be unknown and one wants to be neither too formal nor too informal. The term is from Sanskrit madhya (middle).
  3. Krama Inggil (Krama Alus), polite and formal style. It is used between those of the same status when they do not wish to be informal. It is used by persons of lower status to persons of higher status, such as young people to their elders, or subordinates to bosses; and it is the official style for public speeches, announcements, etc. The term is from Sanskrit krama (in order).

Javanese Letters : Aksara Jawa (Hanacaraka)

Javanese letters

The Javanese script is an abugida writing system which consists of 20 to 33 basic letters, depending on the language being written. Like other Brahmic scripts, each letter (called an aksara) represents a syllable with the inherent vowel /a/ or /ɔ/ which can be changed with the placement of diacritics around the letter. Each letter has a conjunct form called pasangan, which nullifies the inherent vowel of the previous letter. Traditionally, the script is written without space between words (scriptio continua) but is interspersed with a group of decorative punctuation.

Javanese features and fun facts

  • Direction of writing: left to right in horizontal lines
  • The Javanese alphabet consists of akṣara (letters), saṇḍangan (diacritics), wilangan (numerals), and pada (punctuation).
  • The akṣara (letters) consist of akṣara wyanyjana (consonants) and akṣara swara (vowels); the saṇḍangan (diacritics) consist of saṇḍangan swara (vowel diacritics), saṇḍangan panyigeging wanda (sound killers), and saṇḍangan wyanyjana (semivowel diacritics).
  • Each consonants has two forms: the akṣara form is used at the beginning of a syllable, while the pasangan form is used for the second consonant of a consonant cluster and mutes the vowel of the akṣara.
  • There are a number of consonants letters called akṣara murda or akṣara gêḍe (great or important letters) which are used for honorific purposes, such as to write the names of respected people. There are also a number of additional consonant letters to represent foreign sounds called akṣara rekan.
  • The order of the carakan consonants makes the following saying, “Hana caraka, data sawala, paḍa jayanya, maga baṭanga,” which means, “There were (two) emissaries, they began to fight, their valor was equal, they both fell dead.”
  • Dutch loanwords usually have the same form and meaning as in Indonesian. The word sepur also exists in Indonesian, but there it has preserved the literal Dutch meaning of “railway tracks”, while the Javanese word follows Dutch figurative use, and “spoor” (lit. “rail”) is used as metonymy for “trein” (lit. “train”). (Compare a similar metonymic use in English: “to travel by rail” may be used for “to travel by train”.)
  • There are far fewer Arabic loanwords in Javanese than in Malay, and they are usually concerned with Islamic religion. Nevertheless, some words have entered the basic vocabulary, such as pikir (“to think”, from the Arabic fikr), badan (“body”), mripat (“eye”, thought to be derived from the Arabic ma’rifah, meaning “knowledge” or “vision”). However, these Arabic words typically have native Austronesian or Sanskrit alternatives: pikir = galih, idhep (Austronesian) and manah, cipta, or cita (from Sanskrit); badan = awak (Austronesian) and slira, sarira, or angga (from Sanskrit); and mripat = mata (Austronesian) and soca or nétra (from Sanskrit).
  • You can translate from Latin to Javanese script here or here
  • It’s hard to print the Javanese letter online, here is the reference for printing purpose

Research on Javanese

I’ve come across some fascinating projects and research focused on the Javanese language. Kudos to the individuals behind these efforts for revitalizing the importance of the Javanese subject. I truly appreciate their dedication to preserving the language’s roots, identity, and culture.

Natural Language Processing

>>> nltk.corpus.udhr.words('Javanese-Latin1')[11:]
[u'Saben', u'umat', u'manungsa', u'lair', u'kanthi', ...]

Pattern recognition

Speech recognition

Challenges in Javanese Language Research

As researchers delve into the complexities of the Javanese language, they encounter a multitude of challenges:

  • Syntax and Spelling Differences: Javanese exhibits unique syntax and spelling variations, posing challenges for standardization and comprehension.
  • Phonetic Differences: Geographic factors contribute to phonetic variations across different regions where Javanese is spoken, complicating efforts in pronunciation standardization.
  • Vocabulary Differences: Javanese encompasses various registers and dialects, each with its own vocabulary, making it challenging to establish a unified lexicon.
  • Limited Resources: The availability of linguistic resources, such as corpora and dictionaries, is limited, hindering comprehensive linguistic analysis and research.
  • Lack of Javanese Letter Projects: Initiatives focusing on pattern recognition and character modeling for Javanese script are lacking, impeding advancements in text processing and recognition.
  • Lack of Javanese Speech Projects: Similarly, projects dedicated to Javanese speech recognition are scarce, limiting the development of voice-enabled applications and technologies.
  • Lack of Javanese Literature Research: Despite its rich literary tradition, research on Javanese literature, particularly in the realm of natural language processing (NLP) applied to poems and songs, remains insufficient, overlooking valuable cultural and linguistic insights.
  • Navigating these challenges requires concerted efforts from researchers, linguists, and technology enthusiasts to unlock the full potential of the Javanese language in the digital age.

References

--

--

Oswin Rahadiyan Hartono
Oswin Rahadiyan Hartono

Written by Oswin Rahadiyan Hartono

Data Engineering | Data Warehousing | Data Governance

No responses yet