Digitizing Việt Nam marks a digital leap forward in Vietnam Studies through a Columbia - Fulbright collaboration, formalized through that began with a 2022 memorandum of understanding between the Weatherhead East Asian Institute and the Vietnam Studies Center. The Digitizing Việt Nam platform began with the generous donation of the complete archive by the Vietnamese Nôm Preservation Foundation to Columbia University in 2018.




Delve into Vietnam's history, culture, and society through cutting-edge tools and curated resources tailored for scholars, students, and educators.
Explore our digital archive dedicated to preserving and academically exploring Vietnam's historical, cultural & intellectual heritage.
Engage creatively with Vietnam Studies — Use Digitizing Vietnam's specialized tools to approach the field with fresh perspectives and critical insight.
Discover and teach Vietnam Studies with impact — Explore curated syllabi, lesson plans, and multimedia resources designed to support innovative and inclusive learning experiences.
Latest news and discoveries from the digital front of Vietnamese heritage.

Vietnam and China, with their long history of contact, have formed a deep relationship of cultural exchange in which language is one of the most visibly affected areas. For more than two millennia, Chinese has not only served as a vehicle for transmitting culture and thought, but has also left an indelible mark on the structure and vocabulary of Vietnamese. Sino-origin words have penetrated deeply into Vietnamese life and thought, becoming an essential part of the language and enriching and diversifying Vietnamese expression.
The study Hanyu Yuenanyu Guanxi Yusu Lishi Cengci Fenxi (《汉语越南语关系语素历史层次分析》 - “Historical Stratification of Sino-Origin Elements in Vietnamese”) by Assoc. Prof. Dr. Xian Manxue is an in-depth work applying the method of historical phonological comparison to identify, classify, and date when Sino-origin words entered Vietnamese. The research does not stop at the familiar layer of so-called Sino-Vietnamese vocabulary, but also uncovers older layers of borrowings, thereby sketching a vivid and panoramic picture of the history of Han–Viet linguistic contact. This article introduces the main ideas and key findings of that work.
As we know, for many centuries of Vietnamese history, Classical Chinese was used as the official written language. During this period, a large number of Sino-origin words entered Vietnamese, became part of everyday usage, and today are inseparable from modern Vietnamese.
We are familiar with “Sino-Vietnamese words,” that is, words read according to “Sino-Vietnamese readings,” a system of pronouncing Chinese characters based on the phonological system of Middle Chinese in the Tang dynasty, specifically reflecting Chinese of around the 8th century. Examples include: học ‘study’, dân ‘people’, sách ‘book’, văn chương ‘literature’, giáo dục ‘education’, xã hội ‘society’, văn hóa ‘culture’, đạo đức ‘morality’, tình cảm ‘emotion’, gia đình ‘family’, phụ huynh ‘parent’.
However, not all Sino-origin words in Vietnamese belong to this Sino-Vietnamese layer. There are also words like bay ‘fly’, buồn ‘sad’, búa ‘axe’, gương ‘mirror’, ghế ‘chair’, gươm ‘sword’, giếng ‘well’, dao ‘knife’, mùa ‘season’… which at first sound “purely Vietnamese,” but in fact also trace back to Chinese. Their modern forms differ from standard Sino-Vietnamese because they were borrowed very early and became assimilated into Vietnamese; or because they underwent long-term “Vietnamization” after borrowing; or because they entered through spoken contact rather than through the written/literary channel. These words are crucial for tracing the oldest layers of Han–Viet contact.
These items preserve information about historical phonology and carry traces of Sino–Vietnamese cultural exchange across different periods. Analyzing the relationship between such Vietnamese words and earlier stages of Chinese helps us reconstruct both the linguistic contact history between Chinese and Vietnamese and the cultural exchange between the two societies.
To cover this full range, the study adopts a broader concept: “Sino-origin elements in Vietnamese.” This term refers to all monosyllabic Vietnamese words that show systematic phonological and semantic correspondence to a Chinese form, regardless of which layer they belong to or when they were borrowed. This approach allows for a more comprehensive and systematic view of Chinese influence on Vietnamese.
THE METHOD OF HISTORICAL PHONOLOGICAL COMPARISON: Reconstructing history through sound
Each Sino-origin element—Sino-Vietnamese or non–Sino-Vietnamese—functions like a “fossil pebble,” preserving the phonetic features of the period in which it entered Vietnamese. To decode that historical information, the key method is historical phonological comparison.
We use historical phonological comparison based on the principle of “full correspondence” across three components: initial consonant, rhyme (vowel + coda), and tone, in order to identify Sino-origin elements in Vietnamese, especially those outside the standard Sino-Vietnamese system. In other words, a word is only considered Sino-origin if it shows regular patterned correspondences in all three aspects. This principle prevents arbitrary or accidental comparisons and ensures scientific rigor and reliability.
Then, based on known historical sound changes in both Chinese and Vietnamese, we “stratify” these Sino-origin elements by period—that is, we determine when they were borrowed into Vietnamese.
Some illustrations:
– Initial consonants: The character 斧 ‘axe’ has the Sino-Vietnamese reading phủ, but Vietnamese also has búa ‘axe’. The initial b- in búa reflects an earlier stage before the emergence of an [f]-like initial (ph-) in Chinese, i.e. before the 8th century. Therefore búa is an archaic borrowing. Other items of this type include: buồm (帆 SV: phàm, ‘sail’), buồn (烦 SV: phiền, ‘sad’), buông (放 SV: phóng, ‘let go’), buồng (房 SV: phòng, ‘room’), bùa (符 SV: phù, ‘talisman’), bay (飞 SV: phi, ‘fly’).
– Rhymes: The character 惜 ‘to regret’ has the Sino-Vietnamese reading tích, but Vietnamese also has tiếc ‘to regret’. The rhyme correspondence -ích ~ -iếc reflects an older stage. Other pairs include: biếc (碧 SV: bích, ‘azure’), tiệc (席 SV: tịch, ‘banquet’), chiếc (只 SV: chích, ‘classifier for single items’), thiếc (锡 SV: tích, ‘tin’), việc (役 SV: dịch, ‘work, task’), giêng (正 SV: chinh, ‘first month’).
Using this procedure, the study identifies 3,938 Sino-origin elements, of which 1,012 lie outside the standard Sino-Vietnamese reading system—that is, very early borrowings or forms that have been strongly reshaped. According to historical sound change patterns, these are assigned to four main strata:
– Archaic (Qin–Han period);
– Early Medieval (late Eastern Han to early Tang);
– Late Medieval (mid-Tang to Five Dynasties);
– Later / Early Modern and after (from the 10th century onward). (These labels follow traditional periodization in Chinese historical linguistics.)
Analyzing how these elements are distributed across time helps us sketch an overall history of Sino–Vietnamese language contact.
3.1 Qin–Han period: First contact
Han–Viet linguistic contact begins here. 386 items belong to this earliest layer. Especially in the mid-to-late Eastern Han, many scholars from the Central Plains migrated south to escape turmoil. The historical figure Sĩ Nhiếp (Sĩ Vương) is often credited with promoting Han culture in the region. Historical sources note that he opened schools, used canonical texts from the north, glossed and explained them, and taught local people—evidence that aligns with what we see linguistically.
3.2 Early Medieval period (late Eastern Han to early Tang)
By this stage, there are already semantic clusters, suggesting contact on a meaningful scale.
Color terms: Many Vietnamese color words derive from Chinese: hồng (红 ‘red’), vàng (黄 ‘yellow’), xanh (青 ‘blue/green’), tía (紫 ‘purple’), cam (柑 ‘orange’), biếc (碧 ‘bright blue/green’). Notably, vàng, tía, biếc belong to this Early Medieval layer. The word xanh in Vietnamese can mean both ‘green’ and ‘blue’, mirroring the older semantic range of 青 qīng in Classical Chinese. The well-known saying 青出于蓝而胜于蓝 (“Blue/green comes from indigo but surpasses indigo”) reflects this older linkage between shades of green and blue within 青. Vietnamese xanh preserves that ancient semantic field.
3.3 Late Medieval period
This is the peak of large-scale, structured Han–Viet contact and the period that gave rise to the Sino-Vietnamese reading system. Most of the 3,938 Sino-origin elements belong to this layer. As shown by Nguyễn Tài Cẩn, borrowings from the mid-Tang era provided the foundation for the Sino-Vietnamese readings that later became standard.
From the 10th century onward, after Vietnam entered an era of political independence, linguistic self-consciousness encouraged the development of a distinct national language. The Sino-Vietnamese reading system was gradually absorbed into the internal sound structure of Vietnamese and continued to evolve according to Vietnamese phonological rules. Sino-Vietnamese words became organically embedded in Vietnamese and to this day remain productive for forming new terms to express modern concepts, e.g. nhiệt kế ‘thermometer’, khí quyển ‘atmosphere’, vi mô ‘micro-’, vĩ mô ‘macro-’, thị trường ‘market’, công ty ‘company’, công nghệ cao ‘high technology’, toàn cầu hóa ‘globalization’.
3.4 From the 10th century onward
Even after independence, contact did not end. Literary Chinese continued to feed new borrowings. At the same time, continuing waves of migration from southern Chinese regions (Cantonese, Teochew, Hokkien, etc.) introduced new Sino-origin words via spoken channels, especially in southern Vietnam. Words such as xá xíu (barbecued pork), hủ tiếu (rice-noodle soup), lẩu (hotpot), quẩy (fried cruller), xì dầu (soy sauce), sa tế (satay-style chili paste), bò bía (spring rolls) are vivid examples. These forms are colloquial, culinary, and everyday in flavor—quite different from the elevated, classical tone of formal Sino-Vietnamese vocabulary.
In short, studying how these Sino-origin layers are distributed over time gives us a powerful way to reconstruct the linguistic history of Han–Viet contact.
NEW INSIGHTS FROM THE FINDINGS
4.1 “Sino-Vietnamese readings”: a historically layered system
One key finding is that the Sino-Vietnamese reading system is not a single homogeneous block created all at once. Rather, it is historically layered. Sino-Vietnamese forms appear across all four strata: 37 items in the Archaic layer, 256 in the Early Medieval layer, most in the Late Medieval layer, and 10 in the Later/Early Modern layer. This shows that some very old borrowings were never replaced by later “standard” Tang-era readings; they persisted into what we now consider Sino-Vietnamese. Examples include: nghĩa (义 ‘meaning’), địa (地 ‘earth’), thìa (匙 ‘spoon’), thuộc (属 ‘to belong’), tý (子, calendrical cycle), bính (丙, calendrical cycle), đóa (朵 ‘classifier for flowers’).
4.2 “Vietnamization” does not always mean “modified Sino-Vietnamese”
Traditionally, many forms that differ from standard Sino-Vietnamese readings have been explained as “Sino-Vietnamese that later got localized.” However, stratification shows a different reality: many such words—gương ‘mirror’ (镜), gươm ‘sword’ (剑), ghế ‘chair’ (几), ghi ‘to record’ (记), gần ‘near’ (近), dừng ‘to stop’ (停), giếng ‘well’ (井)—were in fact borrowed very early, before the Sino-Vietnamese system crystallized. They are not “later distortions” of Sino-Vietnamese; they are older borrowings that preserve pre–Sino-Vietnamese phonology. Calling them “Vietnamized Sino-Vietnamese” obscures their true historical depth.
SIGNIFICANCE OF THE STUDY
This study is not only for linguists; it speaks to anyone interested in the origins of Vietnamese and the broader history of Vietnamese culture.
– Clarifying the history of Vietnamese: By extracting and decoding the historical information embedded in Sino-origin vocabulary, we gain a more concrete and nuanced understanding of how Vietnamese has received, filtered, and naturalized external elements across different eras.
– Supporting teaching and learning: For learners of both Vietnamese and Chinese, understanding historical sources, etymological connections, and regular sound correspondences helps organize vocabulary in a systematic, intellectually coherent way.
– Contributing to interdisciplinary research: Language is a faithful record of history and culture. By dating borrowings, we obtain valuable evidence to verify, supplement, and illuminate historical events and the long, layered process of Sino–Vietnamese cultural exchange spanning more than two thousand years.
CONCLUSION
Sino-origin vocabulary is a structural component of Vietnamese, contributing to its richness, nuance, and expressive flexibility. Tracing the origins and development of these elements does more than clarify the linguistic past; it gives us a clearer view of the diversity and vitality of Vietnamese today.
Through the lens of historical phonological comparison, the work of Assoc. Prof. Dr. Xian Manxue lifts the veil of time and reveals the living record of deep cultural contact between Vietnam and China—a relationship inscribed, with great subtlety and persistence, into the languages of both peoples. Language here is not only a tool of communication, but a historical witness, a cultural bridge, and a shared heritage that deserves ongoing study and care.

A Chance Discovery at Dâu Pagoda
In the late 1980s and early 1990s, a group of researchers from the Institute of Hán-Nôm Studies paid a visit to Dâu Pagoda (also known by its Sino-Vietnamese names Diên Ứng Tự, Pháp Vân Tự, or Cổ Châu Tự) in Thuận Thành, Hà Bắc (today part of Bắc Ninh Province). The trip was part of a field investigation to collect textual materials in Hán and Nôm scripts from this famous ancient pagoda of the Kinh Bắc region.
At that time, in a side chamber to the right of the ancestral hall stood a storeroom packed with all kinds of household and farming tools—winnowing baskets, trays, bamboo sieves, water scoops, and so on. Quite by chance, the researchers noticed among the clutter several engraved woodblocks belonging to the work Cổ Châu Pháp Vân Phật bản hạnh ngữ lục 古珠法雲佛本行語錄. Remarkably, the entire set of woodblocks remained complete.
This work was already known to scholars, as a printed copy had been preserved at the Institute of Hán-Nôm Studies under the reference number A.818, collected earlier by the École française d’Extrême-Orient (EFEO) in Hanoi. Yet, what made the discovery particularly exciting was that alongside this known text, the researchers also found two other related woodblock sets: a vernacular Nôm verse work titled Cổ Châu Phật bản hạnh 古珠佛本行 and a Hán prose ritual text titled Hiến Cổ Châu Phật tổ nghi 献古珠佛祖儀. Notably, printed copies of these two works were absent from EFEO’s collection and had never been recorded in the holdings of the Institute of Hán-Nôm Studies at that time.
Three Works, Three Genres, One Thematic Thread
The three works are closely connected in content. Both Cổ Châu Pháp Vân Phật bản hạnh ngữ lục and Cổ Châu Phật bản hạnh recount the legend of Lady A Man and the deities of the Tứ Pháp (Four Dharma Goddesses), praising their virtues and miracles. The Hiến Cổ Châu Phật tổ nghi, meanwhile, contains ritual texts—prayers and offerings—expressing reverence toward the Buddhas.
While the first two share the same narrative basis, they differ in form. The Cổ Châu Pháp Vân Phật bản hạnh ngữ lục is a bilingual Hán-Nôm prose text: the original Hán passages are followed by vernacular Nôm translations, engraved in smaller script. In contrast, Cổ Châu Phật bản hạnh is entirely written in Nôm using the popular lục bát (6–8 syllable) verse form, typical of oral Vietnamese tradition. The third, Hiến Cổ Châu Phật tổ nghi, is composed entirely in Classical Chinese prose. Within the pagoda, these were familiarly referred to as Cổ Châu lục, Cổ Châu hạnh, and Cổ Châu nghi, respectively.
Cổ Châu lục: A Unique Bilingual Text
The Hán text of Cổ Châu lục contains nearly 2,100 characters and is described as an ancient transmission of unknown authorship. Each phrase or sentence in Chinese is immediately followed by its Nôm translation, engraved at half-size—a “line-by-line translation” format characteristic of Lê–Trịnh period Hán-Nôm works. According to the colophon engraved on the blocks, the carving was completed in the autumn of the 13th year of the Cảnh Hưng era (1752, Lê dynasty). Based on textual and linguistic evidence, scholars infer that the Hán prototype may date back to the mid–late 14th century (late Trần dynasty).
The Nôm portion, totaling 2,360 characters, was rendered by a figure named Viên Thái (identity unknown). The script displays both phonetic loan characters and phono-semantic compounds, though pure phonetic loans dominate. Many words appear in archaic phonetic forms that were later replaced by phono-semantic ones—for example: ba (巴), tên (先), nay (尼).
The Nôm translation also preserves numerous archaic expressions and lexical items, such as “Hay chưng thầy thửa ở” (“to know where the master dwells”) and “Thực thời lỗi ấy chưng ai” (“Truly, whose fault was it?”), as well as ancient words like 谷 cóc ‘to know’, 沃 óc ‘to call’, 合 hợp ‘should’, and 羅𥒥 la-đá ‘stone’. Such features typify Nôm prose of the 16th–17th centuries. The word la-đá 羅𥒥, for instance, is only attested in texts predating the late 17th century, appearing no later than Alexandre de Rhodes’s Dictionarium Annamiticum Lusitanum et Latinum (1651). By Pigneau de Béhaine’s Latin–Vietnamese Dictionary (1773), only the monosyllabic đá remained.
These linguistic characteristics, together with the “line-by-line” translation format, suggest that the Nôm version of Cổ Châu lục likely dates from the late 16th to early 17th century.
Cổ Châu hạnh: A Lục-bát Nôm Poem
The Cổ Châu hạnh consists of 246 lục-bát verse pairs (about 3,450 characters). The author is unknown, but internal evidence offers clues to its dating. It references the Hồng Đức reign (1470–1497) of King Lê Thánh Tông and refers to China as “Đại Minh,” the Ming dynasty’s contemporary name (1368–1644). These clues suggest that the work was composed between 1470 and 1644, though this range is broad. Given linguistic parallels with Cổ Châu lục, it is plausible that Cổ Châu hạnh was composed contemporaneously or slightly later—again, around the late 16th to early 17th century.
Cổ Châu nghi: Ritual Texts for the Buddha
The Cổ Châu nghi comprises over 1,500 Chinese characters written in Classical prose. An inscription on its opening block states that the text was an old composition of uncertain date, newly engraved at Dâu Pagoda in the year Nhâm Tý (1792), the fifth year of Emperor Quang Trung’s reign under the Tây Sơn dynasty.
A Return from Oblivion
Although scholars had long been aware of the existence of old woodblocks at Dâu Pagoda, they remained unstudied and untouched until the Institute’s researchers rediscovered them in the late 1980s–early 1990s.
In 1995, the Cổ Châu woodblock trilogy and related texts were finally published in the volume Di văn chùa Dâu (Hán-Nôm Inscriptions of Dâu Pagoda), edited by Prof. Nguyễn Quang Hồng—marking the long-awaited return of these cultural treasures from centuries of obscurity.
(Adapted from “The Cổ Châu Woodblock Trilogy and the Hán-Nôm Heritage of Dâu Pagoda,” in Prof. Nguyễn Quang Hồng’s Ngôn ngữ. Văn tự. Ngữ văn [Language, Script, Literature]

Hồ Xuân Hương’s title as “The Queen of Nôm Poetry” (a phrase coined by poet Xuân Diệu) perhaps best reflects the immense influence and rebellious force of her verse. Born around 1772, at the end of the Lê dynasty (which lasted from 1592–1788), she is thought to have been the daughter of Hồ Sĩ Danh (1706–1783) or Hồ Phi Diễn (1703–1786), and the child of a concubine. Because of her social position, she likely did not receive the same formal education as men or royal women. Yet she composed poetry in both classical Chinese and Nôm (Vietnamese demotic script)—around 150 poems are recorded—and became especially renowned for her Nôm works.
Instead of adhering to rigid conventions, allusions, and classical references typical of Sino-cultural poetry, Hồ Xuân Hương’s choice to write in Nôm—the vernacular language of everyday life—freed her from Confucian literary constraints, allowing her to speak with a distinct personal voice. Her inspiration sprang from the most ordinary experiences: “a betel quid,” “a floating cake,” “a jackfruit on a tree,” “a snail’s lot,” “a patch of foul grass,” and even more abstract expressions like “the beauty” (in Self-Lament). One of her special talents lies in expressing delicate or elusive ideas through earthy, colloquial imagery.
As noted in The Annotated Nôm Dictionary (edited by Prof. Nguyễn Quang Hồng), the word cái (丐) has multiple meanings: it can designate an object, mark femininity (in contrast to masculinity), mean “mother,” or refer to something greater (as in “trống cái” – the main drum, “đường cái” – main road). Meanwhile, hồng nhan (紅顔, literally “rosy face”) evokes a more elusive sense—perhaps a beautiful young woman, perhaps destiny or romance (since hồng can mean both “red” and “fate”). When combined as cái hồng nhan, femininity is embodied yet impossible to define precisely—a poetic tension between presence and abstraction.
Although her poem titles often suggest still scenes or quiet objects—“The Jackfruit,” “Autumn Scene,” “The Snail,” “The Floating Cake,” “The Crab”—each line is full of motion. Hồ Xuân Hương’s poetry does not merely describe things as static objects but speaks through them, claiming agency even in seemingly modest tones:
“My body is white, my fate is round” (The Floating Cake)
“My body is like a jackfruit on the tree” (The Jackfruit)
“I wear a green tunic, a yellow bodice / Three soldiers carry my palanquin, high and proud” (The Crab)
Her verbs dominate the verse, imbuing it with a sense of vitality and rhythm. Sometimes action even precedes the subject, creating a world brimming with energy and embodiment:
“Green wraps the tree trunk with a rounded crown,
White spreads across the calm, silent stream.” (Autumn Scene)
The movement in Hồ Xuân Hương’s poetry also resides in her layered wordplay. She was a master of “đố tục giảng thanh” (using the vulgar to express the pure) and “đố thanh giảng tục” (using the pure to express the vulgar)—fusing the sacred and profane through clever double meanings. It was this semantic dynamism that gave her poetry such vitality. Drawing from familiar natural and domestic imagery, she hinted at erotic and reproductive themes:
“The white bridge, two planks joined as one,
Clear water flows straight below!
Wild grass curls along the edges,
Tiny fish dart through the stream.”
In a Confucian society where female chastity defined virtue and sexuality was taboo, such multi-layered verses became subtle acts of resistance against moral repression. Yet Hồ Xuân Hương’s poetry transcends the simple dichotomy of sacred and profane. Its multiple layers often reflect broader social, religious, and even national questions—as seen in the word nước non (“water and mountains”) in The Floating Cake, linking personal fate to the destiny of the homeland and its folk roots.
Image source: Works by Nguyen Quoc Thang and Nghiem Nhan, published on VOV.