A digital hub to study pre-modern and modern Vietnam
A wooden strolled placard at the Thang Nghiem Temple, Han-Nom Collection
About Us

Digitizing Việt Nam marks a digital leap forward in Vietnam Studies through a Columbia - Fulbright collaboration, formalized through that began with a 2022 memorandum of understanding between the Weatherhead East Asian Institute and the Vietnam Studies Center. The Digitizing Việt Nam platform began with the generous donation of the complete archive by the Vietnamese Nôm Preservation Foundation to Columbia University in 2018.

Columbia University WeatherHead East Asian InstituteFulbright University Vietnam - Vietnam Studies CenterHenry Luce Foundation
VNPF LogoVNPF Logo
Study Vietnam through the Digital Lens

Delve into Vietnam's history, culture, and society through cutting-edge tools and curated resources tailored for scholars, students, and educators.

Explore our digital archive dedicated to preserving and academically exploring Vietnam's historical, cultural & intellectual heritage.

Engage creatively with Vietnam Studies — Use Digitizing Vietnam's specialized tools to approach the field with fresh perspectives and critical insight.

Discover and teach Vietnam Studies with impact — Explore curated syllabi, lesson plans, and multimedia resources designed to support innovative and inclusive learning experiences.

What's New

Latest news and discoveries from the digital front of Vietnamese heritage.

AI-ready Archives: Preparing Archival Data for the AI Era
February 12, 2026
AI-ready Archives: Preparing Archival Data for the AI Era

As Artificial Intelligence (AI) is increasingly discussed in the archival field, the question is no longer “Should we use AI?” but rather “How should we prepare archival data so that AI does not undermine the foundational principles of archival practice?”

 

In February 2026, Prof. Giovanni Colavizza and Prof. Lise Jaillant published the AI Preparedness Guidelines for Archivists, issued by the Archives & Records Association (UK & Ireland) under a CC BY license. In this document, the authors emphasize a central point: AI is only truly useful when archival collections are carefully prepared in terms of data, metadata, structure, and evaluation mechanisms.

 

Below are the key elements of the guidelines that may be of particular interest to the archival community in Vietnam.

 

What Does “AI-Ready” Mean?

The guidelines distinguish between two main types of AI models commonly applied in archives:

1. Task-Specific AI

These models are trained to perform clearly defined tasks, such as:

  • Classifying types of records
  • Extracting names, places, and dates
  • Detecting or flagging sensitive content

2. Generative AI

These models generate language and can:

  • Summarize records
  • Suggest descriptions or keywords
  • Answer user questions based on archival data

An important approach highlighted in the guidelines is RAG (Retrieval-Augmented Generation). In this model, the AI system first retrieves relevant material from a well-prepared collection, and only then generates content based on the retrieved data. This approach helps reduce “hallucinations” (AI generating information not present in the source material) and improves accuracy.

 

Four Pillars for AI-Ready Archives

 

1. Completeness and Excluded Data

It is not necessary to digitize 100% of a collection in order to apply AI. However, it is essential to:

  • Clearly state whether the dataset is complete or partial
  • Explain reasons for gaps (not yet digitized, legal restrictions, physical loss, etc.)
  • Document known biases (e.g., overrepresentation of certain social groups or historical periods)

     

This is especially important for Generative AI, since AI can only reflect what is present in the data.

 

2. Metadata and Access Conditions

AI cannot function effectively if metadata is incomplete or fragmented.

It is necessary to ensure:

  • At least minimal metadata at the item level
  • Clear preservation and representation of provenance and series structure
  • Explicit documentation of access conditions (open, restricted, closed)
  • Identification of the language(s) of both the materials and the metadata

     

The guidelines particularly emphasize the value of narrative metadata, such as curatorial notes, historical context, and critical analysis. These elements help AI systems better understand cultural depth, power dynamics, and layered meanings within archival materials.

 

3. Data Formats and File Structure

Preparing data for AI does not mean “cleaning” it in ways that disrupt the original archival structure.

Instead, it is important to:

  • Preserve original files and folder structures
  • Create standardized derivative copies for AI processing
  • Standardize formats (e.g., UTF-8 text or XML for documents; TIFF/JPEG for images)
  • Use clear and structured file naming conventions that can be accessed programmatically via APIs

     

This is particularly relevant for systems using IIIF, OCR pipelines, or databases integrated with vector search technologies.

 

4. Application-Specific Evaluation

Each AI application requires its own set of evaluation metrics, rather than relying on generic criteria. For example:

  • The percentage of AI-generated descriptions accepted with minor edits
  • Time saved per record
  • The rate of false positives when detecting sensitive content
  • User satisfaction with a RAG-based access system

     

Defining evaluation methods from the outset helps ensure that AI delivers practical value rather than functioning merely as a technological experiment.

 

Checklist Before Implementing AI

Before launching an AI project, you should be able to answer “yes” to most of the following questions:

  • Is there a clearly defined use case?
  • Do you understand the completeness (or partial nature) of the dataset?
  • Is there minimal metadata and documented provenance?
  • Are standardized derivative files available for AI processing?
  • Are there clear evaluation criteria?
  • Is there a human review mechanism in place (AI supports, but does not replace professionals)?

 

The most important step is not deploying another tool, but investing in AI data preparedness.

 

Preparing data for AI is essentially an extension of the core principles of archival practice: thorough documentation, preservation of context, structural transparency, and professional accountability. When this foundation is strong, AI can become a supportive tool rather than a force that compromises the value and integrity of archival collections.

Open by Default in the AI Era: How to Protect Donor Materials
February 12, 2026
Open by Default in the AI Era: How to Protect Donor Materials

In her article “The Cost of Open by Default in the AI Era: Can We Protect Donor Materials from Generative AI?” (30 January 2026), Rosalyn Metz, Chief Technology Officer for Libraries and Museums at Emory University, raises a foundational question for archives and cultural heritage institutions:

 

When generative AI can collect, synthesize, and commercialize data at unprecedented scale, is the “open by default” model still sustainable?

 

Metz’s article is not merely a reflection on technology; it is a warning about a fundamental shift in the digital knowledge ecosystem.

 

Four Pressures Undermining “Open by Default”

 

According to Metz, archival institutions are currently facing four major pressures:

 

1. Harvesting Knowledge Frameworks

AI companies are not only collecting content; they are also extracting the classification systems, annotations, and knowledge structures built over decades. As Metz writes, they are not merely “scraping content,” but exploiting—often for profit—the frameworks institutions have created to describe, relate, and interpret that content (“the frameworks we have built to describe, relate, and explain the content”).

 

2. The “Infrastructure Tax”

Data-harvesting bots can overload systems and disrupt access for legitimate users. Metz describes this as a form of attack, where bots consume so many system resources that human users are slowed down or locked out entirely—a “denial-of-service attack against our human users.”

Libraries are effectively paying an “infrastructure tax” to maintain open access in the face of such activity.

 

3. Harvesting Physical Collections

The revival of large-scale digitization initiatives such as Google Books raises another concern: companies do not use data just once. They return whenever they build new models. Meanwhile, libraries typically receive only a one-time payment.

 

4. Erosion of Trust

Perhaps Metz’s most urgent concern is the erosion of trust between donors and archival institutions.

When donors give their complete works or collections, they expect protection. Yet today, as Metz acknowledges, institutions cannot provide absolute guarantees that materials will not be scraped, ingested, and commercialized by AI systems.

 

Contractual Barriers and Technical Challenges

 

Metz analyzes new contractual clauses requiring institutions to prevent:

  1. The use of materials to train generative AI models
  2. The imitation of an author’s voice or style
  3. The creation of substantially similar derivative works
     

While contracts may permit internal AI uses (such as OCR or metadata generation), the larger question remains: How can these terms be enforced when the internet remains an “open buffet” of content? Metz argues that strict compliance leaves only two options:

  1. Remove content from the public web; or
  2. Require users to sign explicit legal agreements.

Both solutions run counter to the open-access ethos heritage institutions have championed for decades.

 

A Middle Path?

 

Metz points to emerging standards such as:

  1. Really Simple Licensing — a framework that embeds machine-readable licensing metadata directly into content;
     
  2. Web Bot Auth — a mechanism requiring bots to authenticate before accessing content.
     

These aim to create machine-enforceable mechanisms to restrict or monetize bot access. However, until major legal cases—such as The New York Times Company v. Microsoft Corporation—reach final rulings, the legal boundaries governing AI training on public content remain unsettled.

 

A Perspective from Digitizing Việt Nam

 

For Digitizing Việt Nam, the questions Metz raises are especially urgent. The project is building digital infrastructure to expand access to materials related to Vietnam. Yet alongside open access comes an ethical responsibility to donors, authors, and communities.

“Open” cannot mean “implicitly available for unlimited extraction.”

Rosalyn Metz’s article reminds us that:

  1. Digital infrastructure is not only a technical issue, but also a legal and ethical one.
  2. Donor trust is the foundation of any digitization initiative.
  3. We need models of openness that are intentional and controlled — “open with intention.”
     

In the AI era, the question is no longer whether we should open access, but: How do we open responsibly while still protecting knowledge and the people who entrusted it to us?

Digitizing Việt Nam at AAS 2026
February 11, 2026
Digitizing Việt Nam at AAS 2026

Session 319: Digital Horizons of Vietnam Studies 

Friday, March 13, 2026 | 11:00 AM – 12:30 PM PDT
 Vancouver Convention Centre (VCC), Room 115

 

At the 2026 Association for Asian Studies Annual Conference in Vancouver, Digitizing Việt Nam will stand at the center of a major roundtable on digital scholarship in the field: Session 319: Digital Horizons of Vietnam Studies: Collections, Collaborations, Creative Applications.

 

Organized and chaired by Cindy Nguyen (UCLA), the roundtable brings together leading scholars working across institutions and continents to reflect on how Vietnam Studies is being reshaped through digital collections, transnational collaboration, and experimental methodologies.

 

Over the past decades, Vietnam Studies has undergone profound transformation—expanding interdisciplinarily, incorporating multilingual scholarship, and deepening engagement with Vietnam and its global diaspora. As the field marks fifty years since the end of the Vietnam War/Second Indochina War/American War, it is also entering a new phase defined by digital infrastructures and public-facing scholarship.

 

This roundtable will focus on specific case studies and live demonstrations, emphasizing collaboration, methodological innovation, and training the next generation of scholars.

 

Featured Projects

 

  • George Dutton (UCLA) will discuss the digitization of the Đại Nam Nhất Thống Chí gazetteer, addressing key challenges in OCR for historical Vietnamese and Sino-Nôm materials, textual annotation strategies, and ensuring cross-disciplinary usability of large textual corpora.

     
  • Nguyễn Tô Lan (Vietnam Academy of Social Sciences) will present the Vietnam Buddhist Resource Digital Repository, a scholar-led initiative dedicated to collecting, digitizing, and providing free global access to resources on Vietnamese Buddhism, with particular emphasis on Sino-Nôm texts. The project models sustainable, open-access infrastructure rooted in scholarly collaboration.

     
  • Vy Cao (Luxembourg Centre for Contemporary and Digital History) will introduce the DISTAM-funded “RAG’it” project, applying Retrieval-Augmented Generation techniques to trace the development of neologisms across early twentieth-century Vietnamese periodicals such as Nam Phong and Tân Việt. The project demonstrates how AI tools can illuminate intellectual change within print culture.

     
  • Cindy Nguyen (UCLA) will present her “Social Worlds” project, drawing from a colonial multilingual Vietnamese encyclopedia to show how combining close reading with computational methods—such as vector space modeling—allows researchers and students to rethink interpretation as a core scholarly act.

     
  • John Phan (Columbia University) will present Digitizing Vietnam, a collaboration between Columbia University and Fulbright University Vietnam. The initiative is developing a digital hub for Vietnam Studies that integrates digitized collections, bibliographic tools, and pedagogical materials, expanding global access to Vietnamese historical resources.

     

Rather than traditional formal papers, the session is structured around short presentations and demonstrations, prioritizing engaged discussion, collective brainstorming, and the formation of new collaborations.

 

Additional Highlights in Vietnam Studies at AAS 2026

 

  • Roundtable Tribute: The Life and Work of Gerard Sasges
     Scheduled for Friday morning at 9:00 AM, this roundtable will honor the esteemed historian’s intellectual legacy. Appreciation is extended to Peter Zinoman for organizing this tribute at the eleventh hour.
     
  • VSG-Sponsored Panel (Session 1110):
     Constructing Socialism: State-Building and Governance in the Democratic Republic of Vietnam, 1954–1975
     Taking place Sunday morning, this panel will revisit foundational questions about governance and socialist state formation in the DRV. Thanks are due to the Vietnam Studies Group (VSG) selection committee and organizers for advancing this important conversation.

     

  • Session 621: Knowledge at the Margins
     Scheduled for Saturday morning, this panel—composed primarily of scholars from Southwest Jiaotong University (PRC)—will examine reading communities and the social life of texts in nineteenth-century Vietnam. It may represent one of the first Vietnam-focused AAS panels centered on scholars from a single Chinese institution, marking an important moment in the internationalization of Vietnam Studies. Another panel with participants largely from France further reflects the growing global scope of the field.

     

  • Session 114: Archives and Data in Dialogue (Thursday evening), organized by French early-career scholars, exploring multilingual archives, postcolonial methodology, and digital research practices in Vietnamese Studies.

 

As AAS 2026 approaches, Vietnam Studies will demonstrate both intellectual continuity and methodological transformation. For Digitizing Việt Nam, Session 319 will serve not only as a presentation platform but as a space to advance collaborative digital infrastructure—shaping how Vietnamese history, texts, and ideas will be accessed, studied, and taught in the years to come.

 

👉 Find the complete list of sessions and abstracts here.