Digitizing Việt Nam

Open by Default in the AI era

Lê Nguyễn Tường Vân · February 12, 2026

In her article “The Cost of Open by Default in the AI Era: Can We Protect Donor Materials from Generative AI?” (30 January 2026), Rosalyn Metz, Chief Technology Officer for Libraries and Museums at Emory University, raises a foundational question for archives and cultural heritage institutions:

When generative AI can collect, synthesize, and commercialize data at unprecedented scale, is the “open by default” model still sustainable?

Metz’s article is not merely a reflection on technology; it is a warning about a fundamental shift in the digital knowledge ecosystem.

Four Pressures Undermining “Open by Default”

According to Metz, archival institutions are currently facing four major pressures:

1. Harvesting Knowledge Frameworks

AI companies are not only collecting content; they are also extracting the classification systems, annotations, and knowledge structures built over decades. As Metz writes, they are not merely “scraping content,” but exploiting—often for profit—the frameworks institutions have created to describe, relate, and interpret that content (“the frameworks we have built to describe, relate, and explain the content”).

2. The “Infrastructure Tax”

Data-harvesting bots can overload systems and disrupt access for legitimate users. Metz describes this as a form of attack, where bots consume so many system resources that human users are slowed down or locked out entirely—a “denial-of-service attack against our human users.”

Libraries are effectively paying an “infrastructure tax” to maintain open access in the face of such activity.

3. Harvesting Physical Collections

The revival of large-scale digitization initiatives such as Google Books raises another concern: companies do not use data just once. They return whenever they build new models. Meanwhile, libraries typically receive only a one-time payment.

4. Erosion of Trust

Perhaps Metz’s most urgent concern is the erosion of trust between donors and archival institutions.

When donors give their complete works or collections, they expect protection. Yet today, as Metz acknowledges, institutions cannot provide absolute guarantees that materials will not be scraped, ingested, and commercialized by AI systems.

Contractual Barriers and Technical Challenges

Metz analyzes new contractual clauses requiring institutions to prevent:

The use of materials to train generative AI models
The imitation of an author’s voice or style
The creation of substantially similar derivative works

While contracts may permit internal AI uses (such as OCR or metadata generation), the larger question remains: How can these terms be enforced when the internet remains an “open buffet” of content? Metz argues that strict compliance leaves only two options:

Remove content from the public web; or
Require users to sign explicit legal agreements.

Both solutions run counter to the open-access ethos heritage institutions have championed for decades.

A Middle Path?

Metz points to emerging standards such as:

Really Simple Licensing — a framework that embeds machine-readable licensing metadata directly into content;
Web Bot Auth — a mechanism requiring bots to authenticate before accessing content.

These aim to create machine-enforceable mechanisms to restrict or monetize bot access. However, until major legal cases—such as The New York Times Company v. Microsoft Corporation—reach final rulings, the legal boundaries governing AI training on public content remain unsettled.

A Perspective from Digitizing Việt Nam

For Digitizing Việt Nam, the questions Metz raises are especially urgent. The project is building digital infrastructure to expand access to materials related to Vietnam. Yet alongside open access comes an ethical responsibility to donors, authors, and communities.

“Open” cannot mean “implicitly available for unlimited extraction.”

Rosalyn Metz’s article reminds us that:

Digital infrastructure is not only a technical issue, but also a legal and ethical one.
Donor trust is the foundation of any digitization initiative.
We need models of openness that are intentional and controlled — “open with intention.”

In the AI era, the question is no longer whether we should open access, but: How do we open responsibly while still protecting knowledge and the people who entrusted it to us?