AI-ready Archives: Preparing Archival Data for the AI Era

As Artificial Intelligence (AI) is increasingly discussed in the archival field, the question is no longer “Should we use AI?” but rather “How should we prepare archival data so that AI does not undermine the foundational principles of archival practice?”
In February 2026, Prof. Giovanni Colavizza and Prof. Lise Jaillant published the AI Preparedness Guidelines for Archivists, issued by the Archives & Records Association (UK & Ireland) under a CC BY license. In this document, the authors emphasize a central point: AI is only truly useful when archival collections are carefully prepared in terms of data, metadata, structure, and evaluation mechanisms.
Below are the key elements of the guidelines that may be of particular interest to the archival community in Vietnam.
The guidelines distinguish between two main types of AI models commonly applied in archives:
1. Task-Specific AI
These models are trained to perform clearly defined tasks, such as:
2. Generative AI
These models generate language and can:
An important approach highlighted in the guidelines is RAG (Retrieval-Augmented Generation). In this model, the AI system first retrieves relevant material from a well-prepared collection, and only then generates content based on the retrieved data. This approach helps reduce “hallucinations” (AI generating information not present in the source material) and improves accuracy.
1. Completeness and Excluded Data
It is not necessary to digitize 100% of a collection in order to apply AI. However, it is essential to:
This is especially important for Generative AI, since AI can only reflect what is present in the data.
2. Metadata and Access Conditions
AI cannot function effectively if metadata is incomplete or fragmented.
It is necessary to ensure:
The guidelines particularly emphasize the value of narrative metadata, such as curatorial notes, historical context, and critical analysis. These elements help AI systems better understand cultural depth, power dynamics, and layered meanings within archival materials.
3. Data Formats and File Structure
Preparing data for AI does not mean “cleaning” it in ways that disrupt the original archival structure.
Instead, it is important to:
This is particularly relevant for systems using IIIF, OCR pipelines, or databases integrated with vector search technologies.
4. Application-Specific Evaluation
Each AI application requires its own set of evaluation metrics, rather than relying on generic criteria. For example:
Defining evaluation methods from the outset helps ensure that AI delivers practical value rather than functioning merely as a technological experiment.
Before launching an AI project, you should be able to answer “yes” to most of the following questions:
The most important step is not deploying another tool, but investing in AI data preparedness.
Preparing data for AI is essentially an extension of the core principles of archival practice: thorough documentation, preservation of context, structural transparency, and professional accountability. When this foundation is strong, AI can become a supportive tool rather than a force that compromises the value and integrity of archival collections.