Application of AI/ML in Large-scale Digital Archive Collections

Closes:
Opening date: 1st of Feb 2025

Introduction 

The digitization of historical records (also called “digital archives”) has opened a variety of large-scale digital collections to the world. The scale and complexity of digital archives are posing enormous challenges for both researchers and memory institutions. One of the generally acknowledged challenges by the archive community is discovering, using, and analyzing digital archives for public service. It is impossible for archivists or historians alone to find a magical solution that will instantly make digital records more accessible and useful [Carter et al., 2022; Hawkins, 2022,;Jaillant et al., 2022; Jaillant and Caputo, 2022). Recently, the computational archival science (CAS), which integrates computational methods and tools, such as artificial intelligence/machine learning (AI/ML) to the archival field to address large-scale digital records/archives processing, analysis, storage, and access (Hedges et al., 2022), has been proposed and identified as a novel and effective approach to resolve challenges.

To promote the applications of AI/ML in digital archives, founding agencies, such as the Institute of Museum and Library Services (IMLS), National Historical Publications & Records Commission (NHPRC), and The National Endowment for the Humanities (NEH) in the United States have been funding more and more grants related to this area. Although scholars from different communities, such as information science and computer science, have explored the applications of natural language processing (NLP), semantic analysis (SA), and computer version (CV) on different archival collections, such as oral history, culture heritage, and historical newspapers (Ali et al., 2024; Chen et al., 2024,; Wang et al., 2021,; Wang et al., 2024), there are still some obstacles for archive professionals and researchers putting computational methods, especially advanced techniques, into practice:

  1. Lacking high-quality, large-scale, and open-sourced corpus for developing effective ML/DL models.
  2. Lacking AI/ML tools that were developed for processing, annotating, analyzing, and visualizing large-scale multivariate heterogeneous archival data.
  3. Lacking case studies and real-world applications that can help professionals and researchers to understand how to use AI/ML tools for dealing with large-scale archival collections in different scenarios. With the development of generative AI, such as ChatGPT (Haleem et al., 2022; Spennemann, 2023; Zhang et al., 2023), it is even more beneficial and urgent to utilize AI/ML to enhance the understanding of large-scale digital archive collections.

Therefore, we propose this Special Issue to gather researchers, professionals, and practical users to initiate a collaborative platform for exchanging ideas, sharing pilot studies, and scoping future directions on this cutting-edge venue. 
This Special Issue aims to bring researchers and practitioners in archival science, library science, computer science, data science, and relevant disciplines to understand the challenges, investigate the problems, propose methods, exchange ideas, share resources, and look for new research directions in the applications of AI/ML in large-scale digital archive collections.

This Special Issue focuses on developing new tasks, datasets, theories, techniques, and applications for using AI/ML in large-scale digital archive collections, and conducting user studies in AI-driven archival systems, as well as investigating the role of AI/ML techniques in enhancing the public engagement with large-scale digital archive collections. The Special Issue endeavors to publish research and practice on developing new tasks, datasets, and frameworks for using AI/ML in large-scale digital archive collections in different contexts, the practices of different machine learning and deep learning techniques for information extraction, organization, retrieval, recommendation, and visualization from/based on large-scale digital archive collections, and user studies in understanding the impact of AI/ML use in large-scale digital archive collections.  

List of Topic Areas

  • User information needs of AI/ML for archival collections.
  • High-quality and large-scale dataset for building AI/ML applications on archival collections
  • Novel framework and method for utilizing AI/ML for largescale archival collections.
  • Information extraction from large-scale archival collections using ML/DL, NLP, CV, and other techniques.
  • Automated cataloging, indexing, and query expansion for semantic retrieval on large-scale archival collections.
  • Topic modeling, thematic analysis, and sentiment analysis based on large-scale archival collections.
  • Knowledge graph construction and application for largescale archival collections.
  • AI/ML tools for automated translation, transcription, and metadata Generation.
  • AI/ML tools for semantic search and tagging, interactive data visualization, personalized recommendations, and others.
  • Applications of large language model/vision model/multimodal model and generative AI for large-scale archival collections.
  • AI/ML for large-scale and multi-modal archival collections. 
  • Ethical considerations in AI/ML for archives.
  • User-centered design in AI-driven archival systems.
  • Challenges and future directions in AI/ML for archives
     

Guest Editors

Haihua Chen, University of North Texas, USA, [email protected]

Xiaoguang Wang, Wuhan University, China, [email protected]

Le Yang, University of Oregon, USA, [email protected]

Wayne de Fremery, Dominican University of California, USA, [email protected]

Submissions Information

Submissions are made using ScholarOne Manuscripts. Click Here to Submit! 

Author guidelines must be strictly followed. Journal Webpage Here!

Authors should select (from the drop-down menu) the special issue title at the appropriate step in the submission process, i.e. in response to ““Please select the issue you are submitting to”. 

Submitted articles must not have been previously published, nor should they be under consideration for publication anywhere else, while under review for this journal.

Key Deadlines

Opening date: 1st of Feb 2025
Closing date: 31st of May 2025  

References:

Ali, D., Milleville, K., Verstockt, S., Van de Weghe, N., Chambers, S. and Birkholz, J.M. (2024), “Computer vision and machine learning approaches for metadata enrichment to improve searchability of historical newspaper collections”, Journal of Documentation, Vol. 80 No. 5, pp. 1031-1056. https://doi.org/10.1108/JD-01-2022-0029
Carter, K.S., Gondek, A., Underwood, W., Randby, T. and Marciano, R. (2022), “Using AI and ML to optimize information discovery in under-utilized, Holocaust-related records”, AI & Society, Vol. 37 No. 3, pp. 837-858.
Chen, H., Kim, J.A., Chen, J. and Sakata, A. (2024), “Demystifying oral history with natural language processing and data analytics: a case study of the Densho digital collection”, The Electronic Library, Vol. 42 No. 4, pp. 643-663.
Haleem, A., Javaid, M. and Singh, R.P. (2022), “An era of ChatGPT as a significant futuristic support tool: A study on features, abilities, and challenges”, BenchCouncil Transactions on Benchmarks, Standards and Evaluations, Vol. 2 No. 4, p. 100089.
Hawkins, A. (2022), “Archives, linked data and the digital humanities: increasing access to digitised and born-digital archives via the semantic web”, Archival Science, Vol. 22 No. 3, pp. 319-344.
Hedges, M., Marciano, R. and Goudarouli, E. (2022), “Introduction to the special issue on computational archival science”, ACM Journal on Computing and Cultural Heritage (JOCCH), Vol. 15 No. 1, pp. 1-2.
Jaillant, L., Aske, K., Goudarouli, E. and Kitcher, N. (2022), “Introduction: Challenges and prospects of born-digital and digitized archives in the digital humanities”, Archival Science, Vol. 22 No. 3, pp. 285-291.
Jaillant, L. and Caputo, A. (2022), “Unlocking digital archives: Cross-disciplinary perspectives on AI and born-digital data”, AI & Society, Vol. 37 No. 3, pp. 823-835.
Spennemann, D.H. (2023), “ChatGPT and the generation of digitally born ‘knowledge’: How does a generative AI language model interpret cultural heritage values?”, Knowledge, Vol. 3 No. 3, pp. 480-512.
Wang, X., Song, N., Liu, X. and Xu, L. (2021), “Data modeling and evaluation of deep semantic annotation for cultural heritage images”, Journal of Documentation, Vol. 77 No. 4, pp. 906-925.
Wang, X., Zhao, K., Zhang, Q. and Liu, C. (2024), “Digital deduction theatre: An experimental methodological framework for the digital intelligence revitalisation of cultural heritage”, Intelligent Computing for Cultural Heritage, Routledge, pp. 203-220.
Zhang, S., Hou, J., Peng, S., Li, Z., Hu, Q. and Wang, P. (2023), ArcGPT: A Large Language Model Tailored for Real-world Archival Applications, arXiv preprint arXiv:2307.14852.