Researchers explore machine learning to automate early modern text transcription ethically

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c
old text

Credit: UNSPLASH / CC0 public domain

Over the past two decades, mass digitization has considerably changed the landscape of learned research. The possibility of looking for digital transcriptions of sources for specific keywords saves precious time, and researchers are no longer limited to archives and libraries if they wish to comb through a text.

However, with the propagation of digital transcriptions, new concerns concerning the workforce required to allow such accessibility. A new article in The 16th century newspaper suggests that the methods for researchers to obtain transcriptions of modern digitized and nitatized sources while avoiding work practices contrary to ethics.

“Unlock digitized archives of the first modern impression: automatic transcription of the first modern printed books”, by the authors Serena Stracker and Kimberly Lifton, begins with a brief history of the two types of software used to produce transcriptions. The optical character recognition software (OCR) has proven to be well suited to the transcription of works from the end of the 19th century and 20th century, but current irregularities in modern early printing make the OCR inadequate for a reliable transcription of these sources.

Instead, the first modern researchers turned to handwritten text recognition technology (HTR). Transkribus, the main HTR software, allows users to consult transcription software models accessible to the public or train their own models. In their comparison of various HTR models tested on a selection of pages of four collections of example from the 16th century, Stricker and Lifton highlight the ability of Transkribus to facilitate the creation of transcription models specially designed by specifications of the desired source of a five -step scholar.

Using transkribus public models, researchers can generate the training data necessary to train their own very specific models. This process, according to the authors, does not make it “no longer necessary or desirable” to rely on outsourced workforce, such as the work of graduate students or workers on the world.

“With the precise and automated transcription of the first modern impression, no longer an objective, but a reality, the field of the first modern studies must consider what combination of human work and human automatic learning technology will be accepted, supported and will ultimately shape the future of research”, conclude the authors.

“It is only by emphasizing ethical work practices that researchers can avoid exacerbating inequalities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information:
Serena Stracker et al, unlocking the digitized archives of the first modern impression: the automatic transcription of the first modern printed books, The 16th century newspaper (2025). DOI: 10.1086/735052

Provided by the University of Chicago

Quote: Researchers explore automatic learning to automate modern text transcription at the start (2025, July 18) recovered on July 18, 2025 from https://phys.org/news/2025-07-explore-machine-automate-early-modern.html

This document is subject to copyright. In addition to any fair program for private or research purposes, no part can be reproduced without written authorization. The content is provided only for information purposes.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button