Transcribing Multilingual Documents in the Digital Age

Last month, I visited the University of Houston as part of the Recovering the US Hispanic Literary Heritage’s new US Latina/o Digital Humanities (#usLdh) Incubator. The scholars in this community have been working at the forefront of scholarly editing for decades by editing, publishing, and digitizing documents that are as multilingual and transnational as the United States.

In doing this work, these scholars have been forced to confront a challenge that is often sidelined in the discussion of digitization. Digitization has brought renewed attention to the difficulties of transcribing and encoding historical texts. But this work has been largely driven by Anglophone, monolingual projects.

As a result, Anglophone, monolingual textuality is hardwired into the systems and processes that we use to transcribe the historical record.

Elvía Arroyo-Ramirez has spoken brilliantly about this in the context of archival processing. We face similar problems when transcribing and encoding texts.

What does it mean for us as digital practitioners to use tools that were designed for a different cultural context? At the #usLdh Incubator, I started to answer this question by tracing a brief history of multilingual transcription. I also spoke about digital transcription options, and some of the challenges of using these tools on multilingual texts.

I’ll summarize some of the key points here, and my slides are online.

Copying Past

Transcription is always transformative, imposing new contexts onto old documents, but transformation is not necessarily deformation or defamation. Transcription can also revive or reveal a text.

The Codex Mendoza

Take the Codex Mendoza, a document believed to have been inscribed by Francisco Gualpuyogualcal and Juan González in New Spain in the 1540s. The first part of the Codex Mendoza is likely a transcription of a Pre-Columbian text, now lost. That text was then transcribed using alphabetic annotations written in Nahuatl, and translated into Spanish. [1]

Page from the Codex Mendoza shows a young man getting his education (top) and a wedding (bottom), with alphabetic Nahuatl transcription. The edge of the Spanish transcription on the opposite page is also visible.

Page from the Codex Mendoza. Image from the Bodleian Library.

Both the pictographic and alphabetic inscriptions are transcriptive. The former moves the text from one page to another; the latter, from one form to another. Both have the potential to remain faithful to an earlier copy.

But only the pictographic document can be a facsimile transcription, representing not just the meaning but its form on the page. And only the alphabetic Nahuatl would have looked like language to the royal Spanish recipients of the copied text.

The original document is lost, but we know that both Gualpuyogualcal and González changed the text as they transcribed it. The pictographic text was produced using European paper and ink, and uses a style of shading introduced by the Spanish. And the Nahuatl transcriptions are sparse and sometimes inaccurate representations of the pictographic text.

Is the alphabetic text sparse because González couldn’t fluently read the pictographic copy? Were the pen and ink chosen to impress the Spanish royalty? In transcription, we see both strategic transformation and misrepresentation.

El Título de Santa María Ixhuatán

Take another example, an alphabetic Nahuatl manuscript known as El título de Santa María Ixhuatán. The historian Margarita Cossich Vielman and linguist Sergio Romero believe that this is partially an alphabetic transcription of a lienzo, a much older document written in pictographic script. [2]

Photograph of a page from the título de Santa María Ixhuatán.

El Título de Santa María Ixhuatán. Image from the Nahuatl/Nawat Project.

But Cossich and Romero think it was written by someone who was not fully literate in pictographic Nahuatl. Where the earlier text used logograms (signs representing a word or phrase) and silabograms (signs representing syllables), the scribe read them as literal representations.

So in the Título, he writes of a place called Teohuanhuaco, “drawing of a tree, drawing of a priest.” But Cossich and Romero argue that it was more likely to be the more figurative Teowakwawnawako, “place beside the divine tree,” or Teokwawko, “place of the divine tree.”

Here, we see a scribe using transcription to revive an otherwise lost text. But we also see how his imperfect copying had an impact on our ability to associate this text with geographic locations or to compare it to other histories.

Copying Present

Literacy and legibility informed the ways that colonial texts were copied in the past. These factors continue to impact our ability to consume the texts of the past, and they can have real implications when we begin to treat transcribed texts as historical fact.

While not all transcriptions are political, in the cases that were described here, shifts in literacy and legibility are directly tied to colonial systems designed to shift power away from indigenous communities.

When we transcribe digitized texts, we enter a similarly interpretive field, and it is worth being attentive to the ways that our transcription processes engage with structures of power.

Many digital project managers use automatic transcription tools to make their scanned documents more searchable. But as I have written elsewhere, automatic transcription tools are not always ‘literate’ in historical orthographies or languages other than English, which can impact our ability to treat these texts as reliable objects of historical study. This is especially impactful when we are using transcriptions for corpus analytics.

The same is true in the case of crowd-sourced transcription. The manual transcription of documents in languages other than English can be more costly, because it requires more specialized knowledge. There is also an extra emotional cost to working with colonial documents that inscribe everyday processes of power and violence.

And what about the tools that we use to inscribe these texts? Most programming languages, command line operations, and web interfaces for transcription are written in English, and many are not configured to handle diacritics or special characters. Again, this means there is an additional barrier to entry to editing multilingual documents that is an unnatural result of systematic bias. As Brook Lillehaugan has explained in the case of the Ticha project, it is unreasonable to ask Zapotec-speaking collaborators to learn English so they can work in TEI.

Knowledge of English should not be a prerequisite for the digital editing of historical texts.


Copying Future

What can we do to address these transcription challenges? One thing is to ensure that scholars working with non-English texts (including those working outside the Latin alphabet) are at the table during the development of both automatic and crowdsourcing transcription tools.

This has been prioritized by teams like From the Pagea service for designing crowdsourced transcription projects. It is also the focus of groups like the Historical and Multilingual OCR project at Northeastern University, which is trying to set the agenda for automatic transcription of multilingual texts.

Another priority must be to think seriously about labor and citation practices. In the case of colonial transcription, historians have been working to recover the names of the indigenous scribes who wrote the documents, recovering their intellectual contribution to the historical record.

We have the same responsibility in the case of digitization. Digitization is intellectual labor which is often done by students and staff in contingent positions. Highlighting this contribution helps ensure that workers know their labor is valued, and better reflects the reality of collaborative digitization work.

It also helps our institutions understand this work, so we can make a case for better financial compensation.

It is only by consistently and publicly acknowledging the value of this work that we will be able to ensure that is taken seriously by funding institutions, hiring committees, and in cases of tenure and promotion.

And that is a necessary step in ensuring that the digitization of the historical record remains, at its foundation, intellectually sound and culturally sensitive.


Hannah Alpert-Abrams is a CLIR Postdoctoral Fellow in Data Curation and Latin American Studies at the University of Texas at Austin. Find her online at

[1] See Frances Berdan and Patricia Rieff Anawalt, The Codex Mendoza. University of California Press (1992). There is plenty to say about the Spanish translations of this text, too, but I don’t address it here.

[2] Margarita Cossich Vielman and Sergio Romero. “Lienzos prehispánicos y el título de Santa María Ixhuatán, Guatemala.” Asociación para el Fomento de los Estudios Históricos en Centroamérica, 2016. Online.












Incubator: Decolonizing the Digital Humanities

This past week, I had the opportunity to give a talk as part of Recovering the US Hispanic Literary Heritage’s new US Latina/o Digital Humanities (#usLdh) Incubator series. If you missed it, you can access our group notes on Google Drive or Storify. I’ve also started a Zotero Group with a growing bibliography related to US Latina/o Digital Humanities (which includes sources on DH, decolonial theory, postcolonial theory, and more). Feel free to join and contribute to the growing bibliography!

My talk and this blog post are not meant as an in-depth analysis of decolonizing DH, instead, my goal is to provide a brief overview of the relationship between coloniality and the archive as well as a discussion of decoloniality not just as a theory but also as a methodology. This is meant to serve as a springboard for further discussion on decolonial DH methodology.

Colonialism, History, and Archives

In order to begin a discussion on decolonizing the digital humanities, I think it’s important to first acknowledge the role of colonialism in creating or shaping the historical record. Archives help structure knowledge and history. In terms of the nation-state, national archives help to create an authoritative national narrative. The International Council on Archives, for example, describes archives on their webpage as follows: “Archives constitute the memory of nations and societies, shape their identity, and are a cornerstone of the information society” (International Council on Archives n.p.). Yet, the shadow of colonialism more often than not penetrates this archivization process, determining whose stories belong in the archive and how to frame the national historical narrative. As Frantz Fanon wrote in The Wretched of the Earth (1963):

[C]olonialism is not simply content to impose its rule upon the present and the future of a dominated country. Colonialism is not satisfied merely with holding a people in its grip and emptying the native’s brain of all form and content. By a kind of perverse logic, it turns to the past of the oppressed people, and distorts it, disfigures it and destroys it. (Fanon 210)

Examples of distortion of history include the erasure of indigenous histories and languages, the recasting of winners/losers, the erasure of people of Mexican descent from the Texas side of the Texas Revolution, and—as recent political controversies have highlighted—the romanticization of the US Confederacy and the US Antebellum South. Colonization, complete with its indigenous genocide and African slavery robbed people of their lives, their freedom, their religion, their names, their culture, their lands, their language, and more. Marginalized archives, which contain the forgotten history of oppressed peoples (what Rodrigo Lazo calls “migrant archives”), “reside in obscurity and are always at the edge of annihilation. They are the texts of the past that have not been written into the official spaces of archivization” (Lazo 37). While not all DH projects are digital archives, DH projects do create a type of archive as they structure knowledge/history, tell historical stories, preserve parts of the archive, prioritize certain people/languages/epistemologies, and more.

Postcolonialism vs. Decolonialism

It is, of course, difficult to boil down two large theoretical fields, but for the purposes of this discussion, I wanted to provide simplified versions. Postcolonial theory critiques the formal colonial matrix of power. You can think of it as a “macro” critique. It looks at the big picture of colonialism. Decoloniality, on the other hand, tends to focus on the details, an awareness of how our quotidian experience is coded through coloniality. It attempts to delink our history (and our present) from colonial legacy by parsing out how coloniality is at work in our lives. The decolonize your diet movement, for example challenges people to become aware of the origins of dishes and to move away from overly-processed foods. An example of a decolonial DH project is The African Origins project, which reinserts the human into the history of colonial transatlantic slave voyages. This project is an effort to identify the names and origins of Africans that were forcibly transported across the Atlantic on slave ships.

Decolonial theory is influenced by Latin American Marxist Dependency theory, Négritude African diaspora intellectuals (such as Aimé Césaire), and WOC feminists. Decolonial theory is rooted in postcolonial theory, but challenges postcolonial studies for using European points of reference.

(Digital) Methodology of the Oppressed

Decoloniality lends itself to pedagogy and methodology in the way that it seeks to question history and hegemonic structures. M. Jacqui Alexander and Chandra Talpede Monanty (1997) comment: “Decolonization has a fundamentally pedagogical dimension—an imperative to understand, to reflect on, and to transform relations of objectification and dehumanization, and to pass this knowledge along to future generations” (xxviii-xxix). One of the ways that decolonial theory approaches such a dimension is through what Emma Pérez (1999, 2003) calls the “decolonial imaginary.” It is through the decolonial imaginary that we can push back against colonial legacies that structure our lives, the decolonial imaginary, writes Pérez (2003),

… can help us rethink history in a way that makes agency for those on the margins transformative….The colonial mindset believes in a normative language, race, culture, gender, class, and sexuality….I propose a decolonial imaginary as a rupturing space, the alternative to that which is written in history….How do we contest the past to revise it in a manner that tells more of our stories? In other words, how do we decolonize our history? To decolonize our history and our historical imaginations, we must uncover the voices from the past that honor multiple experiences, instead of falling prey to that which is easy—allowing the white colonial heteronormative gaze to reconstruct and interpret our past. (123, emphasis mine)

What, then, does coloniality, postcoloniality, and decoloniality have to do with DH?

Koh, Adeline. “Why the World Needs #DHPoco, Part 2.” #DHPoco: Postcolonial Digital Humanities Tumblr. no. 32. 5 Dec. 2013.

Coloniality insists on the preference of Western ontologies and epistemologies and attempts to erase all non-Western forms of existing and knowing. It delegitimizes non-standard and non-Western languages and tries to put people and histories into strict categories related to language, nationality, gender, religion, etc. To approach DH from a decolonial methodology is to question whether your project reinforces coloniality/colonial thinking and to challenge yourself to delink your project from colonial structures.

Here are a few questions to ask yourself:

  • Who?
    • Who is being represented? Who is speaking? Whose history is it? Who is making the choices? Who is working on the archive? Who can access it? Who owns the items? Who houses them? Who is given credit for the work?
  • What?
    • What sort of items are being included? What types of knowledge are considered archivable? In what format/language are they? What is the medium/tool used to present/preserve/disseminate? What are possible ethical concerns?
  • How?
    • How are these choices being made? How are these items being categorized/tagged/labeled? How are they displayed?(Does it make sense to display this way?)

Another great resource for thinking through decolonial and postcolonial DH is the Social Justice and the Digital Humanities site, which emerged from a 2015 Humanities Intensive Learning and Teaching (HILT) course taught by Roopika Rosam and micha cárdenas.

You can enter into conversations about DH, decolonial theory, archives, and social justice on social media using related hashtags such as: #usLdh (US Latina/o Digital Humanities), #transformDH, and #DHpoco (postcolonial digital humanities).

Works cited

Alexander, M. Jacqui, and Chandra Talpede Mohanty. Feminist Genealogies, Colonial Legacies, Democratic Futures. Routledge, 1997.
Fanon, Frantz. The Wretched of the Earth. Translated by Richard Philcox, Grove Press, 2004.
Lazo, Rodrigo. “Migrant Archives.” States of Emergency: The Object of American Studies, edited by Russ Castronovo and Susan Kay Gillman, 2009, pp. 38–72.
Pérez, Emma. “Queering the Borderlands: The Challenges of Excavating the Invisible and Unheard.” Frontiers: A Journal of Women Studies, vol. 24, no. 2/3, 2003, pp. 122–31.
—. The Decolonial Imaginary: Writing Chicanas into History. Indiana University Press, 1999.
Social Justice and the Digital Humanities. Accessed 17 Nov. 2017.
Further Reading
Césaire, Aimé. Discourse on Colonialism. MR, 1972.
Gaertner, David. “Why We Need to Talk About Indigenous Literature in the Digital Humanities.” Novel Alliances, 26 Jan. 2017,
Gil, Alex. The (Digital) Library of Babel. Digital Humanities Summer Institute, Victoria, B.C.
Joseph, Etienne, et al. “Decolonising the Archive (DTA).” Decolonising the Archive (DTA), Accessed 8 Nov. 2017.
Kreitz, Kelley. “Toward a Latinx Digital Humanities Pedagogy: Remixing, Reassembling, and Reimagining the Archive.” Educational Media International, vol. 0, no. 0, Oct. 2017, pp. 1–13. Taylor and Francis+NEJM, doi:10.1080/09523987.2017.1391524.
Postcolonial Digital Humanities | Global Explorations of Race, Class, Gender, Sexuality and Disability within Cultures of Technology. Accessed 4 May 2013.
Risam, Roopika. Beyond the Margins: Intersectionality and the Digital Humanities. Vol. 9, no. 2, 2015. Digital Humanities Quarterly,
—. “Revising History and Re-Authoring the Left in the Postcolonial Digital Archive.” Left History, vol. 18, no. 2, 2015, pp. 35–46.
Sandoval, Chela. Methodology of the Oppressed. University Of Minnesota Press, 2000.

Storify PDF: Incubator: Decolonizing Digital Humanities

Lorena Gauthereau is a CLIR Postdoctoral Fellow at Recovering the US Hispanic Literary Heritage at the University of Houston. Find her online at