Do you know what a CORPUS is? Have you ever used a corpus before in your translation career? Have you ever felt stuck with a simple phrase in your source language and can’t find an equivalent that runs as smoothly as your original? If you answer no to the first two questions, but yes to the third, read on!
When your dictionaries, glossaries, google searches, and other terminological tools can’t help you get a seamlessly translated phrase, and when your brains are fogged up, you may end up with an accurate, but clunky phrase that does a disservice to your client. Whether you’re localizing a product or translating anything from a report to a slogan, these clunky phrases, in the best-case scenario, can lead to a clunky reading experience for your target audience, and in the worst case, they can alienize the page for the end user or reader and turn them away. The results for your client can be catastrophic. As many localization, translation and engineering experts know, the devil is in the details.
Corpus can help in these cases.
Let’s dive in.
A corpus is a large collection of text, written, spoken or both, stored in a database. It can contain many millions or even billions of words, coming from books, newspapers, magazines, journals or works of literature that have been scanned or downloaded electronically. Some corpuses may also contain spoken language coming from transcripts of ordinary conversations, like phone calls, business meetings, conferences, parliamentary meetings, or even radio broadcasts and TV shows.
Corpuses show how language is used in society, in real life. When you translate, you’re creating a real-life text that will live in a new language, in a new society. You need to reflect how that society speaks, to speak in their language and reach out to them, for your client. If you don’t, your business may suffer.
With a corpus, we no longer have to rely so heavily on intuition to know whether a particular adjective goes well—or to use the technical word, collocates—with a certain noun, or whether a word is usually used in a particular context. Instead, we can see what hundreds of different speakers and writers have actually said or written before.
Let’s see a sentence that I was struggling with a few months ago when translating a social justice article from English into Spanish for the Hispanic community in the US:
“The ability of qualified election officials to conduct legitimate audits of their own has gained urgency with partisan actors fueled by the Big Lie conducting partisan reviews that spread false information about elections and undermine confidence in our democracy.”
This sentence is structurally pretty complex in English, and you’ll need a fairly good knowledge of syntax to break up its parts and piece them all together in your target language. I’m not going into any of that now, but you can see this other blog post about challenging syntax. I just want to concentrate on the phrase “has gained urgency.” This is a good collocation in English. The verb “gain” collocates with the noun “urgency”. If we read instead “has taken urgency”, it will probably make us pause, hesitate. If this were a slogan, it would be a big flaw. The same happens in your target language. Not any verb will collocate with the translation of the noun “urgency.” In my target language, Spanish, things get a bit more complicated because we’re so used to reading false friends, calques and literal translations that it’s hard sometimes to separate the wheat from the chaff—there are million examples of literal translations; a heavily broadcast recent example is Will Smith’s slapping remark calqued into Spanish, some of the examples are here, here, and here.
So in this example, if we go the literal route, we could say “ganar urgencia,” but does this ring naturally? No. So let’s do what translators do a lot of: find synonyms.
Ganar is synonymous with:
lograr
adquirir
adueñarse
triunfar
vencer
aventajar
exceder
sobrepujar
superar
dominar
conquistar
tomar
cobrar
alcanzar
llegar
captar
granjear
atraer
prosperar
mejorar
At this point, we’re relying on our intuitive knowledge of our target language to decide which one goes well or collocates with urgencia. But what if we can confirm our hunches with a corpus? So of all these synonyms, I’ve narrowed down my options to “ganar urgencia,” “adquirir urgencia,” “tomar urgencia,” and “cobrar urgencia.” But which one to choose? Corpuses can help us in a way no other tool can.
This is one example of the many corpuses out there. See the reference list at the end for more examples.
So let’s try our narrowed down options, “ganar urgencia,” “adquirir urgencia,” “tomar urgencia” and “cobrar urgencia.” Watch the search live in the video below. I invite you to learn a new free tool to translate and localize better in six minutes.
If you’re interested in learning more, you can watch my 1-hour webinar clicking in the button below.
REFERENCES
Bilingual/Multilingual: Open-Source Parallel Corpus, OPUS: https://opus.nlpl.eu/index.php
Spanish: CORDE, diachronic, from the beginning to 1974: https://corpus.rae.es/cordenet.html
CREA, contemporary, collecting spoken and written texts from 1974 to 2004: https://corpus.rae.es/creanet.html.
CORPES (beta version), written texts from 2001 to 2020: https://apps2.rae.es/CORPES.
English: Some are the American Contemporary English Corpus, or COCA, the Corpus of Historical American English, COHA, the News on the Web NOW Corpus, the TV Corpus. Usefully collected in https://www.english-corpora.org/ (free but registration required).