Choosing Machine Translation: The Trade-Off Triangle

Choosing Machine Translation: The Trade-Off Triangle

Choosing Machine Translation: The Trade-Off Triangle
October 18, 2021

Language service and e-discovery decision making is governed by a trade-off triangle of three basic business constraints: 1) quality, 2) speed, and 3) cost.

Are you looking for a low-cost solution? A fast turnaround? The highest possible quality? Well, choose two.

With any solution, these three variables sit in natural tension with each other. This blog series will navigate this trade-off triangle, explore those three variables, and take a deep dive into the consultative process wrapped around selecting the right solution and achieving maximum value from it. Our first topic: machine translation (MT).

Why Use Machine Translation

While MT might be a catch-all phrase for software-driven translation, there’s wide variation in cost, speed, and quality. The “right” option in one scenario will almost certainly be much less appropriate in another.

Take for example the following request: “I have 550 Russian documents that I need to be machine translated.”

The fact that there are 550 Russian documents is useful information, but it’s only marginally helpful in indicating the best solution among the many existing MT options. The better indicators lie a little deeper. Only once the 550-count document set is paired with its underlying circumstance and overall objective does the ideal solution come into focus.

Let’s observe these three scenarios:

  1. 550 Russian first-level relevant emails pertaining to a tertiary case issue
  2. 550 Russian relevant Excels and PowerPoints needed in 48 hours (possibly with hidden content MT’ed if time allows)
  3. 550 Russian highly nuanced key case documents sent by the General Counsel

Knowing “the why” beyond a file count really helps eliminate options that won’t ultimately meet the need of the request. While we have several broad categories of MT (text, formatted, post-edited), there are many options, enhancements, and steps that can be intertwined to meet a particular budget, time, or quality requirement.

Beyond helping narrow the options for a particular MT solution, it’s truly TLS’s language consulting that allows us to actively adjust and add options within the overarching solution to better meet the translation objective.

Mapping Out the MT Solution: Quality vs. Speed vs. Cost

Image removed.

 

Quality.

With an aim of simply gaining a broad understanding of the topics that exist within a set of emails, straight machine translation of text files is a quick, cost effective way of getting the gist. With a bit more time (and spend), that straight text MT output can be improved by building glossaries specific to the case.

However, understanding the information is not just about the words. In fact, for certain file types, a significant portion of the content’s meaning is derived from the file’s structure—so any solution must take that into account. For certain types of files, such as financial data, presentations, etc., the only recommended approach would be to translate from the native file format. But we can go further.

By default, we do not include hidden rows, columns, or sheets when delivering formatted native machine translations. If the files, however, are being produced or disclosed to opposing counsel in native format, we’d highly recommend hidden content be translated as well. This, of course, is slower and adds heavily to the time variable, but greatly increases the quality and thoroughness of review before disclosure.

Then, there are those instances where the highest quality translation is required but perhaps with a budget that cannot support a full human translation workflow. Some scenarios might include: the multi-language documents are key, and the translations are for expert review; or key documents are non-English, but the monetary value in dispute wouldn't be proportional to the cost of full human translation.

Details matter the most whenever we are talking about the key document in the case. Post-edited MT enables the reviewer to spot details in the evidence that a lower quality translation might otherwise obscure. There are options for light or heavy post-editing. The aim of light post-editing is making MT understandable and factually correct, while heavy post-editing’s aim is both of those things, while also going much further on grammar, tone, style, and consistency. MT that is heavily post-edited will not be far off from a full human translation. Among all of the MT options, post-editing is the slowest and most expensive, but will bar none produce the highest quality results.

Speed.

Text MT is fastest, followed by formatted natives. Formatted natives with hidden content translated is slower, of course, as there are additional steps required for harvesting, translating, and reinserting the hidden material. As mentioned previously, post-edited MT is slowest as it involves the full MT process and human review.

Knowing machine translation is needed quickly allows us to adjust the process to meet deadlines. Conversely, knowing a job isn’t particularly time sensitive allows us to make non-standard suggestions that increase quality. There are also steps in each workflow we can bypass, but doing so comes with a trade-off that lessens the quality.

Cost.

The cost of each MT option roughly correlates with their speeds, with text being the least and post-edited being the most expensive option.

Note: costs don't accumulate in machine translation alone. For example, when a firm-side lawyer handles review (as opposed to a less costly contract reviewer), there is a strong argument for creating all translations as formatted natives. Compared to pure text MT, formatted natives foster a more efficient review due to their legibility. Here is a hypothetical cost breakdown:

100 hours of lawyer review x $500 = $50,000

If we shave 15 hours off review time because formatted natives read much easier…

85 hours of lawyer review x $500 = $42,500

By only increasing the MT spend by $1,000, $7,500 comes off the review tab.

Going back to our three scenarios…

  1. 550 Russian first-level relevant emails pertaining to a tertiary case issue: Text MT
  2. 550 Russian relevant Excels and PowerPoints needed in 48 hours: Formatted natives (possibly with hidden content MT’ed if time allows)
  3. 550 Russian highly nuanced key case documents sent by the General Counsel: High post-edited MT

Slightly alter any of the variables and the whole solution changes.

  1. 550 Russian first-level relevant emails pertaining to a tertiary case issue being reviewed by a senior associateFormatted natives
  2. 550 Russian relevant Excels and PowerPoints needed in 48 hours at the highest qualityLow post-edited MT
  3. 550 Russian highly nuanced key case documents sent by the General Counsel who is not sure there is a case and wants cheapest possible optionFormatted natives

Most cases aren’t composed of documents that neatly delineate language solutions between straight text MT and formatted native MT. A good workflow is one where documents are analysed by a language consultant, segmented, and routed into the MT engines and workflows that finely balance the trade-off triangle’s variables with the wider objective.

 

To learn more about TLS’s machine translation options, visit our website or get in touch with one of our expert language consultants.

Blog Info
Robert Wagner, Global Director of Multilingual E-Discovery and Alys Collins, Director of Business Development | TLS London