MT training golden datasets

Translate and validate the naturality, accuracy, and fluency of translations

Challenge

Overcome systemic, industry-wide issues that cast doubt on translation quality.

Inability to monitor translation supply chain processes to ensure exclusive use of human translation across the full project scope of work

Inherent subjectivity of translation quality standards

Our approach

Assign native translators with expert-level qualifications and high human translation quality scores from previous projects in similar domains.

Deploy proprietary MT Detection tool to detect and eliminate strings that bear traces of machine translation or post-editing.

Use TER to measure the distance between suppliers' strings and known machine-translated strings.

Flag suspicious issues and reassign strings for retranslation by a different supplier.

Results

Golden datasets for six language pairs covered in the pilot, sourced from 30 expert translators

Mid-project change of one language pair per client reprioritization

MT Detection reports for each dataset validating the entire set of strings bears no trace of machine translation, or post-editing

On-time delivery within 30 days of project kickoff

Project details

Customer profile

Global retail and cloud computing giant

Data Type

Text Translation

Text Transcription & Annotation

Scope of Work

5-10 translators and 2-3 reviewers

35,000 - 75,000 words per language pair Local language to English

Geographical Coverage

Loc>English
to four other languages

Collection Methodology

Remote in particular country

Duration of Engagement

One Month engagement with a one-month turnaround