MT training golden datasets
Translate and validate the naturality, accuracy, and fluency of translations
Challenge
Overcome systemic, industry-wide issues that cast doubt on translation quality.
Inability to monitor translation supply chain processes to ensure exclusive use of human translation across the full project scope of work
Inherent subjectivity of translation quality standards
Our approach
Assign native translators with expert-level qualifications and high human translation quality scores from previous projects in similar domains.
Deploy proprietary MT Detection tool to detect and eliminate strings that bear traces of machine translation or post-editing.
Use TER to measure the distance between suppliers' strings and known machine-translated strings.
Flag suspicious issues and reassign strings for retranslation by a different supplier.
Results
Golden datasets for six language pairs covered in the pilot, sourced from 30 expert translators
Mid-project change of one language pair per client reprioritization
MT Detection reports for each dataset validating the entire set of strings bears no trace of machine translation, or post-editing
On-time delivery within 30 days of project kickoff
Project details
Customer profile
Global retail and cloud computing giant
Data Type
Text Translation
Text Transcription & Annotation
Scope of Work
5-10 translators and 2-3 reviewers
35,000 - 75,000 words per language pair Local language to English
Geographical Coverage
Loc>English
to four other languages
Collection Methodology
Remote in particular country
Duration of Engagement
One Month engagement with a one-month turnaround