Notes
Slide Show
Outline
1
eDMV WebMT Project:
Spanish Machine Translation for the NYSDMV Web Site
  • Presentation to the
  •  NYSFIRM State Webmasters Guild
  • Richard Vang, Internet Language Services
  • Dept. of Information Technology, NYS DMV
  • December 5, 2003
2
The eDMV WebMT Project:
Presentation Agenda
  • Why Spanish Web Content? (customer needs, precedents, goals)


  • Translation Methods Overview (human, machine)


  • Controlled Content & Technical Dictionaries


  • Project Processes, Tools & Resources


  • Project Issues & Concerns (technology, processes)


  • Current Status & Future Endeavors


  • Project Metrics (feedback, statistics)
3
DMV Foreign Language Services:
Assessing Resources, Setting Goals
4
Foreign Language Services:
DMV Language Precedents (in Dec. 2001)
5
Initial Target Customers:
Spanish-speaking Population
  • We knew (as of Jan. 02):
  • Total population of NYS (per Census 2000): 18.9 million
    • Total speaking 2nd language @ home: 5.18 million (27.44%)
  • Total Hispanic/Latino population: 2.86 million (15.1%)
    • (3% increase since 1990)
  • Total speaking Spanish @ home: 2.6 million (13.77%)
    • Total that speak English “less than well”: 1.17 million (6.23%)*
  • >50% of Hispanic adults on-line (increases of 20% past two years)
  • 0.72% of monthly calls to DMV Call Centers are in Spanish**
  • Average number of “unique visitors” to DMV web site per month (as of 1/02): 119,399


  • Therefore:
  • Potential customer translations per month: 860** to 7,438*
6
Finding a Solution:
Investigation of Translation Methods
  • Important terms & concepts


  • Translation accuracy


  • Two primary methods for translating content:


    • Human Translation Process
    • Machine Translation Process
7
Investigation of Translation Methods:
Important Translation Terms & Concepts
  • Source/Target Language - the original (source) and resulting (target) languages of a translation process


  • Language Pair - the two interacting languages, e.g. English AND Spanish (English – source; Spanish – target)


  • Language Direction - translating from one language to another, e.g. English => Spanish or Spanish => English


  • Translation Gap - length of time it takes to produce a translated product from the original source content


  • Literary Quality – a translation produced as if composed by a native of the target language


  • Gisting – translating just accurately enough to be understood at the first reading
8
Translation Accuracy:
A Moving Target
  • What is “accurate” translation?


  • Who judges the accuracy?


  • Nuances of language dialects


  • “It’s about the information, stupid”
9
Human Translation (HT) Process:
Current Method for Published Content
10
Human Translation (HT):
Advantages & Disadvantages
  • Advantages
  • Literary quality translation
  • Direct, specific control of which content & terms gets translated
  • Disadvantages
  • One language pair per translator (unless multi-lingual)
  • Doubles work load of Webmaster by creating another version of web site (in a language they don’t know)
  • Expensive ($8900 for DMV DM @ 104 pages)
  • Long translation gap (3 months for DM)
  • Updates can be problematic (process, cost, obsolete info)
  • Different translators create different translations due to dialect and inconsistent terminology
11
Machine Translation (MT) Overview:
What is “Machine Translation”?
  • One of computing industry’s earliest endeavors
  • Now coming into its own via global market
  • Basically a word for word substitution (gisting)
  • Accuracy enhanced by:
    • Building customized dictionary for specific domains
    • Applying semantic and linguistic rules to dictionaries
  • Available “off the shelf” as a translation tool for single user (CAT – personal solution)
  • Available as real-time, “on the fly” web site translation (WebMT – enterprise solution)
  • Not a substitute for human translation, but a substitute for NO translation
12
The Machine Translation (MT) Process:
How MT Works
13
Machine Translation (MT):
Advantages & Disadvantages
  • Advantages
  • Decreases HT costs
  • Real time; no translation gap
  • Multiple language pairs leverage same core dictionary
  • Second version of site not created
  • Can leverage already existing translations and dictionaries for use in other DMV projects (i.e., printed materials)
  • Disadvantages
  • Less accurate than HT (gisting)
  • High potential for nonsense translation (but controllable)
  • Less specific control over what gets translated
  • Does not translate graphics
14
SDL International
Enterprise Translation Server
15
Project Implementation Phase I:
Optimize DMV Site Content for Web MT
  • Review of existing content is recommended by all MT vendors & advocates to achieve best MT results :
  • Use simple grammatical structures
  • Use short, concise sentences
  • Use active verbs (vs. passive)
  • Avoid abbreviations when possible (COD)
  • Use terminology & acronyms consistently
  • Avoid slang
  • Use proper punctuation and accurate spelling
  • Use definite articles where possible
  • Avoid personal and gender-specific pronouns
  • Use a simple format for text layout
16
Project Implementation Phase I:
Why Optimize Site Content?
  • Increases accessibility, decreases reading level
    • 19% of NYS pop. >25 has NO high school diploma
  • Brown U. eGov reports
    • ½ of Americans read at 8th grade level or lower
    • most government sites at 11th or 12th grade level (NY: 10.7)
  • Creates a clear, concise source language (simple, not “dumbed-down”)
  • “Translation” of government-speak for less literate or less educated (understanding reduces customer inquiries)
  • Results in cleaner, more accurate MT
17
Project Implementation Phase II:
Create Lexicon for the DMV Domain
  • Identify and add DMV-specific terms and phrases to DMV Technical Dictionary)
    • Core dictionary & custom/technical dictionary
    • Develop in accordance and simultaneously with optimization (Plain English) process
    • Leverage existing dictionaries of DMV stakeholders and knowledgeable staff
18
Project Implementation Phase II:
Why Create a Technical Lexicon?
  • Has far-reaching and positive consequences which affect many stakeholders & publications
    • Creates an ontology or taxonomy for DMV knowledge management (what is there?)
    • Forces content creators to really think about what they’re saying (Ex: license plates)
    • Creates a consistent language across web site and other publications (Ex: “license” terms)
    • Helps to define how information is organized and presented (content management)
19
Optimization Tool:
Smart Communications, Inc.
  • MAXit Controlled English Checker
    • Only software solution that integrated a dictionary building tool, a controlled English writing tool, and training for writers.
    • Created for international airline manufacturing industry
    • Based on “one word, one meaning”
    • Optimizes source content according to MT recommendations
    • Intended to remove ambiguity, create consistent style at easy reading level
    • Identifies common problems in source text before applying MT
    • Maintains a consistent style of writing within a workgroup
    • Forces the writer to use standard terminology and spelling
  • Text Miner: analyzes words for frequency, usage and importance in context, saved us time in developing a technical lexicon
  • Lexicon Manager: Helps create and manage custom dictionaries
    • Export tool to SDL Dictionary Manager
20
Live Demonstrations:
Plain English & Dictionary Tools

  • MAXit Controlled English Checker


  • SMART Lexicon Manager


  • SDL International Dictionary Manager


21
Example Handouts: PE-revised Text &
Improving MT Accuracy with PE
22
eDMV WebMT Project:
Project Processes, Tools and Resources
  • Brief Timeline
  • Project Process (content phases)
  • Project Personnel (resources)
  • Project Management Tools (databases)
23
eDMV Web MT Project:
Brief Project Timeline I
  • 12/01 – Initial Research
  • 1/02 – Presentation to DMV Executive Guidance Committee
  • 3/02 – Contracts signed with vendors via ASAP; training provided by vendors
  • 4/02 – Pilot Phase begins with initial content “cluster”
24
Plain English Review Process:
Project Phases and Content Clusters
  • Pilot Phase – Driver License cluster
  • Phase 2 – Home Page, various content menu pages, second-level menu pages
  • Phase 3 – RightNow Web FAQs & interfaces
  • Phase 4 – Registration & Title cluster
  • Phase 5 – remaining translatable content
  • Remaining Content – transactions, forms, etc.
25
eDMV WebMT Project:
Project Personnel
  • Project Manager
  • Web Site Content Manager
  • Plain English Review Team
  • DMV Webmasters
  • Internet Services Manager
  • Spanish Translation Review Team
26
eDMV WebMT Project:
Project Management Tools
  • Project web page on intranet
  • Project Database
    • Custom built in MS Access
    • Provides a variety of tools
    • Accessible by all project personnel via LAN
    • Built into process to provide accurate PM tracking data
27
Spanish Review Process:
Improving MT Accuracy
  • Team of volunteers (downstate CS reps)
  • Bought into PE and MT process as a way of possibly lightening their Spanish interactions
  • Two levels of Spanish MT review for approval
    • Project Mgr. initial review (obvious problems, PE)
    • Native speaker expert review (fine tuning, terms)
    • Web page revision, dictionary update after each
  • Review process (no Internet access @ work)
    • PE version on DMV intranet mirror site
    • MT version from Internet saved as HTML on LAN
    • Changes in Word, or hardcopy mark-up
28
eDMV Web MT Project:
Brief Project Timeline II
  • 8/02 – Pilot Phase Goes live
    •  Site visitor access – home page
    •  Welcome page / Disclaimer
    •  “Hardcoded menu” of selected content cluster
  • 12/31/02 – “FollowLink” features brings full site content on-line with Spanish MT
    •  Site visitor access – home page
    •  Welcome page / Disclaimer
    •  FollowLink Feature
    •  Content not translated


29
eDMV WebMT Project:
Live Spanish MT Demonstration


  • www.nysdmv.com in Spanish
30
eDMV WebMT Project:
Technological Issues & Concerns
  • MAXit Controlled English Checker
    •  government vs. technical language
    •  program fixes
    • Word HTML code stripping (CSS)
    •  Temporary Dictionary feature
  • Translation of graphic banners
    •  not possible without dynamic content
    •  suggested <IMG ALT> solution
  • FollowLink Feature
    •  not really a site navigation feature
    •  bookmarks a problem
    •  blocking content from translation
    •  visitor trapped in Spanish, can’t get out
  • Translation Issues
31
eDMV WebMT Project:
Process Issues & Concerns
  • Required constant vigilance on site updates
  • Fluctuating personnel resources caused delays
  • Budgetary process caused delays
  • Translation accuracy is slave to available resources, not the fault of the technology
32
eDMV WebMT Project:
Ongoing Processes (as of 12/03)
  • Still in PE Review Process for Phases 5 and 3 (RNW)
  • Bring RightNow Web on-line with MT (1st)
  • Still evaluating Forms pages (menu, titles)
  • Still evaluating Transaction pages (PE)
  • Still requires dictionary clean-up & more lexicon building (acronyms)
33
Future Endeavors:
Are we “There” yet?
  • Sell rest of agency on benefits of Plain English and controlled vocabulary
  • Get more content creators to use MAXit
  • Coordinate translation services to utilize existing MT translations as drafts for publications
  • Install ETS plug-ins for email and other Office applications


34
eDMV WebMT Project:
Project Metrics
  • Plain English review process
    •  94% of targeted web pages completed process (RNW: 36%)
  • Pages being translated
    •  81 web pages currently using MT (some 20+ printed pages)
  • Spanish review process
    •  16% of targeted web pages completed process
  • Dictionary terms
    •  PE dictionaries: 6,669 terms; Spanish dictionaries; 2,126
  • User Stats
    • 7,213 translations/month avg. (in prediction range)
    • 50K translation by 7/03
35
eDMV WebMT Project:
Customer & Industry Feedback
  • Well received in MT and CE industries
    •  example of PE to soften government-speak (EU)
    •  example for international companies (Daimler-Chrysler)
    •  example of 1st time ever that PE and MT applications were used together to create site content (SDL PE)
  • Featured in Information Week (4/9/03) article on MT
  • Compared to users, virtually NO feedback from customers
36
eDMV WebMT Project:
Contact Information
  • Project Manager
    • Richard Vang (474-2570)
    • rvang@dmv.state.ny.us


  • DMV Web Site Content Manager
    • George Filieau (486-6596)
    • gfili@dmv.state.ny.us


  • DMV Webmaster
    • Holly New (474-2644)
    • hnew@dmv.state.ny.us