|
1
|
- Presentation to the
- NYSFIRM State Webmasters Guild
- Richard Vang, Internet Language Services
- Dept. of Information Technology, NYS DMV
- December 5, 2003
|
|
2
|
- Why Spanish Web Content? (customer needs, precedents, goals)
- Translation Methods Overview (human, machine)
- Controlled Content & Technical Dictionaries
- Project Processes, Tools & Resources
- Project Issues & Concerns (technology, processes)
- Current Status & Future Endeavors
- Project Metrics (feedback, statistics)
|
|
3
|
|
|
4
|
|
|
5
|
- We knew (as of Jan. 02):
- Total population of NYS (per Census 2000): 18.9 million
- Total speaking 2nd language @ home: 5.18 million (27.44%)
- Total Hispanic/Latino population: 2.86 million (15.1%)
- Total speaking Spanish @ home: 2.6 million (13.77%)
- Total that speak English “less than well”: 1.17 million (6.23%)*
- >50% of Hispanic adults on-line (increases of 20% past two years)
- 0.72% of monthly calls to DMV Call Centers are in Spanish**
- Average number of “unique visitors” to DMV web site per month (as of
1/02): 119,399
- Therefore:
- Potential customer translations per month: 860** to 7,438*
|
|
6
|
- Important terms & concepts
- Translation accuracy
- Two primary methods for translating content:
- Human Translation Process
- Machine Translation Process
|
|
7
|
- Source/Target Language - the original (source) and resulting (target)
languages of a translation process
- Language Pair - the two interacting languages, e.g. English AND Spanish
(English – source; Spanish – target)
- Language Direction - translating from one language to another, e.g.
English => Spanish or Spanish => English
- Translation Gap - length of time it takes to produce a translated
product from the original source content
- Literary Quality – a translation produced as if composed by a native of
the target language
- Gisting – translating just accurately enough to be understood at the
first reading
|
|
8
|
- What is “accurate” translation?
- Who judges the accuracy?
- Nuances of language dialects
- “It’s about the information, stupid”
|
|
9
|
|
|
10
|
- Advantages
- Literary quality translation
- Direct, specific control of which content & terms gets translated
- Disadvantages
- One language pair per translator (unless multi-lingual)
- Doubles work load of Webmaster by creating another version of web site
(in a language they don’t know)
- Expensive ($8900 for DMV DM @ 104 pages)
- Long translation gap (3 months for DM)
- Updates can be problematic (process, cost, obsolete info)
- Different translators create different translations due to dialect and
inconsistent terminology
|
|
11
|
- One of computing industry’s earliest endeavors
- Now coming into its own via global market
- Basically a word for word substitution (gisting)
- Accuracy enhanced by:
- Building customized dictionary for specific domains
- Applying semantic and linguistic rules to dictionaries
- Available “off the shelf” as a translation tool for single user (CAT –
personal solution)
- Available as real-time, “on the fly” web site translation (WebMT –
enterprise solution)
- Not a substitute for human translation, but a substitute for NO
translation
|
|
12
|
|
|
13
|
- Advantages
- Decreases HT costs
- Real time; no translation gap
- Multiple language pairs leverage same core dictionary
- Second version of site not created
- Can leverage already existing translations and dictionaries for use in
other DMV projects (i.e., printed materials)
- Disadvantages
- Less accurate than HT (gisting)
- High potential for nonsense translation (but controllable)
- Less specific control over what gets translated
- Does not translate graphics
|
|
14
|
|
|
15
|
- Review of existing content is recommended by all MT vendors &
advocates to achieve best MT results :
- Use simple grammatical structures
- Use short, concise sentences
- Use active verbs (vs. passive)
- Avoid abbreviations when possible (COD)
- Use terminology & acronyms consistently
- Avoid slang
- Use proper punctuation and accurate spelling
- Use definite articles where possible
- Avoid personal and gender-specific pronouns
- Use a simple format for text layout
|
|
16
|
- Increases accessibility, decreases reading level
- 19% of NYS pop. >25 has NO high school diploma
- Brown U. eGov reports
- ½ of Americans read at 8th grade level or lower
- most government sites at 11th or 12th grade level
(NY: 10.7)
- Creates a clear, concise source language (simple, not “dumbed-down”)
- “Translation” of government-speak for less literate or less educated
(understanding reduces customer inquiries)
- Results in cleaner, more accurate MT
|
|
17
|
- Identify and add DMV-specific terms and phrases to DMV Technical
Dictionary)
- Core dictionary & custom/technical dictionary
- Develop in accordance and simultaneously with optimization (Plain
English) process
- Leverage existing dictionaries of DMV stakeholders and knowledgeable
staff
|
|
18
|
- Has far-reaching and positive consequences which affect many
stakeholders & publications
- Creates an ontology or taxonomy for DMV knowledge management (what is
there?)
- Forces content creators to really think about what they’re saying (Ex:
license plates)
- Creates a consistent language across web site and other publications
(Ex: “license” terms)
- Helps to define how information is organized and presented (content
management)
|
|
19
|
- MAXit Controlled English Checker
- Only software solution that integrated a dictionary building tool, a
controlled English writing tool, and training for writers.
- Created for international airline manufacturing industry
- Based on “one word, one meaning”
- Optimizes source content according to MT recommendations
- Intended to remove ambiguity, create consistent style at easy reading
level
- Identifies common problems in source text before applying MT
- Maintains a consistent style of writing within a workgroup
- Forces the writer to use standard terminology and spelling
- Text Miner: analyzes words for frequency, usage and importance in
context, saved us time in developing a technical lexicon
- Lexicon Manager: Helps create and manage custom dictionaries
- Export tool to SDL Dictionary Manager
|
|
20
|
- MAXit Controlled English Checker
- SMART Lexicon Manager
- SDL International Dictionary Manager
|
|
21
|
|
|
22
|
- Brief Timeline
- Project Process (content phases)
- Project Personnel (resources)
- Project Management Tools (databases)
|
|
23
|
- 12/01 – Initial Research
- 1/02 – Presentation to DMV Executive Guidance Committee
- 3/02 – Contracts signed with vendors via ASAP; training provided by
vendors
- 4/02 – Pilot Phase begins with initial content “cluster”
|
|
24
|
- Pilot Phase – Driver License cluster
- Phase 2 – Home Page, various content menu pages, second-level menu pages
- Phase 3 – RightNow Web FAQs & interfaces
- Phase 4 – Registration & Title cluster
- Phase 5 – remaining translatable content
- Remaining Content – transactions, forms, etc.
|
|
25
|
- Project Manager
- Web Site Content Manager
- Plain English Review Team
- DMV Webmasters
- Internet Services Manager
- Spanish Translation Review Team
|
|
26
|
- Project web page on intranet
- Project Database
- Custom built in MS Access
- Provides a variety of tools
- Accessible by all project personnel via LAN
- Built into process to provide accurate PM tracking data
|
|
27
|
- Team of volunteers (downstate CS reps)
- Bought into PE and MT process as a way of possibly lightening their
Spanish interactions
- Two levels of Spanish MT review for approval
- Project Mgr. initial review (obvious problems, PE)
- Native speaker expert review (fine tuning, terms)
- Web page revision, dictionary update after each
- Review process (no Internet access @ work)
- PE version on DMV intranet mirror site
- MT version from Internet saved as HTML on LAN
- Changes in Word, or hardcopy mark-up
|
|
28
|
- 8/02 – Pilot Phase Goes live
- Site visitor access – home page
- Welcome page / Disclaimer
- “Hardcoded menu” of selected
content cluster
- 12/31/02 – “FollowLink” features brings full site content on-line with
Spanish MT
- Site visitor access – home page
- Welcome page / Disclaimer
- FollowLink Feature
- Content not translated
|
|
29
|
- www.nysdmv.com in Spanish
|
|
30
|
- MAXit Controlled English Checker
- government vs. technical
language
- program fixes
- Word HTML code stripping (CSS)
- Temporary Dictionary feature
- Translation of graphic banners
- not possible without dynamic
content
- suggested <IMG ALT>
solution
- FollowLink Feature
- not really a site navigation
feature
- bookmarks a problem
- blocking content from
translation
- visitor trapped in Spanish,
can’t get out
- Translation Issues
|
|
31
|
- Required constant vigilance on site updates
- Fluctuating personnel resources caused delays
- Budgetary process caused delays
- Translation accuracy is slave to available resources, not the fault of
the technology
|
|
32
|
- Still in PE Review Process for Phases 5 and 3 (RNW)
- Bring RightNow Web on-line with MT (1st)
- Still evaluating Forms pages (menu, titles)
- Still evaluating Transaction pages (PE)
- Still requires dictionary clean-up & more lexicon building
(acronyms)
|
|
33
|
- Sell rest of agency on benefits of Plain English and controlled
vocabulary
- Get more content creators to use MAXit
- Coordinate translation services to utilize existing MT translations as
drafts for publications
- Install ETS plug-ins for email and other Office applications
|
|
34
|
- Plain English review process
- 94% of targeted web pages
completed process (RNW: 36%)
- Pages being translated
- 81 web pages currently using MT
(some 20+ printed pages)
- Spanish review process
- 16% of targeted web pages
completed process
- Dictionary terms
- PE dictionaries: 6,669 terms;
Spanish dictionaries; 2,126
- User Stats
- 7,213 translations/month avg. (in prediction range)
- 50K translation by 7/03
|
|
35
|
- Well received in MT and CE industries
- example of PE to soften
government-speak (EU)
- example for international
companies (Daimler-Chrysler)
- example of 1st time
ever that PE and MT applications were used together to create site
content (SDL PE)
- Featured in Information Week (4/9/03) article on MT
- Compared to users, virtually NO feedback from customers
|
|
36
|
- Project Manager
- Richard Vang (474-2570)
- rvang@dmv.state.ny.us
- DMV Web Site Content Manager
- George Filieau (486-6596)
- gfili@dmv.state.ny.us
- DMV Webmaster
- Holly New (474-2644)
- hnew@dmv.state.ny.us
|