Pages in topic:   < [1 2 3 4]
Machine translation: your experience with the various MT programmes? ("state of play")
Thread poster: Barnaby Capel-Dunn
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 19:30
Russian to English
+ ...
@Jeff re Promt / Creole / training Feb 27, 2010

Hi Jeff,
I replied to you off-list, but didn't hear back--maybe you don't check that e-mail often.

I might be interested, depending on what's involved and how much time. I've got only 18 days left on my Promt Professional trial version.

I have continued to try to learn it on my own, as time allows, and find it difficult and time-consuming. As with most documentation that comes with translation-related software (all software?), the instructions leave a lot to be de
... See more
Hi Jeff,
I replied to you off-list, but didn't hear back--maybe you don't check that e-mail often.

I might be interested, depending on what's involved and how much time. I've got only 18 days left on my Promt Professional trial version.

I have continued to try to learn it on my own, as time allows, and find it difficult and time-consuming. As with most documentation that comes with translation-related software (all software?), the instructions leave a lot to be desired, to the extent of sometimes being incomprehensible to the newbie.

In your post, you described your workflow for creating a document that was, from what you say, very rough--barely readable, not a polished translation such as translators need to produce. From where I sit, it looks as though this sort of system might be useful for someone who is crunching through a large volume of fairly standardized, similar texts. But for a freelance translator, I don't see it as useful at all. You say the issue is not editing, but choosing the words that should be put into the dictionary. I don't get it. If you don't edit it heavily, you're going to end up with something that is raw, barely comprehensible--in other words, a mess.

Perhaps I'm missing something.

Susan
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:30
Multiplelanguages
+ ...
Very swamped to reply: here is more info Feb 27, 2010

Hi Susan,
I check all of my emails dozens of times a day.
I'm simply swamped with tasks to organize a lot of things that are happening at various levels to get language content and technologies into place for Haiti Disaster Relief. Writing to important government official to release language content to train technologies.
Discussing with Exec directors managers in various organizations to avoid bureaucracy quenching the initiatives.
Making deals with various technology d
... See more
Hi Susan,
I check all of my emails dozens of times a day.
I'm simply swamped with tasks to organize a lot of things that are happening at various levels to get language content and technologies into place for Haiti Disaster Relief. Writing to important government official to release language content to train technologies.
Discussing with Exec directors managers in various organizations to avoid bureaucracy quenching the initiatives.
Making deals with various technology developers
Putting different players in contact with one another to make sure to get the right people talking to not waste time and avoid gatekeepers blocking things.

I've got several high priority tasks sitting on my desk for past 10 days and they are in competition with many newer high priority tasks coming in.

All in my free time, and of course dependent on that available time. Starting to at least sleep a few full nights this past week, and so the fact that I'm starting to sleep more now means that I'm much less reactive to emails than a few weeks ago when I wasn't sleeping hardly at all.

Trying to identify people to volunteer and delegate tasks to them. A couple of attempts didn't work well and I had to do clean up work afterward. So I need to more carefully choose the people to participate and make things clear that following the documented task exactly as specified means avoiding a lot of headaches later. Every 5 minute block of my free time right now is very precious. Can't waste much of it.

Don't worry about the number of days of your trial period I can arrange anything that is needed, for those who want to participate.
But I now need a couple of people to volunteer to do project management of a couple of critical projects that not only affect this Haiti project, but also are not going to be instrumental in the Chili earthquake needs.

Volunteers needed:


Project 1:

* one or more people with project management experience in time-coded audio/video file subtitling/captioning, for single speaker presentation (not multiple speakers issues and not extremely time-constrained sequences to handle.those who know the field know what I'm referring to). Preferably with experience in online subtitling platforms and how to migration transcriptions/translations done offline in text editors (like MSWord) to online captions. If not audio/video captioning, at least website/software L10n experience.

* transcript reviewers (FR speakers)

* Haitian Creole transcriber(s) for a 10 min video clip. Also need a couple of Haitian Creole reviewers to check the transcription work.

* FR>EN reviewers of MT output (of highly customized MT, not MT postediting of very poorly implemented MT projects)

* EN > all other target languages (all possible languages)
- MT dictionary creators (I'm creating the EN source dictionary)
- MT posteditors and reviewers
- translators for those languages where MT is not available
- source text is 6200 FR source + 10 min of Haitian Creole still to be transcribed that will be translated into FR as source, and then also made into EN source as pivot to all other languages. (for example, I've already got some volunteers for EN > PL/RU/DA/BG. A first draft of FR>ES transcript already finished)

Project two:

* EN source documents: 2 documents (interviews) of 650 words and 2200 words. Need to have translated into as many languages as possible. I'm currently negotiating the copyright issues.
* Same possibility to implement MT customized dictionaries, but for these small projects it might be just better to do the translation without MT with anyone who wants to participate.
* Timing is key. Crowdsourcing approach experience is best.

This is not an amateur project. It's not an example of a lot of junk projects I heard recounted here in these forums where freelance translators receive completely unprepared MT projects by someone who just clicked on a button and gave output text, without putting any thought into it).
It's bringing a lot of expertise to the table, but I need people with willingness to use their skills mentioned above, and participate very actively and creatively on what else can be learned. And those who participate need to be willing to follow the instructions exactly as stated, so as not to waste time or create situations where tasks need to be redone.

But need a few project managers to come in and help. I can't do it alone any longer. I need to focus my efforts and energy on unlocking access to Haitian Creole language text files. I've already documented a lot in detail, so it's not a typical "customer throw you something with no explanation" project. One I see there are some volunteers who really want to help, I'll take the extra 30 minute to add more comments to existing specs.

As for the document workflow explanation, I'm maintaining all saved versions of each set of changes. Can't explain it all here. I've writing up the high-level and detailed-level task list and specs of the projects.

it's not a barely readable document. It is a transcript made from 1.5 hours of extemporaneous speech, but non-professional transcribers. When I relisten and read the transcript, I catch things and have to modify (I was the one to did the video seminar).

all of this is volunteer work. I've already devoted 100+ hours over past 5 weeks (and now all of my free-time is managed is a well documented task list including estimated time effort per task and how much time spent). I'm already gone over 100 urgent/critical tasks since beginning and 5-10 tasks being added per day.
Now now seriously need several people with skills sets above to donate some of their time (2 hours, 5 hours, 10 hours) over next 1-2 weeks to make all of the happen.

I realize now that maybe this needs to be sent the jobs area of these forums. No time to even do that.

Anybody willing to help?
All languages open to participating.

Jeff

Susan Welsh wrote:

Hi Jeff,
I replied to you off-list, but didn't hear back--maybe you don't check that e-mail often.

I might be interested, depending on what's involved and how much time. I've got only 18 days left on my Promt Professional trial version.

I have continued to try to learn it on my own, as time allows, and find it difficult and time-consuming. As with most documentation that comes with translation-related software (all software?), the instructions leave a lot to be desired, to the extent of sometimes being incomprehensible to the newbie.

In your post, you described your workflow for creating a document that was, from what you say, very rough--barely readable, not a polished translation such as translators need to produce. From where I sit, it looks as though this sort of system might be useful for someone who is crunching through a large volume of fairly standardized, similar texts. But for a freelance translator, I don't see it as useful at all. You say the issue is not editing, but choosing the words that should be put into the dictionary. I don't get it. If you don't edit it heavily, you're going to end up with something that is raw, barely comprehensible--in other words, a mess.

Perhaps I'm missing something.

Susan
Collapse


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 19:30
Russian to English
+ ...
Link posted to thread on pro bono work Feb 27, 2010

Re Jeff's requests: I posted a link to this thread, here:

http://www.proz.com/forum/getting_established/4369-good_sources_of_volunteer_pro_bono_translation_work.html#1337522

My language pairs are not what you want, Jeff.


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:30
Multiplelanguages
+ ...
many thanks Susan Feb 27, 2010

thanks very much Susan for your help. Networking is very valuable, and just that gesture to put in in an appropriate forum could be a key thing.

My evenings, nights and weekends are extremely packed with this Haiti project stuff. Hoping to see a breakthrough on a couple of fronts in coming days.

Jeff


 
Vladimir Shelukhin
Vladimir Shelukhin  Identity Verified
Local time: 02:30
English to Russian
+ ...
In memoriam
Receive ready! :-) Feb 27, 2010

Jeff Allen wrote:
Anybody willing to help?
All languages open to participating.
Volunteer to donate 5—10 hours of my time next week. I'd be happy to profit by the occasion and gain my first hand-on experience in English–Russian MT system learning and or machine output post-editing.


 
Vladimir Shelukhin
Vladimir Shelukhin  Identity Verified
Local time: 02:30
English to Russian
+ ...
In memoriam
Thanks Feb 27, 2010

Susan Welsh wrote:
Re Jeff's requests: I posted a link to this thread…
It was a good idea, thank you for re-posting.


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 19:30
Russian to English
+ ...
@Jeff on quality of MT product Mar 2, 2010

Jeff, when I referred to your description of your MT product as indicating something "very rough--barely readable, not a polished translation such as translators need to produce," I was referring to this:

Jeff Allen wrote:

2 English native speakers have read draft and stated in my Facebook entry:

"I have skimmed through the English text and while it is clunky and reads very much like French in terms of syntax, it is understandable."

and

"That English translation isn't pretty, Jeff, but someone with exposure to the Romance languages (word order, styles of idioms) should be able to follow it.



For your present purposes, that's probably fine; but not for the daily work of a freelance translator, where every job is different and the output has to be excellent.

I don't mean to nitpick, I'm just clarifying what I meant earlier in this thread. Forge ahead! You clearly have a lot of energy, but I think you've probably bitten off more than you can chew (training new people at the same time as getting huge volumes of work done).

Susan


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:30
Multiplelanguages
+ ...
this MT project is being carefully planned Mar 3, 2010

Susan Welsh wrote:

Jeff, when I referred to your description of your MT product as indicating something "very rough--barely readable, not a polished translation such as translators need to produce," I was referring to this:

Jeff Allen wrote:

2 English native speakers have read draft and stated in my Facebook entry:

"I have skimmed through the English text and while it is clunky and reads very much like French in terms of syntax, it is understandable."

and

"That English translation isn't pretty, Jeff, but someone with exposure to the Romance languages (word order, styles of idioms) should be able to follow it.



For your present purposes, that's probably fine; but not for the daily work of a freelance translator, where every job is different and the output has to be excellent.

I don't mean to nitpick, I'm just clarifying what I meant earlier in this thread. Forge ahead! You clearly have a lot of energy, but I think you've probably bitten off more than you can chew (training new people at the same time as getting huge volumes of work done).


Susan,

Those comments are due to the very specific nature of the transcribed text that is being translated.

videos 1 and 2 are of me giving a seminar to Haitians (in Creole and in French) on the topic of text data and speech data collection as well as Automatic Speech Recognition (ASR), Text to Speech (TTS) and Machine Translation (MT). I was literally inventing terminology in Haitian Creole (HC) nearly on-the-fly during the 10 min I spoke in Creole. It was verified by my 2 Haitian colleagues just prior to walking in to do seminar. However, keep in mind that the vocabulary/terminology on this topic was simply unknown in HC, because I was creating it in the language. We were playing the role of the "Haitian Creole language Academy" there in that school room.

This is not a prepared speech (except for the 10 min during which I spoke in HC). The rest of the 1hr20 min video is my presentation without any notes. I did the entire seminar in French in mode of extemporaneous speech based on solid experience in the subject matter. This presents a type of spoken genre of text which is very difficult for MT, and I know of any MT tool vendor that would touch it.

Very interactive, lots of questions/answers. It is truly a "ASR, TTS and MT for Beginners" session.

mixture of languages - first 3 minutes in French, then HC for 10 min, then French again for 1 1/2 hours, and then lots of examples in HC during the seminar in FR. It is very much like my doctoral thesis which was in 3 languages (written in FR, hundreds of examples sentences/utterances in Creole, phonetic descriptions, and transliterated gloss forms used by field linguists). No MT software today can handle (out of the box) multiple languages in the same document. I know this very well and have been involved in multi-language text analysis for such systems.

This is not a perfect transcription. It is the best the transcribers could produce with the factors below that hindered the process.

- Part 1 video was the first try of video recording of my colleague on something of this type. at the beginning of the session, he was at the back of the room (about 20+ feet from me) with no external microphone. So hard to hear, when considering the following points:
- open window building. It is possible to hear barking, sometime cars honking outside
- Also can hear other background noise discussions of people near cameraman at beginning of seminar. Their attention span got much better during the 1 1/2 hour session.
- sound quality improved for my speaking parts (the quasi-majority of the content) over the two videos. The cameraman got closer to me at about the 15 minute mark for 1hr20 min of video
- the cameraman made spoken overlay comments from time to time (which need to be disregarded)
- questions from the audience are difficult to hear, but my answers are clearly understandable.
- the transcribers are not subject matter experts. However, as the reviewer doing the clean-up work, I am certainly the expert.


I intentionally knew that the first draft of the transcription in FR was not perfect, and I didn't review it or clean it up. I just told the transcribers to do their best, and then did a minimal FR > EN MT dictionary based not on analyzing/reviewing the entire document (because I didn't have time), but by some rapid analysis techniques.
And I specifically planned to do the analysis and clean-up work AFTER the first send through MT with a minimal 55 entry custom dictionary. This is intentional, and planned, because it is a "troubleshooting step" which I have carefully described in several of my published MT dictionary building and MT postediting articles. This technique in phases has already been used for successful MT projects.

So this is completly different that nearly all of the other MT projects I have done which more along the lines of 2000 words in 47 minutes of a very carefully written enterprise-level software sales document, or 8000 words in 7.5 hours of a well written descriptive text of a technical software testing document, or technical manuals, or marketing materials, or legal statutes of associations, or legal adoption papers, or human resources job posting, etc. All of those are examples of well-written documents because they are promoting the image and reputation of a company, an association or a cause.

But this MT project is completely out of those category types. In the past, I already dared to prove it was possible to successfully implement MT and dictionary customization in record time on press releases / marketing texts (which most MT tool vendors won't touch).

And now I'm daring again to take on another project, with fully logged time constraints and factors: 6000 words of extemporaneous speech. No one in the world would want to try and touch that, without being the subject matter expert on the topic.

And so I am, and I know exactly what I'm doing. I never announce such projects that I know won't be successful. I usually do them first and then write about them. But that's when there is time to plan it. This is the most wonderful example of a real project that just comes as-is, and what can be done in record time to provide the translation via MT. And on a genre of text that no one would normally want to touch with a 50-foot pole.

I've already planned out and arranged with specific MT vendors the dictionary building cycle based on the source dictionary that I'm creating.

And it will be possible to measure the improvements in quality output between phase 1, phase 2, phase 3, etc. Yet, those phases are not week or months of time, but rather simply just a few hours apart. All of the documents are carefully version controlled to be able to trace every single change made.

And now implementing such a project for 10, 20, 50, 100 different languages, it will be possible with the same source document to demonstrate the viability of MT + dictionary customization on many different languages. The results will not be a set of ad-hoc, undefined comments about MT without any knowledge of the context involved. The entire project is baselined, with benchmarking milestones per phase. And on a project that is the same word count size as many jobs that freelancer receive.

And another cherry on the top of the cake, is that through this, I can demonstrate the impact of source language clean-up and optimization, because I am the author of recorded video. So, I can take the liberty to modify the source language transcription because I know what I want to convey as the message. And the it will be possible to measure the quality improvements attained by incremental volumes of 10 min, 20 min, 30 min of invested time.

The total amount of invested time on the production aspects of the project is actually quite small, and carefully logged. What takes all the time is this effort to get participants (what is known as vendor selection in the translation industry) and to coordinate the people (Project management). And all of this being done in free-time, and with this sub-project of the entire project being repriortized every several hours due to other major sub-projects of the overall program.
However, all of this effort is being documented to show what is needed behind the scenes to have a successful project. It's not just a question of translating 6200 words of text.

And that's what all uneducated end customers think it is, just a bunch of words that can be typed in. Well, this project will be done, and then a project report will be written up, which will become a key baseline document for others to conduct their projects in the future.

But I need experienced project managers to participate, knowing very well that PM is the one of the keys to success.

Jeff


 
Kirti Vashee
Kirti Vashee  Identity Verified
United States
Local time: 16:30
Another series of news articles comparing free online engines Apr 17, 2010

Recently several news agencies including the NY Times and the LA times did comparisons of various MT systems.

Professionals should understand that these casual comparisons can be quite misleading even though they are an actual reality and may reflect the experience of a casual Internet user. It is becoming clear that the statistical systems used by Goggle and Microsoft are doing much better than the old Rule Based systems but there are still many instances where good RbMt systems ca
... See more
Recently several news agencies including the NY Times and the LA times did comparisons of various MT systems.

Professionals should understand that these casual comparisons can be quite misleading even though they are an actual reality and may reflect the experience of a casual Internet user. It is becoming clear that the statistical systems used by Goggle and Microsoft are doing much better than the old Rule Based systems but there are still many instances where good RbMt systems can outperform SMT systems.

I have written a blog entry about comparing free online MT systems that some may find interesting: http://kv-emptypages.blogspot.com/2010/03/ongoing-quest-for-best-mt-translation.html
Collapse


 
Pages in topic:   < [1 2 3 4]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Machine translation: your experience with the various MT programmes? ("state of play")






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »