It's a recurring element of near-future worlds: the phone (or video screen or implanted chip) that automagically translates spoken phrases to or from your own language, allowing you to converse freely with anyone. No longer shackled to a tour group or three months of an intensive language class, one could travel the world with ease, confident of the ability to communicate with the locals. I suspect this fantasy of a translation device is more prevalent among those of us in cultures less prone to learning multiple languages, but even multilingual global nomads would find occasional technical assistance useful. NEC's pocket spoken-word translation device, as discussed in a recent New Scientist, seems to be a step in this direction. Translating between Japanese and English, it's aimed at Japanese tourists.
I have my doubts about how well it will work. Machine language translation is a hard problem; if you want to get a sense of the difficulty, just try entering a phrase into Google's "Language Tools" page, translating it to another language, then translating it again back into English. While there are certainly machine translation systems better than that on a freely-available webpage, I haven't seen evidence of any which are good enough to be of use beyond emergency circumstances. Translation of spoken language is even more difficult, in part because conversational speech recognition technology isn't quite ready yet, and in part because so much of human communication goes beyond words. Tone, body position, gestures, and especially the context of the world around you all play into the meaning of the words chosen.
Still, even if it doesn't work all that well, it's a harbinger of an inevitable development. At some point in the next couple of years, we'll have very rough spoken translation tools available to us; shortly thereafter, rough spoken translation tools will be embedded into mobile phones. After that, it's just a case of improvements in the hardware allowing for faster processing and improvements in the algorithms allowing for less-error-prone translation. I would expect that, by 2015 or so, it will be as hard to buy a mobile phone without a translation chip as it is now to buy one without a camera.
Eventually those two technologies could eventually converge, or at least intertwine. I would expect to see a "what does that say?" cameraphone application to show up any month now. And it's possible that a camera, allowing the translation system to recognize body position, gesture and perhaps even context as well as the tone and the words, may be the necessary step to making machine translation really work.
Eek, one of my pet peeves:
"...try entering a phrase into Google's "Language Tools" page, translating it to another language, then translating it again back into English..."
If this really gave a good sense of the problem, then we would evaluate human translators in the same way: "Here, translate this English into Spanish, and then give us the English original, and translate the Spanish back, and we'll see if it's exactly the same."
Second generation translation is something we don't expect humans to do -- why should we evaluate machine translation on that basis? Most translators translate primarily *into* their native language.
The reason that I go to the trouble of pointing this out is that it seems to be the most popular metric for measuring how useful MT is, and I don't think it's a very realistic one.
A more useful comparison would be to translate a short, simple sentence and a long, literary sentence, and to see which translation is more comprehensible: flowery in, garbage out. :)
So maybe one of the skills we will have to learn in order to take advantage of MT in the near term is to *write* simply in the first place.
Pat, I understand what you're saying, and I think your ultimate point is a good one, but I disagree on a couple of things.
If you were to give the translate & translate back problem to a couple of human translators, you probably wouldn't get the exact phrase back in the end -- but what you would get would still very likely bear a close semantic relationship to the original. The meaning would remain intact, even if the phrasing didn't. This is not the case with the MT tools I've played with, even with a "short, simple sentence."
For example (English-German-English):
"How do I get to the embassy?" becomes
"Wie gelange ich an die Botschaft?" becomes
"How do I arrive at the message?"
(This is certainly anecdotal, and I'm sure you could find examples of more accurate translations, but it happens to be the phrase I tried yesterday when I originally wrote this post, imagining it as a common use of a tourist translator.)
I'm not saying that the problem is impossible, only that it's harder than some may expect (or want) it to be.
I do think that, as you suggest, functional use of spoken translator tools will rely on speaking in clear, short phrases. Whether that's a good thing for language -- it makes us speak with clarity -- or a bad thing -- it's the spoken version of IM speak -- remains to be seen.
Once we crack the natural language problem, a much tougher frontier would be translating between different points of view.
If done for marketing or political gain, "SpinSoft" would just be another weapon in the arsenal of misdirection. But if done to increase shared understanding, it could do an end run around our individual biases to find a style with which we agree - just as a good teacher uses many modes and metaphors to engage different students.