What is Statistical Machine Translation?

Over the past few years, a transformation has taken place in machine translation tools as rule-based translation systems have given way to statistical language analysis techniques that use known translations (e.g., United Nations archives and other open content) to derive nuances and meanings not easily addressed by rule-based systems. Tools like Google Translate have used statistical methods to move machine translation to the point where it is now a viable, low-cost, and easy option for automated, rapid translation on web content. While the translation tools are not yet perfect, they are fairly accurate in most cases, and are well-suited for credible on-the-fly translations.

The ability to embed translation tools quickly and easily into websites such that the viewer may choose a preferred language removes the need to prepare individual copies of online material in different languages. This simplifies upkeep and maintenance as well as making it easier to deploy new content quickly. Statistical machine translation is an increasingly robust and low-cost option that has developed to the point where it is a viable and easy solution for museums looking to make general information easily available in multiple languages.

INSTRUCTIONS: Enter your responses to the questions below. This is most easily done by moving your cursor to the end of the last item and pressing RETURN to create a new bullet point. Please include URLs whenever you can (full URLs will automatically be turned into hyperlinks; please type them out rather than using the linking tools in the toolbar).

Please "sign" your contributions by marking with the code of 4 tildes (~) in a row so that we can follow up with you if we need additional information or leads to examples- this produces a signature when the page is updated, like this: - alan alan Aug 13, 2010

(1) How might this technology be relevant to the educational sector you know best?

  • Translation of "foreign language" academic literature into your own language. - tony.hirst tony.hirst May 14, 2011 Statistical machine translation maps real time text or speech to large data sets of known translations and returns the translation with the highest probability of a match. The first such set used by Google was the public transcripts of sessions over the 60+ year history of the United Nations -- which spanned 50+ languages and hundreds of thousands of pages. I am intrigued not only by the technology, but the very idea of using the records of human discourse in this way. The field is about language, but also computer science, math, and of course statistical probability. Add in the rich histrorical underpinnings in the data, and how could this not be relevant to edu? - Larry Larry May 16, 2011
  • another response here

(2) What themes are missing from the above description that you think are important?

There may be an issue here, though, in that the statistical machine translation methods may not work very well in translating formal academic language? - tony.hirst tony.hirst May 14, 2011
  • Speech recognition: Android phones, and Chrome browser at least now have support for voice input, again based on statistical models. In the US, I think you can "sign in" to Google voice input and so benefit from personally training the borg with a view to improving recognition of your voice. Voice input plays naturally with translation and text-to-speech services, as Google Translate demonstrates: http://googlesystem.blogspot.com/2011/04/google-translate-now-with-voice-input.html - tony.hirst tony.hirst May 14, 2011
  • I think it is worth noting that most schools (that I know of) explicitly steer children away from the use of these tools currently - partly, I suspect, because it is seen as 'cheating' and partly because the results aren't good enough for language learning. - andy.powell andy.powell May 19, 2011

(3) What do you see as the potential impact of this technology on teaching, learning, research or information management within the next five years?

(4) Do you have or know of a project working in this area?

Please share information about related projects in our Horizon Project sharing form.