Featured Research

from universities, journals, and other organizations

Connection between human translation and computerized translation programs

Date:
July 17, 2014
Source:
University of Haifa
Summary:
A number of new discoveries relating to the unique linguistic features of text that has been translated by a person that can significantly improve the capabilities of computerized translation programs, research shows.

Have you ever input text in some language into "Google Translate" and received a translation that seemed too superficial? A new study that was conducted by the Department of Computer Science at the University of Haifa suggests a number of new discoveries relating to the unique linguistic features of text that has been translated by a person that can significantly improve the capabilities of computerized translation programs.

"There are significant statistical differences between text that was originally written in a certain language, and text that was translated into that language by a person, no matter how talented the translator. The human reader may not be able to detect these differences but a computer can identify them with perfect accuracy," says Professor Shuli Wintner, Head of the Department of Computer Science, who is heading this project.

Automatic translation software programs such as Google Translate have become useful tools for almost every home, and they yield translations ranging from reasonable to very good in a wide range of languages. However, there are quite a number of errors and inaccuracies even when translating languages that are close to each other, especially in long sentences. Attempts to develop translation software dates back to the 1950s, when the predominant method used was based on a large bilingual dictionary and a great number of grammar rules that characterize correlations between different languages.

This approach, however, failed to provide good results until the early 1990s when researchers at IBM suggested changing the method's paradigm. Translation systems began to be based on two main statistical models that estimate two things: the probability of sequences of words in the target language -- the language we wish to translate into ("language model") -- and the probability that a particular sequence of words in the source language will be translated into a particular sequence in the target language ("translation model").

A statistical translation program needs to scan a vast number of texts in order to obtain good estimates: the language model is based on a large collection of texts in the target language, whereas the translation model is compiled from "parallel texts." Parallel texts are texts that were translated (by professional translators) from the source language into the target language, and from which the model learns to match sequences of words in both languages. Translation programs combine these two models in order to determine which translation is the best for a given sentence: the translation model ensures a translation that is true to the source, and the language model ensures fluency in the target language.

However, findings in translation studies indicate considerable differences between texts that were originally written in a given target language, and texts that were translated into that language from another. This study, conducted at the University of Haifa, found that these differences effect how accurately the translation program translates. "No matter how good and successful the human translator is, the language in which a given text is written -- the source language -- leaves 'fingerprints' on the resulting translation." There also seems to be cognitive load during the translation process that leads to a final product that is significantly different from texts that were originally written in the same language. The human reader may not be able to tell the difference between a document originally written in Hebrew and one that was translated from English into Hebrew -- but the computer can distinguish between them," Prof. Wintner explained.

In earlier studies that were conducted as part of the project, Prof. Wintner and his research partners, Dr. Noam Ordan and research student Vered Valensky, found which key linguistic features distinguish between source and translated texts. It turns out that the differences are not the result of language richness or of sentence length; they result from unexpected issues such as punctuation. "We discovered that text in English that was translated from German had five times more exclamation marks than source text in English," he explained, "however, the most significant characteristics of translated text are the different syntactic structures."

New research results were obtained by Dr. Gennadi Lembersky's in his doctoral dissertation under the direction of Prof. Wintner, with Dr. Ordan. The study found that for a program to be more precise, the direction of translation of the parallel text that the translation model is compiled from needs to match the direction in which we wish to translate. In other words -- when we want to translate text from English into Hebrew, we need to compile a translation model from texts that were translated from English into Hebrew, not from texts that were translated from Hebrew into English. While this seems obvious, the second finding is more surprising: statistical translation programs are much more accurate when their language model is based on texts that have been translated into the target language -- i.e., the translation from English into Hebrew by a program with a language model compiled from texts in Hebrew that had been translated from English was better and more accurate than that of a program based on texts written originally in Hebrew. This doctoral thesis received the Best Thesis Award for 2013 from the European Association for Machine Translation (EAMT) for these findings.

Prof. Wintner says that he believes that within ten years computerized translation programs will be so accurate for a number of language pairs, that it will not be possible to distinguish computer-generated translations. "Over the past twenty years, computerized processing of languages has moved over to using only statistical models instead of the linguistic knowledge they generate. We have shown that awareness of the linguistic features of text -- in our case linguistic features of human translation -- can also significantly benefit applications that are essentially statistical. In the future, we will need to move towards a program that combines both these characteristics," she concluded.


Story Source:

The above story is based on materials provided by University of Haifa. Note: Materials may be edited for content and length.


Cite This Page:

University of Haifa. "Connection between human translation and computerized translation programs." ScienceDaily. ScienceDaily, 17 July 2014. <www.sciencedaily.com/releases/2014/07/140717094605.htm>.
University of Haifa. (2014, July 17). Connection between human translation and computerized translation programs. ScienceDaily. Retrieved September 16, 2014 from www.sciencedaily.com/releases/2014/07/140717094605.htm
University of Haifa. "Connection between human translation and computerized translation programs." ScienceDaily. www.sciencedaily.com/releases/2014/07/140717094605.htm (accessed September 16, 2014).

Share This



More Mind & Brain News

Tuesday, September 16, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

FDA Eyes Skin Shocks Used at Mass. School

FDA Eyes Skin Shocks Used at Mass. School

AP (Sep. 15, 2014) The FDA is considering whether to ban devices used by the Judge Rotenberg Educational Center in Canton, Massachusetts, the only place in the country known to use electrical skin shocks as aversive conditioning for aggressive patients. (Sept. 15) Video provided by AP
Powered by NewsLook.com
Shocker: Journalists Are Utterly Addicted To Coffee

Shocker: Journalists Are Utterly Addicted To Coffee

Newsy (Sep. 13, 2014) A U.K. survey found that journalists consumed the most amount of coffee, but that's only the tip of the coffee-related statistics iceberg. Video provided by Newsy
Powered by NewsLook.com
'Magic Mushrooms' Could Help Smokers Quit

'Magic Mushrooms' Could Help Smokers Quit

Newsy (Sep. 11, 2014) In a small study, researchers found that the majority of long-time smokers quit after taking psilocybin pills and undergoing therapy sessions. Video provided by Newsy
Powered by NewsLook.com
'Fat Shaming' Might Actually Cause Weight Gain

'Fat Shaming' Might Actually Cause Weight Gain

Newsy (Sep. 11, 2014) A study for University College London suggests obese people who are discriminated against gain more weight than those who are not. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins