• Rezultati Niso Bili Najdeni

A Fast Implementation of Rules Based Machine Translation Systems for Similar Natural Languages

N/A
N/A
Protected

Academic year: 2022

Share "A Fast Implementation of Rules Based Machine Translation Systems for Similar Natural Languages"

Copied!
2
0
0

Celotno besedilo

(1)

Informatica37(2013) 455–456 455

A Fast Implementation of Rules Based Machine Translation Systems for Similar Natural Languages

Jernej Viˇciˇc

Faculty of Mathematics, Natural Sciences and Information Technologies University of Primorska

E-mail: jernej.vicic@upr.si

http://www.jt.upr.si/doktoratjernej/thesis/final/

Thesis Summary

Keywords:machine translation, machine translation of related languages, shallow transfer RBMT, RBMT Received:March 25, 2013

This paper is and extended abstract of the doctoral thesis [1]. It presents an overview of the systems and methods for the natural language machine translation. It focuses primarily on systems and methods for shallow transfer rule based machine translation which are better suited for the translation of related languages. The major problem of the rule-based translation systems is costly manual production of dictio- naries and translation rules in the case of a classical approach to building such systems. The work provides an overview over the collection of selected and new methods designed for automatic production of materi- als for the installation of systems based on translation rules.

Povzetek: Priˇcujoˇce delo je razširjen povzetek doktorske disertacije [1]. Predstavlja pregled strojnega prevajanja naravnih jezikov, osredotoˇca se predvsem na sisteme in metode za prevajanje na osnovi pravil plitkega prenosa, ki so najprimernejše za sorodne naravne jezike. Najveˇcja težava sistemov, ki temeljijo na pravilih, je dolgotrajna in draga roˇcna izdelava slovarjev ter prevajalnih pravil v primeru klasiˇcnega pristopa h gradnji prevajalnih sistemov na osnovi pravil. Delo ponuja pregled zbirke izbranih in na novo zasnovanih metod samodejne izdelave gradiv za postavitev prevajalnih sistemov na osnovi pravil.

1 Introduction and problem statement

The paper presents an attempt to automate all data creation processes of a rule-based shallow-transfer machine transla- tion system and its background. Several methods that au- tomate some parts of the shallow transfer Rule Based Ma- chine Translation (RBMT) system construction have been presented and are even used as part of the construction toolkits like Apertium [2], which is a widely used open source toolkit for creating machine translation systems be- tween related languages.

Parts of the creation process have been addressed by sev- eral authors, some of these technologies have been used in our experiments along with newly developed methods. All methods and materials discussed in this paper were tested on a fully functional machine translation system based on Apertium. The system uses an architecture similar to the one presented in Figure 1.

Although it seems that Statistical Machine Translation (SMT) would be a perfect choice as some of the best per- forming machine translation systems are based on the SMT technologies, the stochastic approach has a couple of draw- backs that cannot be ignored; the SMT systems, to be suc- cessful, require huge amounts of parallel texts.

Another reason for choosing the RBMT approach is the nature of the languages involved in our experiments (Slove- nian paired with Serbian, Czech, English and Estonian lan- guage). These are languages with rich inflectional mor- phology and as such they present a big problem for SMT.

Last but not least reason for using an RBMT machine translation system is the chance for the linguistic experts to further refine the results of the automatically produced data and thus to be able to improve the system in a controlled way.

2 Methodology

The modules presented in Figure 1 and numbered with numbers 1 through 5 require linguistic data (monolingual dictionaries, bilingual dictionaries, translation rules, etc.).

Each module was examined and a method for linguistic data creation was designed.

The following types of data are needed for all modules of the system: the monolingual source dictionary with mor- phological information for source language parsing, mono- lingual target dictionary with morphological information for target language generation, bilingual translation dictio- nary, finite-state rules for shallow transfer and local agree-

(2)

456 Informatica37(2013) 455–456 Jernej Viˇciˇc

Figure 1: The modules of a typical shallow transfer transla- tion system. The system [2] follows this design. An addition of the original architecture is the local agreement module tagged as number 6.

ment, statistical target language model, modeled source language tags.

3 Evaluation methodology and results

The evaluation focused only on the translation quality;

the translation speed and responsiveness of the system, user-friendliness and other features of the translation sys- tems are not presented. Were used the following methods:

the automatic objective evaluation using the METEOR [3]

metric, the non-automatic evaluation using weighted Lev- enshtein edit-distance [4] on a human corrected output of the translation system, the non-automatic subjective eval- uation following [5] guidelines. The translation system was constructed according to the methodology presented in Section 2 using the selected training set. The evaluated values in each fold and the average final values are pre- sented.

4 Discussion and further work

The agreement among all three evaluation methods is quite high, which shows that the results of the evaluation process are valid. The translation quality of the Slovenian-Serbian translation system is higher than the systems for distant lan- guage pairs. This can be attributed to the fact that the sim- ilarity of the first language pair is bigger.

The automatically generated linguistic data is far from perfect and additional manual labor will have to be exe- cuted in order to obtain better translation quality.

References

[1] J. Viˇciˇc, “Hitra postavitev prevajalnih sistemov na osnovi pravil za sorodne naravne jezike,” Ph.D.

dissertation, Univerza v Ljubljani, 2012. [Online].

Available: http://eprints.fri.uni-lj.si/1778/

[2] S. A. M. Corbi-Bellot, M. L. Forcada, Ortiz-Rojas,

“An open-source shallow-transfer machine translation

engine for the Romance languages of Spain,” inEAMT, 2005, pp. 79–86.

[3] A. Lavie and M. J. Denkowski, “The Meteor metric for automatic evaluation of machine translation,”Machine Translation, vol. 23, no. 2-3, pp. 105–115, Sep. 2009.

[4] K. S. Fu, Syntactic Pattern Recognition and Applica- tions. Prentice-Hall, Englewood Cliffs, NJ, 1982.

[5] LDC, “Linguistic data annotation specification: As- sessment of fluency and adequacy in translations,”

LDC, Tech. Rep., 2005.

Reference

POVEZANI DOKUMENTI

Since ALG ATOR was designed to be used for various kinds of problems, the criteria for measuring the quality of algorithms are not defined as a part of the system but they have to

We have designed and implemented the mechanism of natural language parser-backed rules for a LanguageTool-based grammar checking module. Our syntax allows

This issue of Scripta Manent brings two articles that deal with different aspects of translation, which corroborates the revived interest in translation as a teaching

These river systems have been shifted towards the deeper parts of the limestone as the result of the progress of karstification and the vertical uplift of the Taurus Mountains

The formed Al-based powder was compared with the commercially available Al-based powders that are generally used for conventional sintering technology.. In the first part of this

Since population registers are address-based, the address as a defining element of private household is primarily used by those countries that have a

Target sentence is parsed using a parser Collins (2003) previously trained on a large treebank Marcus (1993) as described in Section 1, producing parse trees with

There are given new construction methods of the A-optimal chemical balance weighing designs based on incidence matrices of the balanced bipartite weighing designs and the ternary