The Role of Homograms in Machine Translation

Home > Archive > 2018 > Volume 8 Number 2 (Apr. 2018) >

IJMLC 2018 Vol.8(2): 90-97 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2018.8.2.669

Lucia Nacinovic Prskalo and Marija Brkic Bakaric

Abstract—The Croatian language is a pitch-accent language, in which the tone contour realized in the stressed syllable carries the lexical information. Therefore, in some cases, a different lexical accent gives the word a different meaning. In such cases, the ambiguity of the word in written texts, where accents are not usually marked, can be solved by determining the appropriate accent. There are also cases when various basic and derived forms of words have different meanings, different morphosyntactic descriptions (MSDs), and possibly different accents. When words have the same written forms but different meanings, they are called homograms. In order to resolve the ambiguity of homograms, we created a lexicon of homograms that is comprised of all Croatian nouns of different gender, which have the same written forms (if accents are not marked) but different meanings, MSDs, and possibly different accents. This lexicon consists of 19,366 entries and 3,460 unique homograms. Each entry in the lexicon comprises the homogram (unaccented word), the accented word, the corresponding MSD, and the accented lemma. The obtained lexicon enables us to identify and disambiguate homograms within the corpus efficiently and accurately. We also evaluated and analyzed the performance of machine translation (MT) systems for the Croatian–English language pair with a special emphasis on homogram translation. We confirmed that the disambiguation of homograms can improve the performance of MT systems in avoiding major translation mistakes related to assigning the wrong meaning to homograms.

Index Terms—Disambiguation of homograms, lexicon of homograms, pitch accent language, word sense disambiguation.

The authors are with the Department of Informatics, University of Rijeka, Croatia (e-mail: lnacinovic@inf.uniri.hr, mbrkic@inf.uniri.hr).

[PDF]

Cite: The Role of Homograms in Machine Translation, "Lucia Nacinovic Prskalo and Marija Brkic Bakaric," International Journal of Machine Learning and Computing vol. 8, no. 2, pp. 90-97, 2018.

PREVIOUS PAPER

First page

NEXT PAPER

Solving Traveling Salesman Problems with Ant Colony Optimization Algorithms in Sequential and Parallel Computing Environments: A Normalized Comparison

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2018 > Volume 8 Number 2 (Apr. 2018) >

The Role of Homograms in Machine Translation

General Information

Article Metrics in Dimensions