Home > Archive > 2012 > Volume 2 Number 5 (Oct. 2012) >
IJMLC 2012 Vol.2(5): 614-617 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2012.V2.200

The Review of Fields Similarity Estimation Methods

Mahsa Sabbagh Nobarian and Mohammad Reza Feizi Derakhshi

Abstract—Accuracy and consistency are the most important factors in any databases but increasing size of data has become a great challenge in this area. Detecting duplicate records is an important and very difficult process in huge databases containing millions of records. Field matching is a major process for duplicated record detection. In this paper, an attempt is made to provide a brief survey of field matching techniques and their efficiency.

Index Terms—Duplicate detection, character based similarity metrics, edit distance, Jaro distance, Q-Grams.

Mohammad Reza Feizi Derakhshi is with Department of Computer, University of Tabriz, Tabriz, Iran (e-mail: mfeizi@tabrizu.ac.ir)
Mahsa Sabbagh Nobarian is with Department of Computer, Islamic Azad University, Shabestar Branch, Shabestar, Iran (e-mail:msn.sabbagh@yahoo.com)

[PDF]

Cite: Mahsa Sabbagh Nobarian and Mohammad Reza Feizi Derakhshi, "The Review of Fields Similarity Estimation Methods," International Journal of Machine Learning and Computing vol. 2, no. 5, pp. 614-617, 2012.

General Information

  • E-ISSN: 2972-368X
  • Abbreviated Title: Int. J. Mach. Learn.
  • Frequency: Quaterly
  • DOI: 10.18178/IJML
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals LibraryCNKI.
  • E-mail: ijml@ejournal.net


Article Metrics in Dimensions