Abstract—Sequence alignment is one of the methods widely used to determine string similarity. It computes similarity by aligning the component characters a string, and summing the similarity scores of pairs of matched characters. Using an appropriate character similarity measure is important when performing the alignment-based similarity calculation, since the string similarity is highly depending on the character similarity. In this paper, we focus on the character similarity learning process for string classification, particularly for when one set of strings that belong to the same class is given. Our method uses the matching frequency to calculate the character similarity. The performance of the method is also demonstrated by experimental evaluation.
Index Terms—Character similarity, string classification, sequence alignment, scoring matrix.
Sung-Hwan Kim, Chang-Seok Ock and Hwan-Gue Cho are with Dept. of Computer Engineering, Pusan National University, Busan, South Korea (e-mail: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org).
Cite:Sung-Hwan Kim, Chang-Seok Ock, Jong Kyu Seo, and Hwan-Gue Cho, "Construction of Adaptive Scoring Matrix Using Similar Strings as a Training Set," International Journal of Machine Learning and Computing vol. 3, no. 1, pp. 112-116, 2013.