Software Development for Identifying Persian Text Similarity

Elham Mahdipour; Rahele Shojaeian Razavi; Zahra Gheibi

doi:doi:10.11648/j.ijiis.s.2014030601.21

| Peer-Reviewed

Software Development for Identifying Persian Text Similarity

Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi

Published in International Journal of Intelligent Information Systems (Volume 3, Issue 6-1)

Received: 21 October 2014 Accepted: 23 October 2014 Published: 29 October 2014

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.

Published in	International Journal of Intelligent Information Systems (Volume 3, Issue 6-1) This article belongs to the Special Issue Research and Practices in Information Systems and Technologies in Developing Countries
DOI	10.11648/j.ijiis.s.2014030601.21
Page(s)	61-66
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2014. Published by Science Publishing Group

Keywords

Text Similarity, TF-IDF, Semantic Similarity, Stemming

References

[1]	WenyinL, Hao TY, ChenW, FengM “A web-based platform for user interactive question answering”. World Wide Web: Internet Web Inform Syst (2009) 12(2):107–124, 2009.
[2]	Park EK, Ra DY, Jang MG, "Techniques for improving web retrieval effectiveness". Inform Process Manag 41:1207–1223, 2005.
[3]	Atkinson-Abutridy J, Mellish C, Aitken S, "Combining information extraction with genetic algorithms for text mining", IEEE Intelligent Systems, pp: 22-30, 2004, Available on: http://homepages.abdn.ac.uk/c.mellish/pages/papers/atkinsonieee.pdf.
[4]	K Metzler D, Dumais S, Meek C, "Similarity measures for short segments of text". In: Proceedings of the 29th European conference on information retrieval (ECIR 2007). Lecture notes in computer science,vol 4425, Springer, Berlin , pp 16–27, 2007.
[5]	Hassel, M., Resource Lean and Portable "Automatic Text Summarization", Stockholm, Sweden. p. 144, 2007.
[6]	Turney, P. "Mining the web for synonyms: PMI-IR versus LSA on TOEFL". In Proceedings of the Twelfth European Conference on Machine Learning, 2001, Available on: http://www.extractor.com/turney-ecml2001.pdf.
[7]	Landauer T. K., Foltz P., and Laham D, "Introduction to latent semantic analysis". Discourse Processes 25, 1998.
[8]	K. Aas and L. Eikvil, “Text Categorisation: A Survey”, 1999, Available on: http://citeseer.nj.nec.com/aas99text.html.
[9]	Wu Z., Palmer M., "Verb semantics and lexical selection". ACL' 94 Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp: 133-138, 1994. Available on: http://dl.acm.org/citation.cfm?id=981751.
[10]	Voorhees E., "Using WordNet to disambiguate word senses for text retrieval", SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on research and development information retrieval, pp: 171-180, 1993, Available on: http://dl.acm.org/citation.cfm?id=160715.
[11]	R. Krovetz, "Viewing morphology as an inference process", Proc. 16th ACM SIGIR Conference, Pittsburgh, June 27-July 1, pp. 191-202, 1993.
[12]	Hessami Fard Reza, Ghasem sany Gholamreza, "Design of a stemming algorithm for Persian", 11th Annual Conference of Computer Society of Iran, Tehran, 2006. (Persian) Available on: http://www.civilica.com/Paper-ACCSI11-ACCSI11_066.html
[13]	Qazvinian,Vahed.,SharifHassnabadi,Leila., Halavati, Ramin.,"Summarizing Text With a Genetic Algorithm-Based Sentence Extraction", Int. J. Knowledge Management Studies, Vol. 2, No. 4, pp:426-444, 2008, Available on: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.2201&rep=rep1&type=pdf.
[14]	Rada Mihalcea, Courtney Corley, Carlo Strapparava, "Corpus-based and Knowledge-based measures of text semantic similarity", AAAI '06 Proceeding of the 21st national conference on Artificial intelligence, Vol. 1, pp: 775-780, 2006.
[15]	Antonio Toral, Oscar Ferrandez, Eneko Agirre, Rafael Munoz, "A study on linking Wikipedia categories to Wordnet synsets using text similarity", International Conference RANLP 2009, Borovets, Bolgaria, pp: 449-454, 2009.
[16]	Xiaojun Quan, Gang Liu, Zhi Lu, Xingliang Ni, Liu Wenyin, "Short text similarity based on probabilistic topics", Knowl Inf Syst, 25, pp:473-491, DOI:10.1007/s10115-009-0250-y, 2010.

Cite This Article

Plain Text BibTeX RIS

APA Style

Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi. (2014). Software Development for Identifying Persian Text Similarity. International Journal of Intelligent Information Systems, 3(6-1), 61-66. https://doi.org/10.11648/j.ijiis.s.2014030601.21

Copy | Download

ACS Style

Elham Mahdipour; Rahele Shojaeian Razavi; Zahra Gheibi. Software Development for Identifying Persian Text Similarity. Int. J. Intell. Inf. Syst. 2014, 3(6-1), 61-66. doi: 10.11648/j.ijiis.s.2014030601.21

Copy | Download

AMA Style

Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi. Software Development for Identifying Persian Text Similarity. Int J Intell Inf Syst. 2014;3(6-1):61-66. doi: 10.11648/j.ijiis.s.2014030601.21

Copy | Download

@article{10.11648/j.ijiis.s.2014030601.21,
  author = {Elham Mahdipour and Rahele Shojaeian Razavi and Zahra Gheibi},
  title = {Software Development for Identifying Persian Text Similarity},
  journal = {International Journal of Intelligent Information Systems},
  volume = {3},
  number = {6-1},
  pages = {61-66},
  doi = {10.11648/j.ijiis.s.2014030601.21},
  url = {https://doi.org/10.11648/j.ijiis.s.2014030601.21},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.s.2014030601.21},
  abstract = {The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.},
 year = {2014}
}

Copy | Download

TY  - JOUR
T1  - Software Development for Identifying Persian Text Similarity
AU  - Elham Mahdipour
AU  - Rahele Shojaeian Razavi
AU  - Zahra Gheibi
Y1  - 2014/10/29
PY  - 2014
N1  - https://doi.org/10.11648/j.ijiis.s.2014030601.21
DO  - 10.11648/j.ijiis.s.2014030601.21
T2  - International Journal of Intelligent Information Systems
JF  - International Journal of Intelligent Information Systems
JO  - International Journal of Intelligent Information Systems
SP  - 61
EP  - 66
PB  - Science Publishing Group
SN  - 2328-7683
UR  - https://doi.org/10.11648/j.ijiis.s.2014030601.21
AB  - The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.
VL  - 3
IS  - 6-1
ER  -

Copy | Download

Author Information

Elham Mahdipour

Computer Engineering Department, Khavaran Institute of Higher Education, Mashhad, Iran
Rahele Shojaeian Razavi

Computer Engineering Department, Khavaran Institute of Higher Education, Mashhad, Iran
Zahra Gheibi

Computer Engineering Department, Khavaran Institute of Higher Education, Mashhad, Iran

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi. (2014). Software Development for Identifying Persian Text Similarity. International Journal of Intelligent Information Systems, 3(6-1), 61-66. https://doi.org/10.11648/j.ijiis.s.2014030601.21

Copy | Download

ACS Style

Elham Mahdipour; Rahele Shojaeian Razavi; Zahra Gheibi. Software Development for Identifying Persian Text Similarity. Int. J. Intell. Inf. Syst. 2014, 3(6-1), 61-66. doi: 10.11648/j.ijiis.s.2014030601.21

Copy | Download

AMA Style

Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi. Software Development for Identifying Persian Text Similarity. Int J Intell Inf Syst. 2014;3(6-1):61-66. doi: 10.11648/j.ijiis.s.2014030601.21

Copy | Download

@article{10.11648/j.ijiis.s.2014030601.21,
  author = {Elham Mahdipour and Rahele Shojaeian Razavi and Zahra Gheibi},
  title = {Software Development for Identifying Persian Text Similarity},
  journal = {International Journal of Intelligent Information Systems},
  volume = {3},
  number = {6-1},
  pages = {61-66},
  doi = {10.11648/j.ijiis.s.2014030601.21},
  url = {https://doi.org/10.11648/j.ijiis.s.2014030601.21},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.s.2014030601.21},
  abstract = {The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.},
 year = {2014}
}

Copy | Download

TY  - JOUR
T1  - Software Development for Identifying Persian Text Similarity
AU  - Elham Mahdipour
AU  - Rahele Shojaeian Razavi
AU  - Zahra Gheibi
Y1  - 2014/10/29
PY  - 2014
N1  - https://doi.org/10.11648/j.ijiis.s.2014030601.21
DO  - 10.11648/j.ijiis.s.2014030601.21
T2  - International Journal of Intelligent Information Systems
JF  - International Journal of Intelligent Information Systems
JO  - International Journal of Intelligent Information Systems
SP  - 61
EP  - 66
PB  - Science Publishing Group
SN  - 2328-7683
UR  - https://doi.org/10.11648/j.ijiis.s.2014030601.21
AB  - The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.
VL  - 3
IS  - 6-1
ER  -

Copy | Download