Research Article
BibTex RIS Cite

A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL

Year 2010, Volume: 11 Issue: 2, 163 - 172, 29.11.2010

Abstract

We used Lemur Toolkit, an open source toolkit designed for Information Retrieval research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish language. We investigate effectiveness of three retrieval models Lemur supports, especially Language modeling approach to Information Retrieval, combined with language specific preprocessing techniques. Our experiments show that language specific preprocessing significantly improves retrieval performance for all retrieval models. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied. 

References

  • Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
  • And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
  • Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
  • Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
  • Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
  • Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
  • Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
  • Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
  • Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
  • Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
  • Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
  • Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
  • Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
  • Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
  • Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
  • Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.

TÜRKÇE METİN GERİ GETİRIMİNDE DİL MODELLEME YAKLAŞIMI

Year 2010, Volume: 11 Issue: 2, 163 - 172, 29.11.2010

Abstract

Bu çalışmada, bilgi erişimi araştırması için tasarlanmış açık kaynak kodlu bir araç olan Lemur kullanılarak, Türkçe dili için hazırlanmış TREC benzeri bir derlem üzerinde otomatik indeksleme ve geri getirme deneyleri gerçekleştirildi. Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı. Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi. Ayrıca Türkçe dili için en iyi performans dil modelleme yaklaşımından elde edildi.

References

  • Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
  • And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
  • Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
  • Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
  • Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
  • Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
  • Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
  • Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
  • Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
  • Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
  • Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
  • Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
  • Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
  • Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
  • Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
  • Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.
There are 16 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Ozgur Yilmazel

Publication Date November 29, 2010
Published in Issue Year 2010 Volume: 11 Issue: 2

Cite

APA Yilmazel, O. (2010). A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, 11(2), 163-172.
AMA Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. December 2010;11(2):163-172.
Chicago Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11, no. 2 (December 2010): 163-72.
EndNote Yilmazel O (December 1, 2010) A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11 2 163–172.
IEEE O. Yilmazel, “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”, AUJST-A, vol. 11, no. 2, pp. 163–172, 2010.
ISNAD Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11/2 (December 2010), 163-172.
JAMA Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. 2010;11:163–172.
MLA Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, vol. 11, no. 2, 2010, pp. 163-72.
Vancouver Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. 2010;11(2):163-72.