\“NIBA – TAG” - A Tool For Analyzing And Preparing German Texts

G Fliedl; G Weber

doi:10.2495/DATA020331

WIT Press

\“NIBA – TAG” - A Tool For Analyzing And Preparing German Texts

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2002

Size

339 kb

Paper DOI

10.2495/DATA020331

WIT Press

Author(s)

G Fliedl & G Weber

Abstract

\“NIBA-TAG” – A tool for analyzing and preparing German texts G. Fliedl, G. Weber University of Klagenfurt – Department for Business Informatics and Application Systems (IWAS); Austria Abstract NIBA-TAG is a kind of multilevel natural language tagger with rich functionality. It functions as a word-stemmer, a morphological parser and a normal POS-Tagger, which uses syntactic and semantic features for contextually influenced word-tagging. Each rule is based on a ranking-mechanism which is currently related to the levels \“fact”, \“proposal” and \“guess”. One of the postprocessing-units analyzes the ranking-structure and can change a \“proposal” to a \“fact”, if enough rules made an identical proposal for a word. The default output is XML, where the level of precision can be specified. So one could generate a XML-file only including the guesses, or a file with all attributes relevant for the status of a proposal. 1 Introduction The automated analysis of language is important for many tasks in computer science. A mass of information exists in unstructured texts. For analyzing all different sorts of unstructured text we developed our tagging-tool. It is a very efficient instrument for filtering out the STRUCTURE OF CONTENT. In the field of textual analysis no satisfactory results are known up to now, because methods have always been focussed on mainly statistical methods (Kupiec [1]) or to a reduced linguistic functionality. So we concentrated on the development of a general tool for multilevel tagging. The system was implemented in Perl and Prolog; the technical features are: 15423 sentences (164380 words) of a testcorpus have been tagged in 1450 seconds. Linguistic analysis is done according to the NTMS model, which stands for

Keywords

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: