WIT Press

Finding The Real Subject: The Application Of Categorisation Methods To Forum Messages


Free (open access)





Page Range

115 - 123




372 kb

Paper DOI



WIT Press


G. La Rocca


The growth of interactive communities and multimedia nets seems to be uncontrollable to the extent that both human and global communication are impossible today without the Internet. Nowadays Computer Mediated Communication has created and developed on-line communities that well in digitalized territories. This new segment of social science is mostly built up of \“words” and the necessary strategy and tools should be aimed at enabling the scanning, the attentive reading, the understanding and the explanation of such segment. On-line communities are perfect places to study linguistic behaviours and mass interactions. The virtual space in which people can speak are chat lines, mailing lists, forums, etc. In a forum we have the possibility to send one or more messages, participating to the discussion of a specific topic, in this case wine. This forum gathers specialists, technicians and ordinary people who end up becoming a real community, where information and news about wine are frequently swapped. The study of a forum is a method normally used to verify customer satisfaction; in our case we want to find out if there is a parallelism between the titles of the subject and the content of the messages. Because we frequently answer messages without changing their title, to monitor only these titles is not enough for marketing or communication specialists if they want to obtain an exhaustive description of customer communication. The purpose of this work is to obtain a new list of subjects for each message. For this reason we use Text Mining techniques which allow us to look for sets of words inside the texts. TaltaC² \“Entity Search” utility has been used in order to search a distinctive word sequence inside fragments by using complex queries with regular expression. Keywords: text mining, automatic categorization, entity search, computer mediate communication, forum.


text mining, automatic categorization, entity search, computer mediate communication, forum.