WIT Press


A Multilanguage Platform For Open Source Intelligence

Price

Free (open access)

Paper DOI

10.2495/DATA070321

Volume

38

Pages

10

Published

2007

Size

1,125 kb

Author(s)

N. Baldini, F. Neri & M. Pettoni

Abstract

Open Source Intelligence (OSINT) is an intelligence gathering discipline that involves collecting information from open sources and analyzing it to produce usable intelligence. The revolution in information technology is making open sources more accessible, ubiquitous, and valuable, making open intelligence at less cost than ever before. The explosion in OSINT is transforming the intelligence world with the emergence of open versions of the specialistic arts of human intelligence (HUMINT), overhead imagery intelligence (IMINT), and signals intelligence (SIGINT). The international Intelligence Communities have seen open sources grow increasingly easier and cheaper to acquire in recent years. However, up to 80% of electronic data is textual and most valuable information is often hidden and encoded in pages which are neither structured, nor classified. The process of accessing all these raw data, heterogeneous in terms of source and language, and transforming them into information is therefore strongly linked to automatic textual analysis and synthesis, which are greatly related to the ability to master the problems of multilinguality. This paper describes a multilingual indexing, searching and clustering system, designed to manage huge sets of data collected from different and geographically distributed information sources, which provides language independent search and dynamic classification features. The Joint Intelligence and EW Training Centre (CIFIGE) is a military institute, which has adopted this system in order to train the military and civilian personnel of Defence in the OSINT discipline. Keywords: open source intelligence, focused crawling, natural language processing, morphological analysis, syntactic analysis, functional analysis, unsupervised clustering.

Keywords

open source intelligence, focused crawling, natural language processing, morphological analysis, syntactic analysis, functional analysis, unsupervised clustering.