PŘIBIL, Jiří, KINCL, Tomáš, BÍNA, Vladislav, NOVÁK, Michal
In: Proceedings of the World Congress on Engineering and Computer Science 2011. San Francisco, 19.10.2011 – 21.10.2011. San Francisco : International Association of Engineers, 2011, p. 51–55. ISBN 978-988-18210-9-6
Publication year: 2011

At this time all people, especially managers and businessmen, are exposed to the ever-present information pollution. This is why tools of business intelligence are of great importance; nevertheless the current methods can hardly cope with large and unstructured text sources like World Wide Web that currently becomes more and more important. To achieve this main goal we have to find and verify satisfactorily reliable methods for automatic extraction of a main context of a document, i.e., multidimensional structured characterization representing the main topic of the document. To cope with the multilingual sources we have to develop approaches that would not be dependent on the language of the source and that would not need any additional language dependent tools (like thesauri). In our conception, the context is dynamic – it means that a classification of a document will not be dependent only on the document in question but also on the corpus; the expansion of a corpus can result in a change of a document classification.