KINCL, Tomáš, NOVÁK, Michal, PŘIBIL, Jiří
In: Proceedings of the The 9th European Conference on Management Leadership and Governance. Klagenfurt, 14.11.2013 – 15.11.2013. Klagenfurt : ACPI, 2013, p. 122–129. ISBN 978-1-909507-88-3
Publication year: 2013

Sentiment analysis and opinion mining is being perceived as one of the major trends of the nearest future. This issue follows up on the spontaneous and massive expansion of new media (esp. social networks). The amount of the usergenerated content published on social networks significantly increases every day and becomes an important source of information for potential customers. More than 75 % of the users confirm that customer’s reviews have a significant influence on their purchase and they are willing to pay more for a product with better customer reviews. Furthermore one third of the users has posted an online review or rating regarding a product or service and thus became an influencer himself. Using sentiment analysis, company can take advantage to get insight from (social) media, recognize company or product reputation or develop marketing strategy responding to the negative sentiment and positively impact consumer’s perception. Moreover, top influencers and opinion makers can be identified for further cooperation. Even though social media monitoring is commonly carried out automatically (by tracking selected channel or by crawling the web and searching for given keywords) the analysis and interpretation of retrieved data is still often performed manually. Such unsystematic approach is then prone to subjective error and is dependent on the experience and skills of the person performing the analysis. Thus there is a strong call for automated methods (based on computer-based processing and modeling) which would be able to classify expressed sentiment automatically. Good results can be obtained with supervised learning models (i.e. support vector machine models). However, for a good performance a good training set is needed. Such approaches also often work with lexical databases (i.e. WordNet) or sentiment vocabularies (identifying polarity keywords with the sentiment clearly distinguished i.e. “horrible”, “bad”, “worst”). These models do not work very well when the training set comes from different domain than the testing data and also not many studies have addressed sentiment analysis issue for morphologically rich languages, i.e. Arabic, Hebrew, Turkish or Czech. This experiment tries to develop and evaluate a sentiment analysis model for Czech language (which is morphologically rich) which is not dependent on any prior information (lexical databases or sentiment vocabularies which are not available for Czech language) and works well on different domains. As training set data from Czech-Slovak Film Database were used. The support vector machine based classification model has been then tested on different domain (data from an e-shop selling a wide range of products from electronics to clothing or drugstore goods). With a good results (accuracy around 80 %), the model has been also tested on other languages, including Amazon customer reviews in English (Amazon.com, Amazon.co.uk), German (Amazon.de), Italian (Amazon.it) and French (Amazon.fr). Even on other languages, the model still provided a good performance ranging from 70 to 80 %. This may not sound impressive but there are studies reporting that human raters typically agree about 80 % of the time. Thus if an automated systems were absolutely correct about sentiment classification, humans would still disagree with the results about 20 % of the time (since they disagree at this level about any answer).