Sentiment analysis in social media contents using natural language processing techniques

Álvarez López, Tamara

Sentiment analysis in social media contents using natural language processing techniques

Álvarez López, Tamara

Dirigida per:

Enrique Costa Montenegro Director
Milagros Fernández Gavilanes Director/a

Universitat de defensa: Universidade de Vigo

Fecha de defensa: 29 de de novembre de 2019

Tribunal:

Elena Lloret Pastor President/a
Juan Carlos Burguillo Rial Secretari
Benoît Sagot Vocal

Departament:

Enxeñaría telemática

Tipus: Tesi

Teseo: 599110 DIALNET Investigo editor

Resum

Sentiment Analysis ( SA ) is the field of study that aims at extracting in an automatic way the opinion expressed in a piece of text about a certain subject. This task gained importance in the last decades due to the growth of Internet and the Web 2.0. Since then, numerous platforms arose, such as social networks, electronic shops, blogs, forums, etc., where users could share information and interact with each other, creating new content in the Web every second. All this information became very valuable for companies, who wanted to know what the people thought about their products, as well as researchers, or other users who searched for recommendations about a variety of topics. However, analyzing all this information turned out to be a very hard task, and having a way of extracting useful information in an automatic way became a need. Throughout this thesis, the task of SA is studied at different levels of granularity, which go from extracting the sentiment expressed in a whole sentence to specific aspects of a product or service. This last one is the so-called Aspect-Based Sentiment Analysis ( ABSA ). In this research state-of-the-art systems are developed for dealing with these tasks. The sentiment is classified into positive, negative and neutral and in order to determine it, an unsupervised approach is built, adapted to each specific context with minimal manual intervention, but still competitive even when compared to other supervised approaches. It is based on syntactic dependencies between words and propagation rules according to grammatical constructions, as well as on the construction of polarity lexicons automatically adapted to each particular domain. On the other side, in the part of the study dedicated to ABSA , several subtasks are addressed, which are the aspect extraction, category detection and the extraction of the sentiment expressed towards each of these aspects and categories. Different Machine Learning techniques were applied in this part based on classifiers, as well as unsupervised ones based on syntactic and lexical analysis. Several domains and languages were evaluated for each of the studies developed, including contexts like fashion, politics, restaurants or movies, for both English and Spanish texts, as well as some additional experiments on a French dataset. Moreover, a new domain in the field of ABSA is studied here, which is the context of book reviews. As no resources are available, a new dataset is presented, manually annotated with information related to aspects and sentiments, which is publicly available online. Different kinds of book reviews are analyzed and compared this domain to others widely studied in the ABSA literature. This domain presents new challenges due to the way of expressing opinions about books and the kind of vocabulary used. However, it can be a very useful tool for later improving other applications such as book recommendation, digital libraries or electronic book shops.