Interdisciplinary optimism? Sentiment analysis of Twitter data

Interdisciplinary research has faced many challenges, including institutional, cultural and practical ones, while it has also been reported as a ‘career risk’ and even ‘career suicide’ for researchers pursuing such an education and approach. Yet, the propagation of challenges and risks can easily lead to a feeling of anxiety and disempowerment in researchers, which we think is counterproductive to improving interdisciplinarity in practice. Therefore, in the search of ‘bright spots’, which are examples of cases in which people have had positive experiences with interdisciplinarity, this study assesses the perceptions of researchers on interdisciplinarity on the social media platform Twitter. The results of this study show researchers’ many positive experiences and successes of interdisciplinarity, and, as such, document examples of bright spots. These bright spots can give reason for optimistic thinking, which can potentially have many benefits for researchers’ well-being, creativity and innovation, and may also inspire and empower researchers to strive for and pursue interdisciplinarity in the future.

Hashtags Hashtags are an important element of Twitter and can be used to facilitate a search while simultaneously convey opinions or sentiments. For example, the hashtag #love reveals a positive sentiment or feeling, and tweets using the hashtag are all indexed by #love. Twitter allows users to create their own hashtags and poses no restrictions in appending the hashtag symbol (i.e., #) in front of any given text. Following the example of the #love hashtag, we preprocessed hashtags by removing the hash sign, essentially making #love equal to the word love.
Contractions and repeating characters Contractions, such as don't and can't, are a common phenomenon in the English spoken language and, generally, less common in formal written text. For tweets, contractions can be found in abundance and are an accepted means of communication. Contractions were preprocessed by splitting them into their full two-word expressions, such as do not and can not. In doing so, we normalized contractions with their "decontracted" counterparts. Another phenomenon occurring in tweets is the use of repeating characters, such as I loveeeee it, often used for added emphasis. Words that have repeated characters are limited to a maximum of two consecutive characters. For example, the word loveee and loveeee are normalized to lovee. In doing so, we maintained some degree of emphasis.
Lemmatization and uppercase words For grammatical reasons, different word forms or derivationally related words can have a similar meaning and, ideally, we would want such terms to be grouped together. For example, the words like, likes, and liked all have similar semantic meaning and should, ideally, be normalized. Stemming and lemmatization are two NLP techniques to reduce inflectional and derivational forms of words to a common base form. Stemming heuristically cuts off derivational affixes to achieve some kind of normalization, albeit crude in most cases. We applied lemmatization, a more sophisticated normalization method that uses a vocabulary and morphological analysis to reduce words to their base form, called lemma. It is best described by its most basic example, normalizing the verbs am, are, is to be, although such terms are not important for the purpose of sentiment analysis. Additionally, uppercase and lowercase words were grouped as well.
Emoticons and Emojis Emoticons are textual portrayals of a writer's mood or facial expressions, such as :-) and :-D (i.e., smiley face). For sentiment analysis, they are crucial in determining the sentiment of a tweet and should be retained within the analysis. Emoticons that convey a positive sentiment, such as :-), :-], or ;), were replaced with the positive placeholder word EM POS; in essence, grouping variations of positive emoticons with a common word. Emoticons conveying a negative sentiment, such as :-(, :c, or :-c, were replaced by the negative placeholder word EM NEG. A total of 47 different variations of positive and negative emoticons were replaced. A similar approach was performed with emojis that resemble a facial expression and convey a positive or negative sentiment. Emojis are graphical symbols that can represent an idea, concept or mood expression, such as the graphical icon of a happy face. A total of 40 emojis with positive and negative facial expressions were replaced by the placeholder word EM POS and EM NEG, respectively. Replacing and grouping the positive and negative emoticons and emojis will result in the sentiment classification algorithm learning an appropriate weight factor for the corresponding sentiment class. For example, tweets that have been labeled as conveying a negative sentiment (by a human annotator for instance) and predominantly containing negative emoticons (e.g., :-(), can result in the classification algorithm assigning a higher probability or weight to the negative sentiment class for such emoticons. Note that this only holds when the neutral and positively labeled tweets do not predominantly contain negative emoticons; otherwise their is no discriminatory power behind them.
Numbers, punctuation, and slang Numbers and punctuation symbols were removed, as they typically convey no specific sentiment. Numbers that were used to replace characters or syllables of words were retained, such in the case of see you l8er. We chose not to convert slang and abbreviations to their full word expressions, such as brb for be right back or ICYMI for in case you missed it. The machine learning model, described later, would correctly handle most common uses of slang, with the condition that they are part of the training data. As a result, slang that is indicative of a specific sentiment class (e.g. positive or negative) would be assigned appropriate weights or probabilities during model creation.
Input features Each tweet was tokenized, the process of obtaining individual words from sentences. Furthermore, we represented tweets as count vectors with and without inverse document frequency (IDF) weighting [1]. Different variations of tokenization were explored, such as 1-word (unigram), 2-word (bigrams), 3-word (trigrams), and 4-word (n-gram) combinations. Bi-grams are especially important to capture negation of words combinations, such as not good or not great, that would not be captured when using 1-word (unigram) features alone.
Text S2 Descriptions of the seven datasets used to train the sentiment classifier Sanders The Sanders dataset consists out of 5,513 hand classified tweets related to the topics Apple (@Apple), Google (#Google), Microsoft (#Microsoft), and Twitter (#Twitter). Tweets were classified as positive, neutral, negative, or irrelevant; the latter referring to non-English tweets which we discarded. The Sanders dataset has been used for boosting Twitter sentiment classification using different sentiment dimensions [2], combining automatically and hand-labeled twitter sentiment labels [3], and combining community detection and sentiment analysis [4]. The dataset is available from http://www.sananalytics.com/lab/.  [5,6]. A minimum of three independent annotators rated the tweets as positive, negative, mixed, or other. Mixed tweets captured both negative and positive components. Other tweets contained non-evaluative statements or questions. We only included the positive and negative tweets with at least two-thirds agreement between annotators ratings; mixed and other tweets were discarded. The OMD dataset has been used for sentiment classification by social relations [7], polarity classification [8], and sentiment classification utilizing semantic concept features [9]. The dataset is available from https://bitbucket.org/speriosu/updown.

Stanford Test
The Stanford Test dataset contains 182 positive, 139 neutral, and 177 negative annotated tweets [10]. The tweets were labeled by a human annotator and were retrieved by querying the Twitter search API with randomly chosen queries related to consumer products, company names and people. The Stanford Training dataset, in contrast to the Stanford Test dataset, contains 1.6 million labeled tweets. However, the 1.6 million tweets were automatically labeled, thus without a human annotator, by looking at the presence of emoticons. For example, tweets that contained the positive emoticon :-) would be assigned a positive label, regardless of the remaining content of the tweet. Similarly, tweets that contained the negative emoticon :-( would be assigned a negative label. Such an approach is highly biased [11] and we choose not to include this dataset for the purpose of creating a sentiment classifier from labeled tweets. The Stanford Test dataset, although relatively small, has been used to analyze and represent the semantic content of a sentence for purposes of classification or generation [12], semantic smoothing to alleviate data sparseness problem for sentiment analysis [13], and sentiment detection of biased and noisy tweets [14]. The dataset is available from http://www.sentiment140.com/. Health Care Reform (HCR) The Health Care Reform (HCR) dataset was created in 2010around the time the health care bill was signed in the United States -by extracting tweets with the hashtag #hcr [8]. The tweets were manually annotated by the authors by assigning the labels positive, negative, neutral, unsure, or irrelevant. The dataset was split into training, development and test data. We combined the three different datasets that contained a total of 537 positive, 337 neutral, and 886 negative tweets. The tweets labeled as irrelevant or unsure were not included. The HCR dataset was used to improve sentiment analysis by adding semantic features to tweets [9]. The dataset is available from https://bitbucket.org/speriosu/updown.

SemEval-2016
The Semantic Analysis in Twitter Task 2016 dataset, also known as SemEval-2016 Task 4, was created for various sentiment classification tasks. The tasks can be seen as challenges where teams can compete amongst a number of sub-tasks, such as classifying tweets into positive, negative and neutral sentiment, or estimating distributions of sentiment classes. Typically, teams with better classification accuracy or other performance measure rank higher. The dataset consist of training, development, and development-test data that combined consist of 3,918 positive, 2,736 neutral, and 1,208 negative tweets. The original dataset contained a total of 10,000 tweets -100 tweets from 100 topics. Each tweet was labeled by 5 human annotators and only tweets for which 3 out of 5 annotators agreed on their sentiment label were considered. For a full description of the dataset and annotation process see [15]. The dataset is available from http://alt.qcri.org/semeval2016/task4/.
Sentiment Strength (SS) The Sentiment Strength (SS) dataset was used to detect the strength of sentiments expressed in social web texts, such as tweets, for the sentiment strength detection program SentiStrength [16]. The dataset was labeled by human annotators and each tweet was rated on a scale from 1 to 5 for both positive and negative sentiment, i.e. a dual positive-negative scale. For the purpose of this paper, we re-labeled the tweets into positive, negative and neutral tweets as follows. Tweets were considered positive if the positive score was at least 1.5 times larger than the negative score; a positive score of 4 and a negative score of 1 would result in a positive label. Tweets that have a negative score of 1.5 times larger than the positive score were considered negative. A similar score on the positive and negative scale would result in a neutral tweet, such when the positive score is 2 and the negative score 2. A similar re-labeling process was performed by [11]. A total of 1,252 positive, 1,952 neutral, and 861 negative tweets were used. SentiStrength has been used to quantify and statistically validate trading assets from social media data [17], and analyzing emotional expressions and social norms in online chat communities [18]. The dataset is available from http://sentistrength.wlv.ac.uk/documentation/ CLARIN 13-Languages The CLARIN 13-languages dataset contains a total of 1.6 million labeled tweets from 13 different languages, the largest sentiment corpus made publicly available [19]. We used the English subset of the dataset since we restricted our analysis to English tweets. Tweets were collected in September 2013 by using the Twitter Streaming API to obtain a random sample of 1% of all publicly available tweets. The tweets were manually annotated by assigning a positive, neutral, or negative label by a total of 9 annotators; some tweets were labeled by more than 1 annotator or twice by the same annotator. For tweets with multiple annotations, only those with two-third agreement were kept.