Discuss Natural Language Processing for the Social Sciences.

1. A positive and negative word list (Hu and Bing 2004) exists on CourseWorks (Files à hw à hw2 à negative-words.txt and positive-words.txt). Please create a function called gen_senti that Tokenizes arbitrary text and compares each token with the positive and negative lexicons of each dictionary and outputs the sentiment score, S. Positive and negative words, pw and nw, count as a score of 1 and -1 respectively for each word matched. The total count for pw and nw are pc and nc, respectively. Each message sentiment, S, is normalized between -1 and 1. Any text that does not any positive AND negative words would have to be ignored, and not scored.

For example: Let us say the following sentence was an input into the function “The darkest hour is among us in this time of gloom, however, we will prevail!”. Let’s pretend the negative words were darkest and gloom and positive words were prevail
S = (-1 + -1 + 1) / 3 = -1/3 = -0.3333
2. Using the dataframe from lecture, the_data, column body, apply this function to each corpus and add a column called “simple_senti” (15 points)
3. Using vaderSentiment, apply the “compound” value of sentiment for each corpus in column body on a new column of the_data called “vader” (15 points)
4. Compute the mean, median and standard_deviations of both sentiment measures, “simple_senti” and “vader” (10 points)

DATA : https://drive.google.com/file/d/1CzGp8nNJsgl1BLSbYFQB8GRG3r_KgQTb/view?usp=sharing

Last Completed Projects

topic title academic level Writer delivered