A Beginner’s guide to Opinion Mining.
Today any new comer to the world of Data Analysis will come across the term, “Sentiment Analysis”.
Question: What is Sentiment Analysis?
Definition: Analysis of the customer/user feedback about a particular situation/product is called sentiment analysis.
Now, there are projects and papers that do this task with many advanced techniques, such as NLP.
I choose to do it using a very simple method, I basically wanted to do a semi-supervised, search based, supervised-updating algorithm.
The pseudo code of the algorithm,
1. Initialise:
negative[n] = [bad, worse, worst, fucked, shit]
positive[n] = [good, awesome, better, best, cool]
2. Input Reviews.txt
3. Read Reviews.txt
4. Clean Reviews.txt(remove special characters and punctuation)
5. Search Review.txt
6. For each hit of word from negative[n], we add a -1 to the score.
7. For each hit of word from positive[n], we add a +1 to the score.
The code for this is:#List of words on the analysis occurs.
Good = ['nice','great','good','awesome', 'growth', 'bought', 'buy', ]
Bad = ['jerk','hate', 'change', 'privacy', 'problem', 'apple']
#Opening text file containing twitters
file = open("Reviews.txt", "r").read().split(' ')
print file
words = file
text = [word.strip(",.") for line in words for word in line.lower().split()]
postivity = 0
negativity = 0
no_significance = 0
for word in words:
if(word in Good):
print "found "+str(word)
postivity = postivity + 1
print "++"
if(word in Bad):
print "found "+str(word)
negativity = negativity + -1
print "--"
print "\n"
print "\nthe input text has a positivity rating of : "+str(postivity)
print "\nthe input text has a negativity rating of : "+str(negativity)
print "\nUseless words: "+str(no_significance)
total = postivity + negativity + no_significance
print "\nTotal Score: "+str(total)
if postivity > negativity:
print "\nRecommended Product"
else:
print("\nNot recommended")
That is it, that is what I did. The algorithm in itself is not that cool.
And you all know that, I always want the cool :D
To make it a little awesome, I used the word-severity based scoring.
So, in this case, instead of scoring all the words from
negative[n]
and positive[n]
as and respectively, we do the give varied scores on the basis of the severity of the word. Writing the words of
negative[n]
in the order of increasing severity, we get,bad
worse
worst
shit
fucked
, the scores will be,
bad
= worse
= worst
= shit
= and fucked
= Now, we do the same for the words in
positive[n]
, the scores are, good
= awesome
= better
= best
= cool
= The implementation of this part is still under development.
Cheers!