Sunday 5 February 2017

Novice Alorithm design for Opinion Mining.

A Beginner’s guide to Opinion Mining.


Today any new comer to the world of Data Analysis will come across the term, “Sentiment Analysis”.
Question: What is Sentiment Analysis?
Definition: Analysis of the customer/user feedback about a particular situation/product is called sentiment analysis.
Now, there are projects and papers that do this task with many advanced techniques, such as NLP.
I choose to do it using a very simple method, I basically wanted to do a semi-supervised, search based, supervised-updating algorithm.
The pseudo code of the algorithm,
1. Initialise:
    negative[n] = [bad, worse, worst, fucked, shit]
    positive[n] = [good, awesome, better, best, cool]
2. Input Reviews.txt
3. Read Reviews.txt
4. Clean Reviews.txt(remove special characters and punctuation)
5. Search Review.txt
6. For each hit of word from negative[n], we add a -1 to the score. 
7. For each hit of word from positive[n], we add a +1 to the score.
The code for this is:
#List of words on the analysis occurs.
Good = ['nice','great','good','awesome', 'growth', 'bought', 'buy', ]
Bad = ['jerk','hate', 'change', 'privacy', 'problem', 'apple']


#Opening text file containing twitters
file = open("Reviews.txt", "r").read().split(' ')


print file

words = file

text = [word.strip(",.") for line in words for word in line.lower().split()]

postivity = 0
negativity = 0
no_significance = 0

for word in words:
    if(word in Good):
        print "found "+str(word)
        postivity = postivity + 1
        print "++"
    if(word in Bad):
        print "found "+str(word)
        negativity = negativity + -1
        print "--"
    print "\n"

print "\nthe input text has a positivity rating of : "+str(postivity)
print "\nthe input text has a negativity rating of : "+str(negativity)
print "\nUseless words: "+str(no_significance)

total = postivity + negativity + no_significance

print "\nTotal Score: "+str(total)


if postivity > negativity:
        print "\nRecommended Product"
else:
        print("\nNot recommended")

That is it, that is what I did.
The algorithm in itself is not that cool.
And you all know that, I always want the cool :D
To make it a little awesome, I used the word-severity based scoring.
So, in this case, instead of scoring all the words from negative[n] and positive[n] as and respectively, we do the give varied scores on the basis of the severity of the word.
Writing the words of negative[n] in the order of increasing severity, we get,
bad worse worst shit fucked,
the scores will be,
bad =
worse =
worst =
shit = and
fucked =
Now, we do the same for the words in positive[n], the scores are,
good =
awesome =
better =
best =
cool =
The implementation of this part is still under development.

Cheers!

No comments:

Post a Comment