Friday 5 October 2018

Logistic regression using Tensorflow

Logistic regression using Tensorflow


This blog will be a more code driven post rather than the usual blogs of mine where the content is more of complex mathematical symbols and ideas.

The baseline of the code is taken from Kaggle - Logistic Regression with Tensorflow. I have merely made some basic changes and made the code more understandable.

Today we will discuss how to perform Logistic regression using Tensorflow.


Before we move forward, [1]Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Linear and logistic regression are easy to confuse, and that’s because they are very similar. Linear regression outputs a continuous value eg: 5.2386, 108934.5 and so on, so it is used to predict things like the price of a house, or life expectancy (this is a REGRESSION algorithm). Logistic regression on the other hand is used to predict which category something is in eg: 1, 0, 1, 1 (where 1 and 0 can represent Iris-setosa and Iris-versicolor) (the actual output is a continuous value between 1 and 0, but it chooses which one it is closest to). Contrary to the name logistic regression, this is a CLASSIFICATION algorithm, so it classifies which category the item is in. On a more technical note, the difference in execution between the algorithms is the addition of the activation function to linear regression to turn it into logistic regression. This activation function can be ReLU, sigmoid, tanh and so on (this example uses sigmoid). The activation function basically takes the regression output of linear regression and turns it into a value between 1 and 0, and 1 or 0 are chosen based on which the output is closer to.


I will add some input with each line of the code fore your better understanding.

This will make your plot outputs appear and be stored within the notebook.

%matplotlib inline
Importing all the necessary modules,
import numpy as np # linear algebra
import seaborn as sns
sns.set(style='whitegrid')
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import tensorflow as tf

Importing the data (Can be downloaded from here)

iris = pd.read_csv('../iris.csv')

Viewing and analyzing the dataset,

iris.shape

(150, 6)

Because I want to do a binary classification, I am choosing the first 100 rows. Why? Have a look at the dataset and find out.

iris = iris[:100]
iris.shape
iris.head()
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

Replace ‘Iris-setosa’ as 0 and replace ‘Iris-versicolor’ as 1

iris.Species = iris.Species.replace(to_replace=['Iris-setosa', 'Iris-versicolor'], value=[0, 1])
plt.scatter(iris[:50].SepalLengthCm, iris[:50].SepalWidthCm, label='Iris-setosa')
plt.scatter(iris[51:].SepalLengthCm, iris[51:].SepalWidthCm, label='Iris-versicolo')
plt.xlabel('SepalLength')
plt.ylabel('SepalWidth')
plt.legend(loc='best')
<matplotlib.legend.Legend at 0x1988f3afe10>

png

X = iris.drop(labels=['Id', 'Species'], axis=1).values
y = iris.Species.values
seed = 5
np.random.seed(seed)
tf.set_random_seed(seed)

Split data trainset: 80% and testset: 20%

train_index = np.random.choice(len(X), round(len(X) * 0.8), replace=False)
test_index = np.array(list(set(range(len(X))) - set(train_index)))
train_X = X[train_index]
train_y = y[train_index]
test_X = X[test_index]
test_y = y[test_index]
def min_max_normalized(data):
    col_max = np.max(data, axis=0)
    col_min = np.min(data, axis=0)
    return np.divide(data - col_min, col_max - col_min)
# Normalized processing, must be placed after the data set segmentation, 
# otherwise the test set will be affected by the training set
train_X = min_max_normalized(train_X)
test_X = min_max_normalized(test_X)
# Begin building the model framework
# Declare the variables that need to be learned and initialization
# There are 4 features here, A's dimension is (4, 1)
A = tf.Variable(tf.random_normal(shape=[4, 1]))
b = tf.Variable(tf.random_normal(shape=[1, 1]))
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
data = tf.placeholder(dtype=tf.float32, shape=[None, 4])
target = tf.placeholder(dtype=tf.float32, shape=[None, 1])
# Declare the model you need to learn
mod = tf.matmul(data, A) + b
# Declare loss function
# Use the sigmoid cross-entropy loss function,
# first doing a sigmoid on the model result and then using the cross-entropy loss function
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=mod, labels=target))
# Define the learning rate, batch_size etc.
learning_rate = 0.003
batch_size = 30
iter_num = 1500
# Define the optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# Define the goal
goal = opt.minimize(loss)
# Define the accuracy
# The default threshold is 0.5, rounded off directly
prediction = tf.round(tf.sigmoid(mod))
# Bool into float32 type
correct = tf.cast(tf.equal(prediction, target), dtype=tf.float32)
# Average
accuracy = tf.reduce_mean(correct)
# End of the definition of the model framework
# Start training model
# Define the variable that stores the result
loss_trace = []
train_acc = []
test_acc = []
for epoch in range(iter_num):
    # Generate random batch index
    batch_index = np.random.choice(len(train_X), size=batch_size)
    batch_train_X = train_X[batch_index]
    batch_train_y = np.matrix(train_y[batch_index]).T
    sess.run(goal, feed_dict={data: batch_train_X, target: batch_train_y})
    temp_loss = sess.run(loss, feed_dict={data: batch_train_X, target: batch_train_y})
    # convert into a matrix, and the shape of the placeholder to correspond
    temp_train_acc = sess.run(accuracy, feed_dict={data: train_X, target: np.matrix(train_y).T})
    temp_test_acc = sess.run(accuracy, feed_dict={data: test_X, target: np.matrix(test_y).T})
    # recode the result
    loss_trace.append(temp_loss)
    train_acc.append(temp_train_acc)
    test_acc.append(temp_test_acc)
    # output
    if (epoch + 1) % 300 == 0:
        print('epoch: {:4d} loss: {:5f} train_acc: {:5f} test_acc: {:5f}'.format(epoch + 1, temp_loss,
                                                                          temp_train_acc, temp_test_acc))
epoch:  300 loss: 0.646475 train_acc: 0.462500 test_acc: 0.650000
epoch:  600 loss: 0.545493 train_acc: 0.462500 test_acc: 0.650000
epoch:  900 loss: 0.446472 train_acc: 0.775000 test_acc: 0.950000
epoch: 1200 loss: 0.477945 train_acc: 0.975000 test_acc: 1.000000
epoch: 1500 loss: 0.406994 train_acc: 1.000000 test_acc: 1.000000

Read about loss functions here

# Visualization of the results
# loss function
plt.plot(loss_trace)
plt.title('Cross Entropy Loss')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.show()

png

# accuracy
plt.plot(train_acc, 'b-', label='train accuracy')
plt.plot(test_acc, 'k-', label='test accuracy')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Train and Test Accuracy')
plt.legend(loc='best')
plt.show()

png

Cheers!

Sunday 9 September 2018

Catalan's Conjecture - A learning exercise for the bored mind - Contd.

Before we move to Mihailescu’s proof, let us cover some more mathematical concepts (so that we feel superior to the people around us, just kidding :smile:),

There is an interesting theorem that Mihailescu uses in his proof, called the Stickelberger’s theorem,


Stickelberger’s theorem:

This is a result of algebraic number theory, which gives more information about Galois Module structure of class groups of Cyclotomic Fields.
This theorem consists of Stickelberger’s element and Stickelberger’s ideal.
I will now state the complete definition of the theorem and then visit its corners as we move along,

Let denote the -th cyclotomic field . It is a Galois extension of with Galois Group isomorphic to the multiplicative group of integers modulo .

The Stickelberger element (of level or of ) is an element in thr group ring and the Stickelberger ideal is an ideal in the group ring . (Note: ).

The definition of both the Stickelberger element and ideal are, let denote a primitive -th root of unity . The isomorphism from to is given by sending to by the relation
The Stickelberger element of level is given by,

The Stickelberger ideal of level is given by,


Inkeri, used the concept of Wiefrich pair (explain in the previous blogpost of this series) in the context of Catalan’s equation as follows:

A Wieferich pair is a pair of primes such that and

He showed that if the Catalan’s equation holds, then either is a Wieferich pair, or divides , the class number of cyclotomic field , or divides , the class number of cyclotomic field , there were other developments in this direction too.

Bugeaud and Hanrot proved a class number criterion concerning Catalan’s equation, which implies that the Catalan’s Equation ( ) has no solutions in non-zero integers and if and are primes such that one of them is smaller than 43. This was a huge achievement , I recommend you to have a look at the paper at [5].

Mihailescu proved that the Catalan equation has no solutions if and are odd and does not divide . By this result the Catalan conjecture became a theorem. And later Mihailescu succeeded in finding a more elegant proof of Catalan’s conjecture in the case where does divide . Thus, Catalan’s conjecture is a theorem with an algebraic proof in which no computer calculations
are needed.



In this section e wll discuss some breakthrough results by Mihailescu. The most important one is that divides .

The following lemma will be used for that, an element of a ring is called nilpotent if and integer such that .

Lemma 1: The ring does not contain nilpotent elements, if , satisfy the congruence

Theorem 1: For , the element is a in . We also have that divides and divides .

The proofs of the above Lemma and Theorem are beyond the scope of this blog; regardless to say that the pre-requisites are already covered in detail. For the interested, you can refer Catalan’s Conjecture - A cyclotomic field.

Cheers!

Tuesday 10 April 2018

Catalan's Conjecture - A learning exercise for the bored mind.

I was going through a video by Numberphile where they were talking about the Catalan’s conjecture.
This is another conjecture which is simple to state; however extremely difficult to prove; just like Collatz conjecture, about which I already have a blog in place.
At the very outset of the blog; let me bring it to your knowledge that this blog is just for learning and understanding and contains “very little”-to-“null” original work on the subject. However, this blog will be exhaustive and will contain a lot of references that will help a newbie (like myself) get into the depths of the conjecture and fully understand concepts used in its proof.
Let us now define the conjecture;
This statement was conjectured by Eugene Catalan (1814–1894) and was sent to the editor of Journal fur die Reine und Angewandte Mathematik.
The conjecture is as follows,
and are two powers of natural numbers whose values are consecutive (i.e., 8 and 9); the conjecture is for a mathematical statement such as,

The only solution of is for , and , is , , , .
Catalan’s conjecture was proven true by Preda Mihăilescu in 2002; the proof involves the theory of cyclotomic fields and Galois Modules.
So, as you see now, the breadth of the subject blew up! On second were talking about squares and cubes and now we are talking of Cyclotomic fields and Galois Modules!
Nevertheless, I will try and cover each topic in brief and quench our mathematical thirst!
Let us consider (with and ; for the sake of eliminating any confusion between and an english “a”), unless otherwise stated, and can be negative integers as well. Now, we re-write as follows,

The GCD of the two factors on the left hand side of the equation (after considering ) is either or (How did this happen?).

Some concepts before we move ahead:
The Wieferich pairs
In mathematics,
a Wieferich pair is a pair of prime number and that satisfy,

Let us write as, for,
This suggests a traditional approach of factorizing the left hand side
in , the ring of integers in the th cyclotomic
field

Ring:
A ring in the mathematical sense is a set together with two binary
operators and (Addition and multiplication), satisfying the
following conditions:
1. Additive associativity: For all , ,
2. Additive commutativity: For all , ,
3. Additive identity: There exists an element in such that for all a in , .
4. Additive inverse: For every a in there exists such that ,
5. Left and right distributivity: For all , and ,
6. Multiplicative associativity: For all , ( ring satisfying this property is sometimes
explicitly termed an associative ring). Conditions 1-5 are always
required. Though non-associative rings exist, virtually all texts also
require condition 6.
7. Multiplicative commutativity: For all , ( ring satisfying this property is termed a commutative ring),
8. Multiplicative identity: There exists an element such that for all , (a ring satisfying this
property is termed a unit ring, or sometimes a “ring with identity”),
9. Multiplicative inverse: For each , there exists an element such that , where is the identity element.

A brief history of the past developments on this conjecture is a must to be read and understood; the significance comes due to the period of 150 years for which it remained an open problem,
  • Only after six years after Catalan formally defined the conjecture, a result was proposed by French mathematician, Victor Lebesgue. He stated that, for the equation, ; where is a prime; has no solutions for positive values of and . A proof of the same will be discussed in brief in the later part of the blog.
  • After Lebesgue’s work, all development solely consisted of small exponents, and then Naggel showed in 1921 that the difference between a third power and an other perfect power never is equal to 1.
  • In 1932, Selberg proved that, has no solution in positive integers when . A stronger result to this was proved by Ko Cho in 1965, that stated that the equation has no solutions for positive integers when .
  • Cassels made some observations for where and are odd-primes. He proved that is this equality holds for positive integers and , then divides and divides . For the case , this had already been shown by Naggel
  • Inkeri defined the concept of a Wieferich pair [the definition and explanation of the same is given above] in the concept of Catalan equation as follows:
    If the Catalan’s equation holds, then either is a Wieferich Pair, or divides , the class number of the cyclotomic field , or divides , the class number of the cyclotomic field

Cyclotomic Field:
In number theory, a cyclotomic field is a number field obtained by adjoining a complex primitive root of unity to , the field of rational numbers. The -th cyclotomic field (where ) is obtained by adjoining a primitive -th root of
Primitive root of unity
In mathematics, a root of unity, occasionally called a de Moivre number, is any complex number that gives when raised to some positive integer power .

  • Some time later Mihailescu proved that the Catalan equation has no solutions if and are odd and does not divide . By this result the Catalan conjecture became a theorem


More on Cyclotomic Fields:
Let be an odd-prime number. Let be the -th cyclotomic polynomial in i.e., . Consider the field extension of , where denotes a primitive -th root of unity. This is a field extension of degree and it is reducible in . We denote by from now on.
This field extension is Galois with Galois Group,

Since the map,


is an isomorphism
The automorphism acts in all embeddings as complex conjugation. Therefore, we call complex conjugation.
The fixed field of complex conjugation is , which is called the maximal real subfield of . We denote by . The field extension of has degree and it is Galois with Galois theory.
Another important concept that is

Mihailescu’s proof

To be contd…in the next blogpost!