TensorFlow學習筆記（3）：邏輯回歸

wendux 發布于2019-07-25 11:19 / 2631人閱讀

摘要：前言本文使用訓練邏輯回歸模型，并將其與做比較。對數極大似然估計方法的目標函數是最大化所有樣本的發生概率機器學習習慣將目標函數稱為損失，所以將損失定義為對數似然的相反數，以轉化為極小值問題。

前言

本文使用tensorflow訓練邏輯回歸模型，并將其與scikit-learn做比較。數據集來自Andrew Ng的網上公開課程Deep Learning

代碼

#!/usr/bin/env python
# -*- coding=utf-8 -*-
# @author: 陳水平
# @date: 2017-01-04
# @description: compare the logistics regression of tensorflow with sklearn based on the exercise of deep learning course of Andrew Ng.
# @ref: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html

import tensorflow as tf
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# Read x and y
x_data = np.loadtxt("ex4x.dat").astype(np.float32)
y_data = np.loadtxt("ex4y.dat").astype(np.float32)

scaler = preprocessing.StandardScaler().fit(x_data)
x_data_standard = scaler.transform(x_data)

# We evaluate the x and y by sklearn to get a sense of the coefficients.
reg = LogisticRegression(C=999999999, solver="newton-cg")  # Set C as a large positive number to minimize the regularization effect
reg.fit(x_data, y_data)
print "Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_)

# Now we use tensorflow to get similar results.
W = tf.Variable(tf.zeros([2, 1]))
b = tf.Variable(tf.zeros([1, 1]))
y = 1 / (1 + tf.exp(-tf.matmul(x_data_standard, W) + b))
loss = tf.reduce_mean(- y_data.reshape(-1, 1) *  tf.log(y) - (1 - y_data.reshape(-1, 1)) * tf.log(1 - y))

optimizer = tf.train.GradientDescentOptimizer(1.3)
train = optimizer.minimize(loss)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)
for step in range(100):
    sess.run(train)
    if step % 10 == 0:
        print step, sess.run(W).flatten(), sess.run(b).flatten()

print "Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten())
print "Coefficients of tensorflow (raw input): K=%s, b=%s" % (sess.run(W).flatten() / scaler.scale_, sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W)))


# Problem solved and we are happy. But...
# I"d like to implement the logistic regression from a multi-class viewpoint instead of binary.
# In machine learning domain, it is called softmax regression
# In economic and statistics domain, it is called multinomial logit (MNL) model, proposed by Daniel McFadden, who shared the 2000  Nobel Memorial Prize in Economic Sciences.

print "------------------------------------------------"
print "We solve this binary classification problem again from the viewpoint of multinomial classification"
print "------------------------------------------------"

# As a tradition, sklearn first
reg = LogisticRegression(C=9999999999, solver="newton-cg", multi_class="multinomial")
reg.fit(x_data, y_data)
print "Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_)
print "A little bit difference at first glance. What about multiply them with 2?"

# Then try tensorflow
W = tf.Variable(tf.zeros([2, 2]))  # first 2 is feature number, second 2 is class number
b = tf.Variable(tf.zeros([1, 2]))
V = tf.matmul(x_data_standard, W) + b
y = tf.nn.softmax(V)  # tensorflow provide a utility function to calculate the probability of observer n choose alternative i, you can replace it with `y = tf.exp(V) / tf.reduce_sum(tf.exp(V), keep_dims=True, reduction_indices=[1])`

# Encode the y label in one-hot manner
lb = preprocessing.LabelBinarizer()
lb.fit(y_data)
y_data_trans = lb.transform(y_data)
y_data_trans = np.concatenate((1 - y_data_trans, y_data_trans), axis=1)  # Only necessary for binary class 

loss = tf.reduce_mean(-tf.reduce_sum(y_data_trans * tf.log(y), reduction_indices=[1]))
optimizer = tf.train.GradientDescentOptimizer(1.3)
train = optimizer.minimize(loss)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)
for step in range(100):
    sess.run(train)
    if step % 10 == 0:
        print step, sess.run(W).flatten(), sess.run(b).flatten()

print "Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten())
print "Coefficients of tensorflow (raw input): K=%s, b=%s" % ((sess.run(W) / scaler.scale_).flatten(),  sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W)))

輸出如下：

Coefficients of sklearn: K=[[ 0.14834077  0.15890845]], b=-16.378743
0 [ 0.33699557  0.34786162] [ -4.84287721e-09]
10 [ 1.15830743  1.22841871] [ 0.02142336]
20 [ 1.3378191   1.42655993] [ 0.03946959]
30 [ 1.40735555  1.50197577] [ 0.04853692]
40 [ 1.43754184  1.53418231] [ 0.05283691]
50 [ 1.45117068  1.54856908] [ 0.05484771]
60 [ 1.45742035  1.55512536] [ 0.05578374]
70 [ 1.46030474  1.55814099] [ 0.05621871]
80 [ 1.46163988  1.55953443] [ 0.05642065]
90 [ 1.46225858  1.56017959] [ 0.0565144]
Coefficients of tensorflow (input should be standardized): K=[ 1.46252561  1.56045783], b=[ 0.05655487]
Coefficients of tensorflow (raw input): K=[ 0.14831361  0.15888004], b=[-16.26265144]
------------------------------------------------
We solve this binary classification problem again from the viewpoint of multinomial classification
------------------------------------------------
Coefficients of sklearn: K=[[ 0.07417039  0.07945423]], b=-8.189372
A little bit difference at first glance. What about multiply them with 2?
0 [-0.33699557  0.33699557 -0.34786162  0.34786162] [  6.05359674e-09  -6.05359674e-09]
10 [-0.68416572  0.68416572 -0.72988117  0.72988123] [ 0.02157043 -0.02157041]
20 [-0.72234094  0.72234106 -0.77087188  0.77087194] [ 0.02693938 -0.02693932]
30 [-0.72958517  0.72958535 -0.7784785   0.77847856] [ 0.02802362 -0.02802352]
40 [-0.73103166  0.73103184 -0.77998811  0.77998811] [ 0.02824244 -0.02824241]
50 [-0.73132294  0.73132324 -0.78029168  0.78029174] [ 0.02828659 -0.02828649]
60 [-0.73138171  0.73138207 -0.78035289  0.78035301] [ 0.02829553 -0.02829544]
70 [-0.73139352  0.73139393 -0.78036523  0.78036535] [ 0.02829732 -0.0282972 ]
80 [-0.73139596  0.73139632 -0.78036767  0.78036791] [ 0.02829764 -0.02829755]
90 [-0.73139644  0.73139679 -0.78036815  0.78036839] [ 0.02829781 -0.02829765]
Coefficients of tensorflow (input should be standardized): K=[-0.7313965   0.73139679 -0.78036827  0.78036839], b=[ 0.02829777 -0.02829769]
Coefficients of tensorflow (raw input): K=[-0.07417037  0.07446811 -0.07913655  0.07945422], b=[ 8.1893692  -8.18937111]

思考

對于邏輯回歸，損失函數比線性回歸模型復雜了一些。首先需要通過sigmoid函數，將線性回歸的結果轉化為0至1之間的概率值。然后寫出每個樣本的發生概率（似然），那么所有樣本的發生概率就是每個樣本發生概率的乘積。為了求導方便，我們對所有樣本的發生概率取對數，保持其單調性的同時，可以將連乘變為求和（加法的求導公式比乘法的求導公式簡單很多）。對數極大似然估計方法的目標函數是最大化所有樣本的發生概率；機器學習習慣將目標函數稱為損失，所以將損失定義為對數似然的相反數，以轉化為極小值問題。

我們提到邏輯回歸時，一般指的是二分類問題；然而這套思想是可以很輕松就拓展為多分類問題的，在機器學習領域一般稱為softmax回歸模型。本文的作者是統計學與計量經濟學背景，因此一般將其稱為MNL模型。

GPU云服務器云服務器邏輯回歸邏輯回歸二分類機器學習回歸機器學習回歸算法

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/38335.html

ApacheCN 人工智能知識樹 v1.0

摘要：貢獻者飛龍版本最近總是有人問我，把這些資料看完一遍要用多長時間，如果你一本書一本書看的話，的確要用很長時間。為了方便大家，我就把每本書的章節拆開，再按照知識點合并，手動整理了這個知識樹。 Special Sponsors showImg(https://segmentfault.com/img/remote/1460000018907426?w=1760&h=200); 貢獻者：飛龍版...

劉厚水 2019-06-26 19:00 評論0 收藏0
深度學習-初識

摘要：深度學習這幾年很火，所以，從今天起涉足深度學習，為未來學習，注本博文為慕課課程學習筆記。用完后，可以通過發出以下命令來停用此環境提示符將恢復為您的默認提示符由所定義。本機器激活命令使用安裝多層神經網絡的實戰神經元的實現深度學習這幾年很火，所以，從今天起涉足深度學習，為未來學習，注本博文為慕課課程學習筆記。一、入門基本概念機器學習簡介機器學習：無序數據轉化為價值的方法機器學習價值...

jerry 2019-07-30 17:32 評論0 收藏0
TensorFlow學習筆記（2）：多元線性回歸

摘要：前言本文使用訓練多元線性回歸模型，并將其與做比較。在這個例子中，變量一個是面積，一個是房間數，量級相差很大，如果不歸一化，面積在目標函數和梯度中就會占據主導地位，導致收斂極慢。前言本文使用tensorflow訓練多元線性回歸模型，并將其與scikit-learn做比較。數據集來自Andrew Ng的網上公開課程Deep Learning 代碼 #!/usr/bin/env pyth...

ky0ncheng 2019-07-25 11:17 評論0 收藏0