深夜趕工：CNN神經(jīng)網(wǎng)絡(luò)做彩色圖像識(shí)別，用以測(cè)試天價(jià)核彈

kelvinlee 發(fā)布于2019-07-30 14:32 / 2701人閱讀

摘要：此服務(wù)器搭載了塊顯卡，是目前頂級(jí)的深度學(xué)習(xí)計(jì)算卡，單卡售價(jià)萬(wàn)，整機(jī)售價(jià)接近萬(wàn)，天價(jià)核彈，有錢真好。此神經(jīng)網(wǎng)絡(luò)參考了的圖像識(shí)別項(xiàng)目，采用了模型，增加了函數(shù)以擴(kuò)充數(shù)據(jù)集。在天價(jià)核彈上會(huì)是個(gè)什么情況呢明天去試試看咯

在圖像識(shí)別的道路越走越遠(yuǎn)?( ?? ω ?? )y

1.解釋一下

深夜腦子不是很清楚，大部分代碼參考了github……
此CNN圖像識(shí)別神經(jīng)網(wǎng)絡(luò)的用途是之后用來(lái)評(píng)估NVIDIA-DGX服務(wù)器的性能，因此盡量擴(kuò)大網(wǎng)絡(luò)的訓(xùn)練時(shí)間。
此服務(wù)器搭載了8塊NVIDIA TESLA V100顯卡，是目前頂級(jí)的深度學(xué)習(xí)計(jì)算卡，單卡售價(jià)102萬(wàn)RMB，整機(jī)售價(jià)接近1000萬(wàn)，天價(jià)核彈，有錢真好。根據(jù)網(wǎng)上的信息，此服務(wù)器可在8小時(shí)內(nèi)完成titanX 8天的工作量，頂級(jí)民用cpu數(shù)個(gè)月工作量。

此神經(jīng)網(wǎng)絡(luò)參考了GITHUB的圖像識(shí)別項(xiàng)目，采用了DenseNet模型，增加了ImageDataGenerator函數(shù)以擴(kuò)充數(shù)據(jù)集。打算后續(xù)通過(guò)改變常量epoch的值在各個(gè)平臺(tái)進(jìn)行運(yùn)算。

由于深夜倉(cāng)促，尚未完成GPU的配置，因此把epoch設(shè)置為1先在CPU上跑跑試試，通過(guò)經(jīng)驗(yàn)估計(jì)在GTX1080上所需的時(shí)間。

2.數(shù)據(jù)集說(shuō)明
該訓(xùn)練采用cifar10數(shù)據(jù)集，包含60000張32x32像素的彩色圖片，這些圖片分屬不同的類別，如圖所示：

具體說(shuō)明參考多倫多大學(xué)官網(wǎng)：http://www.cs.toronto.edu/~kr...

此網(wǎng)絡(luò)的目的是盡量精確地通過(guò)圖像識(shí)別將圖片分類到自己所屬類別當(dāng)中。

下載數(shù)據(jù)集后直接改名后放入user.kerasdatasets文件夾中：

解壓后可發(fā)現(xiàn)，數(shù)據(jù)集分成6個(gè)batch，其中5個(gè)為訓(xùn)練集，1個(gè)為測(cè)試集：

3.深夜倉(cāng)促，直接上代碼：

導(dǎo)入第三方庫(kù)（numpy/keras/math）：

import numpy as np
import keras
import math
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers import Conv2D, Dense, Input, add, Activation, AveragePooling2D, GlobalAveragePooling2D
from keras.layers import Lambda, concatenate
from keras.initializers import he_normal
from keras.layers.merge import Concatenate
from keras.callbacks import LearningRateScheduler, TensorBoard, ModelCheckpoint
from keras.models import Model
from keras import optimizers
from keras import regularizers
from keras.utils.vis_utils import plot_model as plot

設(shè)置常量：

growth_rate        = 12 
depth              = 100
compression        = 0.5

img_rows, img_cols = 32, 32           #圖片尺寸
img_channels       = 3                #圖片色彩通道數(shù)，RGB
num_classes        = 10               #數(shù)據(jù)集類別數(shù)量
batch_size         = 64               #訓(xùn)練batch所包含的example數(shù)量，只能是64或者32
epochs             = 1                #全數(shù)據(jù)集迭代次數(shù)，這里打算用cpu運(yùn)算一次。
                                      #根據(jù)測(cè)試的顯卡和自己的要求改epoch數(shù)量
                                      #當(dāng)epoch數(shù)量為250時(shí)識(shí)別效果較好，但這里不考慮效果

iterations         = 782              #每一次epoch的步數(shù)
weight_decay       = 0.0001

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

根迭代次數(shù)改變scheduler，越迭代到后面該值越小，這意味著希望訓(xùn)練過(guò)程中隨機(jī)因素逐步減小：

def scheduler(epoch):
    if epoch <= 100:
       return 0.1
    if epoch <= 180:
       return 0.01
    return 0.0005

定義一個(gè)DenseNet模型（github搬運(yùn)工上線！）：

def densenet(img_input,classes_num):

    def bn_relu(x):
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
        return x

    def bottleneck(x):
        channels = growth_rate * 4
        x = bn_relu(x)
        x = Conv2D(channels,kernel_size=(1,1),strides=(1,1),padding="same",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding="same",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def single(x):
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding="same",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def transition(x, inchannels):
        x = bn_relu(x)
        x = Conv2D(int(inchannels * compression),kernel_size=(1,1),strides=(1,1),padding="same",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = AveragePooling2D((2,2), strides=(2, 2))(x)
        return x

    def dense_block(x,blocks,nchannels):
        concat = x
        for i in range(blocks):
            x = bottleneck(concat)
            concat = concatenate([x,concat], axis=-1)
            nchannels += growth_rate
        return concat, nchannels

    def dense_layer(x):
        return Dense(classes_num,activation="softmax",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay))(x)


    # nblocks = (depth - 4) // 3 
    nblocks = (depth - 4) // 6 
    nchannels = growth_rate * 2

    x = Conv2D(nchannels,kernel_size=(3,3),strides=(1,1),padding="same",kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(img_input)

    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = bn_relu(x)
    x = GlobalAveragePooling2D()(x)
    x = dense_layer(x)
    return x

載入數(shù)據(jù)集，并對(duì)標(biāo)簽進(jìn)行矩陣設(shè)置，改變數(shù)據(jù)集數(shù)據(jù)類型：

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test  = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train.astype("float32")
x_test  = x_test.astype("float32")

將數(shù)據(jù)集歸一化，方便訓(xùn)練：

for i in range(3):
    x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
    x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]

定義模型并打印簡(jiǎn)圖，shell中打印的模型圖太長(zhǎng)了，就不貼了，長(zhǎng)得一逼，需要看的話直接在shell中print summary就可以：

img_input = Input(shape=(img_rows,img_cols,img_channels))
output    = densenet(img_input,num_classes)
model     = Model(img_input, output)
# model.load_weights("ckpt.h5")
print(model.summary())
plot(model, to_file="cnn_model.png",show_shapes=True)

這個(gè)模型的參數(shù)情況如下圖所示。圖像識(shí)別的問(wèn)題就是這點(diǎn)麻煩，參數(shù)太多了，大批求導(dǎo)，怪不得天價(jià)核彈這么貴還這么有市場(chǎng)：

本質(zhì)上還是一個(gè)分類問(wèn)題，使用交叉熵作為損失函數(shù)，定義輸出結(jié)果的好壞：

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])

設(shè)定回饋：

tb_cb     = TensorBoard(log_dir="./densenet/", histogram_freq=0)
change_lr = LearningRateScheduler(scheduler)
ckpt      = ModelCheckpoint("./ckpt.h5", save_best_only=False, mode="auto", period=10)
cbks      = [change_lr,tb_cb,ckpt]

添加上數(shù)據(jù)集擴(kuò)充功能，對(duì)圖像做一些彈性變換，比如水平翻轉(zhuǎn)，垂直翻轉(zhuǎn)，旋轉(zhuǎn)：

print("Using real-time data augmentation.")
datagen   = ImageDataGenerator(horizontal_flip=True,width_shift_range=0.125,height_shift_range=0.125,fill_mode="constant",cval=0.)

datagen.fit(x_train)

訓(xùn)練模型：

model.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size), steps_per_epoch=iterations, epochs=epochs, callbacks=cbks,validation_data=(x_test, y_test))
model.save("densenet.h5")

訓(xùn)練過(guò)程cpu（i7-7820hk）滿載：

在cpu上進(jìn)行一次訓(xùn)練需要將近10000秒：

根據(jù)之前手寫數(shù)字文本識(shí)別模型的經(jīng)驗(yàn)（cpu需要12秒，gtx1080只需要0.47秒，gpu是cpu性能的25.72倍），把本程序的epoch改到2500，則gtx1080需要大概270小時(shí)。

在v100天價(jià)核彈上會(huì)是個(gè)什么情況呢？明天去試試看咯！