深度学习tensorflow实现宝可梦图像分类

一、数据集简介

二、数据预处理

三、构建卷积神经网络

四、模型训练

五、预测

六、分析与优化

一、数据集简介

宝可梦数据集（共1168张图像）：bulbasaur（妙蛙种子，234）、charmander（小火龙，238）、mewtwo（超梦，239）、pikachu（皮卡丘，234）、squirtle（杰尼龟，223）。

二、数据预处理

通过pokmon.py批量读取图像路径，根据不同路径生成每张图像的路径和标签并打乱顺序。

import os, globimport random, csvimport tensorflow as tfdef load_csv(root, filename, name2label): # root:数据集根目录 # filename:csv文件名 # name2label:类别名编码表 if not os.path.exists(os.path.join(root, filename)): images = [] for name in name2label.keys(): images += glob.glob(os.path.join(root, name, '*.png')) images += glob.glob(os.path.join(root, name, '*.jpg')) images += glob.glob(os.path.join(root, name, '*.jpeg')) print(len(images), images) random.shuffle(images) with open(os.path.join(root, filename), mode='w', newline='') as f: writer = csv.writer(f) for img in images: name = img.split(os.sep)[-2] label = name2label[name] writer.writerow([img, label]) print('written into csv file:', filename) images, labels = [], [] with open(os.path.join(root, filename)) as f: reader = csv.reader(f) for row in reader: img, label = row label = int(label) images.append(img) labels.append(label) assert len(images) == len(labels) return images, labelsdef load_pokemon(root, mode='train'): # 创建数字编码表 name2label = {} # "sq...":0 for name in sorted(os.listdir(os.path.join(root))): if not os.path.isdir(os.path.join(root, name)): continue # 给每个类别编码一个数字 name2label[name] = len(name2label.keys()) # 读取Label信息 # [file1,file2,], [3,1] images, labels = load_csv(root, 'images.csv', name2label) if mode == 'train': # 60% images = images[:int(0.6 * len(images))] labels = labels[:int(0.6 * len(labels))] elif mode == 'val': # 20% = 60%->80% images = images[int(0.6 * len(images)):int(0.8 * len(images))] labels = labels[int(0.6 * len(labels)):int(0.8 * len(labels))] else: # 20% = 80%->100% images = images[int(0.8 * len(images)):] labels = labels[int(0.8 * len(labels)):] return images, labels, name2labelimg_mean = tf.constant([0.485, 0.456, 0.406])img_std = tf.constant([0.229, 0.224, 0.225])def normalize(x, mean=img_mean, std=img_std): x = (x - mean)/std return xdef denormalize(x, mean=img_mean, std=img_std): x = x * std + mean return xdef main(): import time images, labels, table = load_pokemon('pokemon', 'train') print('images', len(images), images) print('labels', len(labels), labels) print(table)if __name__ == '__main__': main()

三、构建卷积神经网络

通过keras.Sequential构建一个简单的卷积神经网络。

network = keras.Sequential([ layers.Conv2D(16,5,3), layers.MaxPool2D(3,3), layers.ReLU(), layers.Conv2D(64,5,3), layers.MaxPool2D(2,2), layers.ReLU(), layers.Flatten(), layers.Dense(64), layers.ReLU(), layers.Dense(5)]) 四、模型训练

1、读取训练数据，batchsize根据内存或显卡显存大小决定。

batchsz = 256images, labels, table = load_pokemon('pokemon',mode='train')db_train = tf.data.Dataset.from_tensor_slices((images, labels))db_train = db_train.shuffle(1000).map(preprocess).batch(batchsz)

2、读取验证数据

images2, labels2, table = load_pokemon('pokemon',mode='val')db_val = tf.data.Dataset.from_tensor_slices((images2, labels2))db_val = db_val.map(preprocess).batch(batchsz)3、读取测试数据images3, labels3, table = load_pokemon('pokemon',mode='test')db_test = tf.data.Dataset.from_tensor_slices((images3, labels3))db_test = db_test.map(preprocess).batch(100)

4、数据预处理

def preprocess(x,y): # x: 图片的路径，y：图片的数字编码 x = tf.io.read_file(x) x = tf.image.decode_jpeg(x, channels=3) x = tf.image.resize(x, [244, 244]) x = tf.image.random_flip_left_right(x) x = tf.image.random_crop(x, [224,224,3]) x = tf.cast(x, dtype=tf.float32) / 255. x = normalize(x) y = tf.convert_to_tensor(y) y = tf.one_hot(y, depth=5) return x, y

5、模型训练，损失采用交叉熵，使用earlystop防止过拟合。

network.build(input_shape=(4, 224, 224, 3))network.summary()early_stopping = EarlyStopping( monitor='val_accuracy', min_delta=0.001, patience=5)network.compile(optimizer=optimizers.Adam(lr=1e-3), loss=losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'])network.fit(db_train, validation_data=db_val, validation_freq=1, epochs=100, callbacks=[early_stopping])network.evaluate(db_test)

模型结构：

Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d (Conv2D) multiple 1216

_________________________________________________________________

max_pooling2d (MaxPooling2D) multiple 0

_________________________________________________________________

re_lu (ReLU) multiple 0

_________________________________________________________________

conv2d_1 (Conv2D) multiple 25664

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 multiple 0

_________________________________________________________________

re_lu_1 (ReLU) multiple 0

_________________________________________________________________

flatten (Flatten) multiple 0

_________________________________________________________________

dense (Dense) multiple 36928

_________________________________________________________________

re_lu_2 (ReLU) multiple 0

_________________________________________________________________

dense_1 (Dense) multiple 325

=================================================================

Total params: 64,133

Trainable params: 64,133

Non-trainable params: 0

训练结果：

Epoch 16/100

1/3 [=========>....................] - ETA: 6s - loss: 0.1232 - accuracy: 0.9805

2/3 [===================>..........] - ETA: 3s - loss: 0.1455 - accuracy: 0.9785

3/3 [==============================] - 11s 4s/step - loss: 0.1241 - accuracy: 0.9793 - val_loss: 0.3912 - val_accuracy: 0.8798

1/3 [=========>....................] - ETA: 2s - loss: 0.4005 - accuracy: 0.8700

2/3 [===================>..........] - ETA: 1s - loss: 0.4779 - accuracy: 0.8450

3/3 [==============================] - 3s 899ms/step - loss: 0.4673 - accuracy: 0.8504 6、保存模型 network.save('model.h5')

五、预测

1、图像读取和预处理

def preprocess(img): img = tf.io.read_file(img) img = tf.image.decode_jpeg(img, channels=3) img = tf.image.resize(img, [244, 244]) img = tf.image.random_flip_left_right(img) img = tf.image.random_crop(img, [224,224,3]) img = tf.cast(img, dtype=tf.float32) / 255. return imgimg = '3.jpg'x = preprocess(img)x = tf.reshape(x, [1, 224, 224, 3])

2、加载训练模型

network = tf.keras.models.load_model('model.h5')

3、预测分类结果及对应概率，这里使用softmax将输出的logits转换为每个分类对应概率。

logits = network .predict(x)prob = tf.nn.softmax(logits, axis=1)print(prob)max_prob_index = np.argmax(prob, axis=-1)[0]prob = prob.numpy()max_prob = prob[0][max_prob_index]max_index = np.argmax(logits, axis=-1)[0]name = ['妙蛙种子', '小火龙', '超梦', '皮卡丘', '杰尼龟']print(name[max_index] + “:” + max_prob)

测试图像：

预测结果：

tf.Tensor([[0.02942971 0.29606345 0.02201815 0.57856214 0.07392654]], shape=(1, 5), dtype=float32)

0.57856214

皮卡丘

六、分析与优化

从训练和预测效果上看，在训练集上已经达到了98%左右的精度，但是在验证集和测试集上只能达到80%多的精度，尽管使用了earlystop，也出现了明显的过拟合现象。通过预测，可以看出一张很明显的皮卡丘图像预测概率为0.578，虽然可以正确分类，但还没有达到比较好拟合状态。

1、数据集和模型结构优化

为了快速完成训练，这里采用的比较浅的卷积网络，并且由于训练数据太少（总共只有一千多张图像），很难达到比较好的拟合效果，因此可以继续增加数据集以提升精度，也可以用更深层的网络进行训练。

2、训练参数优化

可以通过修改每层参数，以及学习率，更换优化器等方式调整参数，以达到更优的训练效果。

3、迁移学习

针对小样本学习，迁移学习是一个不错的选择，使用tensorflow内置模型结合其在对应公开数据集上的训练参数，通过冻结模型最后对应不同分类结果的全连接层，使用自己的样本和自定义输出层进行训练可以达到更好的拟合效果。

数据集和全部代码地址：