当前位置:网站首页>Pieces of TensorFlow 2.9 (1)
Pieces of TensorFlow 2.9 (1)
2022-08-10 08:12:00 【A cloud in the sky】
目录
mnist.load_data()what is read?
The general principle of the function:
将读取的mnistThe data in the dataset is converted to float and normalized
Python环境3.8.
Code debugging is usedPyCharm.
mnist是什么?
MNISTis a dataset of handwritten digits,由6万张训练图片和1万张测试图片构成的,每张图片都是28*28大小(如下图),而且都是黑白色构成(这里的黑色是一个0-1的浮点数,黑色越深表示数值越靠近1),这些图片是采集的不同的人手写从0到9的数字.
tf.keras.datasets.mnist
tf.keras.datasets.mnist,CTRL+鼠标左键点击datasets或者mnist无法跳转到定义
断点调试可以发现,tf.keras.datasets.mnistWhat the module actually calls iskeras.api._v2.keras.datasets.mnist模块,
查看tf.kerasThe source code also supports this
_keras_module = "keras.api._v2.keras"
keras = _LazyLoader("keras", globals(), _keras_module)
_module_dir = _module_util.get_parent_dir_for_name(_keras_module)
if _module_dir:
_current_module.__path__ = [_module_dir] + _current_module.__path__
setattr(_current_module, "keras", keras)
keras.api._v2.keras.datasets.mnistModules are not actually final nodes yet,在其__init__.py中有定义
from keras.datasets.mnist import load_data
keras.datasets.mnist才是实现了mnistModules for dataset manipulation,The main read dataset,即loda_data函数
The reason for this is mainlytensorflow和keras发展的过程中,分分合合,caused many historical problems,The code in some places is confusing.比如上述代码,完全可以使用from keras.datasets import mnist
当然,我们还是遵循TensorFlowOfficial website recommendations to write,即使用tf.keras.datasets.mnist.
mnist.load_data()what is read?
我们已经知道了mnist的实现是在keras.datasets.mnist,主要就是load_data()函数,用于读取mnist数据集,load_data()函数的源码如下
@keras_export('keras.datasets.mnist.load_data')
def load_data(path='mnist.npz'):
"""Loads the MNIST dataset.
This is a dataset of 60,000 28x28 grayscale images of the 10 digits,
along with a test set of 10,000 images.
More info can be found at the
[MNIST homepage](http://yann.lecun.com/exdb/mnist/).
Args:
path: path where to cache the dataset locally
(relative to `~/.keras/datasets`).
Returns:
Tuple of NumPy arrays: `(x_train, y_train), (x_test, y_test)`.
**x_train**: uint8 NumPy array of grayscale image data with shapes
`(60000, 28, 28)`, containing the training data. Pixel values range
from 0 to 255.
**y_train**: uint8 NumPy array of digit labels (integers in range 0-9)
with shape `(60000,)` for the training data.
**x_test**: uint8 NumPy array of grayscale image data with shapes
(10000, 28, 28), containing the test data. Pixel values range
from 0 to 255.
**y_test**: uint8 NumPy array of digit labels (integers in range 0-9)
with shape `(10000,)` for the test data.
Example:
```python
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
assert x_train.shape == (60000, 28, 28)
assert x_test.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_test.shape == (10000,)
```
License:
Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset,
which is a derivative work from original NIST datasets.
MNIST dataset is made available under the terms of the
[Creative Commons Attribution-Share Alike 3.0 license.](
https://creativecommons.org/licenses/by-sa/3.0/)
"""
origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
path = get_file(
path,
origin=origin_folder + 'mnist.npz',
file_hash=
'731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1')
with np.load(path, allow_pickle=True) as f: # pylint: disable=unexpected-keyword-arg
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
TensorFlow和KerasIn fact, the code comments are very good,Look at the code comments to see what a function is and how to use it
The general principle of the function:
mnist.load_data()function to access the above URL,下载mnist数据集的文件,保存为mnist.npz,路径在xxx\.keras\datasets\mnist.npz
npy
The .npy format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture. The format is designed to be as simple as possible while achieving its limited goals.
也就是将numpyThe resulting array is saved as binary format data.
npz
The .npz format is the standard format for persisting multiple NumPy arrays on disk. A .npz file is a zip file containing multiple .npy files, one for each array.
That is, saving multiple arrays to a file,and saved in binary format.
1个npz中可以有多个npy
用np.load读取这个文件,np就是numpy,这个文件里包含4个数组:
x_train、y_train、x_test、y_test
After reading this4return an array(x_train, y_train), (x_test, y_test)
So that's why I always write when I write code(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train数据一览
y_train数据一览
x_test数据一览
y_test数据一览
将读取的mnistThe data in the dataset is converted to float and normalized
读取的mnistThe data in the dataset has a range of values0-255
归一化的目的就是使得预处理的数据被限定在一定的范围内(比如[0,1]或者[-1,1]),从而消除Undesirable effects caused by singular sample data.归一化的目的就是使得预处理的数据被限定在一定的范围内(比如[0,1]或者[-1,1]),Therefore, eliminating the existence of singular sample data will cause the training time to increase,同时也可能导致无法收敛,因此,当存在奇异样本数据时,在进行训练之前需要对预处理数据进行归一化;反之,不存在奇异样本数据时,则可以不进行归一化.Undesirable effects caused by singular sample data.
x_train和y_train都是numpy数组,且为整形,直接用“/=”会报错:“numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'”
这是Numpycaused by internal mechanisms
x_train /= 255.0
y_train /= 255.0
正确写法
x_train = x_train.astype(np.float)
x_train /= 255.0
y_train = y_train.astype(np.float)
y_train /= 255.0
或者
x_train, y_train = x_train / 255.0, y_train / 255.0
得到的结果如下
边栏推荐
- Rust learning: 6.4_ enumeration of composite types
- 初使jest 单元测试
- PLSQL学习第三天
- 协同工具满足70%-90%的工作需求,成为企业香饽饽
- Uni-app开发微信小程序使用本地图片做背景图
- SQL SERVER 数据库,表的数据发生增删改,该表的索引会在ldf日志中记录吗?
- delta method 介绍
- Quickly enter the current date and time
- 2022-08-01 Advanced Network Engineering (23) Advanced VLAN Technology - VLAN Aggregation, MUX VLAN
- What is an MQTT gateway?What is the difference with traditional DTU?
猜你喜欢
ABAP Data Types 和XSD Type 映射关系以及XSD Type属性
进程管理(动态的)
自动化测试框架搭建 ---- 标记性能较差用例
Based on STC8G2K64S4 single-chip microcomputer to display analog photosensitive analog value through OLED screen
关于数据中心的设计方案,数据中心网络规划设计
【Rust指南】使用Cargo工具高效创建Rust项目 | 理解Rust特别的输入输出语句
Rust learning: 6.3_ Tuples of composite types
90.(cesium之家)cesium高度监听事件
数据库公共字段自动填充
如何设计神经网络结构,神经网络设计与实现
随机推荐
CV+Deep Learning - network architecture Pytorch recurrence series - classification (3: MobileNet, ShuffleNet)
大体来讲,网站会被攻击分为几种原因
问下cdc mysql to doris.不显示具体行数,怎么办?
DGIOT支持工业设备租赁以及远程管控
Uni applet Tencent map polygon background transparency
DGIOT supports industrial equipment rental and remote control
ATH10传感器读取温湿度
Binary tree --- heap
时序动作定位 | ASM-Loc:弱监督时序动作定位的动作感知片段建模(CVPR 2022)
QT下载清华源配置
自动化测试框架Pytest(一)——入门
Day36 LeetCode
Compilation failure:找不到符号
快速输入当前日期与时间
The probability distribution and its application
什么是长轮询
Synchronization lock synchronized traces the source
winget包管理器
js函数聚合的三种实现方式
【Unity入门计划】2D游戏实现敌人来回移动控制脚本