当前位置：网站首页>Pieces of TensorFlow 2.9 (1)

Pieces of TensorFlow 2.9 (1)

2022-08-10 08:12:00 【A cloud in the sky】

mnist是什么？

tf.keras.datasets.mnist

mnist.load_data()what is read？

The general principle of the function：

将读取的mnistThe data in the dataset is converted to float and normalized

Python环境3.8.

Code debugging is usedPyCharm.

mnist是什么？

MNISTis a dataset of handwritten digits,由6万张训练图片和1万张测试图片构成的,每张图片都是28*28大小（如下图）,而且都是黑白色构成（这里的黑色是一个0-1的浮点数,黑色越深表示数值越靠近1）,这些图片是采集的不同的人手写从0到9的数字.

tf.keras.datasets.mnist

tf.keras.datasets.mnist,CTRL+鼠标左键点击datasets或者mnist无法跳转到定义

断点调试可以发现,tf.keras.datasets.mnistWhat the module actually calls iskeras.api._v2.keras.datasets.mnist模块,

查看tf.kerasThe source code also supports this

_keras_module = "keras.api._v2.keras"
keras = _LazyLoader("keras", globals(), _keras_module)
_module_dir = _module_util.get_parent_dir_for_name(_keras_module)
if _module_dir:
  _current_module.__path__ = [_module_dir] + _current_module.__path__
setattr(_current_module, "keras", keras)

keras.api._v2.keras.datasets.mnistModules are not actually final nodes yet,在其__init__.py中有定义

from keras.datasets.mnist import load_data

keras.datasets.mnist才是实现了mnistModules for dataset manipulation,The main read dataset,即loda_data函数

The reason for this is mainlytensorflow和keras发展的过程中,分分合合,caused many historical problems,The code in some places is confusing.比如上述代码,完全可以使用from keras.datasets import mnist

当然,我们还是遵循TensorFlowOfficial website recommendations to write,即使用tf.keras.datasets.mnist.

mnist.load_data()what is read？

我们已经知道了mnist的实现是在keras.datasets.mnist,主要就是load_data()函数,用于读取mnist数据集,load_data()函数的源码如下

@keras_export('keras.datasets.mnist.load_data')
def load_data(path='mnist.npz'):
  """Loads the MNIST dataset.

  This is a dataset of 60,000 28x28 grayscale images of the 10 digits,
  along with a test set of 10,000 images.
  More info can be found at the
  [MNIST homepage](http://yann.lecun.com/exdb/mnist/).

  Args:
    path: path where to cache the dataset locally
      (relative to `~/.keras/datasets`).

  Returns:
    Tuple of NumPy arrays: `(x_train, y_train), (x_test, y_test)`.

  **x_train**: uint8 NumPy array of grayscale image data with shapes
    `(60000, 28, 28)`, containing the training data. Pixel values range
    from 0 to 255.

  **y_train**: uint8 NumPy array of digit labels (integers in range 0-9)
    with shape `(60000,)` for the training data.

  **x_test**: uint8 NumPy array of grayscale image data with shapes
    (10000, 28, 28), containing the test data. Pixel values range
    from 0 to 255.

  **y_test**: uint8 NumPy array of digit labels (integers in range 0-9)
    with shape `(10000,)` for the test data.

  Example:

  ```python
  (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
  assert x_train.shape == (60000, 28, 28)
  assert x_test.shape == (10000, 28, 28)
  assert y_train.shape == (60000,)
  assert y_test.shape == (10000,)
  ```

  License:
    Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset,
    which is a derivative work from original NIST datasets.
    MNIST dataset is made available under the terms of the
    [Creative Commons Attribution-Share Alike 3.0 license.](
    https://creativecommons.org/licenses/by-sa/3.0/)
  """
  origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
  path = get_file(
      path,
      origin=origin_folder + 'mnist.npz',
      file_hash=
      '731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1')
  with np.load(path, allow_pickle=True) as f:  # pylint: disable=unexpected-keyword-arg
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

    return (x_train, y_train), (x_test, y_test)

TensorFlow和KerasIn fact, the code comments are very good,Look at the code comments to see what a function is and how to use it

The general principle of the function：

mnist.load_data()function to access the above URL,下载mnist数据集的文件,保存为mnist.npz,路径在xxx\.keras\datasets\mnist.npz

npy

The .npy format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture. The format is designed to be as simple as possible while achieving its limited goals.

也就是将numpyThe resulting array is saved as binary format data.

npz

The .npz format is the standard format for persisting multiple NumPy arrays on disk. A .npz file is a zip file containing multiple .npy files, one for each array.

That is, saving multiple arrays to a file,and saved in binary format.

1个npz中可以有多个npy

用np.load读取这个文件,np就是numpy,这个文件里包含4个数组：

x_train、y_train、x_test、y_test

After reading this4return an array(x_train, y_train), (x_test, y_test)

So that's why I always write when I write code(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train数据一览

y_train数据一览

x_test数据一览

y_test数据一览

将读取的mnistThe data in the dataset is converted to float and normalized

读取的mnistThe data in the dataset has a range of values0-255

归一化的目的就是使得预处理的数据被限定在一定的范围内（比如[0,1]或者[-1,1]）,从而消除Undesirable effects caused by singular sample data.归一化的目的就是使得预处理的数据被限定在一定的范围内（比如[0,1]或者[-1,1]）,Therefore, eliminating the existence of singular sample data will cause the training time to increase,同时也可能导致无法收敛,因此,当存在奇异样本数据时,在进行训练之前需要对预处理数据进行归一化;反之,不存在奇异样本数据时,则可以不进行归一化.Undesirable effects caused by singular sample data.

x_train和y_train都是numpy数组,且为整形,直接用“/=”会报错：“numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'”

这是Numpycaused by internal mechanisms

x_train /= 255.0
y_train /= 255.0

正确写法

x_train = x_train.astype(np.float)
x_train /= 255.0
y_train = y_train.astype(np.float)
y_train /= 255.0

或者

x_train, y_train = x_train / 255.0, y_train / 255.0

得到的结果如下

原网站

版权声明
本文为[A cloud in the sky]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/222/202208100759489594.html

当前位置：网站首页>Pieces of TensorFlow 2.9 (1)

Pieces of TensorFlow 2.9 (1)

mnist是什么？

tf.keras.datasets.mnist

mnist.load_data()what is read？

The general principle of the function：

将读取的mnistThe data in the dataset is converted to float and normalized

边栏推荐

猜你喜欢

随机推荐