当前位置:网站首页>PyTorch入门:(一)数据加载
PyTorch入门:(一)数据加载
2022-08-08 18:59:00 【Here_SDUT】
前言:本文为学习 PyTorch深度学习快速入门教程(绝对通俗易懂!)【小土堆】时记录的 Jupyter 笔记,部分截图来自视频中的课件。
数据:一堆杂乱是数据,是个垃圾堆
Dataset:提供一种方式去获取数据及其 label ,即在垃圾堆里寻宝,如何获取每个数据及其label,告诉我们总共有多少个数据。
Dataloader: 为网络提供不同的数据形式。
from torch.utils.data import Dataset
help(Dataset)Help on class Dataset in module torch.utils.data.dataset:
class Dataset(typing.Generic)
| An abstract class representing a :class:`Dataset`.
|
| All datasets that represent a map from keys to data samples should subclass
| it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
| data sample for a given key. Subclasses could also optionally overwrite
| :meth:`__len__`, which is expected to return the size of the dataset by many
| :class:`~torch.utils.data.Sampler` implementations and the default options
| of :class:`~torch.utils.data.DataLoader`.
|
| .. note::
| :class:`~torch.utils.data.DataLoader` by default constructs a index
| sampler that yields integral indices. To make it work with a map-style
| dataset with non-integral indices/keys, a custom sampler must be provided.
|
| Method resolution order:
| Dataset
| typing.Generic
| builtins.object
|
| Methods defined here:
|
| __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]'
|
| __getitem__(self, index) -> +T_co
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __orig_bases__ = (typing.Generic[+T_co],)
|
| __parameters__ = (+T_co,)
|
| ----------------------------------------------------------------------
| Class methods inherited from typing.Generic:
|
| __class_getitem__(params) from builtins.type
|
| __init_subclass__(*args, **kwargs) from builtins.type
| This method is called when a class is subclassed.
|
| The default implementation does nothing. It may be
| overridden to extend subclasses.从上面的帮助文档可以看出,Dataset是一个抽象类,继承Dataset类后必须重写__getitem__方法,其他可选重写的方法有len等。
__gettiem__方法用于通过下标(idx)获取一个样本数据
这里采用的是蜜蜂蚂蚁数据集为例,数据集下载链接: https://pan.baidu.com/s/1jZoTmoFzaTLWh4lKBHVbEA 密码: 5suq
# 载入图片数据
from PIL import Image
img_path = "D:/work/StudyCode/jupyter/dataset_for_pytorch_dataloading/train/ants/0013035.jpg"
img = Image.open(img_path)
img.show()主要用到os库的两个方法:
os.listdir(filepath):遍历 filepath 下的所有文件,将文件名以列表的形式返回os.path.join(a,b):将路径a和b拼接起来,此函数的好处是可以根据不同的操作系统的路径分隔符自动拼接路径
from torch.utils.data import Dataset
from PIL import Image
import os
class MyData(Dataset):
def __init__(self, root_dir, label_dir):
self.root_dir = root_dir
self.label_dir = label_dir
self.path = os.path.join(self.root_dir, self.label_dir)
self.img_path = os.listdir(self.path)
def __getitem__(self, idx):
img_name = self.img_path[idx]
img_item_path = os.path.join(self.root_dir, self.label_dir, img_name)
img = Image.open(img_item_path)
label = self.label_dir
return img, label
def __len__(self):
return len(self.img_path)边栏推荐
- 我们想更换RDS数据库,从sqlserver 2016 web升级到 2017企业集群版,有专家咨询
- How is the private key generated by OpenSSH used in putty?
- Laravel 队列消费实例和定时任务添加任务消费
- Transsion Holdings: At present, there is no clear plan for the company's mobile phone products to enter the Chinese market
- Leetcode 23.合并K个升序链表 链表归并合并
- [ZJCTF 2019]NiZhuanSiWei
- Group DETR:分组一对多匹配是加速DETR收敛的关键
- 传统和加密域名概述
- 16. Learn Lua file I/O together
- 软件测试基础笔记
猜你喜欢
随机推荐
十六、一起学习Lua 文件 I/O
uniapp父组件使用prop将异步的数据传给子组件
Laravel 5.8笔记
Style Design Principles in Open Office XML Format
Implementing Forward+ in Unity URP
Fortinet new cloud native protection products launched amazon cloud platform of science and technology
传统和加密域名概述
nyoj 712 探 寻 宝 藏(双线dp 第六届河南省程序设计大赛)
Monaco-Editor Multiplayer Collaboration Editor
Excuse me, during the mongoshake synchronization process in the shake database, src_mongo hangs up, will the synchronization service not exit?
一起了解分层架构&SOA架构
8月报考季,软考选科目避坑指南来啦
oracle视图v$active_session_history,dba_hist_active_session_history如何记录IP地址
搭建DG导致归档日志量变多问题排查
2022年美术生就业前景解析
BP神经网络
软件测试主要是做什么的?
hdu1042 N!(大数)
卡通渲染的历史
[ZJCTF 2019]NiZhuanSiWei









