当前位置:网站首页>PyTorch入门:(一)数据加载
PyTorch入门:(一)数据加载
2022-08-08 18:59:00 【Here_SDUT】
前言:本文为学习 PyTorch深度学习快速入门教程(绝对通俗易懂!)【小土堆】时记录的 Jupyter 笔记,部分截图来自视频中的课件。
数据:一堆杂乱是数据,是个垃圾堆
Dataset:提供一种方式去获取数据及其 label ,即在垃圾堆里寻宝,如何获取每个数据及其label,告诉我们总共有多少个数据。
Dataloader: 为网络提供不同的数据形式。
from torch.utils.data import Dataset
help(Dataset)
Help on class Dataset in module torch.utils.data.dataset:
class Dataset(typing.Generic)
| An abstract class representing a :class:`Dataset`.
|
| All datasets that represent a map from keys to data samples should subclass
| it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
| data sample for a given key. Subclasses could also optionally overwrite
| :meth:`__len__`, which is expected to return the size of the dataset by many
| :class:`~torch.utils.data.Sampler` implementations and the default options
| of :class:`~torch.utils.data.DataLoader`.
|
| .. note::
| :class:`~torch.utils.data.DataLoader` by default constructs a index
| sampler that yields integral indices. To make it work with a map-style
| dataset with non-integral indices/keys, a custom sampler must be provided.
|
| Method resolution order:
| Dataset
| typing.Generic
| builtins.object
|
| Methods defined here:
|
| __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]'
|
| __getitem__(self, index) -> +T_co
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __orig_bases__ = (typing.Generic[+T_co],)
|
| __parameters__ = (+T_co,)
|
| ----------------------------------------------------------------------
| Class methods inherited from typing.Generic:
|
| __class_getitem__(params) from builtins.type
|
| __init_subclass__(*args, **kwargs) from builtins.type
| This method is called when a class is subclassed.
|
| The default implementation does nothing. It may be
| overridden to extend subclasses.
从上面的帮助文档可以看出,Dataset是一个抽象类,继承Dataset类后必须重写__getitem__
方法,其他可选重写的方法有len等。
__gettiem__
方法用于通过下标(idx)获取一个样本数据
这里采用的是蜜蜂蚂蚁数据集为例,数据集下载链接: https://pan.baidu.com/s/1jZoTmoFzaTLWh4lKBHVbEA 密码: 5suq
# 载入图片数据
from PIL import Image
img_path = "D:/work/StudyCode/jupyter/dataset_for_pytorch_dataloading/train/ants/0013035.jpg"
img = Image.open(img_path)
img.show()
主要用到os库的两个方法:
os.listdir(filepath)
:遍历 filepath 下的所有文件,将文件名以列表的形式返回os.path.join(a,b)
:将路径a和b拼接起来,此函数的好处是可以根据不同的操作系统的路径分隔符自动拼接路径
from torch.utils.data import Dataset
from PIL import Image
import os
class MyData(Dataset):
def __init__(self, root_dir, label_dir):
self.root_dir = root_dir
self.label_dir = label_dir
self.path = os.path.join(self.root_dir, self.label_dir)
self.img_path = os.listdir(self.path)
def __getitem__(self, idx):
img_name = self.img_path[idx]
img_item_path = os.path.join(self.root_dir, self.label_dir, img_name)
img = Image.open(img_item_path)
label = self.label_dir
return img, label
def __len__(self):
return len(self.img_path)
边栏推荐
- Goose Factory Robot Dog Fancy Crossing 10m Plum Blossom Pile: Front Flip, Single Pile Jump, Get Up and Bow... No stumble in the whole process
- PG's huge page
- El - tree set radio, click finish after assemble
- Fortinet全新云原生保护产品上线亚马逊云科技平台
- vue项目打包后的网页缓存问题
- Performance optimization | CPU power management from the perspective of ping delay
- 为啥程序员下班后只关显示器从不关电脑?看看各大网站的答案~
- ptorch
- 用工具实现 Mock API 的整个流程
- 智驾科技完成C1轮融资,此前2轮已融4.5亿元
猜你喜欢
SSM项目整合——综合案例
How is the private key generated by OpenSSH used in putty?
Advanced CAD practice (2)
Lecture 4: Database Definition Language of DDL Type of SQL Statements
[MRCTF2020]你传你码呢
BP neural network
The difference between Redis' memory elimination strategy and expired deletion strategy
C language elementary - structure
数据库学习之表的操作
Laravel 5.8 Notes
随机推荐
制造企业为什么要部署数字化工厂系统
SSM project integration, integrated case
What is the main purpose of software testing?
同花顺可以买股票吗?买股票安全吗?
Redhat 7 Maria DB安装与配置
Dandelion R300A 4G router, remote monitoring PLC tutorial
Is there any function in MAXCOMPUTE SQL to judge whether the string is a number?
搭建DG导致归档日志量变多问题排查
Geometric g6 will carry harmonyos system, a comprehensive upgrade competitiveness of products
达梦数据库 DmAPservice服务,启停影响 DMSERVER库服务吗?
疫情期间闲来无事,我自制了一个按钮展示框特效来展示我的博客
C语言初阶-结构体
BP neural network
Oracle - table
【761. 特殊的二进制序列】
软件测试主要是做什么的?
第4讲:SQL语句之DDL类型的数据库定义语言
Ability in general, but it can be large horizontal jump freely?Where is the better?
shell九九乘法口诀表
The difference between Redis' memory elimination strategy and expired deletion strategy