当前位置:网站首页>Deeply learn the skills of parameter adjustment
Deeply learn the skills of parameter adjustment
2022-04-23 15:27:00 【moletop】
How to adjust parameters :
-
batchsize Be suitable
-
epoch Be suitable , Observe the convergence , Prevent over fitting
-
Whether to add batch nomal
-
dropout If you need
-
Activate function selection : except gate Places like that , You need to limit the output to 0-1 outside , Try not to use sigmoid, It can be used tanh perhaps relu Activation functions like that .1. sigmoid Function in -4 To 4 Section in , There's a big gradient . Outside the range , The gradient is close to 0, It's easy to cause the gradient to disappear .2. Input 0 mean value ,sigmoid The output of the function is not 0 Mean .
-
Loss function round plus regular , A round without regularity
-
The choice of optimizer :adam,adadelta etc. , On small data , The effect of the experiment is not as good as sgd, sgd The convergence rate will be slower , But the final result of convergence , It's generally better . If you use sgd Words , You can choose from 1.0 perhaps 0.1 The learning rate started to , After a while , Check on the validation set , If cost No decline , Cut the learning rate by half . Many papers do this , The results of the experiment are also very good . Of course , You can also use ada The series starts with , At the end of the day , Replace it with sgd Keep training . There will also be improvements . It is said that adadelta In general, the effect of classification is better ,adam In the generation problem, the effect is better .
-
ensemble
-
The same parameters , Different initialization methods
-
Different parameters , adopt cross-validation, Choose the best groups
k Detailed explanation of folding and crossing :https://www.cnblogs.com/henuliulei/p/13686046.html
-
The same parameters , Different stages of model training , That is, models with different iterations .
-
Different models , Linear fusion . for example RNN And traditional models .
-
版权声明
本文为[moletop]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231523160668.html
边栏推荐
- el-tree实现只显示某一级复选框且单选
- G007-hwy-cc-estor-03 Huawei Dorado V6 storage simulator construction
- HJ31 单词倒排
- adobe illustrator 菜单中英文对照
- MultiTimer v2 重构版本 | 一款可无限扩展的软件定时器
- 深度学习——超参数设置
- What exactly does the distributed core principle analysis that fascinates Alibaba P8? I was surprised after reading it
- JSON date time date format
- Have you learned the basic operation of circular queue?
- 如何设计一个良好的API接口?
猜你喜欢
Five data types of redis
TLS / SSL protocol details (30) RSA, DHE, ecdhe and ecdh processes and differences in SSL
【Leetcode-每日一题】安装栅栏
The win10 taskbar notification area icon is missing
G007-hwy-cc-estor-03 Huawei Dorado V6 storage simulator construction
自主作业智慧农场创新论坛
函数(第一部分)
Analysis of common storage types and FTP active and passive modes
Reptile exercises (1)
让阿里P8都为之着迷的分布式核心原理解析到底讲了啥?看完我惊了
随机推荐
调度系统使用注意事项
Explanation of redis database (I)
通过 PDO ODBC 将 PHP 连接到 MSSQL
JUC学习记录(2022.4.22)
Use of common pod controller of kubernetes
Tun equipment principle
Modify the default listening IP of firebase emulators
Explanation 2 of redis database (redis high availability, persistence and performance management)
Hj31 word inversion
The wechat applet optimizes the native request through the promise of ES6
Summary of interfaces for JDBC and servlet to write CRUD
setcontext getcontext makecontext swapcontext
木木一路走好呀
今日睡眠质量记录76分
Detailed explanation of C language knowledge points - data types and variables [2] - integer variables and constants [1]
Explanation of redis database (IV) master-slave replication, sentinel and cluster
redis-shake 使用中遇到的错误整理
Design of digital temperature monitoring and alarm system based on DS18B20 single chip microcomputer [LCD1602 display + Proteus simulation + C program + paper + key setting, etc.]
Mysql database explanation (8)
Educational Codeforces Round 127 A-E题解