Deep reinforcement learning library built on top of Neural Network Libraries

Overview

License Build status

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]
# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title Notebook Target RL task
Simple reinforcement learning training to get started Open In Colab Pendulum
Learn how to use training algorithms Open In Colab Pendulum
Learn how to use customized network model for training Open In Colab Mountain car
Learn how to use different network solver for training Open In Colab Pendulum
Learn how to use different replay buffer for training Open In Colab Pendulum
Learn how to use your own environment for training Open In Colab Customized environment
Atari game training example Open In Colab Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments
  • Update cem function interface

    Update cem function interface

    Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

    opened by sbsekiguchi 1
  • Add implementation for RNN support and DRQN algorithm

    Add implementation for RNN support and DRQN algorithm

    Add RNN model support and DRQN algorithm.

    Following trainers will support RNN-model.

    • Q value-based trainers
    • Deterministic gradient and Soft policy trainers

    Other trainers can support RNN models in future but is not implemented in the initial release.

    See this paper for the details of the DRQN algorithm.

    opened by ishihara-y 1
  • Implement SACD

    Implement SACD

    This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

    These changes have been made:

    • New environments with factored reward functions have been added
      • FactoredLunarLanderContinuousV2NNablaRL-v1
      • FactoredAntV4NNablaRL-v1
      • FactoredHopperV4NNablaRL-v1
      • FactoredHalfCheetahV4NNablaRL-v1
      • FactoredWalker2dV4NNablaRL-v1
      • FactoredHumanoidV4NNablaRL-v1
    • SACD algorithms has been added
    • SoftQDTrainer has been added
    • _InfluenceMetricsEvaluator has been added
    • reproduction script has been added (not benchmarked yet)

    visualizing influence metrics

    import gym
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    
    evaluation_hook = H.EvaluationHook(
        eval_env,
        EpisodicEvaluator(run_per_evaluation=10),
        timing=5000,
        writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
    )
    iteration_num_hook = H.IterationNumHook(timing=100)
    
    config = A.SACDConfig(gpu_id=0, reward_dimension=9)
    sacd = A.SACD(env, config=config)
    sacd.set_hooks([iteration_num_hook, evaluation_hook])
    sacd.train_online(env, total_iterations=100000)
    
    influence_history = []
    
    state = env.reset()
    while True:
        action = sacd.compute_eval_action(state)
        influence = sacd.compute_influence_metrics(state, action)
        influence_history.append(influence)
        state, _, done, _ = env.step(action)
        if done:
            break
    
    influence_history = np.array(influence_history)
    for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
        plt.plot(influence_history[:, i], label=label)
    plt.xlabel("step")
    plt.ylabel("influence metrics")
    plt.legend()
    plt.show()
    

    image

    sample animation

    sample

    opened by ishihara-y 0
  • Add gmm and Update gaussian

    Add gmm and Update gaussian

    Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

    The API change is like following:

    Previous :

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    distribution = D.Gaussian(mean, ln_var)
    # return nn.Variable
    assert isinstance(distribution.sample(), nn.Variable)
    

    Updated:

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    # You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
    distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
    assert isinstance(distribution.sample(), nn.Variable)
    
    # If you pass np.ndarray, then all class methods return np.ndarray
    # Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
    distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
    assert isinstance(distribution.sample(), np.ndarray)
    
    opened by sbsekiguchi 0
  • Support nnabla-browser

    Support nnabla-browser

    • [x] add MonitorWriter
    • [x] save computational graph as nntxt

    example

    import gym
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    # save training computational graph
    training_graph_hook = H.TrainingGraphHook(outdir="test")
    
    # evaluation hook with nnabla's Monitor
    eval_env = gym.make("Pendulum-v0")
    evaluator = EpisodicEvaluator(run_per_evaluation=10)
    evaluation_hook = H.EvaluationHook(
        eval_env,
        evaluator,
        timing=10,
        writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
    )
    
    env = gym.make("Pendulum-v0")
    sac = A.SAC(env)
    sac.set_hooks([training_graph_hook, evaluation_hook])
    
    sac.train_online(env, total_iterations=100)
    

    image image

    opened by ishihara-y 0
  • Add iLQR and LQR

    Add iLQR and LQR

    Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

    Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

    opened by ishihara-y 0
  • Check np_random instance and use correct randint alternative

    Check np_random instance and use correct randint alternative

    I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

    # in case of RandomState
    # this line works
    gym.unwrapped.np_random.rand_int(...)
    # in case of Generator
    # rand_int does not exist and we must use integers as an alternative
    gym.unwrapped.np_random.integers(...)
    

    This PR will fix this issue and chooses correct function for sampling integers.

    opened by ishihara-y 0
  • Add icra2018 qtopt

    Add icra2018 qtopt

    opened by sbsekiguchi 0
Releases(v0.12.0)
Owner
Sony
Sony Group Corporation
Sony
Project for the discipline of Visual Data Analysis at EMAp FGV.

Analysis of the dissemination of fake news about COVID-19 on Twitter This project was the final work for the discipline of Visual Data Analysis of the

Giovani Valdrighi 2 Jan 17, 2022
This is a small Messnger with the cmd as an interface

Messenger This is a small messenger with the cmd as an interface. It started as a project to learn more about Python 3. If you want to run a version o

1 Feb 24, 2022
Telegram bot to download tiktok video/audio

TikTokDL (Bot) Telegram RoBot to Download Tiktok video/audio. Features: 👉 Download TikTok Video without Watermark 👉 Download TikTok Video with Water

X-Noid 23 Nov 21, 2022
Wrapper for shh/rsync for use with OpenFOAM and blue bear

bbsync wrapper for shh/rsync for use with OpenFOAM and blue bear About The Project bbsync is a wrapper for shh/rsync for use with OpenFOAM and blue be

1 Dec 10, 2021
𝐀 𝐦𝐨𝐝𝐮𝐥𝐚𝐫 𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦 𝐆𝐫𝐨𝐮𝐩 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐛𝐨𝐭 𝐰𝐢𝐭𝐡 𝐮𝐥𝐭𝐢𝐦𝐚𝐭𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 !!

𝐇𝐨𝐰 𝐓𝐨 𝐃𝐞𝐩𝐥𝐨𝐲 For easiest way to deploy this Bot click on the below button 𝐌𝐚𝐝𝐞 𝐁𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐆𝐫𝐨𝐮𝐩 𝐒𝐨𝐮𝐫𝐜𝐞𝐬 𝐆𝐞𝐧𝐞?

Mukesh Solanki 1 Dec 10, 2021
Social Framework

Social Int Framework Social Int Framework its a Selenium script that scrape the IG photos and do a Reverse search on google and yandex for finding ano

29 Dec 06, 2022
An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries.

🚀 SyneziaRaffles An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries. 🏄‍♂️ Quick Start Pr

Alexis M. 29 Dec 22, 2022
Python client for Midea dhumidifier

This is a library that allows communication with Midea dehumidifier appliances via the local area network. midea-beautiful-dehumidifier This library a

Nenad Bogojevic 42 Dec 22, 2022
A simple weather information tool.

pwy A simple weather information tool. Table of Contents Features Dependencies Installation Usage Update Changelog Credits License Features ASCII art

Clint 105 Dec 31, 2022
use python script to fix vmp dump api in ida

FixVmpDump use python script to fix vmp dump api in ida. support x86 and x64. details in my blog: https://blog.csdn.net/yan_star/article/details/11279

97 Nov 02, 2022
WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications with the ability to serve custom content in order to appropriately respond to client-issued requests.

WILSON Cloud Respwnder What is this? WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications (WILSON) with the ability to serve c

48 Oct 31, 2022
A template / demo bot for the Halcyon matrix bot library

Halcyon stock bot Hello! This is an example / template bot using the halcyon matrix bot library. Feel free to ask questions in the matrix chat #halcyo

Wes Ring 1 Feb 04, 2022
A Python library for inserting an reverse shell attached to Telegram in any Python application.

py tel reverse shell the reverse shell in your telgram! What is this? This program is a Python library that you can use to put an inverted shell conne

Torham 12 Dec 28, 2022
Wordnik Python public library

Python 2.7 client for Wordnik.com API Overview This is a Python 2.7 client for the Wordnik.com v4 API. For more information, see http://developer.word

Wordnik 224 Dec 29, 2022
vk.com API python wrapper

Python vk.com API wrapper This is a vk.com (the largest Russian social network) python API wrapper. The goal is to support all API methods (current an

Dmitry Voronin 371 Dec 29, 2022
A Bot to get RealTime Tweets to a Specific Chats from Desired Persons on Twitter to Telegram Chat.

TgTwitterStreamer A Bot to get RealTime Tweets to a Specific Chats from Desired Persons on Twitter to Telegram Chat. For Getting ENV's Refer this Link

Anonymous 69 Dec 20, 2022
The system to host your files on the Discord application

Distorage The system to host your files on the Discord application Documentation Documentation Distorage How to use the package You can install it wit

6 Jun 27, 2022
Build better AWS infrastructure

Sceptre About Sceptre is a tool to drive AWS CloudFormation. It automates the mundane, repetitive and error-prone tasks, enabling you to concentrate o

sceptre 1.4k Jan 04, 2023
This is an Advanced Calculator maybe with Discord Buttons in python.

Welcome! This is an Advanced Calculator maybe with Discord Buttons in python. This was the first version of the calculator, made for my discord bot, P

Polsulpicien 18 Dec 24, 2022
A Discord bot to scrape textfiles from messages and put them to Hastebin

A Discord bot to scrape textfiles from messages and put them to Hastebin. Intended to use on support servers to help users read textfiles on mobile.

1 Jan 23, 2022