Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Jiminy, fast and portable Python/C++ simulator of poly-articulated systems with OpenAI Gym interface for reinforcement learning.

Jiminy is a fast and portable cross-platform open-source simulator for poly-articulated systems. It was built with two ideas in mind: provide a fast y

Alexis DUBURCQ 122 Dec 29, 2022
Jogo Flappy Bird com phyton e phygame

Flappy-Bird Tecnologias usadas Requisitos para inicializar o jogo: Python faça o download em: https://www.python.org/ Pygame faça o download em: https

João Guilherme 1 Dec 06, 2021
Orbital-patterns - A program which plots pattern that revolving planets make

orbital-patterns Click to spawn planets Press "S" to capture screenshot. Image w

Yuvraj.M 11 Dec 24, 2022
XO game with server, client and visualizer for AI bots.

XO game with server, client and visualizer for AI bots.

Ali 4 Jul 14, 2022
A small game I made back in think 2011

Navi Network A small game I made back in think 2011. An online game inspired by the self-hosted nature of Minecraft, made with pygame, based on the Me

Peder Bergebakken Sundt 2 Jan 19, 2022
A python program for playing rock-paper-scissors with computer .

Rock_Paper_Scissors_Cut A time passing famous hand game known as rock paper scissors cut game. Starting from children to adults everyone plays this ga

Arghya Banerjee 1 Dec 16, 2021
user friendly python script who is able to catch fish in the game New World

new-world-fishing-bot release 1.1.1 click img for demonstration Download guide Click at latest release: Download and extract bot.zip: When you run fil

297 Jan 08, 2023
🎅 Celebrating 2021 Christmas with the development of this game

ChristmasGame (DEVELOPING) 🎅 Celebrating Christmas with the development of this game You can also use this engine to create your game too, just empty

Érik Freitas 5 Jan 10, 2022
A python3 project for generating WorldEdit shematics for the MineClone2 game for Minetest from images.

MineClone2 MapArt This is a python3 project you can use with the MineClone2 game for Minetest. This project take an image and output a WorldEdit shema

3 Jan 06, 2023
Brawl Stars private server for version 30.242

Brawl Stars v30 Brawl Stars v30.242 server emulator written in Python. Requirements: Python 3.7 or higher pymongo dnspython colorama Running the serve

15 Oct 17, 2021
Pendulum Simulation using Pygame

Pendulum project, built using pygame and math modules.

3 Nov 09, 2021
Recreation of HexGame in Pygame. More features will come soon !

Hex with Pygame Historical point of view What Are the rules of this game ? Some Strategies and tips The algorithm for the Win Other fonctionnalities W

4 Mar 26, 2022
uses Entropy to find the best next guess for Wordle, given the color clues

WordleSolver uses Entropy to find the best next guess for Wordle, given the color clues use player.py and enter in the string for the suggested clue w

Steve Earth 1 Jan 26, 2022
This is a simple tic tac toe game that runs in the command line.

Tic Tac Toe Game This is a simple tic tac toe game that runs in the command line. Game Description: The game is made up of a square grid with 9 portio

Josias Aurel 2 Nov 12, 2022
Dragon Quest IV (NDS) English + Party Chat Script Patcher for Japan ROM

Patches English script files from the US version of Dragon Quest IV for Nintendo DS and Android so they are rendered nicely when used with the Japan ROM. Addresses various issues caused by the Japan

Aric Huang 35 Dec 18, 2022
SuperChess is a GUI application for playing chess.

About SuperChess is a GUI application for playing chess. It is written in Python 3.10 programming language, uses PySide6 GUI library, python-chess lib

Boštjan Mejak 1 Oct 16, 2022
🥕Try and keep da carrot alive or else . . .

Carrot 🥕 Warning 💥 : I am not a botanist. I do not study carrots or plant life. I am a noob programmer :P. So don't belive anything you see in this

1 Jan 03, 2022
A Game of Life implementation in Python

Game of Life in Python (Golipy) Golipy is a simulator of John H. Conway's Game of Life, developed in Python based on the Pygame library. This is a toy

Alber 2 Dec 10, 2021
A sprite ripper and converter for Com2uS' 2007 game Music World.

Music World Sprite Dumper This repository contains a python script reads an UNCOMPRESSED Music World pxo file and attempts to dump sprites from it. Th

Buu342 1 Mar 16, 2022
A Higher-Lower web game made in Python using Flask framework.

Higher Lower Web Game Guess the random number from 0 to 9 in this web game made with Python and Flask Framework Modules that were used Random Flask In

Yago Goltara 1 Oct 27, 2021