An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Out-of-the-box support register, sign in, email verification and password recovery workflows for websites based on Django and MongoDB

Using djmongoauth What is it? djmongoauth provides out-of-the-box support for basic user management and additional operations including user registrat

hao 3 Oct 21, 2021
Authentication for Django Rest Framework

Dj-Rest-Auth Drop-in API endpoints for handling authentication securely in Django Rest Framework. Works especially well with SPAs (e.g React, Vue, Ang

Michael 1.1k Jan 03, 2023
Django-registration (redux) provides user registration functionality for Django websites.

Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor

Andrew Cutler 920 Jan 08, 2023
JSON Web Token Authentication support for Django REST Framework

REST framework JWT Auth Notice This project is currently unmaintained. Check #484 for more details and suggested alternatives. JSON Web Token Authenti

José Padilla 3.2k Dec 31, 2022
Authentication Module for django rest auth

django-rest-knox Authentication Module for django rest auth Knox provides easy to use authentication for Django REST Framework The aim is to allow for

James McMahon 878 Jan 04, 2023
API-key based security utilities for FastAPI, focused on simplicity of use

FastAPI simple security API key based security package for FastAPI, focused on simplicity of use: Full functionality out of the box, no configuration

Tolki 154 Jan 03, 2023
Cack facebook tidak login

Cack facebook tidak login

Angga Kurniawan 5 Dec 12, 2021
Get inside your stronghold and make all your Django views default login_required

Stronghold Get inside your stronghold and make all your Django views default login_required Stronghold is a very small and easy to use django app that

Mike Grouchy 384 Nov 23, 2022
Authware API wrapper for Python 3.5+

AuthwarePy Asynchronous wrapper for Authware in Python 3.5+ View our documentation 📲 Installation Run this to install the library via pip: pip instal

Authware 3 Feb 09, 2022
OpenStack Keystone auth plugin for HTTPie

httpie-keystone-auth OpenStack Keystone auth plugin for HTTPie. Installation $ pip install --upgrade httpie-keystone-auth You should now see keystone

Pavlo Shchelokovskyy 1 Oct 20, 2021
Implementation of Supervised Contrastive Learning with AMP, EMA, SWA, and many other tricks

SupCon-Framework The repo is an implementation of Supervised Contrastive Learning. It's based on another implementation, but with several differencies

Ivan Panshin 132 Dec 14, 2022
Graphical Password Authentication System.

Graphical Password Authentication System. This is used to increase the protection/security of a website. Our system is divided into further 4 layers of protection. Each layer is totally different and

Hassan Shahzad 12 Dec 16, 2022
Awesome Django authorization, without the database

rules rules is a tiny but powerful app providing object-level permissions to Django, without requiring a database. At its core, it is a generic framew

1.6k Dec 30, 2022
Automatizando a criação de DAGs usando Jinja e YAML

Automatizando a criação de DAGs no Airflow usando Jinja e YAML Arquitetura do Repo: Pastas por contexto de negócio (ex: Marketing, Analytics, HR, etc)

Arthur Henrique Dell' Antonia 5 Oct 19, 2021
A JOSE implementation in Python

python-jose A JOSE implementation in Python Docs are available on ReadTheDocs. The JavaScript Object Signing and Encryption (JOSE) technologies - JSON

Michael Davis 1.2k Dec 28, 2022
An open source Flask extension that provides JWT support (with batteries included)!

Flask-JWT-Extended Features Flask-JWT-Extended not only adds support for using JSON Web Tokens (JWT) to Flask for protecting views, but also many help

Landon Gilbert-Bland 1.4k Jan 04, 2023
Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator.

Django Admin Two-Factor Authentication Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator. Why Django

Iman Karimi 9 Dec 07, 2022
examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles

examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles ( Examiner , Supervisor , Student )

Ameer Nasser 4 Oct 28, 2021
A Python inplementation for OAuth2

OAuth2-Python Discord Inplementation for OAuth2 login systems. This is a simple Python 'app' made to inplement in your programs that require (shitty)

Prifixy 0 Jan 06, 2022
A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

Aman Raj 5 May 10, 2022