An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A flask extension for managing permissions and scopes

Flask-Pundit A simple flask extension to organize resource authorization and scoping. This extension is heavily inspired by the ruby Pundit library. I

Anurag Chaudhury 49 Dec 23, 2022
A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

Aman Raj 5 May 10, 2022
Todo app with authentication system.

todo list web app with authentication system. User can register, login, logout. User can login and create, delete, update task Home Page here you will

Anurag verma 3 Aug 18, 2022
A simple model based API maker written in Python and based on Django and Django REST Framework

Fast DRF Fast DRF is a small library for making API faster with Django and Django REST Framework. It's easy and configurable. Full Documentation here

Mohammad Ashraful Islam 18 Oct 05, 2022
API with high performance to create a simple blog and Auth using OAuth2 ⛏

DogeAPI API with high performance built with FastAPI & SQLAlchemy, help to improve connection with your Backend Side to create a simple blog and Cruds

Yasser Tahiri 111 Jan 05, 2023
Object Moderation Layer

django-oml Welcome to the documentation for django-oml! OML means Object Moderation Layer, the idea is to have a mixin model that allows you to modera

Angel Velásquez 12 Aug 22, 2019
Strong, Simple, and Precise security for Flask APIs (using jwt)

flask-praetorian Strong, Simple, and Precise security for Flask APIs API security should be strong, simple, and precise like a Roman Legionary. This p

Tucker Beck 321 Dec 18, 2022
Ready to use and customizable Authentications and Authorisation management for FastAPI ⚡

AuthenticationX 💫 Ready-to-use and customizable Authentications and Oauth2 management for FastAPI ⚡

Yasser Tahiri 408 Jan 05, 2023
JWT authentication for Pyramid

JWT authentication for Pyramid This package implements an authentication policy for Pyramid that using JSON Web Tokens. This standard (RFC 7519) is of

Wichert Akkerman 73 Dec 03, 2021
Multi-user accounts for Django projects

django-organizations Summary Groups and multi-user account management Author Ben Lopatin (http://benlopatin.com) Status Separate individual user ident

Ben Lopatin 1.1k Jan 02, 2023
Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator.

Django Admin Two-Factor Authentication Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator. Why Django

Iman Karimi 9 Dec 07, 2022
Luca Security Concept

Luca Security Concept This is the document source of luca's security concept. Please go here for the HTML version: https://luca-app.de/securityconcept

luca 43 Oct 22, 2022
A recipe sharing API built using Django rest framework.

Recipe Sharing API This is the backend API for the recipe sharing platform at https://mesob-recipe.netlify.app/ This API allows users to share recipes

Hannah 21 Dec 30, 2022
FastAPI extension that provides JWT Auth support (secure, easy to use, and lightweight)

FastAPI JWT Auth Documentation: https://indominusbyte.github.io/fastapi-jwt-auth Source Code: https://github.com/IndominusByte/fastapi-jwt-auth Featur

Nyoman Pradipta Dewantara 468 Jan 01, 2023
Simple yet powerful authorization / authentication client library for Python web applications.

Authomatic Authomatic is a framework agnostic library for Python web applications with a minimalistic but powerful interface which simplifies authenti

1k Dec 28, 2022
Authware API wrapper for Python 3.5+

AuthwarePy Asynchronous wrapper for Authware in Python 3.5+ View our documentation 📲 Installation Run this to install the library via pip: pip instal

Authware 3 Feb 09, 2022
Authentication with fastapi and jwt cd realistic

Authentication with fastapi and jwt cd realistic Dependencies bcrypt==3.1.7 data

Fredh Macau 1 Jan 04, 2022
Django Authetication with Twitch.

Django Twitch Auth Dependencies Install requests if not installed pip install requests Installation Install using pip pip install django_twitch_auth A

Leandro Lopes Bueno 1 Jan 02, 2022
Local server that gives you your OAuth 2.0 tokens needed to interact with the Conta Azul's API

What's this? This is a django project meant to be run locally that gives you your OAuth 2.0 tokens needed to interact with Conta Azul's API Prerequisi

Fábio David Freitas 3 Apr 13, 2022
This script helps you log in to your LMS account and enter the currently running session

This script helps you log in to your LMS account and enter the currently running session, all in a second

Ali Ebrahimi 5 Sep 01, 2022