An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A JSON Web Token authentication plugin for the Django REST Framework.

Simple JWT Abstract Simple JWT is a JSON Web Token authentication plugin for the Django REST Framework. For full documentation, visit django-rest-fram

Simple JWT 3.3k Jan 01, 2023
Customizable User Authorization & User Management: Register, Confirm, Login, Change username/password, Forgot password and more.

Flask-User v1.0 Attention: Flask-User v1.0 is a Production/Stable version. The previous version is Flask-User v0.6. User Authentication and Management

Ling Thio 997 Jan 06, 2023
Some scripts to utilise device code authorization for phishing.

OAuth Device Code Authorization Phishing Some scripts to utilise device code authorization for phishing. High level overview as per the instructions a

Daniel Underhay 6 Oct 03, 2022
Generate payloads that force authentication against an attacker machine

Hashgrab Generates scf, url & lnk payloads to put onto a smb share. These force authentication to an attacker machine in order to grab hashes (for exa

xct 35 Dec 20, 2022
RSA Cryptography Authentication Proof-of-Concept

RSA Cryptography Authentication Proof-of-Concept This project was a request by Structured Programming lectures in Computer Science college. It runs wi

Dennys Marcos 1 Jan 22, 2022
Kube OpenID Connect is an application that can be used to easily enable authentication flows via OIDC for a kubernetes cluster

Kube OpenID Connect is an application that can be used to easily enable authentication flows via OIDC for a kubernetes cluster. Kubernetes supports OpenID Connect Tokens as a way to identify users wh

7 Nov 20, 2022
Graphical Password Authentication System.

Graphical Password Authentication System. This is used to increase the protection/security of a website. Our system is divided into further 4 layers of protection. Each layer is totally different and

Hassan Shahzad 12 Dec 16, 2022
Out-of-the-box support register, sign in, email verification and password recovery workflows for websites based on Django and MongoDB

Using djmongoauth What is it? djmongoauth provides out-of-the-box support for basic user management and additional operations including user registrat

hao 3 Oct 21, 2021
python implementation of JSON Web Signatures

python-jws 🚨 This is Unmaintained 🚨 This library is unmaintained and you should probably use For histo

Brian J Brennan 57 Apr 18, 2022
Local server that gives you your OAuth 2.0 tokens needed to interact with the Conta Azul's API

What's this? This is a django project meant to be run locally that gives you your OAuth 2.0 tokens needed to interact with Conta Azul's API Prerequisi

Fábio David Freitas 3 Apr 13, 2022
Basic auth for Django.

easy-basicauth WARNING! THIS LIBRARY IS IN PROGRESS! ANYTHING CAN CHANGE AT ANY MOMENT WITHOUT ANY NOTICE! Installation pip install easy-basicauth Usa

bichanna 2 Mar 25, 2022
Complete Two-Factor Authentication for Django providing the easiest integration into most Django projects.

Django Two-Factor Authentication Complete Two-Factor Authentication for Django. Built on top of the one-time password framework django-otp and Django'

Bouke Haarsma 1.3k Jan 04, 2023
This is a Python library for accessing resources protected by OAuth 2.0.

This is a client library for accessing resources protected by OAuth 2.0. Note: oauth2client is now deprecated. No more features will be added to the l

Google APIs 787 Dec 13, 2022
JWT Key Confusion PoC (CVE-2015-9235) Written for the Hack the Box challenge - Under Construction

JWT Key Confusion PoC (CVE-2015-9235) Written for the Hack the Box challenge - Under Construction This script performs a Java Web Token Key Confusion

Alex Fronteddu 1 Jan 13, 2022
Djagno grpc authentication service with jwt auth

Django gRPC authentication service STEP 1: Install packages pip install -r requirements.txt STEP 2: Make migrations and migrate python manage.py makem

Saeed Hassani Borzadaran 3 May 16, 2022
Automatic login utility of free Wi-Fi captive portals

wicafe Automatic login utility of free Wi-Fi captive portals Disclaimer: read and grant the Terms of Service of Wi-Fi services before using it! This u

Takumi Sueda 8 May 31, 2022
Simple extension that provides Basic, Digest and Token HTTP authentication for Flask routes

Flask-HTTPAuth Simple extension that provides Basic and Digest HTTP authentication for Flask routes. Installation The easiest way to install this is t

Miguel Grinberg 1.1k Jan 05, 2023
A Python package, that allows you to acquire your RecNet authorization bearer token with your account credentials!

RecNet-Login This is a Python package, that allows you to acquire your RecNet bearer token with your account credentials! Installation Done via git: p

Jesse 6 Aug 18, 2022
Login qr line & qr image

login-qr-line-qr-image login qr line & qr image python3 & linux ubuntu api source: https://github.com/hert0t/BEAPI-BETA import httpx import qrcode fro

Alif Budiman 1 Dec 27, 2021
Authentication, JWT, and permission scoping for Sanic

Sanic JWT Sanic JWT adds authentication protection and endpoints to Sanic. It is both easy to get up and running, and extensible for the developer. It

Adam Hopkins 229 Jan 05, 2023