Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models

License

NotificationsYou must be signed in to change notification settings

anassinator/ilqr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

https://travis-ci.org/anassinator/ilqr.svg?branch=master

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR)for non-linear trajectory optimization based on Yuval Tassa'spaper.

It is compatible with both Python 2 and 3 and has built-in support forauto-differentiating both the dynamics model and the cost function usingTheano.

Install

To install, clone and run:

python setup.py install

You may also install the dependencies with pipenv as follows:

pipenv install

Usage

After installing,import as follows:

fromilqrimportiLQR

You can see theexamples directory forJupyter notebooks to see how common control problemscan be solved through iLQR.

Dynamics model

You can set up your own dynamics model by either extending theDynamicsclass and hard-coding it and its partial derivatives. Alternatively, you canwrite it up as a Theano expression and use theAutoDiffDynamics classfor it to be auto-differentiated. Finally, if all you have is a function, youcan use theFiniteDiffDynamics class to approximate the derivativeswith finite difference approximation.

This section demonstrates how to implement the following dynamics model:

m \dot{v} = F - \alpha v

wherem is the object's mass inkg,alpha is thefriction coefficient,v is the object's velocity inm/s,\dot{v} is the object's acceleration inm/s^2, andF isthe control (or force) you're applying to the object inN.

Automatic differentiation

importtheano.tensorasTfromilqr.dynamicsimportAutoDiffDynamicsx=T.dscalar("x")# Position.x_dot=T.dscalar("x_dot")# Velocity.F=T.dscalar("F")# Force.dt=0.01# Discrete time-step in seconds.m=1.0# Mass in kg.alpha=0.1# Friction coefficient.# Acceleration.x_dot_dot=x_dot* (1-alpha*dt/m)+F*dt/m# Discrete dynamics model definition.f=T.stack([x+x_dot*dt,x_dot+x_dot_dot*dt,])x_inputs= [x,x_dot]# State vector.u_inputs= [F]# Control vector.# Compile the dynamics.# NOTE: This can be slow as it's computing and compiling the derivatives.# But that's okay since it's only a one-time cost on startup.dynamics=AutoDiffDynamics(f,x_inputs,u_inputs)

Note: If you want to be able to use the Hessians (f_xx,f_ux,andf_uu), you need to pass thehessians=True argument to theconstructor. This will increase compilation time. Note thatiLQR doesnot require second-order derivatives to function.

Batch automatic differentiation

importtheano.tensorasTfromilqr.dynamicsimportBatchAutoDiffDynamicsstate_size=2# [position, velocity]action_size=1# [force]dt=0.01# Discrete time-step in seconds.m=1.0# Mass in kg.alpha=0.1# Friction coefficient.deff(x,u,i):"""Batched implementation of the dynamics model.    Args:        x: State vector [*, state_size].        u: Control vector [*, action_size].        i: Current time step [*, 1].    Returns:        Next state vector [*, state_size].    """x_=x[...,0]x_dot=x[...,1]F=u[...,0]# Acceleration.x_dot_dot=x_dot* (1-alpha*dt/m)+F*dt/m# Discrete dynamics model definition.returnT.stack([x_+x_dot*dt,x_dot+x_dot_dot*dt,    ]).T# Compile the dynamics.# NOTE: This can be slow as it's computing and compiling the derivatives.# But that's okay since it's only a one-time cost on startup.dynamics=BatchAutoDiffDynamics(f,state_size,action_size)

Note: This is a faster version ofAutoDiffDynamics that doesn'tsupport Hessians.

Finite difference approximation

fromilqr.dynamicsimportFiniteDiffDynamicsstate_size=2# [position, velocity]action_size=1# [force]dt=0.01# Discrete time-step in seconds.m=1.0# Mass in kg.alpha=0.1# Friction coefficient.deff(x,u,i):"""Dynamics model function.    Args:        x: State vector [state_size].        u: Control vector [action_size].        i: Current time step.    Returns:        Next state vector [state_size].    """    [x,x_dot]=x    [F]=u# Acceleration.x_dot_dot=x_dot* (1-alpha*dt/m)+F*dt/mreturnnp.array([x+x_dot*dt,x_dot+x_dot_dot*dt,    ])# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be# as accurate.dynamics=FiniteDiffDynamics(f,state_size,action_size)

Note: It is possible you might need to play with the epsilon values(x_eps andu_eps) used when computing the approximation if yourun into numerical instability issues.

Usage

Regardless of the method used for constructing your dynamics model, you can usethem as follows:

curr_x=np.array([1.0,2.0])curr_u=np.array([0.0])i=0# This dynamics model is not time-varying, so this doesn't matter.>>>dynamics.f(curr_x,curr_u,i)...array([1.02   ,2.01998])>>>dynamics.f_x(curr_x,curr_u,i)...array([[1.     ,0.01   ],           [0.     ,1.00999]])>>>dynamics.f_u(curr_x,curr_u,i)...array([[0.    ],           [0.0001]])

Comparing the output of theAutoDiffDynamics and theFiniteDiffDynamics models should generally yield consistent results,but the auto-differentiated method will always be more accurate. Generally, thefinite difference approximation will be faster unless you're also computing theHessians: in which case, Theano's compiled derivatives are more optimized.

Cost function

Similarly, you can set up your own cost function by either extending theCost class and hard-coding it and its partial derivatives.Alternatively, you can write it up as a Theano expression and use theAutoDiffCost class for it to be auto-differentiated. Finally, if allyou have are a loss functions, you can use theFiniteDiffCost class toapproximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear QuadraticRegulators:

(x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal})

whereQ andR are matrices defining your quadratic state errorand quadratic control errors andx_{goal} is your target state. Forconvenience, an implementation of this cost function is made available as theQRCost class.

QRCost class

importnumpyasnpfromilqr.costimportQRCoststate_size=2# [position, velocity]action_size=1# [force]# The coefficients weigh how much your state error is worth to you vs# the size of your controls. You can favor a solution that uses smaller# controls by increasing R's coefficient.Q=100*np.eye(state_size)R=0.01*np.eye(action_size)# This is optional if you want your cost to be computed differently at a# terminal state.Q_terminal=np.array([[100.0,0.0], [0.0,0.1]])# State goal is set to a position of 1 m with no velocity.x_goal=np.array([1.0,0.0])# NOTE: This is instantaneous and completely accurate.cost=QRCost(Q,R,Q_terminal=Q_terminal,x_goal=x_goal)

Automatic differentiation

importtheano.tensorasTfromilqr.costimportAutoDiffCostx_inputs= [T.dscalar("x"),T.dscalar("x_dot")]u_inputs= [T.dscalar("F")]x=T.stack(x_inputs)u=T.stack(u_inputs)x_diff=x-x_goall=x_diff.T.dot(Q).dot(x_diff)+u.T.dot(R).dot(u)l_terminal=x_diff.T.dot(Q_terminal).dot(x_diff)# Compile the cost.# NOTE: This can be slow as it's computing and compiling the derivatives.# But that's okay since it's only a one-time cost on startup.cost=AutoDiffCost(l,l_terminal,x_inputs,u_inputs)

Batch automatic differentiation

importtheano.tensorasTfromilqr.costimportBatchAutoDiffCostdefcost_function(x,u,i,terminal):"""Batched implementation of the quadratic cost function.    Args:        x: State vector [*, state_size].        u: Control vector [*, action_size].        i: Current time step [*, 1].        terminal: Whether to compute the terminal cost.    Returns:        Instantaneous cost [*].    """Q_=Q_terminalifterminalelseQl=x.dot(Q_).dot(x.T)ifl.ndim==2:l=T.diag(l)ifnotterminal:l_u=u.dot(R).dot(u.T)ifl_u.ndim==2:l_u=T.diag(l_u)l+=l_ureturnl# Compile the cost.# NOTE: This can be slow as it's computing and compiling the derivatives.# But that's okay since it's only a one-time cost on startup.cost=BatchAutoDiffCost(cost_function,state_size,action_size)

Finite difference approximation

fromilqr.costimportFiniteDiffCostdefl(x,u,i):"""Instantaneous cost function.    Args:        x: State vector [state_size].        u: Control vector [action_size].        i: Current time step.    Returns:        Instantaneous cost [scalar].    """x_diff=x-x_goalreturnx_diff.T.dot(Q).dot(x_diff)+u.T.dot(R).dot(u)defl_terminal(x,i):"""Terminal cost function.    Args:        x: State vector [state_size].        i: Current time step.    Returns:        Terminal cost [scalar].    """x_diff=x-x_goalreturnx_diff.T.dot(Q_terminal).dot(x_diff)# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as# accurate.cost=FiniteDiffCost(l,l_terminal,state_size,action_size)

Note: It is possible you might need to play with the epsilon values(x_eps andu_eps) used when computing the approximation if yourun into numerical instability issues.

Usage

Regardless of the method used for constructing your cost function, you can usethem as follows:

>>>cost.l(curr_x,curr_u,i)...400.0>>>cost.l_x(curr_x,curr_u,i)...array([0.,400.])>>>cost.l_u(curr_x,curr_u,i)...array([0.])>>>cost.l_xx(curr_x,curr_u,i)...array([[200.,0.],           [0.,200.]])>>>cost.l_ux(curr_x,curr_u,i)...array([[0.,0.]])>>>cost.l_uu(curr_x,curr_u,i)...array([[0.02]])

Putting it all together

N=1000# Number of time-steps in trajectory.x0=np.array([0.0,-0.1])# Initial state.us_init=np.random.uniform(-1,1, (N,1))# Random initial action path.ilqr=iLQR(dynamics,cost,N)xs,us=ilqr.fit(x0,us_init)

xs andus now hold the optimal state and control trajectorythat reaches the desired goal state with minimum cost.

Finally, aRecedingHorizonController is also bundled with this packageto use theiLQR controller in Model Predictive Control.

Important notes

To quote from Tassa's paper: "Two important parameters which have a directimpact on performance are the simulation time-stepdt and the horizonlengthN. Since speed is of the essence, the goal is to choose thosevalues which minimize the number of steps in the trajectory, i.e. the largestpossible time-step and the shortest possible horizon. The size ofdtis limited by our use of Euler integration; beyond some value the simulationbecomes unstable. The minimum length of the horizonN is aproblem-dependent quantity which must be found by trial-and-error."

Contributing

Contributions are welcome. Simply open an issue or pull request on the matter.

Linting

We useYAPF for all Python formattingneeds. You can auto-format your changes with the following command:

yapf --recursive --in-place --parallel.

You may install the linter as follows:

pipenv install --dev

License

SeeLICENSE.

Credits

This implementation was partially based on Yuval Tassa'sMATLABimplementation,andnavigator8972'simplementation.


[8]ページ先頭

©2009-2025 Movatter.jp