Tile Coding with PIQLE

Francesco De Comité

This document is also available as a pdf file.

1  Introduction

Tile coding, also known as CMAC, offers the possibility to treat problems with large state spaces within the frame of reinforcement learning, avoiding the curse of memory overflow. Together with adapted algorithms, it enables users to approximate state or (state,action) values for otherwise intractable problems.

For more details on tile coding, and related algorithms, see [SB98], chapter 8. At this time, we implemented rectangular tile coding, but very few work will have to be done to add implementation of irregular tiling, log stripes, diagonal stripes, as illustrated in [SB98] figure 8.6 or rectangular rotated tiling, for example.

Later in this document we will show where to create the Java classes for those implementations, without modifying the rest of the library structure.

Note that this implementation of tile coding is suitable only for problems where states are expressed in term of continuous components, and actions are discrete.

Concerning the learning algorithms which can use this notion of tile coding, we implemented the linear, gradient-descent version of Watkin's Q(l), as describe in [SB98], page 213 (more precisely, the corrected version which can be found on the book's website : http://www.cs.ualberta.ca/~sutton/book/QFA.pdf.

The rest of this document is organized as follows : section 2 will describe the hierarchy of classes and the link between them and some Java tricks and considerations, section 3 will explain how those classes are used for two problems : the classical mountain car and acrobot, and show some results obtained by using tile coding.

2  Implementation

Two aspects of the implementation have to be explained :

2.1  Java and PIQLE implementation

2.1.1  Individual tiles

The basic brick of all this construction is a Tile. A Tile is some portion of space, belonging to a unique Tiling, which is a set of Tiles. A Tile only has to remember the Tiling it belongs to, and its own q coefficient. We also need a method enabling the program to read and update q (methods setTheta and getTheta).

Others methods are left for the sake of monitoring and debugging.

Note that due to the way Java treats equality among objects, we do not need to redefine hashcode nor equals for Tiles : we will just have to ensure that a given Tile will not be created twice : this will be done in the Tiling class.

2.1.2  Tiling

Tiling is an abstract class, whose only method returns the Tile associated with a certain (state,action) pair. By extending this class, one can define different partitions of the space covered by the Tiling. At this time, we only implemented a generalization of hypercubic tiling of space (HyperRectangularTiling) and a more general version, allowing to ignore certain dimensions :
HyperRectangularSparseTiling.0.8cm

Implemented tilings
RectangularTiling
: This Tiling instantiation is left as an example, and could be used with two-dimensional state definitions (like the Mountain Car problem). In addition to the basic features of the Tile class, one can access to the coordinates of the tile inside the associated Tiling. Within the RectangularTiling class, method computeTile(double x,double y) is the place where the program associates a Tile to a point (x,y) of the states space. Note that the end of this method ensures that no tile will be generated twice.
HyperRectangularTiling
: This is a straightforward generalization of the class RectangularTiling to any number of dimensions.
HyperRectangularSparseTiling
: this extension of the class
HyperRectangularTiling enables the user to ignore certain dimensions when defining a Tiling. This permits to reproduce quite exactly the standard settings of Sutton's experiments with Mountain Car and Acrobot problems. The environments MountainCarTilingH and AcrobotCLS2Tiling illustrate how one can define those original settings (see below).
0.8cm

How to define new tilings
A Tiling is just a partition of a n-dimensional space, eventually permitting to ignore some dimensions. In order to define a new Tiling, one just has to clearly define this partition, write a method computeTile(State s,Action a), and ensure that no Tile while be created twice. The best place to understand how to do that might be the classes RectangularTile and RectangularTiling.

2.1.3  Some auxiliary classes

Inside the tiling package, three auxiliary classes are written, to allow clear and easy manipulation of Tiles and Tilings. As those classes are intended to help the manipulation of Tiles and Tilings, there might be no need for the user to modify, extends of look inside them.
ListOfTiles
: this class collects all the Tiles, from different Tilings, associated with a particular (state,action) couple. It also contains a method to compute the sum of qs.
SetOfTilings
: in the CMAC scheme, each action is associated with a set of several Tilings. This class gathers all the Tilings corresponding to a specific action. When defining an environment designed to be solved by tiling, one will have to create and define as many SetOfTilings as the number of possible actions. See Montain Car and Acrobot examples.
EligibleTiles
: the learning algorithms using tilings are maintaining a list of Tiles met during the exploration of an episode. This class is meant to memorize those Tiles, allowing to manipulate them ; mainly modify their q, or erase the complete list of Tiles

2.1.4  Algorithms using tilings

Sutton and Barto describe two algorithms using tilings : a linear gradient-descent Sarsa(l), and a linear, gradient-descent Watkins's Q(l). At this time (january 2006), we only implemented the second one, from the pseudo-code available at http://www.cs.ualberta.ca/~sutton/book/QFA.pdf The corresponding class, located in package algorithmes, is called SelectionneurTDFA.

2.1.5  Connecting tilings and problems

Tilings are strongly dependent of the problem to solve, so, it seems quite difficult to define an independent tiling scheme. The solution we decided to use is the following : each problem we want to solve using tilings must be defined by an environment implementing the TileAbleContraintes interface, which forces an environment to have a method which can return a list of tiles associated with a (state,action) couple. In addition this environment must define precisely the tilings. Have a look at MountainCarTiling and AcrobotCLS2Tiling : their constructors instantiate the tilings, and the method getTiles, from interface TileAbleContraintes, is defined.

2.1.6  Where to tune ?

There are several places where parameters can be modified, and there are also several ways to do that.
Size of Tiles
: the number of Tile following each Tiling dimension must be filled into the forth parameter of the tiling constructor (at least for HyperRecténgularSparseTiling class), named nbDiv in the environment class constructor.
Ignored dimensions
: this is defined by the pvalid boolean array in the environment class constructor.
Number and Structure of Tilings
: again, define them into the environment class constructor. Have a look at AcrobotCLS2Tiling and
MountainCarTilingH !
Algorithm parameters
: the class SelectionneurTDFA provides all the method to set, read, modify the algorithm parameters a,g,l,e

2.2  Implemented problems

Two problems are implemented :
Mountain Car
: the program MountainTiling illustrates the implementation of tiling for this problem, using the MountainCarTilingH environment definition. Figure 1 compares the behavior of SelectionneurTDFA, SelectionneurQL and SelectionneurWatkins, in terms of average reward, on this problem. The only parameter that had to be changed beetween each experiment is a.



Figure 1: Comparing tiling, QL, and Watkins on Mountain Car




Acrobot
: TestAcrobotCLS2Tiling illustrates the seek of a solution using the tiling paradigm. Tiling is the only algorithm which learns how to move the acrobot, and the solution seems to be very sensitive to parameters settings. Figure 2 shows an example of learning curve, in term of episode length before to reach the goal.



Figure 2: Acrobot learning curve


3  Remarks

3.1  RLFramework

To define a tiling scheme, one needs to know the upper and lower bounds of values in each dimension, and those informations are provided by the RLFramework defined by the people at the University of Alberta http://rlai.cs.ualberta.ca/RLBB/RL-Framework-concepts.html.

Defining a tiling for problem expressed in this syntax might be straightforward.

3.2  Tuning

Finding the good learning parameters seems to be very difficult, as the algorithm looks very sensible to parameter variations.

References

[SB98]
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. A Bradford Book.

This document was translated from LATEX by HEVEA.