
Adaptive MultiGoal Exploration
We introduce a generic strategy for provably efficient multigoal explor...
read it

An IntrinsicallyMotivated Approach for Learning Highly Exploring and Fast Mixing Policies
What is a good exploration strategy for an agent that interacts with an ...
read it

Nearoptimal Regret Bounds for Stochastic Shortest Path
Stochastic shortest path (SSP) is a wellknown problem in planning and c...
read it

Kinematic State Abstraction and Provably Efficient RichObservation Reinforcement Learning
We present an algorithm, HOMER, for exploration and reinforcement learni...
read it

Provably efficient RL with Rich Observations via Latent State Decoding
We study the exploration problem in episodic MDPs with rich observations...
read it

A Provably Efficient Sample Collection Strategy for Reinforcement Learning
A common assumption in reinforcement learning (RL) is to have access to ...
read it

ExTra: Transferguided Exploration
In this work we present a novel approach for transferguided exploration...
read it
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ϵoptimal goalconditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s_0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s_0 and improving the accuracy of a model estimate that is used to compute goalconditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as Õ(L^5 S_L+ϵΓ_L+ϵ A ϵ^2), where A is the number of actions, S_L+ϵ is the number of states that are incrementally reachable from s_0 in L+ϵ steps, and Γ_L+ϵ is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ϵ and L at the cost of an extra Γ_L+ϵ factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an ϵ/c_minoptimal policy for any costsensitive shortestpath problem defined on the Lreachable states with minimum cost c_min. Finally, we report preliminary empirical results confirming our theoretical findings.
READ FULL TEXT
Comments
There are no comments yet.