Prioritized Experience Replay and Importance Sampling

During the time I’m learning about Deep Q-Learning (DQN), I was stumbling on one thought: with experience replay, we sample transitions in the buffer uniformly. But the way we append every transition means it's likely that most of the transitions are 'bad' experiences, especially early on. Like in the cliff walking problem, early in training we're just running around randomly and falling off the cliff a lot. This leads to the buffer being mostly filled with bad outcomes. Even when we do finally reach the goal and get a good transition, it's drowned out by the huge amount of bad stuff we already stored. This means most of the time we're sampling 'bad' experiences during training updates, and that doesn't feel very efficient.

So I started thinking... is there a way to sample more of the 'good' stuff? That's when I found Prioritized Experience Replay (PER), proposed by DeepMind.

Read more…

Understanding Shannon Information and Entropy

Many materials on this topic start with Claude Shannon’s concept of information. So let’s start with that.
Information, in Shannon's theory, is defined in the context of transferring a message from a source (transmitter) to a receiver over a channel. Imagine tossing a coin. In this scenario, the coin toss outcome acts as the source (transmitter), and you, observing the outcome, are the receiver.

Read more…

Hi, I'm Ngo Truong. This is the place where I documenting my learning note.

Twitter: no

Using theme by Brian Keng