nzxhuong'log

Variational Autoencoders

Ngo Truong — Sat, 12 Apr 2025 18:35:36 GMT

This is my learning note from L4 Latent Variable Models by Pieter Abbeel. The idea of being able to generate (potentially new) images, songs, or any data type you want with generative models always amazes me. And I just want to share my thoughts on it (mostly Latent Variable Models, LVMs).

Prioritized Experience Replay and Importance Sampling

Ngo Truong — Wed, 02 Apr 2025 13:31:56 GMT

During the time I’m learning about Deep Q-Learning (DQN), I was stumbling on one thought: with experience replay, we sample transitions in the buffer uniformly. But the way we append every transition means it's likely that most of the transitions are 'bad' experiences, especially early on. Like in the cliff walking problem, early in training we're just running around randomly and falling off the cliff a lot. This leads to the buffer being mostly filled with bad outcomes. Even when we do finally reach the goal and get a good transition, it's drowned out by the huge amount of bad stuff we already stored. This means most of the time we're sampling 'bad' experiences during training updates, and that doesn't feel very efficient.

So I started thinking... is there a way to sample more of the 'good' stuff? That's when I found Prioritized Experience Replay (PER), proposed by DeepMind.

Understanding Shannon Information and Entropy

Ngo Truong — Tue, 01 Apr 2025 17:11:30 GMT

Many materials on this topic start with Claude Shannon’s concept of information. So let’s start with that.
Information, in Shannon's theory, is defined in the context of transferring a message from a source (transmitter) to a receiver over a channel. Imagine tossing a coin. In this scenario, the coin toss outcome acts as the source (transmitter), and you, observing the outcome, are the receiver.