<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>nzxhuong'log (Posts about importance sampling)</title><link>https://nzxhuong.github.io/</link><description></description><atom:link href="https://nzxhuong.github.io/categories/importance-sampling.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Mon, 07 Apr 2025 17:36:37 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Prioritized Experience Replay and Importance Sampling</title><link>https://nzxhuong.github.io/posts/prioritized-experience-replay-and-importance-sampling/</link><dc:creator>Ngo Truong</dc:creator><description>&lt;div&gt;&lt;p&gt;During the time I’m learning about Deep Q-Learning (DQN), I was stumbling on one thought: with experience replay, we sample transitions in the buffer uniformly. But the way we append &lt;em&gt;every&lt;/em&gt; transition means it's likely that most of the transitions are 'bad' experiences, especially early on. Like in the cliff walking problem, early in training we're just running around randomly and falling off the cliff a lot. This leads to the buffer being mostly filled with bad outcomes. Even when we &lt;em&gt;do&lt;/em&gt; finally reach the goal and get a good transition, it's drowned out by the huge amount of bad stuff we already stored. This means most of the time we're sampling 'bad' experiences during training updates, and that doesn't feel very efficient.&lt;/p&gt;
&lt;p&gt;So I started thinking... is there a way to sample more of the 'good' stuff? That's when I found Prioritized Experience Replay (PER), proposed by DeepMind.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://nzxhuong.github.io/posts/prioritized-experience-replay-and-importance-sampling/"&gt;Read more…&lt;/a&gt; (5 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>importance sampling</category><category>mathjax</category><category>prioritized experience replay</category><category>reinforcement learning</category><guid>https://nzxhuong.github.io/posts/prioritized-experience-replay-and-importance-sampling/</guid><pubDate>Wed, 02 Apr 2025 13:31:56 GMT</pubDate></item></channel></rss>