diff --git a/block-bias.md b/block-bias.md
new file mode 100644
index 0000000..c1a59c0
--- /dev/null
+++ b/block-bias.md
@@ -0,0 +1,64 @@
+⬅ [Block Profiling in Go](./block.md)
+
+⚠️This document describes a sampling bias issue I discovered while researching the block profiler for Go. I have since [landed a fix for it](https://go-review.googlesource.com/c/go/+/299991) that should appear in Go 1.17.
+
+# Block Profiler Sampling Bias
+
+**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events.
+
+As described in the [Usage](#usage) section, the block profiler will sample as follows:
+
+- Events with `duration >= rate` will be sampled 100%
+- Events with `duration < rate` have a `duration / rate` chance of getting sampled.
+
+The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that:
+
+```go
+func blocksampled(cycles int64) bool {
+ rate := int64(atomic.Load64(&blockprofilerate))
+ if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
+ return false
+ }
+ return true
+}
+```
+
+This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program.
+
+## Simple Example
+
+Let's say your `blockprofilerate` is `100ns` and your application produces the following events:
+
+- `A`: `1` event with a duration of `100ns`.
+- `B`: `10` events with a duration of `10ns` each.
+
+Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true.
+
+## Simulation & Proposal for Improvement
+
+For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ).
+
+
+
+So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`.
+
+
+
+So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events.
+
+But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this:
+
+```
+duration = duration * (rate/duration)
+# note: the expression above can be simplified to just `duration = rate`
+```
+
+Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it.
+
+
+
+## Disclaimers
+
+I'm [felixge](https://github.com/felixge) and work at [Datadog](https://www.datadoghq.com/) on [Continuous Profiling](https://www.datadoghq.com/product/code-profiling/) for Go. You should check it out. We're also [hiring](https://www.datadoghq.com/jobs-engineering/#all&all_locations) : ).
+
+The information on this page is believed to be correct, but no warranty is provided. Feedback is welcome!
\ No newline at end of file
diff --git a/block.md b/block.md
index 6f12707..8ea7e04 100644
--- a/block.md
+++ b/block.md
@@ -1,31 +1,11 @@
+⬅ [Index of all go-profiler-notes](./README.md)
+
+[Description](#description) - [Usage](#usage) - [Overhead](#overhead) - [Accuracy](#accuracy) - [Relationship with Mutex Profiling](#relationship-with-mutex-profiling) - [Profiler Labels](#profiler-labels) - [History](#history)
+
This document was last updated for `go1.15.7` but probably still applies to older/newer versions for the most parts.
# Block Profiling in Go
-
-
-- [Description](#description)
-
-* [Usage](#usage)
-* [Overhead](#overhead)
- + [Implementation Details](#implementation-details)
- + [Benchmarks](#benchmarks)
- + [Memory Usage](#memory-usage)
- + [Initialization Time](#initialization-time)
-* [Accuracy](#accuracy)
- + [Sampling Bias](#sampling-bias)
- - [Simple Example](#simple-example)
- - [Simulation & Proposal for Improvement](#simulation---proposal-for-improvement)
- + [Time Stamp Counter](#time-stamp-counter)
- + [Stack Depth](#stack-depth)
- + [Spin Locks](#spin-locks)
-* [Relationship with Wall Clock Time](#relationship-with-wall-clock-time)
-* [Relationship with Mutex Profiling](#relationship-with-mutex-profiling)
-* [Profiler Labels](#profiler-labels)
-* [pprof Output](#pprof-output)
-* [History](#history)
-* [Disclaimers](#disclaimers)
-
## Description
The block profile in Go lets you analyze how much time your program spends waiting on the blocking operations listed below:
@@ -145,64 +125,7 @@ The first call to `runtime.SetBlockProfileRate()` takes `100ms` because it tries
### Sampling Bias
-**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events.
-
-As described in the [Usage](#usage) section, the block profiler will sample as follows:
-
-- Events with `duration >= rate` will be sampled 100%
-- Events with `duration < rate` have a `duration / rate` chance of getting sampled.
-
-The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that:
-
-```go
-func blocksampled(cycles int64) bool {
- rate := int64(atomic.Load64(&blockprofilerate))
- if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
- return false
- }
- return true
-}
-```
-
-This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program.
-
-#### Simple Example
-
-Let's say your `blockprofilerate` is `100ns` and your application produces the following events:
-
-- `A`: `1` event with a duration of `100ns`.
-- `B`: `10` events with a duration of `10ns` each.
-
-Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true.
-
-#### Simulation & Proposal for Improvement
-
-For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ).
-
-
-
-So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`.
-
-
-
-So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events.
-
-But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this:
-
-```
-duration = duration * (rate/duration)
-# note: the expression above can be simplified to just `duration = rate`
-```
-
-Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it.
-
-
-
-So from my point of view it should be possible to eliminate this bias from future versions of Go and I'm planning to [work with the Go project](https://github.com/golang/go/issues/44192) on that.
-
-That being said, I'm not trained in statistics, so my analysis here might be hilariously misguided 🙃.
-
-🚧 It might also be possible to reduce the bias in block profiles after recording them. I've got a [proof of concept](https://github.com/felixge/go-debias-blockprofile) for this, but it's not clear yet if this will work well in practice.
+Up until Go 1.17 the block profiler was biased towards favoring infrequent long events over frequent short events. A [detailed analysis](./block-bias.md) explains the problem.
### Time Stamp Counter
diff --git a/goroutine.md b/goroutine.md
index 1ab23d4..db06827 100644
--- a/goroutine.md
+++ b/goroutine.md
@@ -1,3 +1,5 @@
+⬅ [Index of all go-profiler-notes](./README.md)
+
This document was last updated for `go1.15.6` but probably still applies to older/newer versions for the most parts.
# Goroutine Profiling in Go
diff --git a/pprof.md b/pprof.md
index 0658956..9726d21 100644
--- a/pprof.md
+++ b/pprof.md
@@ -1,3 +1,5 @@
+⬅ [Index of all go-profiler-notes](./README.md)
+
# Go's pprof tool & format
The various profilers built into Go are designed to work with the pprof visualization tool. [pprof](https://github.com/google/pprof) itself is an inofficial Google project that is designed to analyze profiling data from C++, Java and Go programs. The project defines a protocol buffer format that is used by all Go profilers and described in this document.