diff --git a/block-bias.md b/block-bias.md new file mode 100644 index 0000000..c1a59c0 --- /dev/null +++ b/block-bias.md @@ -0,0 +1,64 @@ +⬅ [Block Profiling in Go](./block.md) + +⚠️This document describes a sampling bias issue I discovered while researching the block profiler for Go. I have since [landed a fix for it](https://go-review.googlesource.com/c/go/+/299991) that should appear in Go 1.17. + +# Block Profiler Sampling Bias + +**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events. + +As described in the [Usage](#usage) section, the block profiler will sample as follows: + +- Events with `duration >= rate` will be sampled 100% +- Events with `duration < rate` have a `duration / rate` chance of getting sampled. + +The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that: + +```go +func blocksampled(cycles int64) bool { + rate := int64(atomic.Load64(&blockprofilerate)) + if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) { + return false + } + return true +} +``` + +This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program. + +## Simple Example + +Let's say your `blockprofilerate` is `100ns` and your application produces the following events: + +- `A`: `1` event with a duration of `100ns`. +- `B`: `10` events with a duration of `10ns` each. + +Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true. + +## Simulation & Proposal for Improvement + +For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ). + + + +So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`. + + + +So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events. + +But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this: + +``` +duration = duration * (rate/duration) +# note: the expression above can be simplified to just `duration = rate` +``` + +Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it. + + + +## Disclaimers + +I'm [felixge](https://github.com/felixge) and work at [Datadog](https://www.datadoghq.com/) on [Continuous Profiling](https://www.datadoghq.com/product/code-profiling/) for Go. You should check it out. We're also [hiring](https://www.datadoghq.com/jobs-engineering/#all&all_locations) : ). + +The information on this page is believed to be correct, but no warranty is provided. Feedback is welcome! \ No newline at end of file diff --git a/block.md b/block.md index 6f12707..8ea7e04 100644 --- a/block.md +++ b/block.md @@ -1,31 +1,11 @@ +⬅ [Index of all go-profiler-notes](./README.md) + +[Description](#description) - [Usage](#usage) - [Overhead](#overhead) - [Accuracy](#accuracy) - [Relationship with Mutex Profiling](#relationship-with-mutex-profiling) - [Profiler Labels](#profiler-labels) - [History](#history) + This document was last updated for `go1.15.7` but probably still applies to older/newer versions for the most parts. # Block Profiling in Go - - -- [Description](#description) - -* [Usage](#usage) -* [Overhead](#overhead) - + [Implementation Details](#implementation-details) - + [Benchmarks](#benchmarks) - + [Memory Usage](#memory-usage) - + [Initialization Time](#initialization-time) -* [Accuracy](#accuracy) - + [Sampling Bias](#sampling-bias) - - [Simple Example](#simple-example) - - [Simulation & Proposal for Improvement](#simulation---proposal-for-improvement) - + [Time Stamp Counter](#time-stamp-counter) - + [Stack Depth](#stack-depth) - + [Spin Locks](#spin-locks) -* [Relationship with Wall Clock Time](#relationship-with-wall-clock-time) -* [Relationship with Mutex Profiling](#relationship-with-mutex-profiling) -* [Profiler Labels](#profiler-labels) -* [pprof Output](#pprof-output) -* [History](#history) -* [Disclaimers](#disclaimers) - ## Description The block profile in Go lets you analyze how much time your program spends waiting on the blocking operations listed below: @@ -145,64 +125,7 @@ The first call to `runtime.SetBlockProfileRate()` takes `100ms` because it tries ### Sampling Bias -**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events. - -As described in the [Usage](#usage) section, the block profiler will sample as follows: - -- Events with `duration >= rate` will be sampled 100% -- Events with `duration < rate` have a `duration / rate` chance of getting sampled. - -The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that: - -```go -func blocksampled(cycles int64) bool { - rate := int64(atomic.Load64(&blockprofilerate)) - if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) { - return false - } - return true -} -``` - -This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program. - -#### Simple Example - -Let's say your `blockprofilerate` is `100ns` and your application produces the following events: - -- `A`: `1` event with a duration of `100ns`. -- `B`: `10` events with a duration of `10ns` each. - -Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true. - -#### Simulation & Proposal for Improvement - -For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ). - - - -So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`. - - - -So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events. - -But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this: - -``` -duration = duration * (rate/duration) -# note: the expression above can be simplified to just `duration = rate` -``` - -Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it. - - - -So from my point of view it should be possible to eliminate this bias from future versions of Go and I'm planning to [work with the Go project](https://github.com/golang/go/issues/44192) on that. - -That being said, I'm not trained in statistics, so my analysis here might be hilariously misguided 🙃. - -🚧 It might also be possible to reduce the bias in block profiles after recording them. I've got a [proof of concept](https://github.com/felixge/go-debias-blockprofile) for this, but it's not clear yet if this will work well in practice. +Up until Go 1.17 the block profiler was biased towards favoring infrequent long events over frequent short events. A [detailed analysis](./block-bias.md) explains the problem. ### Time Stamp Counter diff --git a/goroutine.md b/goroutine.md index 1ab23d4..db06827 100644 --- a/goroutine.md +++ b/goroutine.md @@ -1,3 +1,5 @@ +⬅ [Index of all go-profiler-notes](./README.md) + This document was last updated for `go1.15.6` but probably still applies to older/newer versions for the most parts. # Goroutine Profiling in Go diff --git a/pprof.md b/pprof.md index 0658956..9726d21 100644 --- a/pprof.md +++ b/pprof.md @@ -1,3 +1,5 @@ +⬅ [Index of all go-profiler-notes](./README.md) + # Go's pprof tool & format The various profilers built into Go are designed to work with the pprof visualization tool. [pprof](https://github.com/google/pprof) itself is an inofficial Google project that is designed to analyze profiling data from C++, Java and Go programs. The project defines a protocol buffer format that is used by all Go profilers and described in this document.