mirror of
https://github.com/DataDog/go-profiler-notes.git
synced 2026-06-20 16:45:26 +08:00
Move block bias page
This commit is contained in:
parent
6ead318698
commit
64737d5072
4 changed files with 73 additions and 82 deletions
64
block-bias.md
Normal file
64
block-bias.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
⬅ [Block Profiling in Go](./block.md)
|
||||
|
||||
⚠️This document describes a sampling bias issue I discovered while researching the block profiler for Go. I have since [landed a fix for it](https://go-review.googlesource.com/c/go/+/299991) that should appear in Go 1.17.
|
||||
|
||||
# Block Profiler Sampling Bias
|
||||
|
||||
**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events.
|
||||
|
||||
As described in the [Usage](#usage) section, the block profiler will sample as follows:
|
||||
|
||||
- Events with `duration >= rate` will be sampled 100%
|
||||
- Events with `duration < rate` have a `duration / rate` chance of getting sampled.
|
||||
|
||||
The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that:
|
||||
|
||||
```go
|
||||
func blocksampled(cycles int64) bool {
|
||||
rate := int64(atomic.Load64(&blockprofilerate))
|
||||
if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
```
|
||||
|
||||
This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program.
|
||||
|
||||
## Simple Example
|
||||
|
||||
Let's say your `blockprofilerate` is `100ns` and your application produces the following events:
|
||||
|
||||
- `A`: `1` event with a duration of `100ns`.
|
||||
- `B`: `10` events with a duration of `10ns` each.
|
||||
|
||||
Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true.
|
||||
|
||||
## Simulation & Proposal for Improvement
|
||||
|
||||
For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ).
|
||||
|
||||
<img src="/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/sim/block_sampling_hist.png" style="zoom: 80%;" />
|
||||
|
||||
So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`.
|
||||
|
||||
<img src="/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/sim/block_sampling_biased.png" style="zoom:80%;" />
|
||||
|
||||
So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events.
|
||||
|
||||
But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this:
|
||||
|
||||
```
|
||||
duration = duration * (rate/duration)
|
||||
# note: the expression above can be simplified to just `duration = rate`
|
||||
```
|
||||
|
||||
Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it.
|
||||
|
||||
<img src="/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/sim/block_sampling_debiased.png" alt="" style="zoom: 80%;" />
|
||||
|
||||
## Disclaimers
|
||||
|
||||
I'm [felixge](https://github.com/felixge) and work at [Datadog](https://www.datadoghq.com/) on [Continuous Profiling](https://www.datadoghq.com/product/code-profiling/) for Go. You should check it out. We're also [hiring](https://www.datadoghq.com/jobs-engineering/#all&all_locations) : ).
|
||||
|
||||
The information on this page is believed to be correct, but no warranty is provided. Feedback is welcome!
|
||||
87
block.md
87
block.md
|
|
@ -1,31 +1,11 @@
|
|||
⬅ [Index of all go-profiler-notes](./README.md)
|
||||
|
||||
[Description](#description) - [Usage](#usage) - [Overhead](#overhead) - [Accuracy](#accuracy) - [Relationship with Mutex Profiling](#relationship-with-mutex-profiling) - [Profiler Labels](#profiler-labels) - [History](#history)
|
||||
|
||||
This document was last updated for `go1.15.7` but probably still applies to older/newer versions for the most parts.
|
||||
|
||||
# Block Profiling in Go
|
||||
|
||||
<!-- https://ecotrust-canada.github.io/markdown-toc/ -->
|
||||
|
||||
- [Description](#description)
|
||||
|
||||
* [Usage](#usage)
|
||||
* [Overhead](#overhead)
|
||||
+ [Implementation Details](#implementation-details)
|
||||
+ [Benchmarks](#benchmarks)
|
||||
+ [Memory Usage](#memory-usage)
|
||||
+ [Initialization Time](#initialization-time)
|
||||
* [Accuracy](#accuracy)
|
||||
+ [Sampling Bias](#sampling-bias)
|
||||
- [Simple Example](#simple-example)
|
||||
- [Simulation & Proposal for Improvement](#simulation---proposal-for-improvement)
|
||||
+ [Time Stamp Counter](#time-stamp-counter)
|
||||
+ [Stack Depth](#stack-depth)
|
||||
+ [Spin Locks](#spin-locks)
|
||||
* [Relationship with Wall Clock Time](#relationship-with-wall-clock-time)
|
||||
* [Relationship with Mutex Profiling](#relationship-with-mutex-profiling)
|
||||
* [Profiler Labels](#profiler-labels)
|
||||
* [pprof Output](#pprof-output)
|
||||
* [History](#history)
|
||||
* [Disclaimers](#disclaimers)
|
||||
|
||||
## Description
|
||||
|
||||
The block profile in Go lets you analyze how much time your program spends waiting on the blocking operations listed below:
|
||||
|
|
@ -145,64 +125,7 @@ The first call to `runtime.SetBlockProfileRate()` takes `100ms` because it tries
|
|||
|
||||
### Sampling Bias
|
||||
|
||||
**tl;dr:** Setting your sampling `rate` too high will bias your results towards infrequent long events over frequent short events.
|
||||
|
||||
As described in the [Usage](#usage) section, the block profiler will sample as follows:
|
||||
|
||||
- Events with `duration >= rate` will be sampled 100%
|
||||
- Events with `duration < rate` have a `duration / rate` chance of getting sampled.
|
||||
|
||||
The [implementation](https://github.com/golang/go/blob/go1.15.7/src/runtime/mprof.go#L408) for this looks like that:
|
||||
|
||||
```go
|
||||
func blocksampled(cycles int64) bool {
|
||||
rate := int64(atomic.Load64(&blockprofilerate))
|
||||
if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
```
|
||||
|
||||
This means that if you set your profiling `rate` low enough, you'll get very accurate results. However, if your `rate` is higher than the `duration` of some of the events you are sampling, the sampling process will exhibit a bias favoring infrequent events of higher `duration` over frequent events with lower `duration` even so they may contribute to the same amount of overall block duration in your program.
|
||||
|
||||
#### Simple Example
|
||||
|
||||
Let's say your `blockprofilerate` is `100ns` and your application produces the following events:
|
||||
|
||||
- `A`: `1` event with a duration of `100ns`.
|
||||
- `B`: `10` events with a duration of `10ns` each.
|
||||
|
||||
Given this scenario, the `blockprofiler` is guaranteed to catch and accurately report event `A` as `100ns` in the profile. For event `B` the most likely outcome is that the profiler will capture only a single event (10% of 10 events) and report `B` as `10ns` in the profile. So you might find yourself in a situation where you think event `A` is causing 10x more blocking than event `B`, which is not true.
|
||||
|
||||
#### Simulation & Proposal for Improvement
|
||||
|
||||
For an even better intuition about this, consider the [simulated example](./sim/block_sampling.ipynb) below. Here we have a histogram of all durations collected from 3 types of blocking events. As you can see, they all have different mean durations (`1000ns`, `2000ns`, `3000ns`) and they are occurring at different frequencies, with `count(a) > count(b) > count(c)`. What's more difficult to see, is that the cumulative durations of these events are the same, i.e. `sum(a) = sum(b) = sum(c)`, but you can trust me on that : ).
|
||||
|
||||
<img src="./sim/block_sampling_hist.png" style="zoom: 80%;" />
|
||||
|
||||
So given that your application might produce events like this, how will they show up in your block profile as you try out different `blockprofilerate` values? As you can see below, all is well and fine until a `blockprofilerate` of `1000ns`. Each event shows up with the same total duration in the profile (the red and green dots are hidden below the blue ones). However starting at `1000ns` you see that event `a` starts to fade from our profile and at `2000ns` you'd already think that events `b` and `c` are causing twice as much blocking time as event `a`.
|
||||
|
||||
<img src="./sim/block_sampling_biased.png" style="zoom:80%;" />
|
||||
|
||||
So what can we do? Do we always need to live in fear of bias when working with block profiles? No! If the [Overhead](#overhead) for your workload allows it, the simplest solution is to use a low enough `blockprofilerate` in order to capture most blocking events.
|
||||
|
||||
But perhaps there is an even better way. I'm thinking we could correct for the current bias by keeping the same logic of sampling `duration / rate` fraction of events when `duration < rate`. However, when this happens we could simply multiply the sampled duration by `rate/duration` like this:
|
||||
|
||||
```
|
||||
duration = duration * (rate/duration)
|
||||
# note: the expression above can be simplified to just `duration = rate`
|
||||
```
|
||||
|
||||
Doing so could be done with a [trivial patch](https://github.com/felixge/go/compare/master...debias-blockprofile-rate) to the go runtime and the picture below shows the results from simulating it.
|
||||
|
||||
<img src="./sim/block_sampling_debiased.png" alt="" style="zoom: 80%;" />
|
||||
|
||||
So from my point of view it should be possible to eliminate this bias from future versions of Go and I'm planning to [work with the Go project](https://github.com/golang/go/issues/44192) on that.
|
||||
|
||||
That being said, I'm not trained in statistics, so my analysis here might be hilariously misguided 🙃.
|
||||
|
||||
🚧 It might also be possible to reduce the bias in block profiles after recording them. I've got a [proof of concept](https://github.com/felixge/go-debias-blockprofile) for this, but it's not clear yet if this will work well in practice.
|
||||
Up until Go 1.17 the block profiler was biased towards favoring infrequent long events over frequent short events. A [detailed analysis](./block-bias.md) explains the problem.
|
||||
|
||||
### Time Stamp Counter
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,5 @@
|
|||
⬅ [Index of all go-profiler-notes](./README.md)
|
||||
|
||||
This document was last updated for `go1.15.6` but probably still applies to older/newer versions for the most parts.
|
||||
|
||||
# Goroutine Profiling in Go
|
||||
|
|
|
|||
2
pprof.md
2
pprof.md
|
|
@ -1,3 +1,5 @@
|
|||
⬅ [Index of all go-profiler-notes](./README.md)
|
||||
|
||||
# Go's pprof tool & format
|
||||
|
||||
The various profilers built into Go are designed to work with the pprof visualization tool. [pprof](https://github.com/google/pprof) itself is an inofficial Google project that is designed to analyze profiling data from C++, Java and Go programs. The project defines a protocol buffer format that is used by all Go profilers and described in this document.
|
||||
|
|
|
|||
Loading…
Reference in a new issue