This commit is contained in:
Felix Geisendörfer 2022-12-29 14:02:00 +01:00
parent 18729a8f7c
commit 6342fc3259
29 changed files with 81 additions and 17 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 245 KiB

BIN
docs/images/heap-gc.1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 564 KiB

BIN
docs/images/heap-gc.2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 486 KiB

BIN
docs/images/heap-gc.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 624 KiB

BIN
docs/images/heap-simple.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 334 KiB

BIN
docs/images/heap.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 247 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 663 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 366 KiB

BIN
docs/images/scheduler.1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

BIN
docs/images/scheduler.2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

BIN
docs/images/scheduler.3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

BIN
docs/images/scheduler.4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

BIN
docs/images/scheduler.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 154 KiB

BIN
docs/images/stack.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 230 KiB

BIN
docs/images/stack1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

BIN
docs/images/stack2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

BIN
docs/images/stack3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

BIN
docs/images/time.1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 219 KiB

BIN
docs/images/time.2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 319 KiB

BIN
docs/images/time.3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

BIN
docs/images/timeline.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

View file

@ -1,15 +1,21 @@
go-profiler-notes
=================
Formerly known as "go-profiler-notes", this guide contains in-depth information
about Go profiling. In the future, additional topics such as runtime tracing,
metrics, heap debugging as well as userland metrics and distributed tracing will
be covered as well.
This project was started by `Felix Geisendörfer <https://felixge.de/>`_ after joining Datadog's `Continuous Profiler <https://docs.datadoghq.com/profiler/>`_ team in early 2021. Initially it was just a loose collection of markdown files with notes on Go profiling. But over time and thanks to a few `contributors <https://github.com/DataDog/go-profiler-notes/graphs/contributors>`_, it has developed into one of the most in-depth resources on Go profiling.
In the future, additional topics such as runtime tracing, metrics, heap debugging as well as userland metrics and distributed tracing will be covered as well.
Support this project by giving it a |:star:| on GitHub |ico1|
.. |ico1| image:: https://img.shields.io/github/stars/DataDog/go-profiler-notes?style=social
:alt: Github Stars
:target: https://github.com/DataDog/go-profiler-notes
.. toctree::
:hidden:
profiling/index
mental-model-for-go/index
.. toctree::
:maxdepth: 1

View file

@ -1,2 +1,52 @@
Goroutine Scheduler
===================
Lets talk about the scheduler first using the example below:
.. code:: go
func main() {
res, err := http.Get("https://example.org/")
if err != nil {
panic(err)
}
fmt.Printf("%d\n", res.StatusCode)
}
Here we have a single goroutine, lets call it ``G1``, that runs the
``main`` function. The picture below shows a simplified timeline of how
this goroutine might execute on a single CPU. Initially ``G1`` is
running on the CPU to prepare the http request. Then the CPU becomes
idle as the goroutine has to wait for the network. And finally it gets
scheduled onto the CPU again to print out the status code.
From the schedulers perspective, the program above executes like shown
below. At first ``G1`` is ``Executing`` on ``CPU 1``. Then the goroutine
is taken off the CPU while ``Waiting`` for the network. Once the
scheduler notices that the network has replied (using non-blocking I/O,
similar to Node.js), it marks the goroutine as ``Runnable``. And as soon
as a CPU core becomes available, the goroutine starts ``Executing``
again. In our case all cores are available, so ``G1`` can go back to
``Executing`` the ``fmt.Printf()`` function on one of the CPUs
immediately without spending any time in the ``Runnable`` state.
Most of the time, Go programs are running multiple goroutines, so you
will have a few goroutines ``Executing`` on some of the CPU cores, a
large number of goroutines ``Waiting`` for various reasons, and ideally
no goroutines ``Runnable`` unless your program exhibits very high CPU
load. An example of this can be seen below.
Of course the model above glosses over many details. In reality its
turtles all the way down, and the Go scheduler works on top of threads
managed by the operating system, and even CPUs themselves are capable of
hyper-threading which can be seen as a form of scheduling. So if youre
interested, feel free to continue down this rabbit hole via Ardan labs
series on `Scheduling in
Go <https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part1.html>`__
or similar material.
However, the model above should be sufficient to understand the
remainder of this guide. In particular it should become clear that the
time measured by the various Go profilers is essentially the time your
goroutines are spending in the ``Executing`` and ``Waiting`` states as
illustrated by the diagram below.

View file

@ -3,7 +3,6 @@ Mental Model for Go
.. toctree::
:maxdepth: 2
:caption: Contents:
goroutine-scheduler
garbage-collector

View file

@ -4,7 +4,7 @@ CPU Profiler
Go's CPU profiler can help you identify which parts of your code base consume a lot of CPU time.
.. note::
CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. Its also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please refer to the `Goroutine Scheduler <#goroutine-scheduler>`__ section.
CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. Its also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please read about the :doc:`/mental-model-for-go/goroutine-scheduler`.
API
---
@ -13,20 +13,20 @@ You can control the CPU profiler via various APIs:
#. Using the pprof `StartCPUProfile <https://pkg.go.dev/runtime/pprof#StartCPUProfile>`_ and `StopCPUProfile <https://pkg.go.dev/runtime/pprof#StopCPUProfile>`_ API.
.. literalinclude:: ./cpu-profiler-start-stop.go
.. literalinclude:: /code/cpu-profiler-start-stop.go
:language: go
:lines: 11-13
:dedent: 1
#. Passing the ``-cpuprofile`` option to ``go test``.
.. code:: bash
.. code-block:: bash
go test -cpuprofile cpu.pprof
#. Via the ``net/http/pprof`` package.
.. literalinclude:: ./cpu-profiler-net-http-pprof.go
.. literalinclude:: /code/cpu-profiler-net-http-pprof.go
:language: go
:lines: 3-
@ -38,7 +38,7 @@ You can control the CPU profiler via various APIs:
#. Control the sampling rate using `SetCPUProfileRate <https://pkg.go.dev/runtime#SetCPUProfileRate>`_.
.. code:: go
.. code-block:: go
runtime.SetCPUProfileRate(200) // 200 Hz, default is 100
@ -46,7 +46,7 @@ You can control the CPU profiler via various APIs:
#. Use `SetCgoTraceback <https://pkg.go.dev/runtime#SetCgoTraceback>`_ to get stack traces for cgo code.
.. code:: go
.. code-block:: go
import _ github.com/benesch/cgosymbolizer
@ -54,14 +54,14 @@ You can control the CPU profiler via various APIs:
#. Record the individual CPU profile samples into a `runtime/trace <https://pkg.go.dev/runtime/trace>`_ (since Go 1.19).
.. literalinclude:: ./cpu-profiler-trace.go
.. literalinclude:: /code/cpu-profiler-trace.go
:language: go
:lines: 15-20
:dedent: 1
See `CL 400795 <https://go-review.googlesource.com/c/go/+/400795>`_ for more information.
#. (Commercial) Use a 3rd party library such as `dd-trace-go <https://docs.datadoghq.com/profiler/enabling/go/>`__ for continuous profiling in production.
#. **Commercial:** Use a 3rd party library such as `dd-trace-go <https://docs.datadoghq.com/profiler/enabling/go/>`__ for continuous profiling in production.
Data
----
@ -103,16 +103,25 @@ The resulting profile will include a new label column and might look something l
========================= ========== ============= ===============
stack trace label samples/count cpu/nanoseconds
========================= ========== ============= ===============
main.backgroundWork user:bob 5 50000000
main.backgroundWork user:bob 4 50000000
main.backgroundWork user:alice 2 20000000
main.work;main.directWork user:bob 4 40000000
main.work;main.directWork user:alice 3 30000000
main.work;main.directWork user:alice 5 30000000
========================= ========== ============= ===============
Viewing the same profile with pprofs Graph view will also include the labels:
Viewing the same profile with pprofs Graph view will also include the labels as shown in :numref:`cpu-profiler-labels`.
.. figure:: /images/cpu-profiler-labels.png
:name: cpu-profiler-labels
:width: 400
:align: center
pprof labels shown in the Graph view.
How you use these labels is up to you. You might include things such as ``user ids``, ``request ids``, ``http endpoints``, ``subscription plan`` or other data that can allow you to get a better understanding of what types of requests are causing high CPU utilization, even when they are being processed by the same code paths. That being said, using labels will increase the size of your pprof files. So you should probably start with low cardinality labels such as endpoints before moving on to high cardinality labels once you feel confident that they dont impact the performance of your application.
**Commercial**: Datadog supports `Connecting Go Profiling With Tracing <https://felixge.de/2022/02/11/connecting-go-profiling-with-tracing/>`_ via pprof labels.
.. warning::
Go 1.17 and below contained several bugs that could cause some profiler labels to be missing from CPU profiles, see Limitations_ for more information.
@ -153,7 +162,7 @@ you might want to be aware of:
- `CL 369741 <https://go-review.googlesource.com/c/go/+/369741>`__: The first batch of samples in a CPU profile had an off-by-one error causing a misattribution of labels.
- `CL 369983 <https://go-review.googlesource.com/c/go/+/369983>`__: System goroutines created on behalf of user goroutines (e.g. for garbage collection) incorrectly inherited their parents labels.
- |:warning:| You can call SetCPUProfileRate_ to adjust the CPU profiler rate before calling StartCPUProfile_. This will print a warning saying ``runtime: cannot set cpu profile rate until previous profile has finished``. However, it still works. This issue was `initially raised here <https://github.com/golang/go/issues/40094>`__, and there is an `accepted proposal for improving the API <https://github.com/golang/go/issues/42502>`__.
- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently ```64`` <https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal>`__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.
- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently `64 <https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal>`__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.