updates
BIN
docs/images/cpu-profiler-labels.png
Normal file
|
After Width: | Height: | Size: 245 KiB |
BIN
docs/images/heap-gc.1.png
Normal file
|
After Width: | Height: | Size: 564 KiB |
BIN
docs/images/heap-gc.2.png
Normal file
|
After Width: | Height: | Size: 486 KiB |
BIN
docs/images/heap-gc.gif
Normal file
|
After Width: | Height: | Size: 624 KiB |
BIN
docs/images/heap-simple.png
Normal file
|
After Width: | Height: | Size: 334 KiB |
BIN
docs/images/heap.png
Normal file
|
After Width: | Height: | Size: 247 KiB |
BIN
docs/images/profiler-venn.png
Normal file
|
After Width: | Height: | Size: 663 KiB |
BIN
docs/images/scheduler-complete.png
Normal file
|
After Width: | Height: | Size: 366 KiB |
BIN
docs/images/scheduler.1.png
Normal file
|
After Width: | Height: | Size: 143 KiB |
BIN
docs/images/scheduler.2.png
Normal file
|
After Width: | Height: | Size: 144 KiB |
BIN
docs/images/scheduler.3.png
Normal file
|
After Width: | Height: | Size: 144 KiB |
BIN
docs/images/scheduler.4.png
Normal file
|
After Width: | Height: | Size: 143 KiB |
BIN
docs/images/scheduler.gif
Normal file
|
After Width: | Height: | Size: 154 KiB |
BIN
docs/images/stack.gif
Normal file
|
After Width: | Height: | Size: 230 KiB |
BIN
docs/images/stack1.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/images/stack2.png
Normal file
|
After Width: | Height: | Size: 155 KiB |
BIN
docs/images/stack3.png
Normal file
|
After Width: | Height: | Size: 162 KiB |
BIN
docs/images/time.1.png
Normal file
|
After Width: | Height: | Size: 219 KiB |
BIN
docs/images/time.2.png
Normal file
|
After Width: | Height: | Size: 319 KiB |
BIN
docs/images/time.3.png
Normal file
|
After Width: | Height: | Size: 1.2 MiB |
BIN
docs/images/timeline.png
Normal file
|
After Width: | Height: | Size: 206 KiB |
|
|
@ -1,15 +1,21 @@
|
|||
go-profiler-notes
|
||||
=================
|
||||
|
||||
Formerly known as "go-profiler-notes", this guide contains in-depth information
|
||||
about Go profiling. In the future, additional topics such as runtime tracing,
|
||||
metrics, heap debugging as well as userland metrics and distributed tracing will
|
||||
be covered as well.
|
||||
This project was started by `Felix Geisendörfer <https://felixge.de/>`_ after joining Datadog's `Continuous Profiler <https://docs.datadoghq.com/profiler/>`_ team in early 2021. Initially it was just a loose collection of markdown files with notes on Go profiling. But over time and thanks to a few `contributors <https://github.com/DataDog/go-profiler-notes/graphs/contributors>`_, it has developed into one of the most in-depth resources on Go profiling.
|
||||
|
||||
In the future, additional topics such as runtime tracing, metrics, heap debugging as well as userland metrics and distributed tracing will be covered as well.
|
||||
|
||||
Support this project by giving it a |:star:| on GitHub |ico1|
|
||||
|
||||
.. |ico1| image:: https://img.shields.io/github/stars/DataDog/go-profiler-notes?style=social
|
||||
:alt: Github Stars
|
||||
:target: https://github.com/DataDog/go-profiler-notes
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
profiling/index
|
||||
mental-model-for-go/index
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
|
|
|||
|
|
@ -1,2 +1,52 @@
|
|||
Goroutine Scheduler
|
||||
===================
|
||||
|
||||
Let’s talk about the scheduler first using the example below:
|
||||
|
||||
.. code:: go
|
||||
|
||||
func main() {
|
||||
res, err := http.Get("https://example.org/")
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
fmt.Printf("%d\n", res.StatusCode)
|
||||
}
|
||||
|
||||
Here we have a single goroutine, let’s call it ``G1``, that runs the
|
||||
``main`` function. The picture below shows a simplified timeline of how
|
||||
this goroutine might execute on a single CPU. Initially ``G1`` is
|
||||
running on the CPU to prepare the http request. Then the CPU becomes
|
||||
idle as the goroutine has to wait for the network. And finally it gets
|
||||
scheduled onto the CPU again to print out the status code.
|
||||
|
||||
From the scheduler’s perspective, the program above executes like shown
|
||||
below. At first ``G1`` is ``Executing`` on ``CPU 1``. Then the goroutine
|
||||
is taken off the CPU while ``Waiting`` for the network. Once the
|
||||
scheduler notices that the network has replied (using non-blocking I/O,
|
||||
similar to Node.js), it marks the goroutine as ``Runnable``. And as soon
|
||||
as a CPU core becomes available, the goroutine starts ``Executing``
|
||||
again. In our case all cores are available, so ``G1`` can go back to
|
||||
``Executing`` the ``fmt.Printf()`` function on one of the CPUs
|
||||
immediately without spending any time in the ``Runnable`` state.
|
||||
|
||||
Most of the time, Go programs are running multiple goroutines, so you
|
||||
will have a few goroutines ``Executing`` on some of the CPU cores, a
|
||||
large number of goroutines ``Waiting`` for various reasons, and ideally
|
||||
no goroutines ``Runnable`` unless your program exhibits very high CPU
|
||||
load. An example of this can be seen below.
|
||||
|
||||
Of course the model above glosses over many details. In reality it’s
|
||||
turtles all the way down, and the Go scheduler works on top of threads
|
||||
managed by the operating system, and even CPUs themselves are capable of
|
||||
hyper-threading which can be seen as a form of scheduling. So if you’re
|
||||
interested, feel free to continue down this rabbit hole via Ardan labs
|
||||
series on `Scheduling in
|
||||
Go <https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part1.html>`__
|
||||
or similar material.
|
||||
|
||||
However, the model above should be sufficient to understand the
|
||||
remainder of this guide. In particular it should become clear that the
|
||||
time measured by the various Go profilers is essentially the time your
|
||||
goroutines are spending in the ``Executing`` and ``Waiting`` states as
|
||||
illustrated by the diagram below.
|
||||
|
|
|
|||
|
|
@ -3,7 +3,6 @@ Mental Model for Go
|
|||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
goroutine-scheduler
|
||||
garbage-collector
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ CPU Profiler
|
|||
Go's CPU profiler can help you identify which parts of your code base consume a lot of CPU time.
|
||||
|
||||
.. note::
|
||||
CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. It’s also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please refer to the `Goroutine Scheduler <#goroutine-scheduler>`__ section.
|
||||
CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. It’s also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please read about the :doc:`/mental-model-for-go/goroutine-scheduler`.
|
||||
|
||||
API
|
||||
---
|
||||
|
|
@ -13,20 +13,20 @@ You can control the CPU profiler via various APIs:
|
|||
|
||||
#. Using the pprof `StartCPUProfile <https://pkg.go.dev/runtime/pprof#StartCPUProfile>`_ and `StopCPUProfile <https://pkg.go.dev/runtime/pprof#StopCPUProfile>`_ API.
|
||||
|
||||
.. literalinclude:: ./cpu-profiler-start-stop.go
|
||||
.. literalinclude:: /code/cpu-profiler-start-stop.go
|
||||
:language: go
|
||||
:lines: 11-13
|
||||
:dedent: 1
|
||||
|
||||
#. Passing the ``-cpuprofile`` option to ``go test``.
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: bash
|
||||
|
||||
go test -cpuprofile cpu.pprof
|
||||
|
||||
#. Via the ``net/http/pprof`` package.
|
||||
|
||||
.. literalinclude:: ./cpu-profiler-net-http-pprof.go
|
||||
.. literalinclude:: /code/cpu-profiler-net-http-pprof.go
|
||||
:language: go
|
||||
:lines: 3-
|
||||
|
||||
|
|
@ -38,7 +38,7 @@ You can control the CPU profiler via various APIs:
|
|||
|
||||
#. Control the sampling rate using `SetCPUProfileRate <https://pkg.go.dev/runtime#SetCPUProfileRate>`_.
|
||||
|
||||
.. code:: go
|
||||
.. code-block:: go
|
||||
|
||||
runtime.SetCPUProfileRate(200) // 200 Hz, default is 100
|
||||
|
||||
|
|
@ -46,7 +46,7 @@ You can control the CPU profiler via various APIs:
|
|||
|
||||
#. Use `SetCgoTraceback <https://pkg.go.dev/runtime#SetCgoTraceback>`_ to get stack traces for cgo code.
|
||||
|
||||
.. code:: go
|
||||
.. code-block:: go
|
||||
|
||||
import _ github.com/benesch/cgosymbolizer
|
||||
|
||||
|
|
@ -54,14 +54,14 @@ You can control the CPU profiler via various APIs:
|
|||
|
||||
#. Record the individual CPU profile samples into a `runtime/trace <https://pkg.go.dev/runtime/trace>`_ (since Go 1.19).
|
||||
|
||||
.. literalinclude:: ./cpu-profiler-trace.go
|
||||
.. literalinclude:: /code/cpu-profiler-trace.go
|
||||
:language: go
|
||||
:lines: 15-20
|
||||
:dedent: 1
|
||||
|
||||
See `CL 400795 <https://go-review.googlesource.com/c/go/+/400795>`_ for more information.
|
||||
|
||||
#. (Commercial) Use a 3rd party library such as `dd-trace-go <https://docs.datadoghq.com/profiler/enabling/go/>`__ for continuous profiling in production.
|
||||
#. **Commercial:** Use a 3rd party library such as `dd-trace-go <https://docs.datadoghq.com/profiler/enabling/go/>`__ for continuous profiling in production.
|
||||
|
||||
Data
|
||||
----
|
||||
|
|
@ -103,16 +103,25 @@ The resulting profile will include a new label column and might look something l
|
|||
========================= ========== ============= ===============
|
||||
stack trace label samples/count cpu/nanoseconds
|
||||
========================= ========== ============= ===============
|
||||
main.backgroundWork user:bob 5 50000000
|
||||
main.backgroundWork user:bob 4 50000000
|
||||
main.backgroundWork user:alice 2 20000000
|
||||
main.work;main.directWork user:bob 4 40000000
|
||||
main.work;main.directWork user:alice 3 30000000
|
||||
main.work;main.directWork user:alice 5 30000000
|
||||
========================= ========== ============= ===============
|
||||
|
||||
Viewing the same profile with pprof’s Graph view will also include the labels:
|
||||
Viewing the same profile with pprof’s Graph view will also include the labels as shown in :numref:`cpu-profiler-labels`.
|
||||
|
||||
.. figure:: /images/cpu-profiler-labels.png
|
||||
:name: cpu-profiler-labels
|
||||
:width: 400
|
||||
:align: center
|
||||
|
||||
pprof labels shown in the Graph view.
|
||||
|
||||
How you use these labels is up to you. You might include things such as ``user ids``, ``request ids``, ``http endpoints``, ``subscription plan`` or other data that can allow you to get a better understanding of what types of requests are causing high CPU utilization, even when they are being processed by the same code paths. That being said, using labels will increase the size of your pprof files. So you should probably start with low cardinality labels such as endpoints before moving on to high cardinality labels once you feel confident that they don’t impact the performance of your application.
|
||||
|
||||
**Commercial**: Datadog supports `Connecting Go Profiling With Tracing <https://felixge.de/2022/02/11/connecting-go-profiling-with-tracing/>`_ via pprof labels.
|
||||
|
||||
.. warning::
|
||||
Go 1.17 and below contained several bugs that could cause some profiler labels to be missing from CPU profiles, see Limitations_ for more information.
|
||||
|
||||
|
|
@ -153,7 +162,7 @@ you might want to be aware of:
|
|||
- `CL 369741 <https://go-review.googlesource.com/c/go/+/369741>`__: The first batch of samples in a CPU profile had an off-by-one error causing a misattribution of labels.
|
||||
- `CL 369983 <https://go-review.googlesource.com/c/go/+/369983>`__: System goroutines created on behalf of user goroutines (e.g. for garbage collection) incorrectly inherited their parents labels.
|
||||
- |:warning:| You can call SetCPUProfileRate_ to adjust the CPU profiler rate before calling StartCPUProfile_. This will print a warning saying ``runtime: cannot set cpu profile rate until previous profile has finished``. However, it still works. This issue was `initially raised here <https://github.com/golang/go/issues/40094>`__, and there is an `accepted proposal for improving the API <https://github.com/golang/go/issues/42502>`__.
|
||||
- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently ```64`` <https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal>`__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.
|
||||
- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently `64 <https://sourcegraph.com/search?q=context:global+repo:github.com/golang/go+file:src/*+maxCPUProfStack+%3D&patternType=literal>`__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.
|
||||
|
||||
|
||||
|
||||
|
|
|
|||