diff --git a/docs/profiling/cpu-profiler-net-http-pprof.go b/docs/code/cpu-profiler-net-http-pprof.go similarity index 100% rename from docs/profiling/cpu-profiler-net-http-pprof.go rename to docs/code/cpu-profiler-net-http-pprof.go diff --git a/docs/profiling/cpu-profiler-start-stop.go b/docs/code/cpu-profiler-start-stop.go similarity index 100% rename from docs/profiling/cpu-profiler-start-stop.go rename to docs/code/cpu-profiler-start-stop.go diff --git a/docs/profiling/cpu-profiler-trace.go b/docs/code/cpu-profiler-trace.go similarity index 100% rename from docs/profiling/cpu-profiler-trace.go rename to docs/code/cpu-profiler-trace.go diff --git a/docs/profiling/memory-profiler-gc.go b/docs/code/memory-profiler-gc.go similarity index 100% rename from docs/profiling/memory-profiler-gc.go rename to docs/code/memory-profiler-gc.go diff --git a/docs/images/cpu-profiler-labels.png b/docs/images/cpu-profiler-labels.png new file mode 100644 index 0000000..afea348 Binary files /dev/null and b/docs/images/cpu-profiler-labels.png differ diff --git a/docs/images/heap-gc.1.png b/docs/images/heap-gc.1.png new file mode 100644 index 0000000..5f48a05 Binary files /dev/null and b/docs/images/heap-gc.1.png differ diff --git a/docs/images/heap-gc.2.png b/docs/images/heap-gc.2.png new file mode 100644 index 0000000..6d08163 Binary files /dev/null and b/docs/images/heap-gc.2.png differ diff --git a/docs/images/heap-gc.gif b/docs/images/heap-gc.gif new file mode 100644 index 0000000..369601d Binary files /dev/null and b/docs/images/heap-gc.gif differ diff --git a/docs/images/heap-simple.png b/docs/images/heap-simple.png new file mode 100644 index 0000000..ea39359 Binary files /dev/null and b/docs/images/heap-simple.png differ diff --git a/docs/images/heap.png b/docs/images/heap.png new file mode 100644 index 0000000..38c62b5 Binary files /dev/null and b/docs/images/heap.png differ diff --git a/docs/images/profiler-venn.png b/docs/images/profiler-venn.png new file mode 100644 index 0000000..e1c9de7 Binary files /dev/null and b/docs/images/profiler-venn.png differ diff --git a/docs/images/scheduler-complete.png b/docs/images/scheduler-complete.png new file mode 100644 index 0000000..0eaaa4e Binary files /dev/null and b/docs/images/scheduler-complete.png differ diff --git a/docs/images/scheduler.1.png b/docs/images/scheduler.1.png new file mode 100644 index 0000000..8ee465c Binary files /dev/null and b/docs/images/scheduler.1.png differ diff --git a/docs/images/scheduler.2.png b/docs/images/scheduler.2.png new file mode 100644 index 0000000..d7af8c4 Binary files /dev/null and b/docs/images/scheduler.2.png differ diff --git a/docs/images/scheduler.3.png b/docs/images/scheduler.3.png new file mode 100644 index 0000000..496012e Binary files /dev/null and b/docs/images/scheduler.3.png differ diff --git a/docs/images/scheduler.4.png b/docs/images/scheduler.4.png new file mode 100644 index 0000000..1abfe0f Binary files /dev/null and b/docs/images/scheduler.4.png differ diff --git a/docs/images/scheduler.gif b/docs/images/scheduler.gif new file mode 100644 index 0000000..ca569ad Binary files /dev/null and b/docs/images/scheduler.gif differ diff --git a/docs/images/stack.gif b/docs/images/stack.gif new file mode 100644 index 0000000..4268352 Binary files /dev/null and b/docs/images/stack.gif differ diff --git a/docs/images/stack1.png b/docs/images/stack1.png new file mode 100644 index 0000000..fc735e7 Binary files /dev/null and b/docs/images/stack1.png differ diff --git a/docs/images/stack2.png b/docs/images/stack2.png new file mode 100644 index 0000000..c7f1bac Binary files /dev/null and b/docs/images/stack2.png differ diff --git a/docs/images/stack3.png b/docs/images/stack3.png new file mode 100644 index 0000000..2a02097 Binary files /dev/null and b/docs/images/stack3.png differ diff --git a/docs/images/time.1.png b/docs/images/time.1.png new file mode 100644 index 0000000..6a2e8f4 Binary files /dev/null and b/docs/images/time.1.png differ diff --git a/docs/images/time.2.png b/docs/images/time.2.png new file mode 100644 index 0000000..f8aae39 Binary files /dev/null and b/docs/images/time.2.png differ diff --git a/docs/images/time.3.png b/docs/images/time.3.png new file mode 100644 index 0000000..e175da7 Binary files /dev/null and b/docs/images/time.3.png differ diff --git a/docs/images/timeline.png b/docs/images/timeline.png new file mode 100644 index 0000000..43b15cd Binary files /dev/null and b/docs/images/timeline.png differ diff --git a/docs/index.rst b/docs/index.rst index 97d8928..1b5a430 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,15 +1,21 @@ go-profiler-notes ================= -Formerly known as "go-profiler-notes", this guide contains in-depth information -about Go profiling. In the future, additional topics such as runtime tracing, -metrics, heap debugging as well as userland metrics and distributed tracing will -be covered as well. +This project was started by `Felix Geisendörfer `_ after joining Datadog's `Continuous Profiler `_ team in early 2021. Initially it was just a loose collection of markdown files with notes on Go profiling. But over time and thanks to a few `contributors `_, it has developed into one of the most in-depth resources on Go profiling. + +In the future, additional topics such as runtime tracing, metrics, heap debugging as well as userland metrics and distributed tracing will be covered as well. + +Support this project by giving it a |:star:| on GitHub |ico1| + +.. |ico1| image:: https://img.shields.io/github/stars/DataDog/go-profiler-notes?style=social + :alt: Github Stars + :target: https://github.com/DataDog/go-profiler-notes .. toctree:: :hidden: profiling/index + mental-model-for-go/index .. toctree:: :maxdepth: 1 diff --git a/docs/mental-model-for-go/goroutine-scheduler.rst b/docs/mental-model-for-go/goroutine-scheduler.rst index 757f7fa..4fcb86c 100644 --- a/docs/mental-model-for-go/goroutine-scheduler.rst +++ b/docs/mental-model-for-go/goroutine-scheduler.rst @@ -1,2 +1,52 @@ Goroutine Scheduler =================== + +Let’s talk about the scheduler first using the example below: + +.. code:: go + + func main() { + res, err := http.Get("https://example.org/") + if err != nil { + panic(err) + } + fmt.Printf("%d\n", res.StatusCode) + } + +Here we have a single goroutine, let’s call it ``G1``, that runs the +``main`` function. The picture below shows a simplified timeline of how +this goroutine might execute on a single CPU. Initially ``G1`` is +running on the CPU to prepare the http request. Then the CPU becomes +idle as the goroutine has to wait for the network. And finally it gets +scheduled onto the CPU again to print out the status code. + +From the scheduler’s perspective, the program above executes like shown +below. At first ``G1`` is ``Executing`` on ``CPU 1``. Then the goroutine +is taken off the CPU while ``Waiting`` for the network. Once the +scheduler notices that the network has replied (using non-blocking I/O, +similar to Node.js), it marks the goroutine as ``Runnable``. And as soon +as a CPU core becomes available, the goroutine starts ``Executing`` +again. In our case all cores are available, so ``G1`` can go back to +``Executing`` the ``fmt.Printf()`` function on one of the CPUs +immediately without spending any time in the ``Runnable`` state. + +Most of the time, Go programs are running multiple goroutines, so you +will have a few goroutines ``Executing`` on some of the CPU cores, a +large number of goroutines ``Waiting`` for various reasons, and ideally +no goroutines ``Runnable`` unless your program exhibits very high CPU +load. An example of this can be seen below. + +Of course the model above glosses over many details. In reality it’s +turtles all the way down, and the Go scheduler works on top of threads +managed by the operating system, and even CPUs themselves are capable of +hyper-threading which can be seen as a form of scheduling. So if you’re +interested, feel free to continue down this rabbit hole via Ardan labs +series on `Scheduling in +Go `__ +or similar material. + +However, the model above should be sufficient to understand the +remainder of this guide. In particular it should become clear that the +time measured by the various Go profilers is essentially the time your +goroutines are spending in the ``Executing`` and ``Waiting`` states as +illustrated by the diagram below. diff --git a/docs/mental-model-for-go/index.rst b/docs/mental-model-for-go/index.rst index ab1d2d7..5454686 100644 --- a/docs/mental-model-for-go/index.rst +++ b/docs/mental-model-for-go/index.rst @@ -3,7 +3,6 @@ Mental Model for Go .. toctree:: :maxdepth: 2 - :caption: Contents: goroutine-scheduler garbage-collector diff --git a/docs/profiling/cpu-profiler.rst b/docs/profiling/cpu-profiler.rst index 02ff55e..0f462c8 100644 --- a/docs/profiling/cpu-profiler.rst +++ b/docs/profiling/cpu-profiler.rst @@ -4,7 +4,7 @@ CPU Profiler Go's CPU profiler can help you identify which parts of your code base consume a lot of CPU time. .. note:: - CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. It’s also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please refer to the `Goroutine Scheduler <#goroutine-scheduler>`__ section. + CPU time is different from the real time experienced by your users (aka latency). For example a typical http request might take ``100ms`` to complete, spending ``5ms`` of time On-CPU while waiting for ``95ms`` on a database. It’s also possible for a request to take ``100ms``, but spend ``200ms`` of CPU if two goroutines are performing CPU intensive work in parallel. If this is confusing to you, please read about the :doc:`/mental-model-for-go/goroutine-scheduler`. API --- @@ -13,20 +13,20 @@ You can control the CPU profiler via various APIs: #. Using the pprof `StartCPUProfile `_ and `StopCPUProfile `_ API. - .. literalinclude:: ./cpu-profiler-start-stop.go + .. literalinclude:: /code/cpu-profiler-start-stop.go :language: go :lines: 11-13 :dedent: 1 #. Passing the ``-cpuprofile`` option to ``go test``. - .. code:: bash + .. code-block:: bash go test -cpuprofile cpu.pprof #. Via the ``net/http/pprof`` package. - .. literalinclude:: ./cpu-profiler-net-http-pprof.go + .. literalinclude:: /code/cpu-profiler-net-http-pprof.go :language: go :lines: 3- @@ -38,7 +38,7 @@ You can control the CPU profiler via various APIs: #. Control the sampling rate using `SetCPUProfileRate `_. - .. code:: go + .. code-block:: go runtime.SetCPUProfileRate(200) // 200 Hz, default is 100 @@ -46,7 +46,7 @@ You can control the CPU profiler via various APIs: #. Use `SetCgoTraceback `_ to get stack traces for cgo code. - .. code:: go + .. code-block:: go import _ github.com/benesch/cgosymbolizer @@ -54,14 +54,14 @@ You can control the CPU profiler via various APIs: #. Record the individual CPU profile samples into a `runtime/trace `_ (since Go 1.19). - .. literalinclude:: ./cpu-profiler-trace.go + .. literalinclude:: /code/cpu-profiler-trace.go :language: go :lines: 15-20 :dedent: 1 See `CL 400795 `_ for more information. -#. (Commercial) Use a 3rd party library such as `dd-trace-go `__ for continuous profiling in production. +#. **Commercial:** Use a 3rd party library such as `dd-trace-go `__ for continuous profiling in production. Data ---- @@ -103,16 +103,25 @@ The resulting profile will include a new label column and might look something l ========================= ========== ============= =============== stack trace label samples/count cpu/nanoseconds ========================= ========== ============= =============== -main.backgroundWork user:bob 5 50000000 +main.backgroundWork user:bob 4 50000000 main.backgroundWork user:alice 2 20000000 main.work;main.directWork user:bob 4 40000000 -main.work;main.directWork user:alice 3 30000000 +main.work;main.directWork user:alice 5 30000000 ========================= ========== ============= =============== -Viewing the same profile with pprof’s Graph view will also include the labels: +Viewing the same profile with pprof’s Graph view will also include the labels as shown in :numref:`cpu-profiler-labels`. + +.. figure:: /images/cpu-profiler-labels.png + :name: cpu-profiler-labels + :width: 400 + :align: center + + pprof labels shown in the Graph view. How you use these labels is up to you. You might include things such as ``user ids``, ``request ids``, ``http endpoints``, ``subscription plan`` or other data that can allow you to get a better understanding of what types of requests are causing high CPU utilization, even when they are being processed by the same code paths. That being said, using labels will increase the size of your pprof files. So you should probably start with low cardinality labels such as endpoints before moving on to high cardinality labels once you feel confident that they don’t impact the performance of your application. +**Commercial**: Datadog supports `Connecting Go Profiling With Tracing `_ via pprof labels. + .. warning:: Go 1.17 and below contained several bugs that could cause some profiler labels to be missing from CPU profiles, see Limitations_ for more information. @@ -153,7 +162,7 @@ you might want to be aware of: - `CL 369741 `__: The first batch of samples in a CPU profile had an off-by-one error causing a misattribution of labels. - `CL 369983 `__: System goroutines created on behalf of user goroutines (e.g. for garbage collection) incorrectly inherited their parents labels. - |:warning:| You can call SetCPUProfileRate_ to adjust the CPU profiler rate before calling StartCPUProfile_. This will print a warning saying ``runtime: cannot set cpu profile rate until previous profile has finished``. However, it still works. This issue was `initially raised here `__, and there is an `accepted proposal for improving the API `__. -- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently ```64`` `__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken. +- |:warning:| The maximum number of nested function calls that can be captured in stack traces by the CPU profiler is currently `64 `__. If your program is using a lot of recursion or other patterns that lead to deep stack depths, your CPU profile will include stack traces that are truncated. This means you will miss parts of the call chain that led to the function that was active at the time the sample was taken.