diff --git a/content/en/ch1.md b/content/en/ch1.md index 66ec388..638839f 100644 --- a/content/en/ch1.md +++ b/content/en/ch1.md @@ -9,7 +9,7 @@ breadcrumbs: false > > [Thomas Sowell](https://www.youtube.com/watch?v=2YUtKr8-_Fg), Interview with Fred Barnes (2005) -> [!TIP] A Note for Early Release Readers +> [!TIP] A NOTE FOR EARLY RELEASE READERS > With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. > > This will be the 1st chapter of the final book. The GitHub repo for this book is https://github.com/ept/ddia2-feedback. @@ -84,7 +84,7 @@ concepts, and explores their trade-offs: Moreover, this chapter will provide you with terminology that we will need for the rest of the book. -> [!TIP] Terminology: Frontends and Backends +> [!TIP] TERMINOLOGY: FRONTENDS AND BACKENDS Much of what we will discuss in this book relates to *backend development*. To explain that term: for web applications, the client-side code (which runs in a web browser) is called the *frontend*, diff --git a/content/en/ch10.md b/content/en/ch10.md index 1e9a2af..1bf67bd 100644 --- a/content/en/ch10.md +++ b/content/en/ch10.md @@ -1465,7 +1465,7 @@ consensus. -------- -> [!TIP] Managing configuration with coordination services +> [!TIP] MANAGING CONFIGURATION WITH COORDINATION SERVICES Applications and infrastructure often have configuration parameters such as timeouts, thread pool sizes, and so on. Coordination services are sometimes used to store such configuration data, @@ -1605,8 +1605,7 @@ Consensus algorithms are carefully designed to ensure that no committed writes a failover, and that the system cannot get into a split brain state in which multiple nodes are accepting writes. This requires that every write, and every linearizable read, is confirmed by a quorum (typically a majority) of nodes. This can be expensive, especially across geographic regions, -but it is unavoidable if you want the strong consistency and fault tolerance that consensus -provides. +but it is unavoidable if you want the strong consistency and fault tolerance that consensus provides. Coordination services like ZooKeeper and etcd are also built on top of consensus algorithms. They provide locks, leases, failure detection, and change notification features that are useful for diff --git a/content/en/ch2.md b/content/en/ch2.md index fdbb4b0..eb4f9db 100644 --- a/content/en/ch2.md +++ b/content/en/ch2.md @@ -173,7 +173,7 @@ handle, queueing delays increase sharply. -------- -> [!TIP] When an overloaded system won’t recover +> [!TIP] WHEN AN OVERLOADED SYSTEM WON'T RECOVER If a system is close to overload, with throughput pushed close to the limit, it can sometimes enter a vicious cycle where it becomes less efficient and hence even more overloaded. For example, if there @@ -282,7 +282,7 @@ control, and the benefits are diminishing. -------- -> [!TIP] The user impact of response times +> [!TIP] THE USER IMPACT OF RESPONSE TIMES It seems intuitively obvious that a fast service is better for users than a slow service [^20]. However, it is surprisingly difficult to get hold of reliable data to quantify the effect that @@ -331,7 +331,7 @@ practice, defining good availability metrics for SLOs and SLAs is not straightfo -------- -> [!TIP] Computing percentiles +> [!TIP] COMPUTING PERCENTILES If you want to add response time percentiles to the monitoring dashboards for your services, you need to efficiently calculate them on an ongoing basis. For example, you may want to keep a rolling @@ -557,7 +557,7 @@ work with it every day, and take steps to improve it based on this feedback [^71 -------- -> [!TIP] How Important Is Reliability? +> [!TIP] HOW IMPORTANT IS RELIABILITY? Reliability is not just for nuclear power stations and air traffic control—more mundane applications are also expected to work reliably. Bugs in business applications cause lost productivity (and legal diff --git a/content/en/ch3.md b/content/en/ch3.md index e0f4b7a..ac89e5b 100644 --- a/content/en/ch3.md +++ b/content/en/ch3.md @@ -42,7 +42,7 @@ you to work with these models. This comparison will help you decide when to use -------- -> [!TIP] Terminology: Declarative Query Languages +> [!TIP] TERMINOLOGY: DECLARATIVE QUERY LANGUAGES Many of the query languages in this chapter (such as SQL, Cypher, SPARQL, or Datalog) are *declarative*, which means that you specify the pattern of the data you want—what conditions the @@ -171,7 +171,7 @@ Another way of representing the same information, which is perhaps more natural closely to an object structure in application code, is as a JSON document as shown in [Example 3-1](/en/ch3#fig_obama_json). -{{< figure id="fig_obama_json" caption="Example 3-1. Representing a LinkedIn profile as a JSON document" class="w-full my-4" >}} +{{< figure id="fig_obama_json" title="Example 3-1. Representing a LinkedIn profile as a JSON document" class="w-full my-4" >}} ```json { @@ -398,7 +398,7 @@ possible representation is given in [Example 3-2](/en/ch3#fig_datamodels_m2m_js document, but the links to organizations and schools are best represented as references to other documents. -{{< figure id="fig_datamodels_m2m_json" caption="Example 3-2. A résumé that references organizations by ID." class="w-full my-4" >}} +{{< figure id="fig_datamodels_m2m_json" title="Example 3-2. A résumé that references organizations by ID." class="w-full my-4" >}} ```json { @@ -785,7 +785,7 @@ store the properties of each vertex or edge). The head and tail vertex are store you want the set of incoming or outgoing edges for a vertex, you can query the `edges` table by `head_vertex` or `tail_vertex`, respectively. -{{< figure id="fig_graph_sql_schema" caption="Example 3-3. Representing a property graph using a relational schema" class="w-full my-4" >}} +{{< figure id="fig_graph_sql_schema" title="Example 3-3. Representing a property graph using a relational schema" class="w-full my-4" >}} ```sql CREATE TABLE vertices ( @@ -864,7 +864,7 @@ only used internally within the query to create edges between the vertices, usin `(idaho) -[:WITHIN]-> (usa)` creates an edge labeled `WITHIN`, with `idaho` as the tail node and `usa` as the head node. -{{< figure id="fig_cypher_create" caption="Example 3-4. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as a Cypher query" class="w-full my-4" >}} +{{< figure id="fig_cypher_create" title="Example 3-4. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as a Cypher query" class="w-full my-4" >}} ``` CREATE @@ -887,7 +887,7 @@ property of each of those vertices. that are related by an edge labeled `BORN_IN`. The tail vertex of that edge is bound to the variable `person`, and the head vertex is left unnamed. -{{< figure id="fig_cypher_query" caption="Example 3-5. Cypher query to find people who emigrated from the US to Europe" class="w-full my-4" >}} +{{< figure id="fig_cypher_query" title="Example 3-5. Cypher query to find people who emigrated from the US to Europe" class="w-full my-4" >}} ``` MATCH @@ -945,7 +945,7 @@ something called *recursive common table expressions* (the `WITH RECURSIVE` synt to Europe—expressed in SQL using this technique. However, the syntax is very clumsy in comparison to Cypher. -{{< figure id="fig_graph_sql_query" caption="Example 3-6. The same query as [Example 3-5](/en/ch3#fig_cypher_query), written in SQL using recursive common table expressions" class="w-full my-4" >}} +{{< figure id="fig_graph_sql_query" title="Example 3-6. The same query as [Example 3-5](/en/ch3#fig_cypher_query), written in SQL using recursive common table expressions" class="w-full my-4" >}} ```sql WITH RECURSIVE @@ -1052,7 +1052,7 @@ The subject of a triple is equivalent to a vertex in a graph. The object is one [Example 3-7](/en/ch3#fig_graph_n3_triples) shows the same data as in [Example 3-4](/en/ch3#fig_cypher_create), written as triples in a format called *Turtle*, a subset of *Notation3* (*N3*) [^48]. -{{< figure id="fig_graph_n3_triples" caption="Example 3-7. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Turtle triples" class="w-full my-4" >}} +{{< figure id="fig_graph_n3_triples" title="Example 3-7. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Turtle triples" class="w-full my-4" >}} ``` @prefix : . @@ -1081,7 +1081,7 @@ It’s quite repetitive to repeat the same subject over and over again, but fort semicolons to say multiple things about the same subject. This makes the Turtle format quite readable: see [Example 3-8](/en/ch3#fig_graph_n3_shorthand). -{{< figure id="fig_graph_n3_shorthand" caption="Example 3-8. A more concise way of writing the data in [Example 3-7](/en/ch3#fig_graph_n3_triples)" class="w-full my-4" >}} +{{< figure id="fig_graph_n3_shorthand" title="Example 3-8. A more concise way of writing the data in [Example 3-7](/en/ch3#fig_graph_n3_triples)" class="w-full my-4" >}} ``` @prefix : . @@ -1093,7 +1093,7 @@ _:namerica a :Location; :name "North America"; :type "continent". -------- -> [!TIP] The Semantic Web +> [!TIP] THE SEMANTIC WEB Some of the research and development effort on triple stores was motivated by the *Semantic Web*, an early-2000s effort to facilitate internet-wide data exchange by publishing data not only as @@ -1116,7 +1116,7 @@ a data model that was designed for the Semantic Web. RDF data can also be encode example (more verbosely) in XML, as shown in [Example 3-9](/en/ch3#fig_graph_rdf_xml). Tools like Apache Jena can automatically convert between different RDF encodings. -{{< figure id="fig_graph_rdf_xml" caption="Example 3-9. The data of [Example 3-8](/en/ch3#fig_graph_n3_shorthand), expressed using RDF/XML syntax" class="w-full my-4" >}} +{{< figure id="fig_graph_rdf_xml" title="Example 3-9. The data of [Example 3-8](/en/ch3#fig_graph_n3_shorthand), expressed using RDF/XML syntax" class="w-full my-4" >}} ```xml }} +{{< figure id="fig_sparql_query" title="Example 3-10. The same query as [Example 3-5](/en/ch3#fig_cypher_query), expressed in SPARQL" class="w-full my-4" >}} ``` PREFIX : @@ -1227,7 +1227,7 @@ the second column contains `val2`, and so on. are represented as two-column join tables. For example, Lucy has the ID 100 and Idaho has the ID 3, so the relationship “Lucy was born in Idaho” is represented as `born_in(100, 3)`. -{{< figure id="fig_datalog_triples" caption="Example 3-11. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Datalog facts" class="w-full my-4" >}} +{{< figure id="fig_datalog_triples" title="Example 3-11. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Datalog facts" class="w-full my-4" >}} ``` location(1, "North America", "continent"). diff --git a/content/en/ch4.md b/content/en/ch4.md index 20091d8..f14503a 100644 --- a/content/en/ch4.md +++ b/content/en/ch4.md @@ -331,7 +331,7 @@ characteristics in more detail in [“Comparing B-Trees and LSM-Trees”](/en/ch -------- -> [!TIP] Embedded storage engines +> [!TIP] EMBEDDED STORAGE ENGINES Many databases run as a service that accepts queries over a network, but there are also *embedded* databases that don’t expose a network API. Instead, they are libraries that run in the same process @@ -518,7 +518,7 @@ state drives (SSDs) that most databases use today, the difference is smaller, bu -------- -> [!TIP] Sequential vs. Random Writes on SSDs +> [!TIP] SEQUENTIAL VS. RANDOM WRITES ON SSDS On spinning-disk hard drives (HDDs), sequential writes are much faster than random writes: a random write has to mechanically move the disk head to a new position and wait for the right part of the @@ -794,7 +794,7 @@ buying fruit or candy during the 2024 calendar year), but it only needs to acces the `fact_sales` table: `date_key`, `product_sk`, and `quantity`. The query ignores all other columns. -{{< figure id="fig_storage_analytics_query" caption="Example 4-1. Analyzing whether people are more inclined to buy fresh fruit or candy, depending on the day of the week" class="w-full my-4" >}} +{{< figure id="fig_storage_analytics_query" title="Example 4-1. Analyzing whether people are more inclined to buy fresh fruit or candy, depending on the day of the week" class="w-full my-4" >}} ```sql SELECT diff --git a/content/en/ch5.md b/content/en/ch5.md index 1019f0d..9842c05 100644 --- a/content/en/ch5.md +++ b/content/en/ch5.md @@ -239,7 +239,7 @@ particular, since they don’t prescribe a schema, they need to include all the the encoded data. That is, in a binary encoding of the JSON document in [Example 5-2](/en/ch5#fig_encoding_json), they will need to include the strings `userName`, `favoriteNumber`, and `interests` somewhere. -{{< figure id="fig_encoding_json" caption="Example 5-2. Example record which we will encode in several binary formats in this chapter" class="w-full my-4" >}} +{{< figure id="fig_encoding_json" title="Example 5-2. Example record which we will encode in several binary formats in this chapter" class="w-full my-4" >}} ```json { @@ -731,7 +731,7 @@ Developers typically write OpenAPI service definitions in JSON or YAML; see [Exa The service definition allows developers to define service endpoints, documentation, versions, data models, and much more. gRPC definitions look similar, but are defined using Protocol Buffers service definitions. -{{< figure id="fig_open_api_def" caption="Example 5-3. Example OpenAPI service definition in YAML" class="w-full my-4" >}} +{{< figure id="fig_open_api_def" title="Example 5-3. Example OpenAPI service definition in YAML" class="w-full my-4" >}} ```yaml openapi: 3.0.0 @@ -942,7 +942,7 @@ Business Process Execution Language (BPEL) [^44]. -------- -> [!TIP] Tasks, Activities, and Functions +> [!TIP] TASKS, ACTIVITIES, AND FUNCTIONS Different workflow engines use different names for tasks. Temporal, for example, uses the term *activity*. Others refer to tasks as *durable functions*. Though the names differ, the concepts are the same. diff --git a/content/en/ch7.md b/content/en/ch7.md index 2b437ba..555f589 100644 --- a/content/en/ch7.md +++ b/content/en/ch7.md @@ -378,7 +378,7 @@ have the same partition key, they will be in the same shard. -------- -> [!TIPS] PARTITIONING AND RANGE QUERIES IN DATA WAREHOUSES +> [!TIP] PARTITIONING AND RANGE QUERIES IN DATA WAREHOUSES Data warehouses such as BigQuery, Snowflake, and Delta Lake support a similar indexing approach, though the terminology differs. In BigQuery, for example, the partition key determines which diff --git a/content/en/ch8.md b/content/en/ch8.md index 28272ef..c0ae1a1 100644 --- a/content/en/ch8.md +++ b/content/en/ch8.md @@ -875,7 +875,7 @@ needs to ensure that a player’s move abides by the rules of the game, which in you cannot sensibly implement as a database query. Instead, you may use a lock to prevent two players from concurrently moving the same piece, as illustrated in [Example 8-1](/en/ch8#fig_transactions_select_for_update). -{{< figure id="fig_transactions_select_for_update" caption="Example 8-1. Explicitly locking rows to prevent lost updates" class="w-full my-4" >}} +{{< figure id="fig_transactions_select_for_update" title="Example 8-1. Explicitly locking rows to prevent lost updates" class="w-full my-4" >}} ```sql BEGIN TRANSACTION;