mirror of
https://github.com/Vonng/ddia.git
synced 2026-06-22 09:27:04 +08:00
fix note format
This commit is contained in:
parent
d216e35c8e
commit
752c2f58c7
8 changed files with 146 additions and 175 deletions
|
|
@ -151,10 +151,9 @@ employee’s salary, etc. As databases expanded into areas that didn’t involve
|
||||||
the term *transaction* nevertheless stuck, referring to a group of reads and writes that form a
|
the term *transaction* nevertheless stuck, referring to a group of reads and writes that form a
|
||||||
logical unit.
|
logical unit.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> [Chapter 8](/en/ch8#ch_transactions) explores in detail what we mean with a transaction. This chapter uses the term
|
||||||
[Chapter 8](/en/ch8#ch_transactions) explores in detail what we mean with a transaction. This chapter uses the term
|
> loosely to refer to low-latency reads and writes.
|
||||||
loosely to refer to low-latency reads and writes.
|
|
||||||
|
|
||||||
Even though databases started being used for many different kinds of data—posts on social media,
|
Even though databases started being used for many different kinds of data—posts on social media,
|
||||||
moves in a game, contacts in an address book, and many others—the basic access pattern
|
moves in a game, contacts in an address book, and many others—the basic access pattern
|
||||||
|
|
@ -192,11 +191,10 @@ Table 1-1. Comparing characteristics of operational and analytic systems
|
||||||
| Data represents | Latest state of data (current point in time) | History of events that happened over time |
|
| Data represents | Latest state of data (current point in time) | History of events that happened over time |
|
||||||
| Dataset size | Gigabytes to terabytes | Terabytes to petabytes |
|
| Dataset size | Gigabytes to terabytes | Terabytes to petabytes |
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The meaning of *online* in *OLAP* is unclear; it probably refers to the fact that queries are not
|
||||||
The meaning of *online* in *OLAP* is unclear; it probably refers to the fact that queries are not
|
> just for predefined reports, but that analysts use the OLAP system interactively for explorative
|
||||||
just for predefined reports, but that analysts use the OLAP system interactively for explorative
|
> queries.
|
||||||
queries.
|
|
||||||
|
|
||||||
With operational systems, users are generally not allowed to construct custom SQL queries and run
|
With operational systems, users are generally not allowed to construct custom SQL queries and run
|
||||||
them on the database, since that would potentially allow them to read or modify data that they do
|
them on the database, since that would potentially allow them to read or modify data that they do
|
||||||
|
|
|
||||||
|
|
@ -299,12 +299,10 @@ election correctly (see for example the fencing issue in [“Distributed Locks a
|
||||||
libraries like Apache Curator help by providing higher-level recipes on top of ZooKeeper. However, a
|
libraries like Apache Curator help by providing higher-level recipes on top of ZooKeeper. However, a
|
||||||
linearizable storage service is the basic foundation for these coordination tasks.
|
linearizable storage service is the basic foundation for these coordination tasks.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]> Strictly speaking, ZooKeeper provides linearizable writes, but reads may be stale, since there is no
|
||||||
|
> guarantee that they are served from the current leader
|
||||||
Strictly speaking, ZooKeeper provides linearizable writes, but reads may be stale, since there is no
|
> [^18].
|
||||||
guarantee that they are served from the current leader
|
> etcd since version 3 provides linearizable reads by default.
|
||||||
[^18].
|
|
||||||
etcd since version 3 provides linearizable reads by default.
|
|
||||||
|
|
||||||
Distributed locking is also used at a much more granular level in some distributed databases, such as
|
Distributed locking is also used at a much more granular level in some distributed databases, such as
|
||||||
Oracle Real Application Clusters (RAC)
|
Oracle Real Application Clusters (RAC)
|
||||||
|
|
@ -1198,14 +1196,13 @@ Validity
|
||||||
: If a node reads a log entry containing some value, then some node previously requested for that
|
: If a node reads a log entry containing some value, then some node previously requested for that
|
||||||
value to be added to the log.
|
value to be added to the log.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> A shared log is formally known as a *total order broadcast*, *atomic broadcast*, or *total order
|
||||||
A shared log is formally known as a *total order broadcast*, *atomic broadcast*, or *total order
|
> multicast* protocol [[26](/en/ch10#Cachin2011),
|
||||||
multicast* protocol [[26](/en/ch10#Cachin2011),
|
> [76](/en/ch10#Defago2004),
|
||||||
[76](/en/ch10#Defago2004),
|
> [77](/en/ch10#Attiya2004)].
|
||||||
[77](/en/ch10#Attiya2004)].
|
> It’s the same thing described in different words: requesting a value to be added to the log is then
|
||||||
It’s the same thing described in different words: requesting a value to be added to the log is then
|
> called “broadcasting” it, and reading a log entry is called “delivering” it.
|
||||||
called “broadcasting” it, and reading a log entry is called “delivering” it.
|
|
||||||
|
|
||||||
If you have an implementation of a shared log, it is easy to solve the consensus problem: every node
|
If you have an implementation of a shared log, it is easy to solve the consensus problem: every node
|
||||||
that wants to propose a value requests for it to be added to the log, and whichever value is read
|
that wants to propose a value requests for it to be added to the log, and whichever value is read
|
||||||
|
|
@ -1356,12 +1353,11 @@ then the transactions will be serializable
|
||||||
[[81](/en/ch10#Thomson2012),
|
[[81](/en/ch10#Thomson2012),
|
||||||
[82](/en/ch10#Balakrishnan2013)].
|
[82](/en/ch10#Balakrishnan2013)].
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Sharded databases with a strong consistency model often maintain a separate log per shard, which
|
||||||
Sharded databases with a strong consistency model often maintain a separate log per shard, which
|
> improves scalability, but limits the consistency guarantees (e.g., consistent snapshots, foreign key
|
||||||
improves scalability, but limits the consistency guarantees (e.g., consistent snapshots, foreign key
|
> references) they can offer across shards. Serializable transactions across shards are possible, but
|
||||||
references) they can offer across shards. Serializable transactions across shards are possible, but
|
> require additional coordination [^83].
|
||||||
require additional coordination [^83].
|
|
||||||
|
|
||||||
A shared log is also powerful because it can easily be adapted to other forms of consensus:
|
A shared log is also powerful because it can easily be adapted to other forms of consensus:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -110,13 +110,12 @@ translation layer is required between the objects in the application code and th
|
||||||
tables, rows, and columns. The disconnect between the models is sometimes called an *impedance
|
tables, rows, and columns. The disconnect between the models is sometimes called an *impedance
|
||||||
mismatch*.
|
mismatch*.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The term *impedance mismatch* is borrowed from electronics. Every electric circuit has a certain
|
||||||
The term *impedance mismatch* is borrowed from electronics. Every electric circuit has a certain
|
> impedance (resistance to alternating current) on its inputs and outputs. When you connect one
|
||||||
impedance (resistance to alternating current) on its inputs and outputs. When you connect one
|
> circuit’s output to another one’s input, the power transfer across the connection is maximized if
|
||||||
circuit’s output to another one’s input, the power transfer across the connection is maximized if
|
> the output and input impedances of the two circuits match. An impedance mismatch can lead to signal
|
||||||
the output and input impedances of the two circuits match. An impedance mismatch can lead to signal
|
> reflections and other troubles.
|
||||||
reflections and other troubles.
|
|
||||||
|
|
||||||
### Object-relational mapping (ORM)
|
### Object-relational mapping (ORM)
|
||||||
|
|
||||||
|
|
@ -225,15 +224,14 @@ structure explicit (see [Figure 3-2](/en/ch3#fig_json_tree)).
|
||||||
|
|
||||||
###### Figure 3-2. One-to-many relationships forming a tree structure.
|
###### Figure 3-2. One-to-many relationships forming a tree structure.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> This type of relationship is sometimes called *one-to-few* rather than *one-to-many*, since a résumé
|
||||||
This type of relationship is sometimes called *one-to-few* rather than *one-to-many*, since a résumé
|
> typically has a small number of positions
|
||||||
typically has a small number of positions
|
> [[9](/en/ch3#Zola2014),
|
||||||
[[9](/en/ch3#Zola2014),
|
> [10](/en/ch3#Andrews2023)].
|
||||||
[10](/en/ch3#Andrews2023)].
|
> In situations where there may be a genuinely large number of related items—say, comments on a
|
||||||
In situations where there may be a genuinely large number of related items—say, comments on a
|
> celebrity’s social media post, of which there could be many thousands—embedding them all in the same
|
||||||
celebrity’s social media post, of which there could be many thousands—embedding them all in the same
|
> document may be too unwieldy, so the relational approach in [Figure 3-1](/en/ch3#fig_obama_relational) is preferable.
|
||||||
document may be too unwieldy, so the relational approach in [Figure 3-1](/en/ch3#fig_obama_relational) is preferable.
|
|
||||||
|
|
||||||
## Normalization, Denormalization, and Joins
|
## Normalization, Denormalization, and Joins
|
||||||
|
|
||||||
|
|
@ -727,14 +725,13 @@ databases need relational-style references to other documents, and many relation
|
||||||
sections where schema flexibility is beneficial. Relational-document hybrids are a powerful
|
sections where schema flexibility is beneficial. Relational-document hybrids are a powerful
|
||||||
combination.
|
combination.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Codd’s original description of the relational model
|
||||||
Codd’s original description of the relational model
|
> [^3] actually allowed something similar to JSON
|
||||||
[^3] actually allowed something similar to JSON
|
> within a relational schema. He called it *nonsimple domains*. The idea was that a value in a row
|
||||||
within a relational schema. He called it *nonsimple domains*. The idea was that a value in a row
|
> doesn’t have to just be a primitive datatype like a number or a string, but it could also be a
|
||||||
doesn’t have to just be a primitive datatype like a number or a string, but it could also be a
|
> nested relation (table)—so you can have an arbitrarily nested tree structure as a value, much like
|
||||||
nested relation (table)—so you can have an arbitrarily nested tree structure as a value, much like
|
> the JSON or XML support that was added to SQL over 30 years later.
|
||||||
the JSON or XML support that was added to SQL over 30 years later.
|
|
||||||
|
|
||||||
# Graph-Like Data Models
|
# Graph-Like Data Models
|
||||||
|
|
||||||
|
|
@ -874,13 +871,12 @@ The edges table is like the many-to-many associative table/join table we saw in
|
||||||
stored in the same table. There may also be indexes on the labels and the properties, allowing
|
stored in the same table. There may also be indexes on the labels and the properties, allowing
|
||||||
vertices or edges with certain properties to be found efficiently.
|
vertices or edges with certain properties to be found efficiently.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> A limitation of graph models is that an edge can only associate two vertices with each other,
|
||||||
A limitation of graph models is that an edge can only associate two vertices with each other,
|
> whereas a relational join table can represent three-way or even higher-degree relationships by
|
||||||
whereas a relational join table can represent three-way or even higher-degree relationships by
|
> having multiple foreign key references on a single row. Such relationships can be represented in a
|
||||||
having multiple foreign key references on a single row. Such relationships can be represented in a
|
> graph by creating an additional vertex corresponding to each row of the join table, and edges
|
||||||
graph by creating an additional vertex corresponding to each row of the join table, and edges
|
> to/from that vertex, or by using a *hypergraph*.
|
||||||
to/from that vertex, or by using a *hypergraph*.
|
|
||||||
|
|
||||||
Those features give graphs a great deal of flexibility for data modeling, as illustrated in
|
Those features give graphs a great deal of flexibility for data modeling, as illustrated in
|
||||||
[Figure 3-6](/en/ch3#fig_datamodels_graph). The figure shows a few things that would be difficult to express in a
|
[Figure 3-6](/en/ch3#fig_datamodels_graph). The figure shows a few things that would be difficult to express in a
|
||||||
|
|
@ -1103,15 +1099,14 @@ The subject of a triple is equivalent to a vertex in a graph. The object is one
|
||||||
(*lucy*, *marriedTo*, *alain*) the subject and object *lucy* and *alain* are both vertices, and
|
(*lucy*, *marriedTo*, *alain*) the subject and object *lucy* and *alain* are both vertices, and
|
||||||
the predicate *marriedTo* is the label of the edge that connects them.
|
the predicate *marriedTo* is the label of the edge that connects them.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> To be precise, databases that offer a triple-like data model often need to store some additional
|
||||||
To be precise, databases that offer a triple-like data model often need to store some additional
|
> metadata on each tuple. For example, AWS Neptune uses quads (4-tuples) by adding a graph ID to each
|
||||||
metadata on each tuple. For example, AWS Neptune uses quads (4-tuples) by adding a graph ID to each
|
> triple [^46];
|
||||||
triple [^46];
|
> Datomic uses 5-tuples, extending each triple with a transaction ID and a boolean to indicate
|
||||||
Datomic uses 5-tuples, extending each triple with a transaction ID and a boolean to indicate
|
> deletion [^47].
|
||||||
deletion [^47].
|
> Since these databases retain the basic *subject-predicate-object* structure explained above, this
|
||||||
Since these databases retain the basic *subject-predicate-object* structure explained above, this
|
> book nevertheless calls them triple-stores.
|
||||||
book nevertheless calls them triple-stores.
|
|
||||||
|
|
||||||
[Example 3-7](/en/ch3#fig_graph_n3_triples) shows the same data as in [Example 3-4](/en/ch3#fig_cypher_create), written as
|
[Example 3-7](/en/ch3#fig_graph_n3_triples) shows the same data as in [Example 3-4](/en/ch3#fig_cypher_create), written as
|
||||||
triples in a format called *Turtle*, a subset of *Notation3* (*N3*)
|
triples in a format called *Turtle*, a subset of *Notation3* (*N3*)
|
||||||
|
|
|
||||||
|
|
@ -97,12 +97,11 @@ forever, and handling partially written records when recovering from a crash), b
|
||||||
principle is the same. Logs are incredibly useful, and we will encounter them several times in this
|
principle is the same. Logs are incredibly useful, and we will encounter them several times in this
|
||||||
book.
|
book.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The word *log* is often used to refer to application logs, where an application outputs text that
|
||||||
The word *log* is often used to refer to application logs, where an application outputs text that
|
> describes what’s happening. In this book, *log* is used in the more general sense: an append-only
|
||||||
describes what’s happening. In this book, *log* is used in the more general sense: an append-only
|
> sequence of records on disk. It doesn’t have to be human-readable; it might be binary and intended
|
||||||
sequence of records on disk. It doesn’t have to be human-readable; it might be binary and intended
|
> only for internal use by the database system.
|
||||||
only for internal use by the database system.
|
|
||||||
|
|
||||||
On the other hand, the `db_get` function has terrible performance if you have a large number of
|
On the other hand, the `db_get` function has terrible performance if you have a large number of
|
||||||
records in your database. Every time you want to look up a key, `db_get` has to scan the entire
|
records in your database. Every time you want to look up a key, `db_get` has to scan the entire
|
||||||
|
|
@ -889,15 +888,14 @@ If each column is stored separately, a query only needs to read and parse those
|
||||||
used in that query, which can save a lot of work. [Figure 4-7](/en/ch4#fig_column_store) shows this principle using
|
used in that query, which can save a lot of work. [Figure 4-7](/en/ch4#fig_column_store) shows this principle using
|
||||||
an expanded version of the fact table from [Figure 3-5](/en/ch3#fig_dwh_schema).
|
an expanded version of the fact table from [Figure 3-5](/en/ch3#fig_dwh_schema).
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Column storage is easiest to understand in a relational data model, but it applies equally to
|
||||||
Column storage is easiest to understand in a relational data model, but it applies equally to
|
> nonrelational data. For example, Parquet
|
||||||
nonrelational data. For example, Parquet
|
> [^57]
|
||||||
[^57]
|
> is a columnar storage format that supports a document data model, based on Google’s Dremel
|
||||||
is a columnar storage format that supports a document data model, based on Google’s Dremel
|
> [^58],
|
||||||
[^58],
|
> using a technique known as *shredding* or *striping*
|
||||||
using a technique known as *shredding* or *striping*
|
> [^59].
|
||||||
[^59].
|
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
@ -985,13 +983,12 @@ are followed by user *X* and who also follow user *Y*
|
||||||
There are also various other compression schemes for columnar databases, which you can find in the
|
There are also various other compression schemes for columnar databases, which you can find in the
|
||||||
references [^75].
|
references [^75].
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Don’t confuse column-oriented databases with the *wide-column* (also known as *column-family*) data
|
||||||
Don’t confuse column-oriented databases with the *wide-column* (also known as *column-family*) data
|
> model, in which a row can have thousands of columns, and there is no need for all the rows to have
|
||||||
model, in which a row can have thousands of columns, and there is no need for all the rows to have
|
> the same columns [^9]. Despite the similarity
|
||||||
the same columns [^9]. Despite the similarity
|
> in name, wide-column databases are row-oriented, since they store all values from a row together.
|
||||||
in name, wide-column databases are row-oriented, since they store all values from a row together.
|
> Google’s Bigtable, Apache Accumulo, and HBase are examples of the wide-column model.
|
||||||
Google’s Bigtable, Apache Accumulo, and HBase are examples of the wide-column model.
|
|
||||||
|
|
||||||
### Sort Order in Column Storage
|
### Sort Order in Column Storage
|
||||||
|
|
||||||
|
|
@ -1293,12 +1290,11 @@ location along one dimension’s axis. Embedding models generate vector embeddin
|
||||||
each other (in this multi-dimensional space) when the embedding’s input documents are semantically
|
each other (in this multi-dimensional space) when the embedding’s input documents are semantically
|
||||||
similar.
|
similar.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> We saw the term *vectorized processing* in [“Query Execution: Compilation and Vectorization”](/en/ch4#sec_storage_vectorized).
|
||||||
We saw the term *vectorized processing* in [“Query Execution: Compilation and Vectorization”](/en/ch4#sec_storage_vectorized).
|
> Vectors in semantic search have a different meaning. In vectorized processing, the vector refers to
|
||||||
Vectors in semantic search have a different meaning. In vectorized processing, the vector refers to
|
> a batch of bits that can be processed with specially optimized code. In embedding models, vectors are a list of
|
||||||
a batch of bits that can be processed with specially optimized code. In embedding models, vectors are a list of
|
> floating point numbers that represent a location in multi-dimensional space.
|
||||||
floating point numbers that represent a location in multi-dimensional space.
|
|
||||||
|
|
||||||
For example, a three-dimensional vector embedding for a Wikipedia page about agriculture might be
|
For example, a three-dimensional vector embedding for a Wikipedia page about agriculture might be
|
||||||
[0.1, 0.22, 0.11]. A Wikipedia page about vegetables would be quite near, perhaps with an embedding
|
[0.1, 0.22, 0.11]. A Wikipedia page about vegetables would be quite near, perhaps with an embedding
|
||||||
|
|
|
||||||
|
|
@ -1086,10 +1086,9 @@ you have to remember to use them. In some cases, such as with Temporal’s workf
|
||||||
frameworks provide static analysis tools to determine if nondeterministic behavior has been
|
frameworks provide static analysis tools to determine if nondeterministic behavior has been
|
||||||
introduced.
|
introduced.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Making code deterministic is a powerful idea, but tricky to do robustly. In
|
||||||
Making code deterministic is a powerful idea, but tricky to do robustly. In
|
> [“The Power of Determinism”](/en/ch9#sidebar_distributed_determinism) we will return to this topic.
|
||||||
[“The Power of Determinism”](/en/ch9#sidebar_distributed_determinism) we will return to this topic.
|
|
||||||
|
|
||||||
## Event-Driven Architectures
|
## Event-Driven Architectures
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -108,11 +108,10 @@ etcd, and RabbitMQ quorum queues (among others), are also based on a single lead
|
||||||
automatically elect a new leader if the old one fails (we will discuss consensus in more detail in
|
automatically elect a new leader if the old one fails (we will discuss consensus in more detail in
|
||||||
[Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency)).
|
[Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency)).
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> In older documents you may see the term *master–slave replication*. It means the same as
|
||||||
In older documents you may see the term *master–slave replication*. It means the same as
|
> leader-based replication, but the term should be avoided as it is widely considered offensive
|
||||||
leader-based replication, but the term should be avoided as it is widely considered offensive
|
> [^8].
|
||||||
[^8].
|
|
||||||
|
|
||||||
## Synchronous Versus Asynchronous Replication
|
## Synchronous Versus Asynchronous Replication
|
||||||
|
|
||||||
|
|
@ -354,11 +353,10 @@ Failover is fraught with things that can go wrong:
|
||||||
is already struggling with high load or network problems, an unnecessary failover is likely to
|
is already struggling with high load or network problems, an unnecessary failover is likely to
|
||||||
make the situation worse, not better.
|
make the situation worse, not better.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Guarding against split brain by limiting or shutting down old leaders is known as *fencing* or, more
|
||||||
Guarding against split brain by limiting or shutting down old leaders is known as *fencing* or, more
|
> emphatically, *Shoot The Other Node In The Head* (STONITH). We will discuss fencing in more detail
|
||||||
emphatically, *Shoot The Other Node In The Head* (STONITH). We will discuss fencing in more detail
|
> in [“Distributed Locks and Leases”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_lock_fencing).
|
||||||
in [“Distributed Locks and Leases”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_lock_fencing).
|
|
||||||
|
|
||||||
There are no easy solutions to these problems. For this reason, some operations teams prefer to
|
There are no easy solutions to these problems. For this reason, some operations teams prefer to
|
||||||
perform failovers manually, even if the software supports automatic failover.
|
perform failovers manually, even if the software supports automatic failover.
|
||||||
|
|
@ -502,15 +500,14 @@ just a temporary state—if you stop writing to the database and wait a while, t
|
||||||
eventually catch up and become consistent with the leader. For that reason, this effect is known
|
eventually catch up and become consistent with the leader. For that reason, this effect is known
|
||||||
as *eventual consistency* [^22].
|
as *eventual consistency* [^22].
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The term *eventual consistency* was coined by Douglas Terry et al.
|
||||||
The term *eventual consistency* was coined by Douglas Terry et al.
|
> [^23],
|
||||||
[^23],
|
> popularized by Werner Vogels
|
||||||
popularized by Werner Vogels
|
> [^24],
|
||||||
[^24],
|
> and became the battle cry of many NoSQL projects. However, not only NoSQL databases are eventually
|
||||||
and became the battle cry of many NoSQL projects. However, not only NoSQL databases are eventually
|
> consistent: followers in an asynchronously replicated relational database have the same
|
||||||
consistent: followers in an asynchronously replicated relational database have the same
|
> characteristics.
|
||||||
characteristics.
|
|
||||||
|
|
||||||
The term “eventually” is deliberately vague: in general, there is no limit to how far a replica can
|
The term “eventually” is deliberately vague: in general, there is no limit to how far a replica can
|
||||||
fall behind. In normal operation, the delay between a write happening on the leader and being
|
fall behind. In normal operation, the delay between a write happening on the leader and being
|
||||||
|
|
@ -846,10 +843,9 @@ and forwards those writes (plus any writes of its own) to one other node. Anothe
|
||||||
has the shape of a *star*: one designated root node forwards writes to all of the other nodes. The
|
has the shape of a *star*: one designated root node forwards writes to all of the other nodes. The
|
||||||
star topology can be generalized to a tree.
|
star topology can be generalized to a tree.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Don’t confuse a star-shaped network topology with a *star schema* (see
|
||||||
Don’t confuse a star-shaped network topology with a *star schema* (see
|
> [“Stars and Snowflakes: Schemas for Analytics”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch03.html#sec_datamodels_analytics)), which describes the structure of a data model.
|
||||||
[“Stars and Snowflakes: Schemas for Analytics”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch03.html#sec_datamodels_analytics)), which describes the structure of a data model.
|
|
||||||
|
|
||||||
In circular and star topologies, a write may need to pass through several nodes before it reaches
|
In circular and star topologies, a write may need to pass through several nodes before it reaches
|
||||||
all replicas. Therefore, nodes need to forward data changes they receive from other nodes. To
|
all replicas. Therefore, nodes need to forward data changes they receive from other nodes. To
|
||||||
|
|
@ -1020,13 +1016,12 @@ This problem does not occur in a single-leader database.
|
||||||
|
|
||||||
###### Figure 6-9. A write conflict caused by two leaders concurrently updating the same record.
|
###### Figure 6-9. A write conflict caused by two leaders concurrently updating the same record.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> We say that the two writes in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict) are *concurrent* because neither
|
||||||
We say that the two writes in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict) are *concurrent* because neither
|
> was “aware” of the other at the time the write was originally made. It doesn’t matter whether the
|
||||||
was “aware” of the other at the time the write was originally made. It doesn’t matter whether the
|
> writes literally happened at the same time; indeed, if the writes were made while offline, they
|
||||||
writes literally happened at the same time; indeed, if the writes were made while offline, they
|
> might have actually happened some time apart. What matters is whether one write occurred in a state
|
||||||
might have actually happened some time apart. What matters is whether one write occurred in a state
|
> where the other write has already taken effect.
|
||||||
where the other write has already taken effect.
|
|
||||||
|
|
||||||
In [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent) we will tackle the question of how a database can determine
|
In [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent) we will tackle the question of how a database can determine
|
||||||
whether two writes are concurrent. For now we will assume that we can detect conflicts, and we want
|
whether two writes are concurrent. For now we will assume that we can detect conflicts, and we want
|
||||||
|
|
@ -1258,13 +1253,12 @@ a fashionable architecture for databases after Amazon used it for its in-house *
|
||||||
Riak, Cassandra, and ScyllaDB are open source datastores with leaderless replication models inspired
|
Riak, Cassandra, and ScyllaDB are open source datastores with leaderless replication models inspired
|
||||||
by Dynamo, so this kind of database is also known as *Dynamo-style*.
|
by Dynamo, so this kind of database is also known as *Dynamo-style*.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The original *Dynamo* system was only described in a paper
|
||||||
The original *Dynamo* system was only described in a paper
|
> [^45], but never released outside of
|
||||||
[^45], but never released outside of
|
> Amazon. The similarly-named *DynamoDB* is a more recent cloud database from AWS, but it has a
|
||||||
Amazon. The similarly-named *DynamoDB* is a more recent cloud database from AWS, but it has a
|
> completely different architecture: it uses single-leader replication based on the Multi-Paxos
|
||||||
completely different architecture: it uses single-leader replication based on the Multi-Paxos
|
> consensus algorithm [^5].
|
||||||
consensus algorithm [^5].
|
|
||||||
|
|
||||||
In some leaderless implementations, the client directly sends its writes to several replicas, while
|
In some leaderless implementations, the client directly sends its writes to several replicas, while
|
||||||
in others, a coordinator node does this on behalf of the client. However, unlike a leader database,
|
in others, a coordinator node does this on behalf of the client. However, unlike a leader database,
|
||||||
|
|
@ -1357,11 +1351,10 @@ For example, a workload with few writes and many reads may benefit from setting
|
||||||
*r* = 1. This makes reads faster, but has the disadvantage that just one failed node causes all
|
*r* = 1. This makes reads faster, but has the disadvantage that just one failed node causes all
|
||||||
database writes to fail.
|
database writes to fail.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> There may be more than *n* nodes in the cluster, but any given value is stored only on *n*
|
||||||
There may be more than *n* nodes in the cluster, but any given value is stored only on *n*
|
> nodes. This allows the dataset to be sharded, supporting datasets that are larger than you can fit
|
||||||
nodes. This allows the dataset to be sharded, supporting datasets that are larger than you can fit
|
> on one node. We will return to sharding in [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding).
|
||||||
on one node. We will return to sharding in [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding).
|
|
||||||
|
|
||||||
The quorum condition, *w* + *r* > *n*, allows the system to tolerate unavailable nodes
|
The quorum condition, *w* + *r* > *n*, allows the system to tolerate unavailable nodes
|
||||||
as follows:
|
as follows:
|
||||||
|
|
|
||||||
|
|
@ -384,11 +384,10 @@ Similarly popular is a *conditional write* operation, which allows a write to ha
|
||||||
has not been concurrently changed by someone else (see [“Conditional writes (compare-and-set)”](/en/ch8#sec_transactions_compare_and_set)),
|
has not been concurrently changed by someone else (see [“Conditional writes (compare-and-set)”](/en/ch8#sec_transactions_compare_and_set)),
|
||||||
similarly to a compare-and-set or compare-and-swap (CAS) operation in shared-memory concurrency.
|
similarly to a compare-and-set or compare-and-swap (CAS) operation in shared-memory concurrency.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Strictly speaking, the term *atomic increment* uses the word *atomic* in the sense of multi-threaded
|
||||||
Strictly speaking, the term *atomic increment* uses the word *atomic* in the sense of multi-threaded
|
> programming. In the context of ACID, it should actually be called an *isolated* or *serializable*
|
||||||
programming. In the context of ACID, it should actually be called an *isolated* or *serializable*
|
> increment, but that’s not the usual term.
|
||||||
increment, but that’s not the usual term.
|
|
||||||
|
|
||||||
These single-object operations are useful, as they can prevent lost updates when several clients try
|
These single-object operations are useful, as they can prevent lost updates when several clients try
|
||||||
to write to the same object concurrently (see [“Preventing Lost Updates”](/en/ch8#sec_transactions_lost_update)). However, they are
|
to write to the same object concurrently (see [“Preventing Lost Updates”](/en/ch8#sec_transactions_lost_update)). However, they are
|
||||||
|
|
@ -510,12 +509,11 @@ financial data!”—but that misses the point. Even many popular relational dat
|
||||||
are usually considered “ACID”) use weak isolation, so they wouldn’t necessarily have prevented these
|
are usually considered “ACID”) use weak isolation, so they wouldn’t necessarily have prevented these
|
||||||
bugs from occurring.
|
bugs from occurring.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Incidentally, much of the banking system relies on text files that are exchanged via secure FTP
|
||||||
Incidentally, much of the banking system relies on text files that are exchanged via secure FTP
|
> [^35].
|
||||||
[^35].
|
> In this context, having an audit trail and some human-level fraud prevention measures is actually
|
||||||
In this context, having an audit trail and some human-level fraud prevention measures is actually
|
> more important than ACID properties.
|
||||||
more important than ACID properties.
|
|
||||||
|
|
||||||
Those examples also highlight an important point: even if concurrency issues are rare in normal
|
Those examples also highlight an important point: even if concurrency issues are rare in normal
|
||||||
operation, you have to consider the possibility that an attacker deliberately sends a burst of
|
operation, you have to consider the possibility that an attacker deliberately sends a burst of
|
||||||
|
|
@ -676,10 +674,9 @@ account 1 again at the end of the transaction, she would see a different value (
|
||||||
in her previous query. Read skew is considered acceptable under read committed isolation: the
|
in her previous query. Read skew is considered acceptable under read committed isolation: the
|
||||||
account balances that Aaliyah saw were indeed committed at the time when she read them.
|
account balances that Aaliyah saw were indeed committed at the time when she read them.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> The term *skew* is unfortunately overloaded: we previously used it in the sense of an *unbalanced
|
||||||
The term *skew* is unfortunately overloaded: we previously used it in the sense of an *unbalanced
|
> workload with hot spots* (see [“Skewed Workloads and Relieving Hot Spots”](/en/ch7#sec_sharding_skew)), whereas here it means *timing anomaly*.
|
||||||
workload with hot spots* (see [“Skewed Workloads and Relieving Hot Spots”](/en/ch7#sec_sharding_skew)), whereas here it means *timing anomaly*.
|
|
||||||
|
|
||||||
In Aaliyah’s case, this is not a lasting problem, because she will most likely see consistent account
|
In Aaliyah’s case, this is not a lasting problem, because she will most likely see consistent account
|
||||||
balances if she reloads the online banking website a few seconds later. However, some situations
|
balances if she reloads the online banking website a few seconds later. However, some situations
|
||||||
|
|
|
||||||
|
|
@ -139,11 +139,10 @@ messages (requests, responses) that are too big to fit in one packet. These appl
|
||||||
use TCP, the Transmission Control Protocol, to establish a *connection* that breaks up large data
|
use TCP, the Transmission Control Protocol, to establish a *connection* that breaks up large data
|
||||||
streams into individual packets, and puts them back together again on the receiving side.
|
streams into individual packets, and puts them back together again on the receiving side.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> Most of what we say about TCP applies also to its more recent alternative QUIC, as well as the
|
||||||
Most of what we say about TCP applies also to its more recent alternative QUIC, as well as the
|
> Stream Control Transmission Protocol (SCTP) used in WebRTC, the BitTorrent uTP protocol, and
|
||||||
Stream Control Transmission Protocol (SCTP) used in WebRTC, the BitTorrent uTP protocol, and
|
> other transport protocols. For a comparison to UDP, see [“TCP Versus UDP”](/en/ch9#sidebar_distributed_tcp_udp).
|
||||||
other transport protocols. For a comparison to UDP, see [“TCP Versus UDP”](/en/ch9#sidebar_distributed_tcp_udp).
|
|
||||||
|
|
||||||
TCP is often described as providing “reliable” delivery, in the sense that it detects and
|
TCP is often described as providing “reliable” delivery, in the sense that it detects and
|
||||||
retransmits dropped packets, it detects reordered packets and puts them back in the correct order,
|
retransmits dropped packets, it detects reordered packets and puts them back in the correct order,
|
||||||
|
|
@ -1018,12 +1017,11 @@ must respond quickly and predictably to their sensor inputs. In these systems, t
|
||||||
*deadline* by which the software must respond; if it doesn’t meet the deadline, that may cause a
|
*deadline* by which the software must respond; if it doesn’t meet the deadline, that may cause a
|
||||||
failure of the entire system. These are so-called *hard real-time* systems.
|
failure of the entire system. These are so-called *hard real-time* systems.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> In embedded systems, *real-time* means that a system is carefully designed and tested to meet
|
||||||
In embedded systems, *real-time* means that a system is carefully designed and tested to meet
|
> specified timing guarantees in all circumstances. This meaning is in contrast to the more vague use of the
|
||||||
specified timing guarantees in all circumstances. This meaning is in contrast to the more vague use of the
|
> term *real-time* on the web, where it describes servers pushing data to clients and stream
|
||||||
term *real-time* on the web, where it describes servers pushing data to clients and stream
|
> processing without hard response time constraints (see [Link to Come]).
|
||||||
processing without hard response time constraints (see [Link to Come]).
|
|
||||||
|
|
||||||
For example, if your car’s onboard sensors detect that you are currently experiencing a crash, you
|
For example, if your car’s onboard sensors detect that you are currently experiencing a crash, you
|
||||||
wouldn’t want the release of the airbag to be delayed due to an inopportune GC pause in the airbag
|
wouldn’t want the release of the airbag to be delayed due to an inopportune GC pause in the airbag
|
||||||
|
|
@ -1242,12 +1240,11 @@ token*, which is a number that increases every time a lock is granted (e.g., inc
|
||||||
service). We can then require that every time a client sends a write request to the storage service,
|
service). We can then require that every time a client sends a write request to the storage service,
|
||||||
it must include its current fencing token.
|
it must include its current fencing token.
|
||||||
|
|
||||||
###### Note
|
> [!NOTE]
|
||||||
|
> There are several alternative names for fencing tokens. In Chubby, Google’s lock service, they are
|
||||||
There are several alternative names for fencing tokens. In Chubby, Google’s lock service, they are
|
> called *sequencers* [^88], and in Kafka they are called *epoch numbers*.
|
||||||
called *sequencers* [^88], and in Kafka they are called *epoch numbers*.
|
> In consensus algorithms, which we will discuss in [Chapter 10](/en/ch10#ch_consistency), the *ballot number* (Paxos) or
|
||||||
In consensus algorithms, which we will discuss in [Chapter 10](/en/ch10#ch_consistency), the *ballot number* (Paxos) or
|
> *term number* (Raft) serves a similar purpose.
|
||||||
*term number* (Raft) serves a similar purpose.
|
|
||||||
|
|
||||||
In [Figure 9-6](/en/ch9#fig_distributed_fencing), client 1 acquires the lease with a token of 33, but then
|
In [Figure 9-6](/en/ch9#fig_distributed_fencing), client 1 acquires the lease with a token of 33, but then
|
||||||
it goes into a long pause and the lease expires. Client 2 acquires the lease with a token of 34 (the
|
it goes into a long pause and the lease expires. Client 2 acquires the lease with a token of 34 (the
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue