fix reference summary

2026-06-25 02:46:51 +08:00 · 2025-08-09 16:09:53 +08:00 · 2025-08-09 16:09:53 +08:00 · 4ec385f161
commit 4ec385f161
parent 752c2f58c7
14 changed files with 2811 additions and 3255 deletions
--- a/content/en/ch1.md
+++ b/content/en/ch1.md
@ -252,9 +252,7 @@ the data warehouse. This process of getting data into the data warehouse is know
 *transform* and *load* steps is swapped (i.e., the transformation is done in the data warehouse,
 after loading), resulting in *ELT*.
-![ddia 0101](/fig/ddia_0101.png)
+{{< figure src="/fig/ddia_0101.png" id="fig_dwh_etl" title="Figure 1-1. Simplified outline of ETL into a data warehouse." class="w-full my-4" >}}
 ###### Figure 1-1. Simplified outline of ETL into a data warehouse.
 In some cases the data sources of the ETL processes are external SaaS products such as customer
 relationship management (CRM), email marketing, or credit card processing systems. In those cases,
@ -428,9 +426,10 @@ the other extreme are widely-used cloud services or Software as a Service (SaaS)
 implemented and operated by an external vendor, and which you only access through a web interface or
 API.
 ![ddia 0102](/fig/ddia_0102.png)
-###### Figure 1-2. A spectrum of types of software and its operations.
+{{< figure src="/fig/ddia_0102.png" id="fig_cloud_spectrum" title="Figure 1-2. A spectrum of types of software and its operations." class="w-full my-4" >}}
 The middle ground is off-the-shelf software (open source or commercial) that you *self-host*, i.e.,
 deploy yourself—for example, if you download MySQL and install it on a server you control. This
@ -962,7 +961,7 @@ whose data you are collecting and processing. There is much more to this topic;
 will go deeper into the topics of ethics and legal compliance, including the problems of bias and
 discrimination.
-# Summary
+## Summary
 The theme of this chapter has been to understand trade-offs: that is, to recognize that for many
 questions there is not one right answer, but several different approaches that each have various
@ -994,9 +993,7 @@ data is being processed—an aspect that many engineers are prone to ignoring. H
 requirements into technical implementations is not yet well understood, but it’s important to keep
 this question in mind as we move through the rest of this book.
-## Footnotes
+### References
 ## References
 [^1]: Richard T. Kouzes, Gordon A. Anderson, Stephen T. Elbert, Ian Gorton, and Deborah K. Gracio. [The Changing Paradigm of Data-Intensive Computing](http://www2.ic.uff.br/~boeres/slides_AP/papers/TheChanginParadigmDataIntensiveComputing_2009.pdf). *IEEE Computer*, volume 42, issue 1, January 2009. [doi:10.1109/MC.2009.26](https://doi.org/10.1109/MC.2009.26)
 [^2]: Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. [Local-first software: you own your data, in spite of the cloud](https://www.inkandswitch.com/local-first/). At *2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software* (Onward!), October 2019. [doi:10.1145/3359591.3359737](https://doi.org/10.1145/3359591.3359737)
--- a/content/en/ch10.md
+++ b/content/en/ch10.md
@ -90,8 +90,7 @@ guarantee*. To clarify this idea, let’s look at an example of a system that is
 ###### Figure 10-1. This system is not linearizable, causing sports fans to be confused.
-[Figure 10-1](/en/ch10#fig_consistency_linearizability_0) shows an example of a nonlinearizable sports website
+[Figure 10-1](/en/ch10#fig_consistency_linearizability_0) shows an example of a nonlinearizable sports website [^4].
 [^4].
 Aaliyah and Bryce are sitting in the same room, both checking their phones to see the outcome of a
 game their favorite team is playing. Just after the final score is announced, Aaliyah refreshes the
 page, sees the winner announced, and excitedly tells Bryce about it. Bryce incredulously hits
@ -216,17 +215,14 @@ There are a few interesting details to point out in [Figure 10-4](/en/ch10#fig_
 so B is not allowed to read an older value than A. Again, it’s the same situation as with Aaliyah
 and Bryce in [Figure 10-1](/en/ch10#fig_consistency_linearizability_0).
-That is the intuition behind linearizability; the formal definition
+That is the intuition behind linearizability; the formal definition [^1] describes it more precisely. It is
 [^1] describes it more precisely. It is
 possible (though computationally expensive) to test whether a system’s behavior is linearizable by
 recording the timings of all requests and responses, and checking whether they can be arranged into
-a valid sequential order [[6](/en/ch10#Kingsbury2014knossos),
+a valid sequential order [[^6], [^7]].
 [7](/en/ch10#Kingsbury2020elle)].
 Just as there are various weak isolation levels for transactions besides serializability (see
 [“Weak Isolation Levels”](/en/ch8#sec_transactions_isolation_levels)), there are also various weaker consistency models for
-replicated systems besides linearizability
+replicated systems besides linearizability [^8].
 [^8].
 In fact, the *read-after-write*, *monotonic reads*, and *consistent prefix reads* properties we saw
 in [“Problems with Replication Lag”](/en/ch6#sec_replication_lag) are examples of such weaker consistency models. Linearizability
 guarantees all these weaker properties, and more. In this chapter we will focus on linearizability,
@ -255,24 +251,20 @@ Linearizability
 Serializability does not have that requirement: for example, stale reads are allowed by
 serializability [^10].
-(*Sequential consistency* is something else again
+(*Sequential consistency* is something else again [^8], but we won’t discuss it here.)
 [^8], but we won’t discuss it here.)
 A database may provide both serializability and linearizability, and this combination is known as
 *strict serializability* or *strong one-copy serializability* (*strong-1SR*)
-[[11](/en/ch10#Bailis2014virtues_ch10),
+[[^11], [^12]].
 [12](/en/ch10#Bernstein1987_ch10)].
 Single-node databases are typically linearizable. With distributed databases using optimistic
 methods like serializable snapshot isolation (see [“Serializable Snapshot Isolation (SSI)”](/en/ch8#sec_transactions_ssi)) the situation is more
 complicated: for example, CockroachDB provides serializability, and some recency guarantees on
 reads, but not strict serializability [^13]
-because this would require expensive coordination between transactions
+because this would require expensive coordination between transactions [^14].
 [^14].
 It is also possible to combine a weaker isolation level with linearizability, or a weaker
 consistency model with serializability; in fact, consistency model and isolation level can be chosen
-largely independently from each other [[15](/en/ch10#Darnell2022),
+largely independently from each other [[^15], [^16]].
 [16](/en/ch10#Abadi2019consistency)].
 ## Relying on Linearizability
@ -285,13 +277,11 @@ requirement for making a system work correctly.
 A system that uses single-leader replication needs to ensure that there is indeed only one leader,
 not several (split brain). One way of electing a leader is to use a lease: every node that starts up
-tries to acquire the lease, and the one that succeeds becomes the leader
+tries to acquire the lease, and the one that succeeds becomes the leader [^17].
 [^17].
 No matter how this mechanism is implemented, it must be linearizable: it should not be possible for
 two different nodes to acquire the lease at the same time.
-Coordination services like Apache ZooKeeper
+Coordination services like Apache ZooKeeper [^18]
 [^18]
 and etcd are often used to implement distributed leases and leader election. They use consensus
 algorithms to implement linearizable operations in a fault-tolerant way (we discuss such algorithms
 later in this chapter). There are still many subtle details to implementing leases and leader
@ -305,8 +295,7 @@ linearizable storage service is the basic foundation for these coordination task
 > etcd since version 3 provides linearizable reads by default.
 Distributed locking is also used at a much more granular level in some distributed databases, such as
-Oracle Real Application Clusters (RAC)
+Oracle Real Application Clusters (RAC) [^19].
 [^19].
 RAC uses a lock per disk page, with multiple nodes sharing access
 to the same disk storage system. Since these linearizable locks are on the critical path of
 transaction execution, RAC deployments usually have a dedicated cluster interconnect network for
@ -338,8 +327,7 @@ loosely interpreted constraints in [Link to Come].
 However, a hard uniqueness constraint, such as the one you typically find in relational databases,
 requires linearizability. Other kinds of constraints, such as foreign key or attribute constraints,
-can be implemented without linearizability
+can be implemented without linearizability [^20].
 [^20].
 ### Cross-channel timing dependencies
@ -469,20 +457,16 @@ returns the new value. (It’s once again the Aaliyah and Bryce situation from
 It is possible to make Dynamo-style quorums linearizable at the cost of reduced
 performance: a reader must perform read repair (see [“Catching up on missed writes”](/en/ch6#sec_replication_read_repair)) synchronously,
-before returning results to the application
+before returning results to the application [^24].
 [^24].
 Moreover, before writing, a writer must read the latest state of a quorum of nodes to fetch the
 latest timestamp of any prior write, and ensure that the new write has a greater timestamp
-[[25](/en/ch10#Lynch1997),
+[[^25], [^26]].
 [26](/en/ch10#Cachin2011)].
 However, Riak does not perform synchronous read repair due to the performance penalty.
-Cassandra does wait for read repair to complete on quorum reads
+Cassandra does wait for read repair to complete on quorum reads [^27],
 [^27],
 but it loses linearizability due to its use of time-of-day clocks for timestamps.
 Moreover, only linearizable read and write operations can be implemented in this way; a
-linearizable compare-and-set operation cannot, because it requires a consensus algorithm
+linearizable compare-and-set operation cannot, because it requires a consensus algorithm [^28].
 [^28].
 In summary, it is safest to assume that a leaderless system with Dynamo-style replication does not
 provide linearizability, even with quorum reads and writes.
@ -545,31 +529,23 @@ The trade-off is as follows:
 Thus, applications that don’t require linearizability can be more tolerant of network problems. This
 insight is popularly known as the *CAP theorem*
-[[29](/en/ch10#Fox1999),
+[[^29], [^30], [^31], [^32]],
 [30](/en/ch10#Gilbert2002),
 [31](/en/ch10#Gilbert2012),
 [32](/en/ch10#Brewer2012rules)],
 named by Eric Brewer in 2000, although the trade-off had been known to designers of
 distributed databases since the 1970s
-[[33](/en/ch10#Davidson1985),
+[[^33], [^34], [^35]].
 [34](/en/ch10#Johnson1975),
 [35](/en/ch10#Fischer1982)].
 CAP was originally proposed as a rule of thumb, without precise definitions, with the goal of
 starting a discussion about trade-offs in databases. At the time, many distributed databases
-focused on providing linearizable semantics on a cluster of machines with shared storage
+focused on providing linearizable semantics on a cluster of machines with shared storage [^19], and CAP encouraged database engineers
 [^19], and CAP encouraged database engineers
 to explore a wider design space of distributed shared-nothing systems, which were more suitable for
-implementing large-scale web services
+implementing large-scale web services [^36].
 [^36].
 CAP deserves credit for this culture shift—it helped trigger the NoSQL movement, a burst of new
 database technologies around the mid-2000s.
 # The Unhelpful CAP Theorem
 CAP is sometimes presented as *Consistency, Availability, Partition tolerance: pick 2 out of 3*.
-Unfortunately, putting it this way is misleading
+Unfortunately, putting it this way is misleading [^32] because network partitions are a kind of
 [^32] because network partitions are a kind of
 fault, so they aren’t something about which you have a choice: they will happen whether you like it
 or not.
@ -581,16 +557,13 @@ either linearizability or total availability. Thus, a better way of phrasing CAP
 A more reliable network needs to make this choice less often, but at some point the choice is
 inevitable.
-The CP/AP classification scheme has several further flaws
+The CP/AP classification scheme has several further flaws [^4]. *Consistency* is formalized as
 [^4]. *Consistency* is formalized as
 linearizability (the theorem doesn’t say anything about weaker consistency models), and the
 formalization of *availability* [^30] does not
-match the usual meaning of the term
+match the usual meaning of the term [^38]. Many highly available (fault-tolerant) systems actually do not meet CAP’s
 [^38]. Many highly available (fault-tolerant) systems actually do not meet CAP’s
 idiosyncratic definition of availability. Moreover, some system designers choose (with good reason)
 to provide neither linearizability nor the form of availability that the CAP theorem assumes, so
-those systems are neither CP nor AP [[39](/en/ch10#Abadi2010),
+those systems are neither CP nor AP [[^39], [^40]].
 [40](/en/ch10#Abadi2017)].
 All in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us
 understand systems better, so CAP is best avoided.
@ -601,31 +574,25 @@ fault (network partitions, which according to data from Google are the cause of
 incidents [^41]).
 It doesn’t say anything about network delays, dead nodes, or other trade-offs. Thus, although CAP
 has been historically influential, it has little practical value for designing systems
-[[4](/en/ch10#Kleppmann2015stop),
+[[^4], [^38]].
 [38](/en/ch10#Kleppmann2015critique)].
 There have been efforts to generalize CAP. For example, the *PACELC principle* observes that system
 designers might also choose to weaken consistency at times when the network is working fine in order
-to reduce latency [[39](/en/ch10#Abadi2010),
+to reduce latency [[^39], [^40], [^42]].
 [40](/en/ch10#Abadi2017),
 [42](/en/ch10#Abadi2012)].
 Thus, during a network partition (P), we need to choose between availability (A) and consistency
 (C); else (E), when there is no partition, we may choose between low latency (L) and
 consistency (C). However, this definition inherits several problems with CAP, such as the
 counterintuitive definitions of consistency and availability.
-There are many more interesting impossibility results in distributed systems
+There are many more interesting impossibility results in distributed systems [^43],
 [^43],
 and CAP has now been superseded by more precise results
-[[44](/en/ch10#Mahajan2011),
+[[^44], [^45]],
 [45](/en/ch10#Attiya2015)],
 so it is of mostly historical interest today.
 ### Linearizability and network delays
 Although linearizability is a useful guarantee, surprisingly few systems are actually linearizable
-in practice. For example, even RAM on a modern multi-core CPU is not linearizable
+in practice. For example, even RAM on a modern multi-core CPU is not linearizable [^46]:
 [^46]:
 if a thread running on one CPU core writes to a memory address, and a thread on another CPU core
 reads the same address shortly afterward, it is not guaranteed to read the value written by the
 first thread (unless a *memory barrier* or *fence*
@ -633,8 +600,7 @@ first thread (unless a *memory barrier* or *fence*
 The reason for this behavior is that every CPU core has its own memory cache and store buffer.
 Memory access first goes to the cache by default, and any changes are asynchronously written out to
-main memory. Since accessing data in the cache is much faster than going to main memory
+main memory. Since accessing data in the cache is much faster than going to main memory [^48], this feature is essential for
 [^48], this feature is essential for
 good performance on modern CPUs. However, there are now several copies of the data (one in main
 memory, and perhaps several more in various caches), and these copies are asynchronously updated, so
 linearizability is lost.
@ -642,12 +608,10 @@ linearizability is lost.
 Why make this trade-off? It makes no sense to use the CAP theorem to justify the multi-core memory
 consistency model: within one computer we usually assume reliable communication, and we don’t expect
 one CPU core to be able to continue operating normally if it is disconnected from the rest of the
-computer. The reason for dropping linearizability is *performance*, not fault tolerance
+computer. The reason for dropping linearizability is *performance*, not fault tolerance [^39].
 [^39].
 The same is true of many distributed databases that choose not to provide linearizable guarantees:
-they do so primarily to increase performance, not so much for fault tolerance
+they do so primarily to increase performance, not so much for fault tolerance [^42].
 [^42].
 Linearizability is slow—and this is true all the time, not only during a network fault.
 Can’t we maybe find a more efficient implementation of linearizable storage? It seems the answer is
@ -826,8 +790,7 @@ limitations:
 different nodes have wildly different counter values.
 A *hybrid logical clock* combines the advantages of physical time-of-day clocks with the ordering
-guarantees of Lamport clocks
+guarantees of Lamport clocks [^55].
 [^55].
 Like a physical clock, it counts seconds or microseconds. Like a Lamport clock, when one node sees a
 timestamp from another node that is greater than its local clock value, it moves its own local value
 forward to match the other node’s timestamp. As a result, if one node’s clock is running fast, the
@ -850,8 +813,7 @@ In [“Multi-version concurrency control (MVCC)”](/en/ch8#sec_transactions_sna
 essentially, by giving each transaction a transaction ID, and allowing each transaction to see
 writes made by transactions with a lower ID, but to make writes by transactions with higher IDs
 invisible. Lamport clocks and hybrid logical clocks are a good way of generating these transaction
-IDs, because they ensure that the snapshot is consistent with causality
+IDs, because they ensure that the snapshot is consistent with causality [^56].
 [^56].
 When multiple timestamps are generated concurrently, these algorithms order them arbitrarily. This
 means that when you look at two timestamps, you generally can’t tell whether they were generated
@ -983,28 +945,18 @@ node, but which get a lot harder if you want fault tolerance:
 It turns out that all of these are instances of the same fundamental distributed systems problem:
 *consensus*. Consensus is one of the most important and fundamental problems in distributed
 computing; it is also infamously difficult to get right
-[[58](/en/ch10#Chandra2007),
+[[^58], [^59]],
 [59](/en/ch10#Portnoy2012)],
 and many systems have got it wrong in the past. Now that we have discussed replication
 ([Chapter 6](/en/ch6#ch_replication)), transactions ([Chapter 8](/en/ch8#ch_transactions)), system models ([Chapter 9](/en/ch9#ch_distributed)), and
 linearizability (this chapter), we are finally ready to tackle the consensus problem.
 The best-known consensus algorithms are Viewstamped Replication
-[[60](/en/ch10#Oki1988),
+[[^60], [^61]],
-[61](/en/ch10#Liskov2012)],
+Paxos [[^58], [^62], [^63], [^64]],
-Paxos [[58](/en/ch10#Chandra2007),
+Raft [[^23], [^65], [^66]],
-[62](/en/ch10#Lamport1998),
+and Zab [[^18], [^22], [^67]].
 [63](/en/ch10#Lamport2001),
 [64](/en/ch10#vanRenesse2011)],
 Raft [[23](/en/ch10#Ongaro2014atc),
 [65](/en/ch10#Ongaro2014thesis),
 [66](/en/ch10#Howard2015refloated)],
 and Zab [[18](/en/ch10#Junqueira2013_ch10),
 [22](/en/ch10#Junqueira2011),
 [67](/en/ch10#Medeiros2012)].
 There are quite a few similarities between these algorithms, but they are not the same
-[[68](/en/ch10#vanRenesse2014),
+[[^68], [^69]].
 [69](/en/ch10#Howard2020)].
 These algorithms work in a non-Byzantine system model: that is, network communication may be
 arbitrarily delayed or dropped, and nodes may crash, restart, and become disconnected, but the
 algorithms assume that nodes otherwise follow the protocol correctly and do not behave maliciously.
@ -1012,17 +964,14 @@ algorithms assume that nodes otherwise follow the protocol correctly and do not
 There are also consensus algorithms that can tolerate some Byzantine nodes, i.e., nodes that don’t
 correctly follow the protocol (for example, by sending contradictory messages to other nodes). A
 common assumption is that fewer than one-third of the nodes are Byzantine-faulty
-[[26](/en/ch10#Cachin2011),
+[[^26], [^70]].
-[70](/en/ch10#Castro2002)].
+Such *Byzantine fault tolerant* (BFT) consensus algorithms are used in blockchains [^71].
 Such *Byzantine fault tolerant* (BFT) consensus algorithms are used in blockchains
 [^71].
 However, as explained in [“Byzantine Faults”](/en/ch9#sec_distributed_byzantine), BFT algorithms are beyond the scope of this
 book.
 # The Impossibility of Consensus
-You may have heard about the FLP result
+You may have heard about the FLP result [^72]—named after the
 [^72]—named after the
 authors Fischer, Lynch, and Paterson—which proves that there is no algorithm that is always able to
 reach consensus if there is a risk that a node may crash. In a distributed system, we must assume
 that nodes may crash, so reliable consensus is impossible. Yet, here we are, discussing algorithms
@ -1118,15 +1067,13 @@ and is never going to come back online.)
 Of course, if *all* nodes crash and none of them are running, then it is not possible for any
 algorithm to decide anything. There is a limit to the number of failures that an algorithm can
 tolerate: in fact, it can be proved that any consensus algorithm requires at least a majority of
-nodes to be functioning correctly in order to assure termination
+nodes to be functioning correctly in order to assure termination [^73]. That majority can safely form a quorum
 [^73]. That majority can safely form a quorum
 (see [“Quorums for reading and writing”](/en/ch6#sec_replication_quorum_condition)).
 Thus, the termination property is subject to the assumption that fewer than half of the nodes are
 crashed or unreachable. However, most consensus algorithms ensure that the safety
 properties—agreement, integrity, and validity—are always met, even if a majority of nodes fail or
-there is a severe network problem
+there is a severe network problem [^75].
 [^75].
 Thus, a large-scale outage can stop the system from being able to process requests, but it cannot
 corrupt the consensus system by causing it to make inconsistent decisions.
@ -1148,8 +1095,7 @@ consensus. Any CAS invocations whose new value was not decided return an error.
 different expected values use separate runs of the consensus protocol.
 This shows that CAS and consensus are equivalent to each other
-[[28](/en/ch10#Herlihy1991),
+[[^28], [^73]].
 [73](/en/ch10#Chandra1996)].
 Again, both are straightforward on a single node, but challenging to make fault-tolerant. As an
 example of CAS in a distributed setting, we saw conditional write operations for object stores in
 [“Databases backed by object storage”](/en/ch6#sec_replication_object_storage), which allow a write to happen only if an object with the same
@ -1159,8 +1105,7 @@ However, a linearizable read-write register is not sufficient to solve consensus
 tells us that consensus cannot be solved by a deterministic algorithm in the asynchronous crash-stop
 model [^72], but we saw in
 [“Linearizability and quorums”](/en/ch10#sec_consistency_quorum_linearizable) that a linearizable register can be implemented using quorum
-reads/writes in this model [[24](/en/ch10#Attiya1995),
+reads/writes in this model [[^24], [^25], [^26]].
 [25](/en/ch10#Lynch1997), [26](/en/ch10#Cachin2011)].
 From this it follows that a linearizable register cannot solve consensus.
 ### Shared logs as consensus
@ -1198,21 +1143,19 @@ Validity
 > [!NOTE]
 > A shared log is formally known as a *total order broadcast*, *atomic broadcast*, or *total order
-> multicast* protocol [[26](/en/ch10#Cachin2011),
+> multicast* protocol [[^26],
-> [76](/en/ch10#Defago2004),
+> [^76],
-> [77](/en/ch10#Attiya2004)].
+> [^77]].
 > It’s the same thing described in different words: requesting a value to be added to the log is then
 > called “broadcasting” it, and reading a log entry is called “delivering” it.
 If you have an implementation of a shared log, it is easy to solve the consensus problem: every node
 that wants to propose a value requests for it to be added to the log, and whichever value is read
 back in the first log entry is the value that is decided. Since all nodes read log entries in the
-same order, they are guaranteed to agree on which value is delivered first
+same order, they are guaranteed to agree on which value is delivered first [^28].
 [^28].
 Conversely, if you have a solution for consensus, you can implement a shared log. The details are a
-bit more complicated, but the basic idea is this
+bit more complicated, but the basic idea is this [^73]:
 [^73]:
 1. You have a slot in the log for every future log entry, and you run a separate instance of the
 consensus algorithm for every such slot to decide what value should go in that entry.
@ -1260,8 +1203,7 @@ An exception is if we know for sure that no more than two nodes will propose a v
 the nodes can send each other the values they want to propose, and then each perform the
 fetch-and-add operation. The node that reads zero decides its own value, and the node that reads one
 decides the other node’s value. This solves the consensus problem among two nodes, which is why we
-can say that fetch-and-add has a *consensus number* of two
+can say that fetch-and-add has a *consensus number* of two [^28].
 [^28].
 In contrast, CAS and shared logs solve consensus for any number of nodes that may propose values, so
 they have a consensus number of ∞ (infinity).
@ -1276,8 +1218,7 @@ What is the relationship between consensus and atomic commitment? At first glanc
 similar—both require nodes to come to some form of agreement. However, there is one important
 difference: with consensus it’s okay to decide any value that proposed, whereas with atomic
 commitment the algorithm *must* abort if *any* of the participants voted to abort. More precisely,
-atomic commitment requires the following properties
+atomic commitment requires the following properties [^78]:
 [^78]:
 Uniform agreement
 : No two nodes decide on different outcomes.
@ -1302,8 +1243,7 @@ any of the communication among the nodes times out). The other three properties
 same as for consensus.
 If you have a solution for consensus, there are multiple ways you could solve atomic commitment
-[[78](/en/ch10#Guerraoui1995),
+[[^78], [^79]].
 [79](/en/ch10#Gray2006)].
 One works like this: when you want to commit the transaction, every node sends its vote to commit or
 abort to every other node. Nodes that receive a vote to commit from itself and every other node
 propose “commit” using the consensus algorithm; nodes that receive a vote to abort, or which
@ -1350,8 +1290,7 @@ Similarly, a shared log can be used to implement serializable transactions: as d
 [“Actual Serial Execution”](/en/ch8#sec_transactions_serial), if every log entry represents a deterministic transaction to be
 executed as a stored procedure, and if every node executes those transactions in the same order,
 then the transactions will be serializable
-[[81](/en/ch10#Thomson2012),
+[[^81], [^82]].
 [82](/en/ch10#Balakrishnan2013)].
 > [!NOTE]
 > Sharded databases with a strong consistency model often maintain a separate log per shard, which
@ -1411,12 +1350,10 @@ A node votes yes only if it is not aware of any other leader with a higher epoch
 Thus, we have two rounds of voting: once to choose a leader, and a second time to vote on a leader’s
 proposal for the next entry to append to the log. The quorums for those two votes must overlap: if
 a vote on a proposal succeeds, at least one of the nodes that voted for it must have also
-participated in the most recent successful leader election
+participated in the most recent successful leader election [^85]. Thus, if the vote on a proposal
 [^85]. Thus, if the vote on a proposal
 passes without revealing any higher-numbered epoch, the current leader can conclude that no leader
 with a higher epoch number has been elected, and therefore it can safely append the proposed entry
-to the log [[26](/en/ch10#Cachin2011),
+to the log [[^26], [^86]].
 [86](/en/ch10#Kleppmann2024distsys)].
 These two rounds of voting look superficially similar to two-phase commit, but they are very
 different protocols. In consensus algorithms, any node can start an election and it requires only a
@ -1427,8 +1364,7 @@ vote from *every* participant before it can commit.
 This basic structure is common to all of Raft, Multi-Paxos, Zab, and Viewstamped Replication: a vote
 by a quorum of nodes elects a leader, and then another quorum vote is required for every entry that
-the leader wants to append to the log [[68](/en/ch10#vanRenesse2014),
+the leader wants to append to the log [[^68], [^69]]. Every new log entry is synchronously replicated
 [69](/en/ch10#Howard2020)]. Every new log entry is synchronously replicated
 to a quorum of nodes before it is confirmed to the client that requested the write. This ensures
 that the log entry won’t be lost if the current leader fails.
@ -1436,8 +1372,7 @@ However, the devil is in the details, and that’s also where these algorithms t
 approaches. For example, when the old leader fails and a new one is elected, the algorithm needs to
 ensure that the new leader honors any log entries that had already been appended by the old leader
 before it failed. Raft does this by only allowing a node to become the new leader if its log is at
-least as up-to-date as a majority of its followers
+least as up-to-date as a majority of its followers [^69].
 [^69].
 In contrast, Paxos allows any node to become the new leader, but requires it to bring its log
 up-to-date with other nodes before it can start appending new entries of its own.
@ -1463,9 +1398,7 @@ easily cause a lot of data loss or corruption.
 Another subtlety is in how the algorithms deal with log entries that had been proposed by the old
 leader before it failed, but for which the vote on appending to the log had not yet completed. You
 can find discussions of these details in the references for this chapter
-[[23](/en/ch10#Ongaro2014atc),
+[[^23], [^69], [^86]].
 [69](/en/ch10#Howard2020),
 [86](/en/ch10#Kleppmann2024distsys)].
 For databases that use a consensus algorithm for replication, not only do writes need to be turned
 into log entries and replicated to a quorum. If you want to guarantee linearizable reads, they also
@ -1508,8 +1441,7 @@ work.
 Sometimes, consensus algorithms are particularly sensitive to network problems. For example, Raft
 has been shown to have unpleasant edge cases
-[[88](/en/ch10#Howard2015coracle),
+[[^88], [^89]]:
 [89](/en/ch10#Lianza2020_ch10)]:
 if the entire network is working correctly except for one particular network link that is
 consistently unreliable, Raft can get into situations where leadership continually bounces between
 two nodes, or the current leader is continually forced to resign, so the system effectively never
@ -1536,8 +1468,7 @@ entirely in memory (although they still write to disk for durability), which is
 multiple nodes using a fault-tolerant consensus algorithm.
 Coordination services are modeled after Google’s Chubby lock service
-[[17](/en/ch10#Burrows2006_ch10),
+[[^17], [^58]].
 [58](/en/ch10#Chandra2007)].
 They combine a consensus algorithm with several other features that turn out to be particularly
 useful when building distributed systems:
@ -1614,8 +1545,7 @@ information like “the node running on IP address 10.1.1.23 is the leader for s
 assignments usually change on a timescale of minutes or hours. Coordination services are not
 intended for storing data that may change thousands of times per second. For that, it is better to
 use a conventional database; alternatively, tools like Apache BookKeeper
-[[90](/en/ch10#Kelly2014),
+[[^90], [^91]]
 [91](/en/ch10#Vanlightly2021)]
 can be used to replicate fast-changing internal state of a service.
 ### Service discovery
@ -1645,7 +1575,7 @@ algorithm’s voting process. Reads from an observer are not linearizable as the
 they remain available even if the network is interrupted, and they increase the read throughput that
 the system can support by caching.
-# Summary
+## Summary
 In this chapter we examined the topic of strong consistency in fault-tolerant systems: what it is,
 and how to achieve it. We looked in depth at linearizability, a popular formalization of strong
@ -1731,8 +1661,6 @@ availability and better performance. In these cases, it is common to use leaderl
 replication, which we previously discussed in [Chapter 6](/en/ch6#ch_replication). The logical clocks that we
 discussed in this chapter are helpful in that context.
 ### Footnotes
 ### References
 [^1]: Maurice P. Herlihy and Jeannette M. Wing. [Linearizability: A Correctness Condition for Concurrent Objects](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf). *ACM Transactions on Programming Languages and Systems* (TOPLAS), volume 12, issue 3, pages 463–492, July 1990. [doi:10.1145/78969.78972](https://doi.org/10.1145/78969.78972) 
--- a/content/en/ch11.md
+++ b/content/en/ch11.md
@ -35,7 +35,7 @@ Stream processing is somewhere between online and offline/batch processing (so i
 As we shall see in this chapter, batch processing is an important building block in our quest to build reliable, scalable, and maintainable applications. For example, Map‐ Reduce, a batch processing algorithm published in 2004 [1], was (perhaps over- enthusiastically) called “the algorithm that makes Google so massively scalable” [2]. It was subsequently implemented in various open source data systems, including Hadoop, CouchDB, and MongoDB.
-MapReduce is a fairly low-level programming model compared to the parallel pro‐ cessing systems that were developed for data warehouses many years previously [3, 4], but it was a major step forward in terms of the scale of processing that could be achieved on commodity hardware. Although the importance of MapReduce is now declining [5], it is still worth understanding, because it provides a clear picture of why and how batch processing is useful.
+MapReduce is a fairly low-level programming model compared to the parallel pro‐ cessing systems that were developed for data warehouses many years previously [^3] [^4], but it was a major step forward in terms of the scale of processing that could be achieved on commodity hardware. Although the importance of MapReduce is now declining [5], it is still worth understanding, because it provides a clear picture of why and how batch processing is useful.
 In fact, batch processing is a very old form of computing. Long before programmable digital computers were invented, punch card tabulating machines—such as the Hol‐ lerith machines used in the 1890 US Census [6]—implemented a semi-mechanized form of batch processing to compute aggregate statistics from large inputs. And Map‐ Reduce bears an uncanny resemblance to the electromechanical IBM card-sorting machines that were widely used for business data processing in the 1940s and 1950s [7]. As usual, history has a tendency of repeating itself.
@ -94,7 +94,7 @@ In the next chapter, we will turn to stream processing, in which the input is *u
-## References
+### References
 1. Jeffrey Dean and Sanjay Ghemawat: “[MapReduce: Simplified Data Processing on Large Clusters](https://research.google/pubs/pub62/),” at *6th USENIX Symposium on Operating System Design and Implementation* (OSDI), December 2004.
 1. Joel Spolsky: “[The Perils of JavaSchools](https://www.joelonsoftware.com/2005/12/29/the-perils-of-javaschools-2/),” *joelonsoftware.com*, December 29, 2005.
--- a/content/en/ch12.md
+++ b/content/en/ch12.md
@ -75,7 +75,7 @@ Finally, we discussed techniques for achieving fault tolerance and exactly-once
-## References
+### References
 1. Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.: “[The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing](http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf),” *Proceedings of the VLDB Endowment*, volume 8, number 12, pages 1792–1803, August 2015. [doi:10.14778/2824032.2824076](http://dx.doi.org/10.14778/2824032.2824076)
 1. Harold Abelson, Gerald Jay Sussman, and Julie Sussman: [*Structure and Interpretation of Computer Programs*](https://web.archive.org/web/20220807043536/https://mitpress.mit.edu/sites/default/files/sicp/index.html), 2nd edition. MIT Press, 1996. ISBN: 978-0-262-51087-5, available online at *mitpress.mit.edu*
--- a/content/en/ch13.md
+++ b/content/en/ch13.md
@ -48,7 +48,7 @@ Finally, we took a step back and examined some ethical aspects of building data-
 As software and data are having such a large impact on the world, we engineers must remember that we carry a responsibility to work toward the kind of world that we want to live in: a world that treats people with humanity and respect. I hope that we can work together toward that goal.
-## References
+### References
 1. Rachid Belaid: “[Postgres Full-Text Search is Good Enough!](http://rachbelaid.com/postgres-full-text-search-is-good-enough/),” *rachbelaid.com*, July 13, 2015.
 1. Philippe Ajoux, Nathan Bronson, Sanjeev Kumar, et al.: “[Challenges to Adopting Stronger Consistency at Scale](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-ajoux.pdf),” at *15th USENIX Workshop on Hot Topics in Operating Systems* (HotOS), May 2015.
--- a/content/en/ch2.md
+++ b/content/en/ch2.md
@ -187,24 +187,19 @@ time out and resend their request. This causes the rate of requests to increase
 the problem worse—a *retry storm*. Even when the load is reduced again, such a system may remain in
 an overloaded state until it is rebooted or otherwise reset. This phenomenon is called a *metastable
 failure*, and it can cause serious outages in production systems
-[[7](/en/ch2#Bronson2021),
+[[^7], [^8]].
 [8](/en/ch2#Brooker2021)].
 To avoid retries overloading a service, you can increase and randomize the time between successive
 retries on the client side (*exponential backoff*
-[[9](/en/ch2#Brooker2015),
+[[^9], [^10]]),
 [10](/en/ch2#Brooker2022backoff)]),
 and temporarily stop sending requests to a service that has returned errors or timed out recently
-(using a *circuit breaker* [[11](/en/ch2#Nygard2018),
+(using a *circuit breaker* [[^11], [^12]]
 [12](/en/ch2#Chen2022)]
 or *token bucket* algorithm [^13]).
 The server can also detect when it is approaching overload and start proactively rejecting requests
 (*load shedding* [^14]), and send back
 responses asking clients to slow down (*backpressure*
-[[1](/en/ch2#Cvet2016),
+[[^1], [^15]]).
-[15](/en/ch2#Sackman2016_ch2)]).
+The choice of queueing and load-balancing algorithms can also make a difference [^16].
 The choice of queueing and load-balancing algorithms can also make a difference
 [^16].
 In terms of performance metrics, the response time is usually what users care about the most,
 whereas the throughput determines the required computing resources (e.g., how many servers you need),
@ -242,8 +237,7 @@ to another. You will encounter this style of diagram frequently over the course
 The response time can vary significantly from one request to the next, even if you keep making the
 same request over and over again. Many factors can add random delays: for example, a context switch
 to a background process, the loss of a network packet and TCP retransmission, a garbage collection
-pause, a page fault forcing a read from disk, mechanical vibrations in the server rack
+pause, a page fault forcing a read from disk, mechanical vibrations in the server rack [^17],
 [^17],
 or many other causes. We will discuss this topic in more detail in [“Timeouts and Unbounded Delays”](/en/ch9#sec_distributed_queueing).
 Queueing delays often account for a large part of the variability in response times. As a server
@ -291,8 +285,7 @@ directly affect users’ experience of the service. For example, Amazon describe
 requirements for internal services in terms of the 99.9th percentile, even though it only affects 1
 in 1,000 requests. This is because the customers with the slowest requests are often those who have
 the most data on their accounts because they have made many purchases—that is, they’re the most
-valuable customers
+valuable customers [^19].
 [^19].
 It’s important to keep those customers happy by ensuring the website is fast for them.
 On the other hand, optimizing the 99.99th percentile (the slowest 1 in 10,000 requests) was deemed
@ -302,23 +295,19 @@ control, and the benefits are diminishing.
 # The user impact of response times
-It seems intuitively obvious that a fast service is better for users than a slow service
+It seems intuitively obvious that a fast service is better for users than a slow service [^20].
 [^20].
 However, it is surprisingly difficult to get hold of reliable data to quantify the effect that
 latency has on user behavior.
 Some often-cited statistics are unreliable. In 2006 Google reported that a slowdown in search
-results from 400 ms to 900 ms was associated with a 20% drop in traffic and revenue
+results from 400 ms to 900 ms was associated with a 20% drop in traffic and revenue [^21].
 [^21].
 However, another Google study from 2009 reported that a 400 ms increase in latency resulted in
-only 0.6% fewer searches per day
+only 0.6% fewer searches per day [^22],
 [^22],
 and in the same year Bing found that a two-second increase in load time reduced ad revenue by 4.3%
 [^23].
 Newer data from these companies appears not to be publicly available.
-A more recent Akamai study
+A more recent Akamai study [^24]
 [^24]
 claims that a 100 ms increase in response time reduced the conversion rate of e-commerce sites
 by up to 7%; however, on closer inspection, the same study reveals that very *fast* page load times
 are also correlated with lower conversion rates! This seemingly paradoxical result is explained by
@ -326,8 +315,7 @@ the fact that the pages that load fastest are often those that have no useful co
 error pages). However, since the study makes no effort to separate the effects of page content from
 the effects of load time, its results are probably not meaningful.
-A study by Yahoo
+A study by Yahoo [^25]
 [^25]
 compares click-through rates on fast-loading versus slow-loading search results, controlling for
 quality of search results. It finds 20–30% more clicks on fast searches when the difference between
 fast and slow responses is 1.25 seconds or more.
@ -348,15 +336,13 @@ end-user requests end up being slow (an effect known as *tail latency amplificat
 ###### Figure 2-6. When several backend calls are needed to serve a request, it takes just a single slow backend request to slow down the entire end-user request.
 Percentiles are often used in *service level objectives* (SLOs) and *service level agreements*
-(SLAs) as ways of defining the expected performance and availability of a service
+(SLAs) as ways of defining the expected performance and availability of a service [^27].
 [^27].
 For example, an SLO may set a target for a service to have a median response time of less than
 200 ms and a 99th percentile under 1 s, and a target that at least 99.9% of valid requests
 result in non-error responses. An SLA is a contract that specifies what happens if the SLO is not
 met (for example, customers may be entitled to a refund). That is the basic idea, at least; in
 practice, defining good availability metrics for SLOs and SLAs is not straightforward
-[[28](/en/ch2#Mogul2019),
+[[^28], [^29]].
 [29](/en/ch2#Hauer2020)].
 # Computing percentiles
@ -369,10 +355,8 @@ The simplest implementation is to keep a list of response times for all requests
 window and to sort that list every minute. If that is too inefficient for you, there are algorithms
 that can calculate a good approximation of percentiles at minimal CPU and memory cost.
 Open source percentile estimation libraries include HdrHistogram,
-t-digest [[30](/en/ch2#Dunning2021),
+t-digest [[^30], [^31]],
-[31](/en/ch2#Kohn2021)],
+OpenHistogram [^32], and DDSketch [^33].
 OpenHistogram [^32], and DDSketch
 [^33].
 Beware that averaging percentiles, e.g., to reduce the time resolution or to combine data from
 several machines, is mathematically meaningless—the right way of aggregating response time data
@ -391,9 +375,7 @@ software, typical expectations include:
 If all those things together mean “working correctly,” then we can understand *reliability* as
 meaning, roughly, “continuing to work correctly, even when things go wrong.” To be more precise
 about things going wrong, we will distinguish between *faults* and *failures*
-[[35](/en/ch2#Heimerdinger1992),
+[[^35], [^36], [^37]]:
 [36](/en/ch2#Gaertner1999),
 [37](/en/ch2#Avizienis2004)]:
 Fault
 : A fault is when a particular *part* of a system stops working correctly: for example, if a
@ -438,8 +420,7 @@ handling [^38]; by deliberately inducing faults, you ensure
 that the fault-tolerance machinery is continually exercised and tested, which can increase your
 confidence that faults will be handled correctly when they occur naturally. *Chaos engineering* is
 a discipline that aims to improve confidence in fault-tolerance mechanisms through experiments such
-as deliberately injecting faults
+as deliberately injecting faults [^39].
 [^39].
 Although we generally prefer tolerating faults over preventing faults, there are cases where
 prevention is better than cure (e.g., because no cure exists). This is the case with security
@ -452,8 +433,8 @@ cured, as described in the following sections.
 When we think of causes of system failure, hardware faults quickly come to mind:
 * Approximately 2–5% of magnetic hard drives fail per year
-  [[40](/en/ch2#Pinheiro2007),
+ [[^40],
-  [41](/en/ch2#Schroeder2007)];
+ [^41]];
 in a storage cluster with 10,000 disks, we should therefore expect on average one disk failure per day.
 Recent data suggests that disks are getting more reliable, but failure rates remain significant
 [^42].
@ -464,36 +445,22 @@ When we think of causes of system failure, hardware faults quickly come to mind:
 but uncorrectable errors occur approximately once per year per drive, even in drives that are
 fairly new (i.e., that have experienced little wear); this error rate is higher than that of
 magnetic hard drives
-  [[45](/en/ch2#Schroeder2016_ch2),
+ [[^45],
-  [46](/en/ch2#Alter2019)].
+ [^46]].
 * Other hardware components such as power supplies, RAID controllers, and memory modules also fail,
-  although less frequently than hard drives
+ although less frequently than hard drives [^47] [^48].
  [[47](/en/ch2#Ford2010),
  [48](/en/ch2#Vishwanath2010)].
 * Approximately one in 1,000 machines has a CPU core that occasionally computes the wrong result,
-  likely due to manufacturing defects
+ likely due to manufacturing defects [^49] [^50] [^51]. In some cases, an erroneous computation leads to a crash, but in other cases it leads to a program simply returning the wrong result.
  [[49](/en/ch2#Hochschild2021),
  [50](/en/ch2#Dixit2021),
  [51](/en/ch2#Behrens2015)].
  In some cases, an erroneous computation leads to a crash, but in other cases it leads to a program
  simply returning the wrong result.
 * Data in RAM can also be corrupted, either due to random events such as cosmic rays, or due to
 permanent physical defects. Even when memory with error-correcting codes (ECC) is used, more than
 1% of machines encounter an uncorrectable error in a given year, which typically leads to a crash
-  of the machine and the affected memory module needing to be replaced
+ of the machine and the affected memory module needing to be replaced [^52].
-  [^52].
+ Moreover, certain pathological memory access patterns can flip bits with high probability [^53].
  Moreover, certain pathological memory access patterns can flip bits with high probability
  [^53].
 * An entire datacenter might become unavailable (for example, due to power outage or network
-  misconfiguration) or even be permanently destroyed (for example by fire, flood, or earthquake
+ misconfiguration) or even be permanently destroyed (for example by fire, flood, or earthquake [^54]).
  [^54]).
 A solar storm, which induces large electrical currents in long-distance wires when the sun ejects
-  a large mass of charged particles, could damage power grids and undersea network cables
+ a large mass of charged particles, could damage power grids and undersea network cables [^55].
-  [^55].
+ Although such large-scale failures are rare, their impact can be catastrophic if a service cannot tolerate the loss of a datacenter [^56].
  Although such large-scale failures are rare, their impact can be catastrophic if a service cannot
  tolerate the loss of a datacenter
  [^56].
 These events are rare enough that you often don’t need to worry about them when working on a small
 system, as long as you can easily replace hardware that becomes faulty. However, in a large-scale
@ -510,10 +477,7 @@ running uninterrupted for years.
 Redundancy is most effective when component faults are independent, that is, the occurrence of one
 fault does not change how likely it is that another fault will occur. However, experience has shown
-that there are often significant correlations between component failures
+that there are often significant correlations between component failures [^41] [^57] [^58];
 [[41](/en/ch2#Schroeder2007),
 [57](/en/ch2#Han2021),
 [58](/en/ch2#Nightingale2011)];
 unavailability of an entire server rack or an entire datacenter still happens more often than we
 would like.
@ -543,23 +507,17 @@ upgrade*, and we will discuss it further in [Chapter 5](/en/ch5#ch_encoding).
 Although hardware failures can be weakly correlated, they are still mostly independent: for
 example, if one disk fails, it’s likely that other disks in the same machine will be fine for
 another while. On the other hand, software faults are often very highly correlated, because it is
-common for many nodes to run the same software and thus have the same bugs
+common for many nodes to run the same software and thus have the same bugs [^59] [^60].
 [[59](/en/ch2#Gunawi2014),
 [60](/en/ch2#Kreps2012_ch1)].
 Such faults are harder to anticipate, and they tend to cause many more system failures than
 uncorrelated hardware faults [^47]. For example:
 * A software bug that causes every node to fail at the same time in particular circumstances. For
 example, on June 30, 2012, a leap second caused many Java applications to hang simultaneously due
-  to a bug in the Linux kernel, bringing down many Internet services
+ to a bug in the Linux kernel, bringing down many Internet services [^61].
  [^61].
 Due to a firmware bug, all SSDs of certain models suddenly fail after precisely 32,768 hours of
-  operation (less than 4 years), rendering the data on them unrecoverable
+ operation (less than 4 years), rendering the data on them unrecoverable [^62].
  [^62].
 * A runaway process that uses up some shared, limited resource, such as CPU time, memory, disk
-  space, network bandwidth, or threads
+ space, network bandwidth, or threads [^63]. For example, a process that consumes too much memory while processing a large request may be
  [^63].
  For example, a process that consumes too much memory while processing a large request may be
 killed by the operating system. A bug in a client library could cause a much higher request
 volume than anticipated [^64].
 * A service that the system depends on slows down, becomes unresponsive, or starts returning
@ -567,16 +525,12 @@ uncorrelated hardware faults [^47]. For example:
 * An interaction between different systems results in emergent behavior that does not occur when
 each system was tested in isolation [^65].
 * Cascading failures, where a problem in one component causes another component to become overloaded
-  and slow down, which in turn brings down another component
+ and slow down, which in turn brings down another component [^66] [^67]].
  [[66](/en/ch2#Ulrich2016),
  [67](/en/ch2#Fassbender2022)].
 The bugs that cause these kinds of software faults often lie dormant for a long time until they are
 triggered by an unusual set of circumstances. In those circumstances, it is revealed that the
 software is making some kind of assumption about its environment—and while that assumption is
-usually true, it eventually stops being true for some reason
+usually true, it eventually stops being true for some reason [^68] [^69].
 [[68](/en/ch2#Cook2000),
 [69](/en/ch2#Woods2017)].
 There is no quick solution to the problem of systematic faults in software. Lots of small things can
 help: carefully thinking about assumptions and interactions in the system; thorough testing; process
@ -590,8 +544,7 @@ human. Unlike machines, humans don’t just follow rules; their strength is bein
 adaptive in getting their job done. However, this characteristic also leads to unpredictability, and
 sometimes mistakes that can lead to failures, despite best intentions. For example, one study of
 large internet services found that configuration changes by operators were the leading cause of
-outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages
+outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages [^70].
 [^70].
 It is tempting to label such problems as “human error” and to wish that they could be solved by
 better controlling human behavior through tighter procedures and compliance with rules. However,
@ -602,8 +555,7 @@ Often complex systems have emergent behavior, in which unexpected interactions b
 may also lead to failures [^72].
 Various technical measures can help minimize the impact of human mistakes, including thorough
-testing (both hand-written tests and *property testing* on lots of random inputs)
+testing (both hand-written tests and *property testing* on lots of random inputs) [^38], rollback mechanisms for quickly
 [^38], rollback mechanisms for quickly
 reverting configuration changes, gradual roll-outs of new code, detailed and clear monitoring,
 observability tools for diagnosing production issues (see [“Problems with Distributed Systems”](/en/ch1#sec_introduction_dist_sys_problems)),
 and well-designed interfaces that encourage “the right thing” and discourage “the wrong thing”.
@ -627,8 +579,7 @@ As a general principle, when investigating an incident, you should be suspicious
 answers. “Bob should have been more careful when deploying that change” is not productive, but
 neither is “We must rewrite the backend in Haskell.” Instead, management should take the opportunity
 to learn the details of how the sociotechnical system works from the point of view of the people who
-work with it every day, and take steps to improve it based on this feedback
+work with it every day, and take steps to improve it based on this feedback [^71].
 [^71].
 # How Important Is Reliability?
@ -637,11 +588,9 @@ are also expected to work reliably. Bugs in business applications cause lost pro
 risks if figures are reported incorrectly), and outages of e-commerce sites can have huge costs in
 terms of lost revenue and damage to reputation.
-In many applications, a temporary outage of a few minutes or even a few hours is tolerable
+In many applications, a temporary outage of a few minutes or even a few hours is tolerable [^74],
 [^74],
 but permanent data loss or corruption would be catastrophic. Consider a parent who stores all their
-pictures and videos of their children in your photo application
+pictures and videos of their children in your photo application [^75]. How would they
 [^75]. How would they
 feel if that database was suddenly corrupted? Would they know how to restore it from a backup?
 As another example of how unreliable software can harm people, consider the Post Office Horizon
@ -651,8 +600,7 @@ Eventually it became clear that many of these shortfalls were due to bugs in the
 convictions have since been overturned [^76].
 What led to this, probably the largest miscarriage of justice in British history, is the fact that
 English law assumes that computers operate correctly (and hence, evidence produced by computers is
-reliable) unless there is evidence to the contrary
+reliable) unless there is evidence to the contrary [^77].
 [^77].
 Software engineers may laugh at the idea that software could ever be bug-free, but this is little
 solace to the people who were wrongfully imprisoned, declared bankrupt, or even committed suicide as
 a result of a wrongful conviction due to an unreliable computer system.
@ -728,8 +676,7 @@ If you can double the resources in order to handle twice the load, while keeping
 same, we say that you have *linear scalability*, and this is considered a good thing. Occasionally
 it is possible to handle twice the load with less than double the resources, due to economies of
 scale or a better distribution of peak load
-[[79](/en/ch2#Warfield2023_ch2),
+[[^79], [^80]].
 [80](/en/ch2#Brooker2023multitenancy)].
 Much more likely is that the cost grows faster than linearly, and there may be many reasons for the
 inefficiency. For example, if you have a lot of data, then processing a single write request may
 involve more work than if you have a small amount of data, even if the size of the request is the
@ -753,8 +700,7 @@ Another approach is the *shared-disk architecture*, which uses several machines
 CPUs and RAM, but which stores data on an array of disks that is shared between the machines, which
 are connected via a fast network: *Network-Attached Storage* (NAS) or *Storage Area Network* (SAN).
 This architecture has traditionally been used for on-premises data warehousing workloads, but
-contention and the overhead of locking limit the scalability of the shared-disk approach
+contention and the overhead of locking limit the scalability of the shared-disk approach [^81].
 [^81].
 By contrast, the *shared-nothing architecture*
 [^82]
@ -796,8 +742,7 @@ operate largely independently from each other. This is the underlying principle
 (see [“Microservices and Serverless”](/en/ch1#sec_introduction_microservices)), sharding ([Chapter 7](/en/ch7#ch_sharding)), stream processing
 ([Link to Come]), and shared-nothing architectures. However, the challenge is in knowing where to
 draw the line between things that should be together, and things that should be apart. Design
-guidelines for microservices can be found in other books
+guidelines for microservices can be found in other books [^84],
 [^84],
 and we discuss sharding of shared-nothing systems in [Chapter 7](/en/ch7#ch_sharding).
 Another good principle is not to make things more complicated than necessary. If a single-machine
@ -817,8 +762,7 @@ bugs that need fixing.
 It is widely recognized that the majority of the cost of software is not in its initial development,
 but in its ongoing maintenance—fixing bugs, keeping its systems operational, investigating failures,
 adapting it to new platforms, modifying it for new use cases, repaying technical debt, and adding
-new features [[85](/en/ch2#Ensmenger2016),
+new features [[^85], [^86]].
 [86](/en/ch2#Glass2002)].
 However, maintenance is also difficult. If a system has been successfully running for a long time,
 it may well use outdated technologies that not many engineers understand today (such as mainframes
@ -857,8 +801,7 @@ In large-scale systems consisting of many thousands of machines, manual maintena
 unreasonably expensive, and automation is essential. However, automation can be a two-edged sword:
 there will always be edge cases (such as rare failure scenarios) that require manual intervention
 from the operations team. Since the cases that cannot be handled automatically are the most complex
-issues, greater automation requires a *more* skilled operations team that can resolve those issues
+issues, greater automation requires a *more* skilled operations team that can resolve those issues [^88].
 [^88].
 Moreover, if an automated system goes wrong, it is often harder to troubleshoot than a system that
 relies on an operator to perform some actions manually. For that reason, it is not the case that
@ -866,8 +809,7 @@ more automation is always better for operability. However, some amount of automa
 and the sweet spot will depend on the specifics of your particular application and organization.
 Good operability means making routine tasks easy, allowing the operations team to focus their efforts
-on high-value activities. Data systems can do various things to make routine tasks easy, including
+on high-value activities. Data systems can do various things to make routine tasks easy, including [^89]:
 [^89]:
 * Allowing monitoring tools to check the system’s key metrics, and supporting observability tools
 (see [“Problems with Distributed Systems”](/en/ch1#sec_introduction_dist_sys_problems)) to give insights into the system’s runtime behavior.
@ -891,15 +833,13 @@ project mired in complexity is sometimes described as a *big ball of mud*
 When complexity makes maintenance hard, budgets and schedules are often overrun. In complex
 software, there is also a greater risk of introducing bugs when making a change: when the system is
 harder for developers to understand and reason about, hidden assumptions, unintended consequences,
-and unexpected interactions are more easily overlooked
+and unexpected interactions are more easily overlooked [^69].
 [^69].
 Conversely, reducing complexity greatly improves the maintainability of software, and thus
 simplicity should be a key goal for the systems we build.
 Simple systems are easier to understand, and therefore we should try to solve a given problem in the
 simplest way possible. Unfortunately, this is easier said than done. Whether something is simple or
-not is often a subjective matter of taste, as there is no objective standard of simplicity
+not is often a subjective matter of taste, as there is no objective standard of simplicity [^92].
 [^92].
 For example, one system may hide a complex implementation behind a simple interface, whereas another
 may have a simple implementation that exposes more internal detail to its users—which one is
 simpler?
@ -952,13 +892,12 @@ different word to refer to agility on a data system level: *evolvability*
 [^97].
 One major factor that makes change difficult in large systems is when some action is irreversible,
-and therefore that action needs to be taken very carefully
+and therefore that action needs to be taken very carefully [^98].
 [^98].
 For example, say you are migrating from one database to another: if you cannot switch back to the
 old system in case of problems with the new one, the stakes are much higher than if you can easily go
 back. Minimizing irreversibility improves flexibility.
-# Summary
+## Summary
 In this chapter we examined several examples of nonfunctional requirements: performance,
 reliability, scalability, and maintainability. Through these topics we have also encountered
@ -986,8 +925,7 @@ There are no easy answers on how to achieve these things, but one thing that can
 applications using well-understood building blocks that provide useful abstractions. The rest of
 this book will cover a selection of building blocks that have proved to be valuable in practice.
-##### References
+### Summary
 [^1]: Mike Cvet. [How We Learned to Stop Worrying and Love Fan-In at Twitter](https://www.youtube.com/watch?v=WEgCjwyXvwc). At *QCon San Francisco*, December 2016. 
 [^2]: Raffi Krikorian. [Timelines at Scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability/). At *QCon San Francisco*, November 2012. Archived at [perma.cc/V9G5-KLYK](https://perma.cc/V9G5-KLYK) 
--- a/content/en/ch3.md
+++ b/content/en/ch3.md
@ -54,12 +54,10 @@ In contrast, with most programming languages you would have to write an *algorit
 the computer which operations to perform in which order. A declarative query language is attractive
 because it is typically more concise and easier to write than an explicit algorithm. But more
 importantly, it also hides implementation details of the query engine, which makes it possible for
-the database system to introduce performance improvements without requiring any changes to queries.
+the database system to introduce performance improvements without requiring any changes to queries. [^1].
 [^1].
 For example, a database might be able to execute a declarative query in parallel across multiple CPU
-cores and machines, without you having to worry about how to implement that parallelism
+cores and machines, without you having to worry about how to implement that parallelism [^2].
 [^2].
 In a hand-coded algorithm it would be a lot of work to implement such parallel execution yourself.
 # Relational Model versus Document Model
@ -79,11 +77,9 @@ Over the years, there have been many competing approaches to data storage and qu
 and early 1980s, the *network model* and the *hierarchical model* were the main alternatives, but
 the relational model came to dominate them. Object databases came and went again in the late 1980s
 and early 1990s. XML databases appeared in the early 2000s, but have only seen niche adoption. Each
-competitor to the relational model generated a lot of hype in its time, but it never lasted
+competitor to the relational model generated a lot of hype in its time, but it never lasted [^4].
 [^4].
 Instead, SQL has grown to incorporate other data types besides its relational core—for example,
-adding support for XML, JSON, and graph data
+adding support for XML, JSON, and graph data [^5].
 [^5].
 In the 2010s, *NoSQL* was the latest buzzword that tried to overthrow the dominance of relational
 databases. NoSQL refers not to a single technology, but a loose set of ideas around new data models,
@ -120,8 +116,7 @@ mismatch*.
 ### Object-relational mapping (ORM)
 Object-relational mapping (ORM) frameworks like ActiveRecord and Hibernate reduce the amount of
-boilerplate code required for this translation layer, but they are often criticized
+boilerplate code required for this translation layer, but they are often criticized [^6].
 [^6].
 Some commonly cited problems are:
 * ORMs are complex and can’t completely hide the differences between the two models, so developers
@ -211,8 +206,7 @@ this in [“Schema flexibility in the document model”](/en/ch3#sec_datamodels_
 The JSON representation has better *locality* than the multi-table schema in
 [Figure 3-1](/en/ch3#fig_obama_relational) (see [“Data locality for reads and writes”](/en/ch3#sec_datamodels_document_locality)). If you want to fetch a profile
 in the relational example, you need to either perform multiple queries (query each table by
-`user_id`) or perform a messy multi-way join between the `users` table and its subordinate tables
+`user_id`) or perform a messy multi-way join between the `users` table and its subordinate tables [^8].
 [^8].
 In the JSON representation, all the relevant information is in one place, making the query both
 faster and simpler.
@ -227,8 +221,8 @@ structure explicit (see [Figure 3-2](/en/ch3#fig_json_tree)).
 > [!NOTE]
 > This type of relationship is sometimes called *one-to-few* rather than *one-to-many*, since a résumé
 > typically has a small number of positions
-> [[9](/en/ch3#Zola2014),
+> [[^9],
-> [10](/en/ch3#Andrews2023)].
+> [^10]].
 > In situations where there may be a genuinely large number of related items—say, comments on a
 > celebrity’s social media post, of which there could be many thousands—embedding them all in the same
 > document may be too unwieldy, so the relational approach in [Figure 3-1](/en/ch3#fig_obama_relational) is preferable.
@ -347,8 +341,7 @@ denormalized representation consistent.
 However, the implementation of materialized timelines at X (formerly Twitter) does not store the
 actual text of each post: each entry actually only stores the post ID, the ID of the user who posted
-it, and a little bit of extra information to identify reposts and replies
+it, and a little bit of extra information to identify reposts and replies [^11].
 [^11].
 In other words, it is a precomputed result of (approximately) the following query:
 ```
@ -363,8 +356,7 @@ This means that whenever the timeline is read, the service still needs to perfor
 the post ID to fetch the actual post content (as well as statistics such as the number of likes
 and replies), and look up the sender’s profile by ID (to get their username, profile picture, and
 other details). This process of looking up the human-readable information by ID is called
-*hydrating* the IDs, and it is essentially a join performed in application code
+*hydrating* the IDs, and it is essentially a join performed in application code [^11].
 [^11].
 The reason for storing only IDs in the precomputed timeline is that the data they refer to is
 fast-changing: the number of likes and replies may change multiple times per second on a popular
@ -495,8 +487,7 @@ down into subdimensions. For example, there could be separate tables for brands
 product categories, and each row in the `dim_product` table could reference the brand and category
 as foreign keys, rather than storing them as strings in the `dim_product` table. Snowflake schemas
 are more normalized than star schemas, but star schemas are often preferred because
-they are simpler for analysts to work with
+they are simpler for analysts to work with [^12].
 [^12].
 In a typical data warehouse, tables are often quite wide: fact tables often have over 100 columns,
 sometimes several hundred. Dimension tables can also be wide, as they include all the metadata that
@ -549,9 +540,7 @@ such applications well, because the items (or their IDs) can simply be stored in
 determine their order. In relational databases there isn’t a standard way of representing such
 reorderable lists, and various tricks are used: sorting by an integer column (requiring renumbering
 when you insert into the middle), a linked list of IDs, or fractional indexing
-[[14](/en/ch3#Nelson2018),
+[[^14], [^15], [^16]].
 [15](/en/ch3#Wallace2017),
 [16](/en/ch3#Greenspan2020)].
 ### Schema flexibility in the document model
@ -570,15 +559,13 @@ when the data is written) [^18].
 Schema-on-read is similar to dynamic (runtime) type checking in programming languages, whereas
 schema-on-write is similar to static (compile-time) type checking. Just as the advocates of static
-and dynamic type checking have big debates about their relative merits
+and dynamic type checking have big debates about their relative merits [^19],
 [^19],
 enforcement of schemas in database is a contentious topic, and in general there’s no right or wrong
 answer.
 The difference between the approaches is particularly noticeable in situations where an application
 wants to change the format of its data. For example, say you are currently storing each user’s full
-name in one field, and you instead want to store the first name and last name separately
+name in one field, and you instead want to store the first name and last name separately [^20].
 [^20].
 In a document database, you would just start writing new documents with the new fields and have
 code in the application that handles the case when old documents are read. For example:
@ -606,10 +593,7 @@ since every row needs to be rewritten, and other schema operations (such as chan
 of a column) also typically require the entire table to be copied.
 Various tools exist to allow this type of schema changes to be performed in the background without downtime
-[[21](/en/ch3#Percona2023),
+[[^21], [^22], [^23], [^24]],
 [22](/en/ch3#Noach2016),
 [23](/en/ch3#Mukherjee2022),
 [24](/en/ch3#PerezAradros2023)],
 but performing such migrations on large databases remains operationally challenging. Complicated
 migrations can be avoided by only adding the `first_name` column with a default value of `NULL`
 (which is fast), and filling it in at read time, like you would with a document database.
@ -644,13 +628,11 @@ and avoid frequent small updates to a document.
 However, the idea of storing related data together for locality is not limited to the document
 model. For example, Google’s Spanner database offers the same locality properties in a relational
 data model, by allowing the schema to declare that a table’s rows should be interleaved (nested)
-within a parent table
+within a parent table [^25].
 [^25].
 Oracle allows the same, using a feature called *multi-table index cluster tables*
 [^26].
 The *wide-column* data model popularized by Google’s Bigtable, and used e.g. in HBase and Accumulo,
-has a concept of *column families*, which have a similar purpose of managing locality
+has a concept of *column families*, which have a similar purpose of managing locality [^27].
 [^27].
 ### Query languages for documents
@ -660,10 +642,7 @@ varied. Some allow only key-value access by primary key, while others also offer
 to query for values inside documents, and some provide rich query languages.
 XML databases are often queried using XQuery and XPath, which are designed to allow complex queries,
-including joins across multiple documents, and also format their results as XML
+including joins across multiple documents, and also format their results as XML [^28]. JSON Pointer [^29] and JSONPath [^30] provide an equivalent to XPath for JSON.
 [^28]. JSON Pointer
 [^29] and JSONPath
 [^30] provide an equivalent to XPath for JSON.
 MongoDB’s aggregation pipeline, whose `$lookup` operator for joins we saw in
 [“Normalization, Denormalization, and Joins”](/en/ch3#sec_datamodels_normalization), is an example of a query language for collections of JSON
@ -713,8 +692,7 @@ matter of taste.
 ### Convergence of document and relational databases
 Document databases and relational databases started out as very different approaches to data
-management, but they have grown more similar over time
+management, but they have grown more similar over time [^31].
 [^31].
 Relational databases added support for JSON types and query operators, and the ability to index
 properties inside documents. Some document databases (such as MongoDB, Couchbase, and RethinkDB)
 added support for joins, secondary indexes, and declarative query languages.
@ -759,8 +737,7 @@ Road or rail networks
 Well-known algorithms can operate on these graphs: for example, map navigation apps search for
 the shortest path between two points in a road network, and
 PageRank can be used on the web graph to determine the
-popularity of a web page and thus its ranking in search results
+popularity of a web page and thus its ranking in search results [^32].
 [^32].
 Graphs can be represented in several different ways. In the *adjacency list* model, each vertex
 stores the IDs of its neighbor vertices that are one edge away. Alternatively, you can use an
@ -786,16 +763,14 @@ types of objects in a single database. For example:
 as Wikidata, also publish graph data in a structured form.
 There are several different, but related, ways of structuring and querying data in graphs. In this
-section we will discuss the *property graph* model (implemented by Neo4j, Memgraph, KùzuDB
+section we will discuss the *property graph* model (implemented by Neo4j, Memgraph, KùzuDB [^35],
 [^35],
 and others [^36])
 and the *triple-store* model (implemented by Datomic, AllegroGraph, Blazegraph, and others). These
 models are fairly similar in what they can express, and some graph databases (such as Amazon
 Neptune) support both models.
 We will also look at four query languages for graphs (Cypher, SPARQL, Datalog, and GraphQL), as well
-as SQL support for querying graphs. Other graph query languages exist, such as Gremlin
+as SQL support for querying graphs. Other graph query languages exist, such as Gremlin [^37],
 [^37],
 but these will give us a representative overview.
 To illustrate these different languages and models, this section uses the graph shown in
@ -899,11 +874,9 @@ extended to accommodate changes in your application’s data structures.
 *Cypher* is a query language for property graphs, originally created for the Neo4j graph database,
 and later developed into an open standard as *openCypher*
 [^38].
-Besides Neo4j, Cypher is supported by Memgraph, KùzuDB
+Besides Neo4j, Cypher is supported by Memgraph, KùzuDB [^35],
 [^35],
 Amazon Neptune, Apache AGE (with storage in PostgreSQL), and others. It is named after a character
-in the movie *The Matrix* and is not related to ciphers in cryptography
+in the movie *The Matrix* and is not related to ciphers in cryptography [^39].
 [^39].
 [Example 3-4](/en/ch3#fig_cypher_create) shows the Cypher query to insert the lefthand portion of
 [Figure 3-6](/en/ch3#fig_datamodels_graph) into a graph database. The rest of the graph can be added similarly. Each
@ -1071,11 +1044,8 @@ Oracle has a different SQL extension for recursive queries, which it calls *hier
 [^41].
 However, the situation may be improving: at the time of writing, there are plans to add a graph
-query language called GQL to the SQL standard [[42](/en/ch3#Deutsch2022),
+query language called GQL to the SQL standard [[^42], [^43]],
-[43](/en/ch3#Green2019)],
+which will provide a syntax inspired by Cypher, GSQL [^44], and PGQL [^45].
 which will provide a syntax inspired by Cypher, GSQL
 [^44], and PGQL
 [^45].
 ## Triple-Stores and SPARQL
@ -1109,8 +1079,7 @@ The subject of a triple is equivalent to a vertex in a graph. The object is one
 > book nevertheless calls them triple-stores.
 [Example 3-7](/en/ch3#fig_graph_n3_triples) shows the same data as in [Example 3-4](/en/ch3#fig_cypher_create), written as
-triples in a format called *Turtle*, a subset of *Notation3* (*N3*)
+triples in a format called *Turtle*, a subset of *Notation3* (*N3*) [^48].
 [^48].
 ##### Example 3-7. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Turtle triples
@ -1158,16 +1127,12 @@ Some of the research and development effort on triple stores was motivated by th
 early-2000s effort to facilitate internet-wide data exchange by publishing data not only as
 human-readable web pages, but also in a standardized, machine-readable format. Although the Semantic
 Web as originally envisioned did not succeed
-[[49](/en/ch3#Target2018),
+[[^49], [^50]],
 [50](/en/ch3#MendelGleason2022)],
 the legacy of the Semantic Web project lives on in a couple of specific technologies: *linked data*
 standards such as JSON-LD [^51],
-*ontologies* used in biomedical science
+*ontologies* used in biomedical science [^52],
-[^52],
+Facebook’s Open Graph protocol [^53]
-Facebook’s Open Graph protocol
+(which is used for link unfurling [^54]),
 [^53]
 (which is used for link unfurling
 [^54]),
 knowledge graphs such as Wikidata, and standardized vocabularies for structured data maintained by
 [`schema.org`](https://schema.org/).
@ -1178,8 +1143,7 @@ for applications.
 ### The RDF data model
 The Turtle language we used in [Example 3-8](/en/ch3#fig_graph_n3_shorthand) is actually a way of encoding data in the
-*Resource Description Framework* (RDF)
+*Resource Description Framework* (RDF) [^55],
 [^55],
 a data model that was designed for the Semantic Web. RDF data can also be encoded in other ways, for
 example (more verbosely) in XML, as shown in [Example 3-9](/en/ch3#fig_graph_rdf_xml). Tools like Apache Jena can
 automatically convert between different RDF encodings.
@ -1229,8 +1193,7 @@ just specify this prefix once at the top of the file, and then forget about it.
 ### The SPARQL query language
-*SPARQL* is a query language for triple-stores using the RDF data model
+*SPARQL* is a query language for triple-stores using the RDF data model [^56].
 [^56].
 (It is an acronym for *SPARQL Protocol and RDF Query Language*, pronounced “sparkle.”)
 It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite
 similar.
@ -1275,9 +1238,7 @@ various other triple stores [^36].
 ## Datalog: Recursive Relational Queries
 Datalog is a much older language than SPARQL or Cypher: it arose from academic research in the 1980s
-[[57](/en/ch3#Green2013),
+[[^57], [^58], [^59]].
 [58](/en/ch3#Ceri1989),
 [59](/en/ch3#Abiteboul1995)].
 It is less well known among software engineers and not widely supported in mainstream databases, but
 it ought to be better-known since it is a very expressive language that is particularly powerful for
 complex queries. Several niche databases, including Datomic, LogicBlox, CozoDB, and LinkedIn’s
@ -1397,8 +1358,7 @@ APIs.
 GraphQL’s flexibility comes at a cost. Organizations that adopt GraphQL often need tooling to
 convert GraphQL queries into requests to internal services, which often use REST or gRPC (see
-[Chapter 5](/en/ch5#ch_encoding)). Authorization, rate limiting, and performance challenges are additional concerns
+[Chapter 5](/en/ch5#ch_encoding)). Authorization, rate limiting, and performance challenges are additional concerns [^61].
 [^61].
 GraphQL’s query language is also limited since GraphQL come from an untrusted source. The language
 does not allow anything that could be expensive to execute, since otherwise users could perform
 denial-of-service attacks on a server by running lots of expensive queries. In particular, GraphQL
@ -1538,8 +1498,7 @@ the status of each booking, another that computes charts for the conference orga
 and a third that generates files for the printer that produces the attendees’ badges.
 The idea of using events as the source of truth, and expressing every state change as an event, is
-known as *event sourcing* [[62](/en/ch3#Betts2012),
+known as *event sourcing* [[^62], [^63]].
 [63](/en/ch3#Young2014)].
 The principle of maintaining separate read-optimized representations and deriving them from the
 write-optimized representation is called *command query responsibility segregation (CQRS)*
 [^64].
@ -1692,17 +1651,15 @@ like. Dataframes are flexible enough to allow data to be gradually evolved from
 into a matrix representation, while giving the data scientist control over the representation that
 is most suitable for achieving the goals of the data analysis or model training process.
-There are also databases such as TileDB
+There are also databases such as TileDB [^66]
 [^66]
 that specialize in storing large multidimensional arrays of numbers; they are called *array
 databases* and are most commonly used for scientific datasets such as geospatial measurements
 (raster data on a regularly spaced grid), medical imaging, or observations from astronomical
 telescopes [^67].
 Dataframes are also used in the financial industry for representing *time series data*, such as the
-prices of assets and trades over time
+prices of assets and trades over time [^68].
 [^68].
-# Summary
+## Summary
 Data models are a huge subject, and in this chapter we have taken a quick look at a broad variety of
 different models. We didn’t have space to go into all the details of each model, but hopefully the
@ -1764,10 +1721,11 @@ a few brief examples:
 We have to leave it there for now. In the next chapter we will discuss some of the trade-offs that
 come into play when *implementing* the data models described in this chapter.
 ##### Footnotes
-##### References
+
 ### Summary
--- a/content/en/ch4.md
+++ b/content/en/ch4.md
@ -123,8 +123,7 @@ possible write operation. Any kind of index usually slows down writes, because t
 to be updated every time data is written.
 This is an important trade-off in storage systems: well-chosen indexes speed up read queries, but
-every index consumes additional disk space and slows down writes, sometimes substantially
+every index consumes additional disk space and slows down writes, sometimes substantially [^1].
 [^1].
 For this reason, databases don’t usually index everything by default, but require you—the person
 writing the application or administering the database—to choose indexes manually, using your
 knowledge of the application’s typical query patterns. You can then choose the indexes that give
@ -177,8 +176,7 @@ Now you do not need to keep all the keys in memory: you can group the key-value
 SSTable into *blocks* of a few kilobytes, and then store the first key of each block in the index.
 This kind of index, which stores only some of the keys, is called *sparse*. This index is stored in
 a separate part of the SSTable, for example using an immutable B-tree, a trie, or another data
-structure that allows queries to quickly look up a particular key
+structure that allows queries to quickly look up a particular key [^4].
 [^4].
 For example, in [Figure 4-2](/en/ch4#fig_storage_sstable_index), the first key of one block is `handbag`, and the
 first key of the next block is `handsome`. Now say you’re looking for the key `handiwork`, which
@ -219,8 +217,7 @@ log and a sorted file:
 4. From time to time, run a merging and compaction process in the background to combine segment files
 and to discard overwritten or deleted values.
-Merging segments works similarly to the *mergesort* algorithm
+Merging segments works similarly to the *mergesort* algorithm [^5]. The process is illustrated in
 [^5]. The process is illustrated in
 [Figure 4-3](/en/ch4#fig_storage_sstable_merging): start reading the input files side by side, look at the first key
 in each file, copy the lowest key (according to the sort order) to the output file, and repeat. If
 the same key appears in more than one input file, keep only the more recent value. This produces a
@ -242,18 +239,14 @@ called a *tombstone* to the data file. When log segments are merged, the tombsto
 process to discard any previous values for the deleted key. Once the tombstone is merged into the
 oldest segment, it can be dropped.
-The algorithm described here is essentially what is used in RocksDB
+The algorithm described here is essentially what is used in RocksDB [^7],
-[^7],
+Cassandra, Scylla, and HBase [^8],
-Cassandra, Scylla, and HBase
+all of which were inspired by Google’s Bigtable paper [^9]
 [^8],
 all of which were inspired by Google’s Bigtable paper
 [^9]
 (which introduced the terms *SSTable* and *memtable*).
 The algorithm was originally published in 1996 under the name *Log-Structured Merge-Tree* or *LSM-Tree*
 [^10],
-building on earlier work on log-structured filesystems
+building on earlier work on log-structured filesystems [^11].
 [^11].
 For this reason, storage engines that are based on the principle of merging and compacting sorted
 files are often called *LSM storage engines*.
@ -265,8 +258,7 @@ requests to using the new merged segment instead of the old segments, and then t
 can be deleted.
 The segment files don’t necessarily have to be stored on local disk: they are also well suited for
-writing to object storage. SlateDB and Delta Lake
+writing to object storage. SlateDB and Delta Lake [^12].
 [^12].
 take this approach, for example.
 Having immutable segment files also simplifies crash recovery: if a crash happens while writing out
@ -287,8 +279,7 @@ appears in a particular SSTable.
 [Figure 4-4](/en/ch4#fig_storage_bloom) shows an example of a Bloom filter containing two keys and 16 bits (in
 reality, it would contain more keys and more bits). For every key in the SSTable we compute a hash
-function, producing a set of numbers that are then interpreted as indexes into the array of bits
+function, producing a set of numbers that are then interpreted as indexes into the array of bits [^14].
 [^14].
 We set the bits corresponding to those indexes to 1, and leave the rest as 0. For example, the key
 `handbag` hashes to the numbers (2, 9, 4), so we set the 2nd, 9th, and 4th bits to 1. The bitmap
 is then stored as part of the SSTable, along with the sparse index of keys. This takes a bit of
@ -311,8 +302,7 @@ as if a key is present, even though it isn’t, is called a *false positive*.
 The probability of false positives depends on the number of keys, the number of bits set per key,
 and the total number of bits in the Bloom filter. You can use an online calculator tool to work out
-the right parameters for your application
+the right parameters for your application [^15].
 [^15].
 As a rule of thumb, you need to allocate 10 bits of Bloom filter space for every key in the SSTable
 to get a false positive probability of 1%, and the probability is reduced tenfold for every 5
 additional bits you allocate per key.
@ -331,8 +321,7 @@ In the context of an LSM storage engines, false positives are no problem:
 An important detail is how the LSM storage chooses when to perform compaction, and which SSTables to
 include in a compaction. Many LSM-based storage systems allow you to configure which compaction
 strategy to use, and some of the common choices are
-[[16](/en/ch4#Luo2019),
+[[^16], [^17]]:
 [17](/en/ch4#Sarkar2022)]:
 Size-tiered compaction
 : Newer and smaller SSTables are successively merged into older and larger SSTables. The SSTables
@ -360,16 +349,14 @@ Many databases run as a service that accepts queries over a network, but there a
 databases that don’t expose a network API. Instead, they are libraries that run in the same process
 as your application code, typically reading and writing files on the local disk, and you interact
 with them through normal function calls. Examples of embedded storage engines include RocksDB,
-SQLite, LMDB, DuckDB, and KùzuDB
+SQLite, LMDB, DuckDB, and KùzuDB [^19].
 [^19].
 Embedded databases are very commonly used in mobile apps to store the local user’s data. On the
 backend, they can be an appropriate choice if the data is small enough to fit on a single machine,
 and if there are not many concurrent transactions. For example, in a multitenant system in which
 each tenant is small enough and completely separate from others (i.e., you do not need to run
 queries that combine data from multiple tenants), you can potentially use a separate embedded
-database instance per tenant
+database instance per tenant [^20].
 [^20].
 The storage and retrieval methods we discuss in this chapter are used in both embedded and in
 client-server databases. In [Chapter 6](/en/ch6#ch_replication) and [Chapter 7](/en/ch7#ch_sharding) we will discuss techniques
@ -381,8 +368,7 @@ The log-structured approach is popular, but it is not the only form of key-value
 widely used structure for reading and writing database records by key is the *B-tree*.
 Introduced in 1970 [^21]
-and called “ubiquitous” less than 10 years later
+and called “ubiquitous” less than 10 years later [^22],
 [^22],
 B-trees have stood the test of time very well. They remain the standard index implementation in
 almost all relational databases, and many nonrelational databases use them too.
@ -441,8 +427,7 @@ the new key), and a page for 337–344. We also have to update the parent page t
 both children, with a boundary value of 337 between them. If the parent page doesn’t have enough
 space for the new reference, it may also need to be split, and the splits can continue all the way
 to the root of the tree. When the root is split, we make a new root above it. Deleting keys (which
-may require nodes to be merged) is more complex
+may require nodes to be merged) is more complex [^5].
 [^5].
 This algorithm ensures that the tree remains *balanced*: a B-tree with *n* keys always has a depth
 of *O*(log *n*). Most databases can fit into a B-tree that is three or four levels deep, so
@ -467,8 +452,7 @@ In order to make the database resilient to crashes, it is common for B-tree impl
 include an additional data structure on disk: a *write-ahead log* (WAL). This is an append-only file
 to which every B-tree modification must be written before it can be applied to the pages of the tree
 itself. When the database comes back up after a crash, this log is used to restore the B-tree back
-to a consistent state [[2](/en/ch4#Graefe2011),
+to a consistent state [[^2], [^24]].
 [24](/en/ch4#Mohan1992)].
 In filesystems, the equivalent mechanism is known as *journaling*.
 To improve performance, B-tree implementations typically don’t immediately write every modified page
@ -501,8 +485,7 @@ mention just a few:
 ## Comparing B-Trees and LSM-Trees
 As a rule of thumb, LSM-trees are better suited for write-heavy applications, whereas B-trees are faster for reads
-[[27](/en/ch4#Athanassoulis2016),
+[[^27], [^28]].
 [28](/en/ch4#Stopford2015)].
 However, benchmarks are often sensitive to details of the workload. You need to test systems with
 your particular workload in order to make a valid comparison. Moreover, it’s not a strict either/or
 choice between LSM and B-trees: storage engines sometimes blend characteristics of both approaches,
@ -522,21 +505,18 @@ Range queries are simple and fast on B-trees, as they can use the sorted structu
 LSM storage, range queries can also take advantage of the SSTable sorting, but they need to scan all
 the segments in parallel and combine the results. Bloom filters don’t help for range queries (since
 you would need to compute the hash of every possible key within the range, which is impractical),
-making range queries more expensive than point queries in the LSM approach
+making range queries more expensive than point queries in the LSM approach [^29].
 [^29].
 High write throughput can cause latency spikes in a log-structured storage engine if the
 memtable fills up. This happens if data can’t be written out to disk fast enough, perhaps because
 the compaction process cannot keep up with incoming writes. Many storage engines, including RocksDB,
 perform *backpressure* in this situation: they suspend all reads and writes until the memtable has
 been written out to disk
-[[30](/en/ch4#Balmau2019),
+[[^30], [^31]].
 [31](/en/ch4#RocksDBTuning)].
 Regarding read throughput, modern SSDs (and especially NVMe) can perform many independent read
 requests in parallel. Both LSM-trees and B-trees are able to provide high read throughput, but
-storage engines need to be carefully designed to take advantage of this parallelism
+storage engines need to be carefully designed to take advantage of this parallelism [^32].
 [^32].
 ### Sequential vs. random writes
@ -568,17 +548,14 @@ The reason is that flash memory can be read or written one page (typically 4 Ki
 but it can only be erased one block (typically 512 KiB) at a time. Some of the pages in a block
 may contain valid data, whereas others may contain data that is no longer needed. Before erasing a
 block, the controller must first move pages containing valid data into other blocks; this process is
-called *garbage collection* (GC)
+called *garbage collection* (GC) [^33].
 [^33].
 A sequential write workload writes larger chunks of data at a time, so it is likely that a whole
 512 KiB block belongs to a single file; when that file is later deleted again, the whole block
 can be erased without having to perform any GC. On the other hand, with a random write workload, it
 is more likely that a block contains a mixture of pages with valid and invalid data, so the GC has
 to perform more work before a block can be erased
-[[34](/en/ch4#Vanlightly2023nvme),
+[[^34], [^35], [^36]].
 [35](/en/ch4#Alibaba2019_ch4),
 [36](/en/ch4#Hu2010)].
 The write bandwidth consumed by GC is then not available for the application. Moreover, the
 additional writes performed by GC contribute to wear on the flash memory; therefore, random writes
@ -591,14 +568,12 @@ operations on the underlying disk. With LSM-trees, a value is first written to t
 durability, then again when the memtable is written to disk, and again every time the key-value pair
 is part of a compaction. (If the values are significantly larger than the keys, this overhead can be
 reduced by storing values separately from keys, and performing compaction only on SSTables
-containing keys and references to values
+containing keys and references to values [^37].)
 [^37].)
 A B-tree index must write every piece of data at least twice: once to the write-ahead log, and once
 to the tree page itself. In addition, they sometimes need to write out an entire page, even if only
 a few bytes in that page changed, to ensure the B-tree can be correctly recovered after a crash or
-power failure [[38](/en/ch4#Zaitsev2006),
+power failure [[^38], [^39]].
 [39](/en/ch4#Vondra2016)].
 If you take the total number of bytes written to disk in some workload, and divide by the number of
 bytes you would have to write if you simply wrote an append-only log with no index, you get the
@ -610,8 +585,7 @@ handle within the available disk bandwidth.
 Write amplification is a problem in both LSM-trees and B-trees. Which one is better depends on
 various factors, such as the length of your keys and values, and how often you overwrite existing
 keys versus insert new ones. For typical workloads, LSM-trees tend to have lower write amplification
-because they don’t have to write entire pages and they can compress chunks of the SSTable
+because they don’t have to write entire pages and they can compress chunks of the SSTable [^40].
 [^40].
 This is another factor that makes LSM storage engines well suited for write-heavy workloads.
 Besides affecting throughput, write amplification is also relevant for the wear on SSDs: a storage
@ -636,8 +610,7 @@ the data files anyway, and SSTables don’t have pages with unused space. Moreov
 key-value pairs can better be compressed in SSTables, and thus often produce smaller files on disk
 than B-trees. Keys and values that have been overwritten continue to consume space until they are
 removed by a compaction, but this overhead is quite low when using leveled compaction
-[[40](/en/ch4#Callaghan2015),
+[[^40], [^41]].
 [41](/en/ch4#Callaghan2016rocksdb)].
 Size-tiered compaction (see [“Compaction strategies”](/en/ch4#sec_storage_lsm_compaction)) uses more disk space, especially
 temporarily during compaction.
@ -737,11 +710,9 @@ easily be backed up, inspected, and analyzed by external utilities.
 Products such as VoltDB, SingleStore, and Oracle TimesTen are in-memory databases with a relational model,
 and the vendors claim that they can offer big performance improvements by removing all the overheads
 associated with managing on-disk data structures
-[[46](/en/ch4#Stonebraker2007),
+[[^46], [^47]].
 [47](/en/ch4#VoltDB2014uj)].
 RAMCloud is an open source, in-memory key-value store with durability (using a log-structured
-approach for the data in memory as well as the data on disk)
+approach for the data in memory as well as the data on disk) [^48].
 [^48].
 Redis and Couchbase provide weak durability by writing to disk asynchronously.
@ -749,8 +720,7 @@ Counterintuitively, the performance advantage of in-memory databases is not due
 they don’t need to read from disk. Even a disk-based storage engine may never need to read from disk
 if you have enough memory, because the operating system caches recently used disk blocks in memory
 anyway. Rather, they can be faster because they can avoid the overheads of encoding in-memory data
-structures in a form that can be written to disk
+structures in a form that can be written to disk [^49].
 [^49].
 Besides performance, another interesting area for in-memory databases is providing data models that
 are difficult to implement with disk-based indexes. For example, Redis offers a database-like
@ -774,10 +744,7 @@ transaction processing and data warehousing in the same product. However, these
 and analytical processing (HTAP) databases (introduced in [“Data Warehousing”](/en/ch1#sec_introduction_dwh)) are increasingly
 becoming two separate storage and query engines, which happen to be accessible through a common SQL
 interface
-[[50](/en/ch4#Larson2013),
+[[^50], [^51], [^52], [^53]].
 [51](/en/ch4#Farber2012),
 [52](/en/ch4#Stonebraker2013),
 [53](/en/ch4#Prout2022_ch4)].
 ## Cloud Data Warehouses
@ -790,16 +757,14 @@ of scalable cloud infrastructure like object storage and serverless computation
 Cloud data warehouses tend to integrate better with other cloud services and to be more elastic.
 For example, many cloud warehouses support automatic log ingestion, and offer easy integration with
 data processing frameworks such as Google Cloud’s Dataflow or Amazon Web Services’ Kinesis. These
-warehouses are also more elastic because they decouple query computation from the storage layer
+warehouses are also more elastic because they decouple query computation from the storage layer [^54].
 [^54].
 Data is persisted on object storage rather than local disks, which makes it easy to adjust storage
 capacity and compute resources for queries independently, as we previously saw in
 [“Cloud-Native System Architecture”](/en/ch1#sec_introduction_cloud_native).
 Open source data warehouses such as Apache Hive, Trino, and Apache Spark have also evolved with the
 cloud. As data storage for analytics has moved to data lakes on object storage, open source warehouses
-have begun to break apart
+have begun to break apart [^55]. The following
 [^55]. The following
 components, which were previously integrated in a single system such as Apache Hive, are now often
 implemented as separate components:
@ -844,8 +809,7 @@ efficiently becomes a challenging problem. Dimension tables are usually much sma
 rows), so in this section we will focus on storage of facts.
 Although fact tables are often over 100 columns wide, a typical data warehouse query only accesses 4
-or 5 of them at one time (`"SELECT *"` queries are rarely needed for analytics)
+or 5 of them at one time (`"SELECT *"` queries are rarely needed for analytics) [^52]. Take the query in
 [^52]. Take the query in
 [Example 4-1](/en/ch4#fig_storage_analytics_query): it accesses a large number of rows (every occurrence of someone
 buying fruit or candy during the 2024 calendar year), but it only needs to access three columns of
 the `fact_sales` table: `date_key`, `product_sk`,
@ -882,8 +846,7 @@ memory, parse them, and filter out those that don’t meet the required conditio
 long time.
 The idea behind *column-oriented* (or *columnar*) storage is simple: don’t store all the values from
-one row together, but store all the values from each *column* together instead
+one row together, but store all the values from each *column* together instead [^56].
 [^56].
 If each column is stored separately, a query only needs to read and parse those columns that are
 used in that query, which can save a lot of work. [Figure 4-7](/en/ch4#fig_column_store) shows this principle using
 an expanded version of the fact table from [Figure 3-5](/en/ch3#fig_dwh_schema).
@ -907,33 +870,24 @@ individual columns and put them together to form the 23rd row of the table.
 In fact, columnar storage engines don’t actually store an entire column (containing perhaps
 trillions of rows) in one go. Instead, they break the table into blocks of thousands or millions of
-rows, and within each block they store the values from each column separately
+rows, and within each block they store the values from each column separately [^60].
 [^60].
 Since many queries are restricted to a particular date range, it is common to make each block
 contain the rows for a particular timestamp range. A query then only needs to load the columns it
 needs in those blocks that overlap with the required date range.
-Columnar storage is used in almost all analytic databases nowadays
+Columnar storage is used in almost all analytic databases nowadays [^60],
-[^60],
+ranging from large-scale cloud data warehouses such as Snowflake [^61]
-ranging from large-scale cloud data warehouses such as Snowflake
+to single-node embedded databases such as DuckDB [^62],
-[^61]
+and product analytics systems such as Pinot [^63]
 to single-node embedded databases such as DuckDB
 [^62],
 and product analytics systems such as Pinot
 [^63]
 and Druid [^64].
 It is used in storage formats such as Parquet, ORC
-[[65](/en/ch4#Liu2023),
+[[^65], [^66]],
 [66](/en/ch4#Zeng2023)],
 Lance [^67],
 and Nimble [^68],
 and in-memory analytics formats like Apache Arrow
-[[65](/en/ch4#Liu2023),
+[[^65], [^69]]
 [69](/en/ch4#McKinney2021)]
 and Pandas/NumPy [^70].
-Some time-series databases, such as InfluxDB IOx
+Some time-series databases, such as InfluxDB IOx [^71] and TimescaleDB [^72],
 [^71] and TimescaleDB
 [^72],
 are also based on column-oriented storage.
 ### Column Compression
@ -961,8 +915,7 @@ One option is to store those bitmaps using one bit per row. However, these bitma
 a lot of zeros (we say that they are *sparse*). In that case, the bitmaps can additionally be
 run-length encoded: counting the number of consecutive zeros or ones and storing that number, as
 shown at the bottom of [Figure 4-8](/en/ch4#fig_bitmap_index). Techniques such as *roaring bitmaps* switch between the
-two bitmap representations, using whichever is the most compact
+two bitmap representations, using whichever is the most compact [^73].
 [^73].
 This can make the encoding of a column remarkably efficient.
 Bitmap indexes such as these are very well suited for the kinds of queries that are common in a data
@ -1046,9 +999,7 @@ Queries need to examine both the column data on disk and the recent writes in me
 the two. The query execution engine hides this distinction from the user. From an analyst’s point
 of view, data that has been modified with inserts, updates, or deletes is immediately reflected in
 subsequent queries. Snowflake, Vertica, Apache Pinot, Apache Druid, and many others do this
-[[61](/en/ch4#Dageville2016), [63](/en/ch4#Im2018),
+[[^61], [^63], [^64], [^76]].
 [64](/en/ch4#Yang2014),
 [76](/en/ch4#Lamb2012)].
 ## Query Execution: Compilation and Vectorization
@ -1068,8 +1019,7 @@ the amount of data they need to read off disk, but also the CPU time required to
 operators. The simplest kind of operator is like an interpreter for a programming language: while
 iterating over each row, it checks a data structure representing the query to find out which
 comparisons or calculations it needs to perform on which columns. Unfortunately, this is too slow
-for many analytics purposes. Two alternative approaches for efficient query execution have emerged
+for many analytics purposes. Two alternative approaches for efficient query execution have emerged [^77]:
 [^77]:
 Query compilation
 : The query engine takes the SQL query and generates code for executing it. The code iterates over
@ -1084,7 +1034,7 @@ Vectorized processing
 : The query is interpreted, not compiled, but it is made fast by processing many values from a
 column in a batch, instead of iterating over rows one by one. A fixed set of predefined operators
 are built into the database; we can pass arguments to them and get back a batch of results
-    [[50](/en/ch4#Larson2013), [75](/en/ch4#Abadi2013)].
+ [[^50], [^75]].
 For example, we could pass the `product_sk` column and the ID of “bananas” to an equality operator,
 and get back a bitmap (one bit per value in the input column, which is 1 if it’s a banana); we could
@ -1107,8 +1057,8 @@ performance by taking advantages of the characteristics of modern CPUs:
 function calls) to keep the CPU instruction processing pipeline busy and avoid branch
 mispredictions,
 * making use of parallelism such as multiple threads and single-instruction-multi-data (SIMD)
-  instructions [[79](/en/ch4#Boncz2005),
+ instructions [[^79],
-  [80](/en/ch4#Zhou2002)], and
+ [^80]], and
 * operating directly on compressed data without decoding it into a separate in-memory
 representation, which saves memory allocation and copying costs.
@ -1123,8 +1073,7 @@ expanded query.
 When the underlying data changes, a materialized view needs to be updated accordingly. Some
 databases can do that automatically, and there are also systems such as Materialize that specialize
-in materialized view maintenance
+in materialized view maintenance [^81].
 [^81].
 Performing such updates means more work on writes, but materialized views can improve read
 performance in workloads that repeatedly need to perform the same queries.
@ -1133,8 +1082,7 @@ discussed earlier, data warehouse queries often involve an aggregate function, s
 `AVG`, `MIN`, or `MAX` in SQL. If the same aggregates are used by many different queries, it can be
 wasteful to crunch through the raw data every time. Why not cache some of the counts or sums that
 queries use most often? A *data cube* or *OLAP cube* does this by creating a grid of aggregates
-grouped by different dimensions
+grouped by different dimensions [^82].
 [^82].
 [Figure 4-10](/en/ch4#fig_data_cube) shows an example.
 ![ddia 0410](/fig/ddia_0410.png)
@ -1197,16 +1145,12 @@ longitude), or all the restaurants in a range of longitudes (but anywhere betwee
 South poles), but not both simultaneously.
 One option is to translate a two-dimensional location into a single number using a space-filling
-curve, and then to use a regular B-tree index
+curve, and then to use a regular B-tree index [^83].
-[^83].
+More commonly, specialized spatial indexes such as R-trees or Bkd-trees [^84]
 More commonly, specialized spatial indexes such as R-trees or Bkd-trees
 [^84]
 are used; they divide up the space so that nearby data points tend to be grouped in the same
 subtree. For example, PostGIS implements geospatial indexes as R-trees using PostgreSQL’s
-Generalized Search Tree indexing facility
+Generalized Search Tree indexing facility [^85].
-[^85].
+It is also possible to use regularly spaced grids of triangles, squares, or hexagons [^86].
 It is also possible to use regularly spaced grids of triangles, squares, or hexagons
 [^86].
 Multi-dimensional indexes are not just for geographic locations. For example, on an ecommerce
 website you could use a three-dimensional index on the dimensions (*red*, *green*, *blue*) to search
@ -1215,14 +1159,12 @@ two-dimensional index on (*date*, *temperature*) in order to efficiently search
 observations during the year 2013 where the temperature was between 25 and 30℃. With a
 one-dimensional index, you would have to either scan over all the records from 2013 (regardless of
 temperature) and then filter them by temperature, or vice versa. A 2D index could narrow down by
-timestamp and temperature simultaneously
+timestamp and temperature simultaneously [^87].
 [^87].
 ## Full-Text Search
 Full-text search allows you to search a collection of text documents (web pages, product
-descriptions, etc.) by keywords that might appear anywhere in the text
+descriptions, etc.) by keywords that might appear anywhere in the text [^88].
 [^88].
 Information retrieval is a big, specialist topic that often involves language-specific processing:
 for example, several Asian languages are written without spaces or punctuation between words, and
 therefore splitting text into words requires a model that indicates which character sequences
@ -1249,26 +1191,21 @@ warehouse query that searches for rows matching two conditions ([Figure 4-9](/e
 bitmaps for terms *x* and *y* and compute their bitwise AND. Even if the bitmaps are run-length
 encoded, this can be done very efficiently.
-For example, Lucene, the full-text indexing engine used by Elasticsearch and Solr, works like this
+For example, Lucene, the full-text indexing engine used by Elasticsearch and Solr, works like this [^90].
 [^90].
 It stores the mapping from term to postings list in SSTable-like sorted files, which are merged in
-the background using the same log-structured approach we saw earlier in this chapter
+the background using the same log-structured approach we saw earlier in this chapter [^91].
 [^91].
 PostgreSQL’s GIN index type also uses postings lists to support full-text search and indexing inside
 JSON documents
-[[92](/en/ch4#Fittl2021),
+[[^92], [^93]].
 [93](/en/ch4#Angelakos2020)].
 Instead of breaking text into words, an alternative is to find all the substrings of length *n*,
 which are called *n*-grams. For example, the trigrams (*n* = 3) of the string
 `"hello"` are `"hel"`, `"ell"`, and `"llo"`. If we build an inverted index of all trigrams, we can
 search the documents for arbitrary substrings that are at least three characters long. Trigram
-indexes even allows regular expressions in search queries; the downside is that they are quite large
+indexes even allows regular expressions in search queries; the downside is that they are quite large [^94].
 [^94].
 To cope with typos in documents or queries, Lucene is able to search text for words within a certain
-edit distance (an edit distance of 1 means that one letter has been added, removed, or replaced)
+edit distance (an edit distance of 1 means that one letter has been added, removed, or replaced) [^95].
 [^95].
 It does this by storing the set of terms as a finite state automaton over the characters in the
 keys, similar to a *trie*
 [^96],
@ -1309,12 +1246,9 @@ measure the distance between vectors. Cosine similarity measures the cosine of t
 vectors to determine how close they are, while Euclidean distance measures the straight-line
 distance between two points in space.
-Many early embedding models such as Word2Vec
+Many early embedding models such as Word2Vec [^98],
-[^98],
+BERT [^99],
-BERT
+and GPT [^100]
 [^99],
 and GPT
 [^100]
 worked with text data. Such models are usually implemented as neural networks. Researchers went on to
 create embedding models for video, audio, and images as well. More recently, model
 architecture has become *multimodal*: a single model can generate vector embeddings for multiple
@ -1357,16 +1291,13 @@ Hierarchical Navigable Small World (HNSW)
 ###### Figure 4-11. Searching for the database entry that is closest to a given query vector in a HNSW index.
 Many popular vector databases implement IVF and HNSW indexes. Facebook’s Faiss library has many
-variations of each
+variations of each [^101],
-[^101],
+and PostgreSQL’s pgvector supports both as well [^102].
 and PostgreSQL’s pgvector supports both as well
 [^102].
 The full details of the IVF and HNSW algorithms are beyond the scope of this book, but their papers
 are an excellent resource
-[[103](/en/ch4#Baranchuk2018),
+[[^103], [^104]].
 [104](/en/ch4#Malkov2020)].
-# Summary
+## Summary
 In this chapter we tried to get to the bottom of how databases perform storage and retrieval. What
 happens when you store data in a database, and what does the database do when you query for the
@ -1413,10 +1344,11 @@ Although this chapter couldn’t make you an expert in tuning any one particular
 has hopefully equipped you with enough vocabulary and ideas that you can make sense of the
 documentation for the database of your choice.
 ##### Footnotes
-##### References
+
 ### Summary
--- a/content/en/ch5.md
+++ b/content/en/ch5.md
@ -118,12 +118,10 @@ restored with minimal additional code. However, they also have a number of deep
 yourself to your current programming language for potentially a very long time, and precluding
 integrating your systems with those of other organizations (which may use different languages).
 * In order to restore data in the same object types, the decoding process needs to be able to
-  instantiate arbitrary classes. This is frequently a source of security problems
+ instantiate arbitrary classes. This is frequently a source of security problems [^1]:
  [^1]:
 if an attacker can get your application to decode an arbitrary byte sequence, they can instantiate
 arbitrary classes, which in turn often allows them to do terrible things such as remotely
-  executing arbitrary code [[2](/en/ch5#Breen2015),
+ executing arbitrary code [^2] [^3].
  [3](/en/ch5#McKenzie2013)].
 * Versioning data is often an afterthought in these libraries: as they are intended for quick and
 easy encoding of data, they often neglect the inconvenient problems of forward and backward
 compatibility [^4].
@ -138,8 +136,7 @@ other than very transient purposes.
 When moving to standardized encodings that can be written and read by many programming languages, JSON
 and XML are the obvious contenders. They are widely known, widely supported, and almost as widely
-disliked. XML is often criticized for being too verbose and unnecessarily complicated
+disliked. XML is often criticized for being too verbose and unnecessarily complicated [^6].
 [^6].
 JSON’s popularity is mainly due to its built-in support in web browsers and simplicity relative to
 XML. CSV is another popular language-independent format, but it only supports tabular data without
 nesting.
@ -155,8 +152,7 @@ problems:
 This is a problem when dealing with large numbers; for example, integers greater than 253 cannot
 be exactly represented in an IEEE 754 double-precision floating-point number, so such numbers become
-  inaccurate when parsed in a language that uses floating-point numbers, such as JavaScript
+ inaccurate when parsed in a language that uses floating-point numbers, such as JavaScript [^7].
  [^7].
 An example of numbers larger than 253 occurs on X (formerly Twitter), which uses a 64-bit number to
 identify each post. The JSON returned by the API includes post IDs twice, once as a JSON number and
 once as a decimal string, to work around the fact that the numbers are not correctly parsed by
@ -173,8 +169,7 @@ problems:
 * CSV does not have any schema, so it is up to the application to define the meaning of each row and
 column. If an application change adds a new row or column, you have to handle that change manually.
 CSV is also a quite vague format (what happens if a value contains a comma or a newline character?).
-  Although its escaping rules have been formally specified
+ Although its escaping rules have been formally specified [^9],
  [^9],
 not all parsers implement them correctly.
 Despite these flaws, JSON, XML, and CSV are good enough for many purposes. It’s likely that they will
@ -211,7 +206,7 @@ JSON Schema so that keys may only contain digits, and values can only be strings
 ##### Example 5-1. Example JSON Schema with integer keys and string values. Integer keys are represented as strings containing only integers since JSON Schema requires all keys to be strings.
-```
+```json
 {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
@ -229,8 +224,7 @@ if/else schema logic, named types, references to remote schemas, and much more.
 for a very powerful schema language. Such features also make for unwieldy definitions. It can be
 challenging to resolve remote schemas, reason about conditional rules, or evolve schemas in a
 forwards or backwards compatible way [^10].
-Similar concerns apply to XML Schema
+Similar concerns apply to XML Schema [^11].
 [^11].
 ### Binary encoding
@ -286,8 +280,7 @@ In the following sections we will see how we can do much better, and encode the
 ## Protocol Buffers
 Protocol Buffers (protobuf) is a binary encoding library developed at Google.
-It is similar to Apache Thrift, which was originally developed by Facebook
+It is similar to Apache Thrift, which was originally developed by Facebook [^13];
 [^13];
 most of what this section says about Protocol Buffers applies also to Thrift.
 Protocol Buffers requires a schema for any data that is encoded. To encode the data
@ -381,8 +374,7 @@ value won’t fit in 32 bits, it will be truncated.
 Apache Avro is another binary encoding format that is interestingly different from Protocol Buffers.
 It was started in 2009 as a subproject of Hadoop, as a result of Protocol Buffers not being a good
-fit for Hadoop’s use cases
+fit for Hadoop’s use cases [^15].
 [^15].
 Avro also uses a schema to specify the structure of the data being encoded. It has two schema
 languages: one (Avro IDL) intended for human editing, and one (based on JSON) that is more easily
@ -455,8 +447,7 @@ application code is expecting, and their types.
 If the reader’s and writer’s schema are the same, decoding is easy. If they are different, Avro
 resolves the differences by looking at the writer’s schema and the reader’s schema side by side and
 translating the data from the writer’s schema into the reader’s schema. The Avro specification
-[[16](/en/ch5#AvroSpec),
+[[^16], [^17]]
 [17](/en/ch5#AvroParsing)]
 defines exactly how this resolution works, and it is illustrated in
 [Figure 5-6](/en/ch5#fig_encoding_avro_resolution).
@ -536,8 +527,7 @@ Sending records over a network connection
 connection. The Avro RPC protocol (see [“Dataflow Through Services: REST and RPC”](/en/ch5#sec_encoding_dataflow_rpc)) works like this.
 A database of schema versions is a useful thing to have in any case, since it acts as documentation
-and gives you a chance to check schema compatibility
+and gives you a chance to check schema compatibility [^21].
 [^21].
 As the version number, you could use a simple incrementing integer, or you could use a hash of the
 schema.
@ -581,13 +571,10 @@ languages.
 The ideas on which these encodings are based are by no means new. For example, they have a lot in
 common with ASN.1, a schema definition language that was first standardized in 1984
-[[23](/en/ch5#Larmouth1999),
+[[^23], [^24]].
 [24](/en/ch5#Kaliski1993)].
 It was used to define various network protocols, and its binary encoding (DER) is still used to encode
-SSL certificates (X.509), for example
+SSL certificates (X.509), for example [^25].
-[^25].
+ASN.1 supports schema evolution using tag numbers, similar to Protocol Buffers [^26].
 ASN.1 supports schema evolution using tag numbers, similar to Protocol Buffers
 [^26].
 However, it’s also very complex and badly documented, so ASN.1
 is probably not a good choice for new applications.
@ -681,8 +668,7 @@ versions of the schema.
 More complex schema changes—for example, changing a single-valued attribute to be multi-valued, or
 moving some data into a separate table—still require data to be rewritten, often at the application
 level [^27].
-Maintaining forward and backward compatibility across such migrations is still a research problem
+Maintaining forward and backward compatibility across such migrations is still a research problem [^28].
 [^28].
 ### Archival storage
@ -722,8 +708,7 @@ application-specific, and the client and server need to agree on the details of
 In some ways, services are similar to databases: they typically allow clients to submit and query
 data. However, while databases allow arbitrary queries using the query languages we discussed in
 [Chapter 3](/en/ch3#ch_datamodels), services expose an application-specific API that only allows inputs and outputs
-that are predetermined by the business logic (application code) of the service
+that are predetermined by the business logic (application code) of the service [^29]. This restriction provides a degree of encapsulation: services can impose
 [^29]. This restriction provides a degree of encapsulation: services can impose
 fine-grained restrictions on what clients can and cannot do.
 A key design goal of a service-oriented/microservices architecture is to make the application easier
@ -752,8 +737,7 @@ different contexts. For example:
 systems, or OAuth for shared access to user data.
 The most popular service design philosophy is REST, which builds upon the principles of HTTP
-[[30](/en/ch5#Fielding2000),
+[[^30], [^31]].
 [31](/en/ch5#Fielding2008)].
 It emphasizes simple data formats, using URLs for identifying resources and using HTTP features for
 cache control, authentication, and content type negotiation. An API designed according to the
 principles of REST is called *RESTful*.
@ -763,8 +747,7 @@ format to send and expect in response. Even if a service adopts RESTful design p
 need to somehow find out these details. Service developers often use an interface definition
 language (IDL) to define and document their service’s API endpoints and data models, and to evolve
 them over time. Other developers can then use the service definition to determine how to query the
-service. The two most popular service IDLs are OpenAPI (also known as Swagger
+service. The two most popular service IDLs are OpenAPI (also known as Swagger [^32])
 [^32])
 and gRPC. OpenAPI is used for web services that send and receive JSON data, while gRPC services send
 and receive Protocol Buffers.
@ -841,17 +824,14 @@ Architecture (CORBA) is excessively complex, and does not provide backward or fo
 compatibility [^33].
 SOAP and the WS-\* web services framework aim to provide interoperability across vendors, but are
 also plagued by complexity and compatibility problems
-[[34](/en/ch5#Lacey2006),
+[[^34], [^35], [^36]].
 [35](/en/ch5#Tilkov2006),
 [36](/en/ch5#Bray2004)].
 All of these are based on the idea of a *remote procedure call* (RPC), which has been around since
 the 1970s [^37].
 The RPC model tries to make a request to a remote network service look the same as calling a function or
 method in your programming language, within the same process (this abstraction is called *location
 transparency*). Although RPC seems convenient at first, the approach is fundamentally flawed
-[[38](/en/ch5#Waldo1994),
+[[^38], [^39]].
 [39](/en/ch5#Vinoski2008)].
 A network request is very different from a local function call:
 * A local function call is predictable and either succeeds or fails, depending only on parameters
@ -978,8 +958,7 @@ version of the API it wants to use [^42]).
 For RESTful APIs, common approaches are to use a version
 number in the URL or in the HTTP `Accept` header. For services that use API keys to identify a
 particular client, another option is to store a client’s requested API version on the server and to
-allow this version selection to be updated through a separate administrative interface
+allow this version selection to be updated through a separate administrative interface [^43].
 [^43].
 ## Durable Execution and Workflows
@ -994,8 +973,7 @@ the credit card, and call the banking service to deposit debited funds, as shown
 [Figure 5-7](/en/ch5#fig_encoding_workflow). We call this sequence of steps a *workflow*, and each step a *task*.
 Workflows are typically defined as a graph of tasks. Workflow definitions may be written in a
 general-purpose programming language, a domain specific language (DSL), or a markup language such as
-Business Process Execution Language (BPEL)
+Business Process Execution Language (BPEL) [^44].
 [^44].
 # Tasks, Activities, and Functions
@ -1038,8 +1016,7 @@ task fails, the framework will re-execute the task, but will skip any RPC calls
 that the task made successfully before failing. Instead, the framework will pretend to make the
 call, but will instead return the results from the previous call. This is possible because durable
 execution frameworks log all RPCs and state changes to durable storage like a write-ahead log
-[[45](/en/ch5#TemporalService),
+[[^45], [^46]].
 [46](/en/ch5#Ewen2023)].
 [Example 5-5](/en/ch5#fig_temporal_workflow) shows an example of a workflow definition that supports durable execution
 using Temporal.
@ -1067,16 +1044,13 @@ class PaymentWorkflow:
 Frameworks like Temporal are not without their challenges. External services, such as the
 third-party payment gateway in our example, must still provide an idempotent API. Developers must
-remember to use unique IDs for these APIs to prevent duplicate execution
+remember to use unique IDs for these APIs to prevent duplicate execution [^47].
 [^47].
 And because durable execution frameworks log each RPC call in order, it expects a subsequent
 execution to make the same RPC calls in the same order. This makes code changes brittle: you
-might introduce undefined behavior simply by re-ordering function calls
+might introduce undefined behavior simply by re-ordering function calls [^48].
 [^48].
 Instead of modifying the code of an existing workflow, it is safer to deploy a new version of the
 code separately, so that re-executions of existing workflow invocations continue to use the old
-version, and only new invocations use the new code
+version, and only new invocations use the new code [^49].
 [^49].
 Similarly, because durable execution frameworks expect to replay all code deterministically (the
 same inputs produce the same outputs), nondeterministic code such as random number generators or
@ -1097,8 +1071,7 @@ how encoded data can flow from one process to another. A request is called an *e
 unlike RPC, the sender usually does not wait for the recipient to process the event. Moreover,
 events are typically not sent to the recipient via a direct network connection, but go via an
 intermediary called a *message broker* (also called an *event broker*, *message queue*, or
-*message-oriented middleware*), which stores the message temporarily.
+*message-oriented middleware*), which stores the message temporarily. [^50].
 [^50].
 Using a message broker has several advantages compared to direct RPC:
@ -1136,7 +1109,7 @@ Message brokers typically don’t enforce any particular data model—a message
 bytes with some metadata, so you can use any encoding format. A common approach is to use Protocol
 Buffers, Avro, or JSON, and to deploy a schema registry alongside the message broker to store all
 the valid schema versions and check their compatibility
-[[19](/en/ch5#ConfluentSchemaReg), [21](/en/ch5#Kreps2015)].
+[[^19], [^21]].
 AsyncAPI, a messaging-based equivalent of OpenAPI, can also be used to specify the schema of
 messages.
@ -1160,8 +1133,7 @@ sending and receiving asynchronous messages. Message delivery is not guaranteed:
 scenarios, messages will be lost. Since each actor processes only one message at a time, it doesn’t
 need to worry about threads, and each actor can be scheduled independently by the framework.
-In *distributed actor frameworks* such as Akka, Orleans
+In *distributed actor frameworks* such as Akka, Orleans [^51],
 [^51],
 and Erlang/OTP, this programming model is used to scale an application across
 multiple nodes. The same message-passing mechanism is used, no matter whether the sender and recipient
 are on the same node or different nodes. If they are on different nodes, the message is
@ -1178,7 +1150,7 @@ application, you still have to worry about forward and backward compatibility, a
 sent from a node running the new version to a node running the old version, and vice versa. This can
 be achieved by using one of the encodings discussed in this chapter.
-# Summary
+## Summary
 In this chapter we looked at several ways of turning data structures into bytes on the network or
 bytes on disk. We saw how the details of these encodings affect not only their efficiency, but more
@ -1222,10 +1194,11 @@ encodings are important:
 We can conclude that with a bit of care, backward/forward compatibility and rolling upgrades are
 quite achievable. May your application’s evolution be rapid and your deployments be frequent.
 ##### Footnotes
-##### References
+
 ### Summary
 [^1]: [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html). Common Weakness Enumeration, *cwe.mitre.org*, July 2006. Archived at [perma.cc/26EU-UK9Y](https://perma.cc/26EU-UK9Y) 
--- a/content/en/ch6.md
+++ b/content/en/ch6.md
@ -11,7 +11,7 @@ breadcrumbs: false
 > Douglas Adams, *Mostly Harmless* (1992)
 *Replication* means keeping a copy of the same data on multiple machines that are connected via a
-network. As discussed in [“Distributed versus Single-Node Systems”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch01.html#sec_introduction_distributed), there are several reasons
+network. As discussed in [“Distributed versus Single-Node Systems”](/ch01.html#sec_introduction_distributed), there are several reasons
 why you might want to replicate data:
 * To keep data geographically close to your users (and thus reduce access latency)
@ -19,7 +19,7 @@ why you might want to replicate data:
 * To scale out the number of machines that can serve read queries (and thus increase read throughput)
 In this chapter we will assume that your dataset is small enough that each machine can hold a copy of
-the entire dataset. In [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding) we will relax that assumption and discuss *sharding*
+the entire dataset. In [Chapter 7](/ch07.html#ch_sharding) we will relax that assumption and discuss *sharding*
 (*partitioning*) of datasets that are too big for a single machine. In later chapters we will discuss
 various kinds of faults that can occur in a replicated data system, and how to deal with them.
@ -36,10 +36,8 @@ in databases, and although the details vary by database, the general principles
 many different implementations. We will discuss the consequences of such choices in this chapter.
 Replication of databases is an old topic—the principles haven’t changed much since they were
-studied in the 1970s
+studied in the 1970s [^1], because the fundamental constraints of networks have remained the same. Despite being so old,
-[^1],
+concepts such as *eventual consistency* still cause confusion. In [“Problems with Replication Lag”](/ch06.html#sec_replication_lag) we will
 because the fundamental constraints of networks have remained the same. Despite being so old,
 concepts such as *eventual consistency* still cause confusion. In [“Problems with Replication Lag”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lag) we will
 get more precise about eventual consistency and discuss things like the *read-your-writes* and
 *monotonic reads* guarantees.
@ -52,7 +50,7 @@ delete some data, replication doesn’t help since the deletion will have also b
 replicas, so you need a backup if you want to restore the deleted data.
 In fact, replication and backups are often complementary to each other. Backups are sometimes part
-of the process of setting up replication, as we shall see in [“Setting Up New Followers”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_new_replica).
+of the process of setting up replication, as we shall see in [“Setting Up New Followers”](/ch06.html#sec_replication_new_replica).
 Conversely, archiving replication logs can be part of a backup process.
 Some databases internally maintain immutable snapshots of past states, which serve as a kind of
@ -69,7 +67,7 @@ question inevitably arises: how do we ensure that all the data ends up on all th
 Every write to the database needs to be processed by every replica; otherwise, the replicas would no
 longer contain the same data. The most common solution is called *leader-based replication*,
 *primary-backup*, or *active/passive*. It works as follows (see
-[Figure 6-1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_leader_follower)):
+[Figure 6-1](/ch06.html#fig_replication_leader_follower)):
 1. One of the replicas is designated the *leader* (also known as *primary* or *source*
   [^2]).
@ -88,9 +86,9 @@ longer contain the same data. The most common solution is called *leader-based r
 ###### Figure 6-1. Single-leader replication directs all writes to a designated leader, which sends a stream of changes to the follower replicas.
-If the database is sharded (see [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding)), each shard has one leader. Different shards may
+If the database is sharded (see [Chapter 7](/ch07.html#ch_sharding)), each shard has one leader. Different shards may
 have their leaders on different nodes, but each shard must nevertheless have one leader node. In
-[“Multi-Leader Replication”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_multi_leader) we will discuss an alternative model in which a system may have
+[“Multi-Leader Replication”](/ch06.html#sec_replication_multi_leader) we will discuss an alternative model in which a system may have
 multiple leaders for the same shard at the same time.
 Single-leader replication is very widely used. It’s a built-in feature of many relational databases,
@ -106,7 +104,7 @@ Many consensus algorithms such as Raft, which is used for replication in Cockroa
 TiDB [^7],
 etcd, and RabbitMQ quorum queues (among others), are also based on a single leader, and
 automatically elect a new leader if the old one fails (we will discuss consensus in more detail in
-[Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency)).
+[Chapter 10](/ch10.html#ch_consistency)).
 > [!NOTE]
 > In older documents you may see the term *master–slave replication*. It means the same as
@ -119,17 +117,17 @@ An important detail of a replicated system is whether the replication happens *s
 *asynchronously*. (In relational databases, this is often a configurable option; other systems are
 often hardcoded to be either one or the other.)
-Think about what happens in [Figure 6-1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_leader_follower), where the user of a website updates
+Think about what happens in [Figure 6-1](/ch06.html#fig_replication_leader_follower), where the user of a website updates
 their profile image. At some point in time, the client sends the update request to the leader;
 shortly afterward, it is received by the leader. At some point, the leader forwards the data change
 to the followers. Eventually, the leader notifies the client that the update was successful.
-[Figure 6-2](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_sync_replication) shows one possible way how the timings could work out.
+[Figure 6-2](/ch06.html#fig_replication_sync_replication) shows one possible way how the timings could work out.
 ![ddia 0602](/fig/ddia_0602.png)
 ###### Figure 6-2. Leader-based replication with one synchronous and one asynchronous follower.
-In the example of [Figure 6-2](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_sync_replication), the replication to follower 1 is
+In the example of [Figure 6-2](/ch06.html#fig_replication_sync_replication), the replication to follower 1 is
 *synchronous*: the leader waits until follower 1 has confirmed that it received the write before
 reporting success to the user, and before making the write visible to other clients. The replication
 to follower 2 is *asynchronous*: the leader sends the message, but doesn’t wait for a response from
@ -159,9 +157,9 @@ called *semi-synchronous*.
 In some systems, a *majority* (e.g., 3 out of 5 replicas, including the leader) of replicas is
 updated synchronously, and the remaining minority is asynchronous. This is an example of a *quorum*,
-which we will discuss further in [“Quorums for reading and writing”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_quorum_condition). Majority quorums are often
+which we will discuss further in [“Quorums for reading and writing”](/ch06.html#sec_replication_quorum_condition). Majority quorums are often
 used in systems that use a consensus protocol for automatic leader election, which we will return to
-in [Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency).
+in [Chapter 10](/ch10.html#ch_consistency).
 Sometimes, leader-based replication is configured to be completely asynchronous. In this case, if the
 leader fails and is not recoverable, any writes that have not yet been replicated to followers are
@ -172,7 +170,7 @@ processing writes, even if all of its followers have fallen behind.
 Weakening durability may sound like a bad trade-off, but asynchronous replication is nevertheless
 widely used, especially if there are many followers or if they are geographically distributed
 [^9].
-We will return to this issue in [“Problems with Replication Lag”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lag).
+We will return to this issue in [“Problems with Replication Lag”](/ch06.html#sec_replication_lag).
 ## Setting Up New Followers
@ -224,8 +222,8 @@ for live queries. Storing database data in object storage has many benefits:
  durability guarantees. This also allows databases to bypass inter-zone network fees.
 * Databases can use an object store’s *conditional write* feature—essentially, a *compare-and-set*
  (CAS) operation—to implement transactions and leadership election
-  [[10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Morling2024_ch6),
+  [[10](/ch06.html#Morling2024_ch6),
-  [11](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Chandramohan2024)]).
+  [11](/ch06.html#Chandramohan2024)]).
 * Storing data from multiple databases in the same object store can simplify data integration,
  particularly when open formats such as Apache Parquet and Apache Iceberg are used.
@ -312,10 +310,10 @@ consists of the following steps:
   [^13].
   The best candidate for leadership is usually the replica with the most up-to-date data changes
   from the old leader (to minimize any data loss). Getting all the nodes to agree on a new leader
-   is a consensus problem, discussed in detail in [Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency).
+   is a consensus problem, discussed in detail in [Chapter 10](/ch10.html#ch_consistency).
 3. *Reconfiguring the system to use the new leader.* Clients now need to send
   their write requests to the new leader (we discuss this
-   in [“Request Routing”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#sec_sharding_routing)). If the old leader comes back, it might still believe that it is
+   in [“Request Routing”](/ch07.html#sec_sharding_routing)). If the old leader comes back, it might still believe that it is
   the leader, not realizing that the other replicas have
   forced it to step down. The system needs to ensure that the old leader becomes a follower and
   recognizes the new leader.
@ -337,10 +335,10 @@ Failover is fraught with things that can go wrong:
  primary keys that were previously assigned by the old leader. These primary keys were also used in
  a Redis store, so the reuse of primary keys resulted in inconsistency between MySQL and Redis,
  which caused some private data to be disclosed to the wrong users.
-* In certain fault scenarios (see [Chapter 9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#ch_distributed)), it could happen that two nodes both believe
+* In certain fault scenarios (see [Chapter 9](/ch09.html#ch_distributed)), it could happen that two nodes both believe
  that they are the leader. This situation is called *split brain*, and it is dangerous: if both
  leaders accept writes, and there is no process for resolving conflicts (see
-  [“Multi-Leader Replication”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_multi_leader)), data is likely to be lost or corrupted. As a safety catch, some
+  [“Multi-Leader Replication”](/ch06.html#sec_replication_multi_leader)), data is likely to be lost or corrupted. As a safety catch, some
  systems have a mechanism to shut down one node if two leaders are detected. However, if this
  mechanism is not carefully designed, you can end up with both nodes being shut down
  [^15].
@ -356,7 +354,7 @@ Failover is fraught with things that can go wrong:
 > [!NOTE]
 > Guarding against split brain by limiting or shutting down old leaders is known as *fencing* or, more
 > emphatically, *Shoot The Other Node In The Head* (STONITH). We will discuss fencing in more detail
-> in [“Distributed Locks and Leases”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_lock_fencing).
+> in [“Distributed Locks and Leases”](/ch09.html#sec_distributed_lock_fencing).
 There are no easy solutions to these problems. For this reason, some operations teams prefer to
 perform failovers manually, even if the software supports automatic failover.
@ -370,7 +368,7 @@ behind by several days could be catastrophic.
 These issues—node failures; unreliable networks; and trade-offs around replica consistency,
 durability, availability, and latency—are in fact fundamental problems in distributed systems.
-In [Chapter 9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#ch_distributed) and [Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency) we will discuss them in greater depth.
+In [Chapter 9](/ch09.html#ch_distributed) and [Chapter 10](/ch10.html#ch_consistency) we will discuss them in greater depth.
 ## Implementation of Replication Logs
@ -401,9 +399,9 @@ break down:
 It is possible to work around those issues—for example, the leader can replace any nondeterministic
 function calls with a fixed return value when the statement is logged so that the followers all get
 the same value. The idea of executing deterministic statements in a fixed order is similar to the
-event sourcing model that we previously discussed in [“Event Sourcing and CQRS”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch03.html#sec_datamodels_events). This approach is
+event sourcing model that we previously discussed in [“Event Sourcing and CQRS”](/ch03.html#sec_datamodels_events). This approach is
 also known as *state machine replication*, and we will discuss the theory behind it in
-[“Using shared logs”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#sec_consistency_smr).
+[“Using shared logs”](/ch10.html#sec_consistency_smr).
 Statement-based replication was used in MySQL before version 5.1. It is still sometimes used today,
 as it is quite compact, but by default MySQL now switches to row-based replication (discussed shortly) if
@ -415,7 +413,7 @@ replication methods.
 ### Write-ahead log (WAL) shipping
-In [Chapter 4](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch04.html#ch_storage) we saw that a write-ahead log is needed to make B-tree storage engines robust:
+In [Chapter 4](/ch04.html#ch_storage) we saw that a write-ahead log is needed to make B-tree storage engines robust:
 every modification is first written to the WAL so that the tree can be restored to a consistent
 state after a crash. Since the WAL contains all the information necessary to restore the indexes and
 heap into a consistent state, we can use the exact same log to build a replica on another node:
@ -423,8 +421,8 @@ besides writing the log to disk, the leader also sends it across the network to
 the follower processes this log, it builds a copy of the exact same files as found on the leader.
 This method of replication is used in PostgreSQL and Oracle, among others
-[[17](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Suzuki2017_ch6),
+[[17](/ch06.html#Suzuki2017_ch6),
-[18](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kapila2012)].
+[18](/ch06.html#Kapila2012)].
 The main disadvantage is that the log describes the data on a very low level: a WAL contains details
 of which bytes were changed in which disk blocks. This makes replication tightly coupled to the
 storage engine. If the database changes its storage format from one version to another, it is
@ -476,7 +474,7 @@ This technique is called *change data capture*, and we will return to it in [Lin
 # Problems with Replication Lag
 Being able to tolerate node failures is just one reason for wanting replication. As mentioned
-in [“Distributed versus Single-Node Systems”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch01.html#sec_introduction_distributed), other reasons are scalability (processing more
+in [“Distributed versus Single-Node Systems”](/ch01.html#sec_introduction_distributed), other reasons are scalability (processing more
 requests than a single machine can handle) and latency (placing replicas geographically closer to
 users).
@ -528,7 +526,7 @@ be read from a follower. This is especially appropriate if data is frequently vi
 occasionally written.
 With asynchronous replication, there is a problem, illustrated in
-[Figure 6-3](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_read_your_writes): if the user views the data shortly after making a write, the
+[Figure 6-3](/ch06.html#fig_replication_read_your_writes): if the user views the data shortly after making a write, the
 new data may not yet have reached the replica. To the user, it looks as though the data they
 submitted was lost, so they will be understandably unhappy.
@ -568,7 +566,7 @@ are various possible techniques. To mention a few:
  [^26].
  The timestamp could be a *logical timestamp* (something that indicates ordering of writes, such as
  the log sequence number) or the actual system clock (in which case clock synchronization becomes
-  critical; see [“Unreliable Clocks”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_clocks)).
+  critical; see [“Unreliable Clocks”](/ch09.html#sec_distributed_clocks)).
 * If your replicas are distributed across regions (for geographical proximity to users or for
  availability), there is additional complexity. Any request that needs to be served by the leader
  must be routed to the region that contains the leader.
@ -604,7 +602,7 @@ zonal outages where one zone goes offline, but they do not protect against regio
 all zones in a region are unavailable. To survive a regional outage, a distributed system must be
 deployed across multiple regions, which can result in higher latencies, lower throughput, and
 increased cloud networking bills. We will discuss these tradeoffs more in
-[“Multi-leader replication topologies”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_topologies). For now, just know that when we say region, we mean a collection of
+[“Multi-leader replication topologies”](/ch06.html#sec_replication_topologies). For now, just know that when we say region, we mean a collection of
 zones/datacenters in a single geographic location.
 ## Monotonic Reads
@ -613,7 +611,7 @@ Our second example of an anomaly that can occur when reading from asynchronous f
 possible for a user to see things *moving backward in time*.
 This can happen if a user makes several reads from different replicas. For example,
-[Figure 6-4](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_monotonic_reads) shows user 2345 making the same query twice, first to a follower
+[Figure 6-4](/ch06.html#fig_replication_monotonic_reads) shows user 2345 making the same query twice, first to a follower
 with little lag, then to a follower with greater lag. (This scenario is quite likely if the user
 refreshes a web page, and each request is routed to a random server.) The first query returns a
 comment that was recently added by user 1234, but the second query doesn’t return anything because
@ -654,7 +652,7 @@ answered it.
 Now, imagine a third person is listening to this conversation through followers. The things said by
 Mrs. Cake go through a follower with little lag, but the things said by Mr. Poons have a longer
-replication lag (see [Figure 6-5](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_consistent_prefix)). This observer would hear the following:
+replication lag (see [Figure 6-5](/ch06.html#fig_replication_consistent_prefix)). This observer would hear the following:
 Mrs. Cake
 :   About ten seconds usually, Mr. Poons.
@ -676,7 +674,7 @@ writes happens in a certain order, then anyone reading those writes will see the
 order.
 This is a particular problem in sharded (partitioned) databases, which we will discuss in
-[Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding). If the database always applies writes in the same order, reads always see a
+[Chapter 7](/ch07.html#ch_sharding). If the database always applies writes in the same order, reads always see a
 consistent prefix, so this anomaly cannot happen. However, in many distributed databases, different
 shards operate independently, so there is no global ordering of writes: when a user reads from the
 database, they may see some parts of the database in an older state and some in a newer state.
@ -684,7 +682,7 @@ database, they may see some parts of the database in an older state and some in
 One solution is to make sure that any writes that are causally related to each other are written to
 the same shard—but in some applications that cannot be done efficiently. There are also algorithms
 that explicitly keep track of causal dependencies, a topic that we will return to in
-[“The “happens-before” relation and concurrency”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_happens_before).
+[“The “happens-before” relation and concurrency”](/ch06.html#sec_replication_happens_before).
 ## Solutions for Replication Lag
@ -700,15 +698,15 @@ synchronously updated follower. However, dealing with these issues in applicatio
 and easy to get wrong.
 The simplest programming model for application developers is to choose a database that provides a
-strong consistency guarantee for replicas such as linearizability (see [Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency)), and ACID
+strong consistency guarantee for replicas such as linearizability (see [Chapter 10](/ch10.html#ch_consistency)), and ACID
-transactions (see [Chapter 8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch08.html#ch_transactions)). This allows you to mostly ignore the challenges that arise
+transactions (see [Chapter 8](/ch08.html#ch_transactions)). This allows you to mostly ignore the challenges that arise
 from replication, and treat the database as if it had just a single node. In the early 2010s the
 *NoSQL* movement promoted the view that these features limited scalability, and that large-scale
 systems would have to embrace eventual consistency.
 However, since then, a number of databases started providing strong consistency and transactions
 while also offering the fault tolerance, high availability, and scalability advantages of a
-distributed database. As mentioned in [“Relational Model versus Document Model”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch03.html#sec_datamodels_history), this trend is known as *NewSQL* to
+distributed database. As mentioned in [“Relational Model versus Document Model”](/ch03.html#sec_datamodels_history), this trend is known as *NewSQL* to
 contrast with NoSQL (although it’s less about SQL specifically, and more about new approaches to
 scalable transaction management).
@ -758,7 +756,7 @@ single-leader replication, the leader has to be in *one* of the regions, and all
 through that region.
 In a multi-leader configuration, you can have a leader in *each* region.
-[Figure 6-6](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_multi_dc) shows what this architecture might look like. Within each region,
+[Figure 6-6](/ch06.html#fig_replication_multi_dc) shows what this architecture might look like. Within each region,
 regular leader–follower replication is used (with followers maybe in a different availability zone
 from the leader); between regions, each region’s leader replicates its changes to the leaders in
 other regions.
@ -798,7 +796,7 @@ Tolerance of network problems
 Consistency
 :   A single-leader system can provide strong consistency guarantees, such as serializable
-    transactions, which we will discuss in [Chapter 8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch08.html#ch_transactions). The biggest downside of multi-leader
+    transactions, which we will discuss in [Chapter 8](/ch08.html#ch_transactions). The biggest downside of multi-leader
    systems is that the consistency they can achieve is much weaker. For example, you can’t guarantee
    that a bank account won’t go negative or that a username is unique: it’s always possible for
    different leaders to process writes that are individually fine (paying out some of the money in an
@ -808,7 +806,7 @@ Consistency
    This is simply a fundamental limitation of distributed systems
    [^28].
    If you need to enforce such constraints, you’re therefore better off with a single-leader system.
-    However, as we will see in [“Dealing with Conflicting Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_write_conflicts), multi-leader systems can still
+    However, as we will see in [“Dealing with Conflicting Writes”](/ch06.html#sec_replication_write_conflicts), multi-leader systems can still
    achieve consistency properties that are useful in a wide range of apps that don’t need such
    constraints.
@ -826,17 +824,17 @@ multi-leader replication is often considered dangerous territory that should be
 ### Multi-leader replication topologies
 A *replication topology* describes the communication paths along which writes are propagated from
-one node to another. If you have two leaders, like in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict), there is
+one node to another. If you have two leaders, like in [Figure 6-9](/ch06.html#fig_replication_write_conflict), there is
 only one plausible topology: leader 1 must send all of its writes to leader 2, and vice versa. With
 more than two leaders, various different topologies are possible. Some examples are illustrated in
-[Figure 6-7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_topologies).
+[Figure 6-7](/ch06.html#fig_replication_topologies).
 ![ddia 0607](/fig/ddia_0607.png)
 ###### Figure 6-7. Three example topologies in which multi-leader replication can be set up.
 The most general topology is *all-to-all*, shown in
-[Figure 6-7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_topologies)(c),
+[Figure 6-7](/ch06.html#fig_replication_topologies)(c),
 in which every leader sends its writes to every other leader. However, more restricted topologies
 are also used: for example a *circular topology* in which each node receives writes from one node
 and forwards those writes (plus any writes of its own) to one other node. Another popular topology
@ -845,7 +843,7 @@ star topology can be generalized to a tree.
 > [!NOTE]
 > Don’t confuse a star-shaped network topology with a *star schema* (see
-> [“Stars and Snowflakes: Schemas for Analytics”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch03.html#sec_datamodels_analytics)), which describes the structure of a data model.
+> [“Stars and Snowflakes: Schemas for Analytics”](/ch03.html#sec_datamodels_analytics)), which describes the structure of a data model.
 In circular and star topologies, a write may need to pass through several nodes before it reaches
 all replicas. Therefore, nodes need to forward data changes they receive from other nodes. To
@ -866,28 +864,28 @@ along different paths, avoiding a single point of failure.
 On the other hand, all-to-all topologies can have issues too. In particular, some network links may
 be faster than others (e.g., due to network congestion), with the result that some replication
-messages may “overtake” others, as illustrated in [Figure 6-8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality).
+messages may “overtake” others, as illustrated in [Figure 6-8](/ch06.html#fig_replication_causality).
 ![ddia 0608](/fig/ddia_0608.png)
 ###### Figure 6-8. With multi-leader replication, writes may arrive in the wrong order at some replicas.
-In [Figure 6-8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality), client A inserts a row into a table on leader 1, and client B
+In [Figure 6-8](/ch06.html#fig_replication_causality), client A inserts a row into a table on leader 1, and client B
 updates that row on leader 3. However, leader 2 may receive the writes in a different order: it may
 first receive the update (which, from its point of view, is an update to a row that does not exist
 in the database) and only later receive the corresponding insert (which should have preceded the
 update).
-This is a problem of causality, similar to the one we saw in [“Consistent Prefix Reads”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_consistent_prefix):
+This is a problem of causality, similar to the one we saw in [“Consistent Prefix Reads”](/ch06.html#sec_replication_consistent_prefix):
 the update depends on the prior insert, so we need to make sure that all nodes process the insert
 first, and then the update. Simply attaching a timestamp to every write is not sufficient, because
 clocks cannot be trusted to be sufficiently in sync to correctly order these events at leader 2 (see
-[Chapter 9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#ch_distributed)).
+[Chapter 9](/ch09.html#ch_distributed)).
 To order these events correctly, a technique called *version vectors* can be used, which we will
-discuss later in this chapter (see [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent)). However, many multi-leader
+discuss later in this chapter (see [“Detecting Concurrent Writes”](/ch06.html#sec_replication_concurrent)). However, many multi-leader
 replication systems don’t use good techniques for ordering updates, leaving them vulnerable to
-issues like the one in [Figure 6-8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality). If you are using multi-leader replication, it
+issues like the one in [Figure 6-8](/ch06.html#fig_replication_causality). If you are using multi-leader replication, it
 is worth being aware of these issues, carefully reading the documentation, and thoroughly testing
 your database to ensure that it really does provide the guarantees you believe it to have.
@ -918,9 +916,9 @@ Sheets for text documents and spreadsheets, Figma for graphics, and Linear for p
 What makes these apps so responsive is that user input is immediately reflected in the user
 interface, without waiting for a network round-trip to the server, and edits by one user are shown
 to their collaborators with low latency
-[[32](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DayRichter2010),
+[[32](/ch06.html#DayRichter2010),
-[33](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Wallace2019),
+[33](/ch06.html#Wallace2019),
-[34](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Artman2023)].
+[34](/ch06.html#Artman2023)].
 This again results in a multi-leader architecture: each web browser tab that has opened the shared
 file is a replica, and any updates that you make to the file are asynchronously replicated to the
@ -938,9 +936,9 @@ those changes.
 A software library that supports this process is called a *sync engine*. Although the idea has
 existed for a long time, the term has recently gained attention
-[[35](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Saafan2024),
+[[35](/ch06.html#Saafan2024),
-[36](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hagoel2024),
+[36](/ch06.html#Hagoel2024),
-[37](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Jayakar2024)].
+[37](/ch06.html#Jayakar2024)].
 An application that allows a user to continue editing a file while offline (which may be implemented
 using a sync engine) is called *offline-first*
 [^38].
@ -970,7 +968,7 @@ approach has a number of advantages:
  offline is the same as having very large network delay.
 * A sync engine simplifies the programming model for frontend apps, compared to performing explicit
  service calls in application code. Every service call requires error handling, as discussed in
-  [“The problems with remote procedure calls (RPCs)”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch05.html#sec_problems_with_rpc): for example, if a request to update data on a server fails, the user
+  [“The problems with remote procedure calls (RPCs)”](/ch05.html#sec_problems_with_rpc): for example, if a request to update data on a server fails, the user
  interface needs to somehow reflect that error. A sync engine allows the app to perform reads and
  writes on local data, which almost never fails, leading to a more declarative programming style
  [^41].
@ -1007,7 +1005,7 @@ a local-first sync engine on end user devices—is that concurrent writes on dif
 lead to conflicts that need to be resolved.
 For example, consider a wiki page that is simultaneously being edited by two users, as shown in
-[Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict). User 1 changes the title of the page from A to B, and user 2
+[Figure 6-9](/ch06.html#fig_replication_write_conflict). User 1 changes the title of the page from A to B, and user 2
 independently changes the title from A to C. Each user’s change is successfully applied to their
 local leader. However, when the changes are asynchronously replicated, a conflict is detected.
 This problem does not occur in a single-leader database.
@ -1017,13 +1015,13 @@ This problem does not occur in a single-leader database.
 ###### Figure 6-9. A write conflict caused by two leaders concurrently updating the same record.
 > [!NOTE]
-> We say that the two writes in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict) are *concurrent* because neither
+> We say that the two writes in [Figure 6-9](/ch06.html#fig_replication_write_conflict) are *concurrent* because neither
 > was “aware” of the other at the time the write was originally made. It doesn’t matter whether the
 > writes literally happened at the same time; indeed, if the writes were made while offline, they
 > might have actually happened some time apart. What matters is whether one write occurred in a state
 > where the other write has already taken effect.
-In [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent) we will tackle the question of how a database can determine
+In [“Detecting Concurrent Writes”](/ch06.html#sec_replication_concurrent) we will tackle the question of how a database can determine
 whether two writes are concurrent. For now we will assume that we can detect conflicts, and we want
 to figure out the best way of resolving them.
@ -1052,13 +1050,13 @@ Another example of conflict avoidance: imagine you want to insert new records an
 IDs for them based on an auto-incrementing counter. If you have two leaders, you could set them up
 so that one leader only generates odd numbers and the other only generates even numbers. That way
 you can be sure that the two leaders won’t concurrently assign the same ID to different records.
-We will discuss other ID assignment schemes in [“ID Generators and Logical Clocks”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#sec_consistency_logical).
+We will discuss other ID assignment schemes in [“ID Generators and Logical Clocks”](/ch10.html#sec_consistency_logical).
 ### Last write wins (discarding concurrent writes)
 If conflicts can’t be avoided, the simplest way of resolving them is to attach a timestamp to each
 write, and to always use the value with the greatest timestamp. For example, in
-[Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict), let’s say that the timestamp of user 1’s write is greater than
+[Figure 6-9](/ch06.html#fig_replication_write_conflict), let’s say that the timestamp of user 1’s write is greater than
 the timestamp of user 2’s write. In that case, both leaders will determine that the new title of the
 page should be B, and they discard the write that sets it to C. If the writes coincidentally have
 the same timestamp, the winner can be chosen by comparing the values (e.g., in the case of strings,
@ -1066,7 +1064,7 @@ taking the one that’s earlier in the alphabet).
 This approach is called *last write wins* (LWW) because the write with the greatest timestamp can be
 considered the “last” one. The term is misleading though, because when two writes are concurrent
-like in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict), which one is older and which is later is undefined, and
+like in [Figure 6-9](/ch06.html#fig_replication_write_conflict), which one is older and which is later is undefined, and
 so the timestamp order of concurrent writes is essentially random.
 Therefore the real meaning of LWW is: when the same record is concurrently written on different
@ -1084,7 +1082,7 @@ Another problem with LWW is that if a real-time clock (e.g. a Unix timestamp) is
 for the writes, the system becomes very sensitive to clock synchronization. If one node has a clock
 that is ahead of the others, and you try to overwrite a value written by that node, your write may
 be ignored as it may have a lower timestamp, even though it clearly occurred later. This problem can
-be solved by using a *logical clock*, which we will discuss in [“ID Generators and Logical Clocks”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#sec_consistency_logical).
+be solved by using a *logical clock*, which we will discuss in [“ID Generators and Logical Clocks”](/ch10.html#sec_consistency_logical).
 ### Manual conflict resolution
@ -1096,7 +1094,7 @@ merge is complete.
 In a database, it would be impractical for a conflict to stop the entire replication process until a
 human has resolved it. Instead, databases typically store all the concurrently written values for a
-given record—for example, both B and C in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict). These values are
+given record—for example, both B and C in [Figure 6-9](/ch06.html#fig_replication_write_conflict). These values are
 sometimes called *siblings*. The next time you query that record, the database returns *all* those
 values, rather than just the latest one. You can then resolve those values in whatever way you want,
 either automatically in application code (for example, you could concatenate B and C into “B/C”), or
@ -1120,7 +1118,7 @@ suffers from a number of problems:
  sibling, but another sibling still contained that old item, the removed item would unexpectedly
  reappear in the customer’s cart
  [^45].
-  [Figure 6-10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_amazon_anomaly) shows an example where Device 1 removes Book from the shopping
+  [Figure 6-10](/ch06.html#fig_replication_amazon_anomaly) shows an example where Device 1 removes Book from the shopping
  cart and concurrently Device 2 removes DVD, but after merging the conflict both items reappear.
 * If multiple nodes observe the conflict and concurrently resolve it, the conflict resolution
  process can itself introduce a new conflict. Those resolutions could even be inconsistent: for
@ -1149,7 +1147,7 @@ updates as much as possible, and hence avoiding data loss:
  same position, it can be ordered deterministically so that all nodes get the same merged outcome.
 * If the data is a collection of items (ordered like a to-do list, or unordered like a shopping
  cart), we can merge it similarly to text by tracking insertions and deletions. To avoid the
-  shopping cart issue in [Figure 6-10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_amazon_anomaly), the algorithms track the fact that Book
+  shopping cart issue in [Figure 6-10](/ch06.html#fig_replication_amazon_anomaly), the algorithms track the fact that Book
  and DVD were deleted, so the merged result is Cart = {Soap}.
 * If the data is an integer representing a counter that can be incremented or decremented (e.g., the
  number of likes on a social media post), the merge algorithm can tell how many increments and
@ -1175,7 +1173,7 @@ Two families of algorithms are commonly used to implement automatic conflict res
 They have different design philosophies and performance characteristics, but both are able to
 perform automatic merges for all the aforementioned types of data.
-[Figure 6-11](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_ot_crdt) shows an example of how OT and a CRDT merge concurrent updates to a
+[Figure 6-11](/ch06.html#fig_replication_ot_crdt) shows an example of how OT and a CRDT merge concurrent updates to a
 text. Assume you have two replicas that both start off with the text “ice”. One replica prepends the
 letter “n” to make “nice”, while concurrently the other replica appends an exclamation mark to make
 “ice!”.
@ -1196,7 +1194,7 @@ OT
 CRDT
 :   Most CRDTs give each character a unique, immutable ID and use those to determine the positions of
-    insertions/deletions, instead of indexes. For example, in [Figure 6-11](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_ot_crdt) we assign
+    insertions/deletions, instead of indexes. For example, in [Figure 6-11](/ch06.html#fig_replication_ot_crdt) we assign
    the ID 1A to “i”, the ID 2A to “c”, etc. When inserting the exclamation mark, we generate an
    operation containing the ID of the new character (4B) and the ID of the existing character after
    which we want to insert (3A). To insert at the beginning of the string we give “nil” as the
@ -1218,7 +1216,7 @@ Sync engines for JSON data can be implemented both with CRDTs (e.g., Automerge o
 ### What is a conflict?
-Some kinds of conflict are obvious. In the example in [Figure 6-9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_write_conflict), two writes
+Some kinds of conflict are obvious. In the example in [Figure 6-9](/ch06.html#fig_replication_write_conflict), two writes
 concurrently modified the same field in the same record, setting it to two different values. There
 is little doubt that this is a conflict.
@ -1232,7 +1230,7 @@ are made on two different leaders.
 There isn’t a quick ready-made answer, but in the following chapters we will trace a path toward a
 good understanding of this problem. We will see some more examples of conflicts in
-[Chapter 8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch08.html#ch_transactions), and in [Link to Come] we will discuss scalable approaches for detecting and
+[Chapter 8](/ch08.html#ch_transactions), and in [Link to Come] we will discuss scalable approaches for detecting and
 resolving conflicts in a replicated system.
 # Leaderless Replication
@ -1245,8 +1243,8 @@ writes in the same order.
 Some data storage systems take a different approach, abandoning the concept of a leader and
 allowing any replica to directly accept writes from clients. Some of the earliest replicated data
-systems were leaderless [[1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lindsay1979_ch6),
+systems were leaderless [[1](/ch06.html#Lindsay1979_ch6),
-[50](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gifford1979)], but the
+[50](/ch06.html#Gifford1979)], but the
 idea was mostly forgotten during the era of dominance of relational databases. It once again became
 a fashionable architecture for databases after Amazon used it for its in-house *Dynamo* system in
 2007 [^45].
@ -1270,10 +1268,10 @@ profound consequences for the way the database is used.
 Imagine you have a database with three replicas, and one of the replicas is currently
 unavailable—perhaps it is being rebooted to install a system update. In a single-leader
 configuration, if you want to continue processing writes, you may need to perform a failover (see
-[“Handling Node Outages”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_failover)).
+[“Handling Node Outages”](/ch06.html#sec_replication_failover)).
 On the other hand, in a leaderless configuration, failover does not exist.
-[Figure 6-12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_node_outage) shows what happens: the client (user 1234) sends the write to
+[Figure 6-12](/ch06.html#fig_replication_quorum_node_outage) shows what happens: the client (user 1234) sends the write to
 all three replicas in parallel, and the two available replicas accept the write but the unavailable
 replica misses it. Let’s say that it’s sufficient for two out of three replicas to
 acknowledge the write: after user 1234 has received two *ok* responses, we consider the write to be
@ -1294,9 +1292,9 @@ stale value from another.
 In order to tell which responses are up-to-date and which are outdated, every value that is written
 needs to be tagged with a version number or timestamp, similarly to what we saw in
-[“Last write wins (discarding concurrent writes)”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lww). When a client receives multiple values in response to a read, it uses the
+[“Last write wins (discarding concurrent writes)”](/ch06.html#sec_replication_lww). When a client receives multiple values in response to a read, it uses the
 one with the greatest timestamp (even if that value was only returned by one replica, and several
-other replicas returned older values). See [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent) for more details.
+other replicas returned older values). See [“Detecting Concurrent Writes”](/ch06.html#sec_replication_concurrent) for more details.
 ### Catching up on missed writes
@ -1306,7 +1304,7 @@ mechanisms are used in Dynamo-style datastores:
 Read repair
 :   When a client makes a read from several nodes in parallel, it can detect any stale responses.
-    For example, in [Figure 6-12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_node_outage), user 2345 gets a version 6 value from
+    For example, in [Figure 6-12](/ch06.html#fig_replication_quorum_node_outage), user 2345 gets a version 6 value from
    replica 3 and a version 7 value from replicas 1 and 2. The client sees that replica 3 has a stale
    value and writes the newer value back to that replica. This approach works well for values that are
    frequently read.
@ -1326,7 +1324,7 @@ Anti-entropy
 ### Quorums for reading and writing
-In the example of [Figure 6-12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_node_outage), we considered the write to be successful
+In the example of [Figure 6-12](/ch06.html#fig_replication_quorum_node_outage), we considered the write to be successful
 even though it was only processed on two out of three replicas. What if only one out of three
 replicas accepted the write? How far can we push this?
@ -1354,7 +1352,7 @@ database writes to fail.
 > [!NOTE]
 > There may be more than *n* nodes in the cluster, but any given value is stored only on *n*
 > nodes. This allows the dataset to be sharded, supporting datasets that are larger than you can fit
-> on one node. We will return to sharding in [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding).
+> on one node. We will return to sharding in [Chapter 7](/ch07.html#ch_sharding).
 The quorum condition, *w* + *r* > *n*, allows the system to tolerate unavailable nodes
 as follows:
@ -1362,9 +1360,9 @@ as follows:
 * If *w* < *n*, we can still process writes if a node is unavailable.
 * If *r* < *n*, we can still process reads if a node is unavailable.
 * With *n* = 3, *w* = 2, *r* = 2 we can tolerate one unavailable
-  node, like in [Figure 6-12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_node_outage).
+  node, like in [Figure 6-12](/ch06.html#fig_replication_quorum_node_outage).
 * With *n* = 5, *w* = 3, *r* = 3 we can tolerate two unavailable nodes.
-  This case is illustrated in [Figure 6-13](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_overlap).
+  This case is illustrated in [Figure 6-13](/ch06.html#fig_replication_quorum_overlap).
 Normally, reads and writes are always sent to all *n* replicas in parallel. The parameters *w* and
 *r* determine how many nodes we wait for—i.e., how many of the *n* nodes need to report success
@ -1386,7 +1384,7 @@ If you have *n* replicas, and you choose *w* and *r* such that *w* + *r* > *n*
 generally expect every read to return the most recent value written for a key. This is the case because the
 set of nodes to which you’ve written and the set of nodes from which you’ve read must overlap. That
 is, among the nodes you read there must be at least one node with the latest value (illustrated in
-[Figure 6-13](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_quorum_overlap)).
+[Figure 6-13](/ch06.html#fig_replication_quorum_overlap)).
 Often, *r* and *w* are chosen to be a majority (more than *n*/2) of nodes, because that ensures
 *w* + *r* > *n* while still tolerating up to *n*/2 (rounded down) node failures. But quorums are
@ -1413,12 +1411,12 @@ properties can be confusing. Some scenarios include:
  value, the number of replicas storing the new value may fall below *w*, breaking the quorum
  condition.
 * While a rebalancing is in progress, where some data is moved from one node to another (see
-  [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding)), nodes may have inconsistent views of which nodes should be holding the *n*
+  [Chapter 7](/ch07.html#ch_sharding)), nodes may have inconsistent views of which nodes should be holding the *n*
  replicas for a particular value. This can result in the read and write quorums no longer
  overlapping.
 * If a read is concurrent with a write operation, the read may or may not see the concurrently
  written value. In particular, it’s possible for one read to see the new value, and a subsequent
-  read to see the old value, as we shall see in [“Linearizability and quorums”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#sec_consistency_quorum_linearizable).
+  read to see the old value, as we shall see in [“Linearizability and quorums”](/ch10.html#sec_consistency_quorum_linearizable).
 * If a write succeeded on some replicas but failed on others (for example because the disks on some
  nodes are full), and overall succeeded on fewer than *w* replicas, it is not rolled back on the
  replicas where it succeeded. This means that if a write was reported as failed, subsequent reads
@ -1426,12 +1424,12 @@ properties can be confusing. Some scenarios include:
  [^52].
 * If the database uses timestamps from a real-time clock to determine which write is newer (as
  Cassandra and ScyllaDB do, for example), writes might be silently dropped if another node with a
-  faster clock has written to the same key—an issue we previously saw in [“Last write wins (discarding concurrent writes)”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lww).
+  faster clock has written to the same key—an issue we previously saw in [“Last write wins (discarding concurrent writes)”](/ch06.html#sec_replication_lww).
-  We will discuss this in more detail in [“Relying on Synchronized Clocks”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_clocks_relying).
+  We will discuss this in more detail in [“Relying on Synchronized Clocks”](/ch09.html#sec_distributed_clocks_relying).
 * If two writes occur concurrently, one of them might be processed first on one replica, and the
  other might be processed first on another replica. This leads to a conflict, similarly to what we
-  saw for multi-leader replication (see [“Dealing with Conflicting Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_write_conflicts)). We will return to this
+  saw for multi-leader replication (see [“Dealing with Conflicting Writes”](/ch06.html#sec_replication_write_conflicts)). We will return to this
-  topic in [“Detecting Concurrent Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_concurrent).
+  topic in [“Detecting Concurrent Writes”](/ch06.html#sec_replication_concurrent).
 Thus, although quorums appear to guarantee that a read returns the latest written value, in practice
 it is not so simple. Dynamo-style databases are generally optimized for use cases that can tolerate
@ -1463,7 +1461,7 @@ able to quantify “eventual.”
 A replication system based on a single leader can provide strong consistency guarantees that are
 difficult or impossible to achieve in a leaderless system. However, as we have seen in
-[“Problems with Replication Lag”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lag), reads in a leader-based replicated system can also return stale values if
+[“Problems with Replication Lag”](/ch06.html#sec_replication_lag), reads in a leader-based replicated system can also return stale values if
 you make them on an asynchronously updated follower.
 Reading from the leader ensures up-to-date responses, but it suffers from performance problems:
@ -1507,7 +1505,7 @@ That said, leaderless systems can have performance problems as well:
  to wait for before a request can complete. Even if you wait only for the fastest *r* or *w*
  replicas to respond, and even if you make the requests in parallel, a bigger *r* or *w* increases
  the chance that you hit a slow replica, increasing the overall response time (see
-  [“Use of Response Time Metrics”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch02.html#sec_introduction_slo_sla)).
+  [“Use of Response Time Metrics”](/ch02.html#sec_introduction_slo_sla)).
 * A large-scale network interruption that disconnects a client from a large number of replicas can
  make it impossible to form a quorum. Some leaderless databases offer a configuration option that
  allows any reachable replica to accept writes, even if it’s not one of the usual replicas for that
@ -1526,7 +1524,7 @@ fault tolerance while also having a high likelihood of reading up-to-date data.
 ### Multi-region operation
 We previously discussed cross-region replication as a use case for multi-leader replication (see
-[“Multi-Leader Replication”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_multi_leader)). Leaderless replication is also suitable for
+[“Multi-Leader Replication”](/ch06.html#sec_replication_multi_leader)). Leaderless replication is also suitable for
 multi-region operation, since it is designed to tolerate conflicting concurrent writes, network
 interruptions, and latency spikes.
@ -1549,7 +1547,7 @@ resulting in conflicts that need to be resolved. Such conflicts may occur as the
 not always: they could also be detected later during read repair, hinted handoff, or anti-entropy.
 The problem is that events may arrive in a different order at different nodes, due to variable
-network delays and partial failures. For example, [Figure 6-14](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_concurrency) shows two clients,
+network delays and partial failures. For example, [Figure 6-14](/ch06.html#fig_replication_concurrency) shows two clients,
 A and B, simultaneously writing to a key *X* in a three-node datastore:
 * Node 1 receives the write from A, but never receives the write from B due to a transient
@ -1563,13 +1561,13 @@ A and B, simultaneously writing to a key *X* in a three-node datastore:
 If each node simply overwrote the value for a key whenever it received a write request from a
 client, the nodes would become permanently inconsistent, as shown by the final *get* request in
-[Figure 6-14](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_concurrency): node 2 thinks that the final value of *X* is B, whereas the other
+[Figure 6-14](/ch06.html#fig_replication_concurrency): node 2 thinks that the final value of *X* is B, whereas the other
 nodes think that the value is A.
 In order to become eventually consistent, the replicas should converge toward the same value. For
 this, we can use any of the conflict resolution mechanisms we previously discussed in
-[“Dealing with Conflicting Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_write_conflicts), such as last-write-wins (used by Cassandra and ScyllaDB),
+[“Dealing with Conflicting Writes”](/ch06.html#sec_replication_write_conflicts), such as last-write-wins (used by Cassandra and ScyllaDB),
-manual resolution, or CRDTs (described in [“CRDTs and Operational Transformation”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_crdts), and used by Riak).
+manual resolution, or CRDTs (described in [“CRDTs and Operational Transformation”](/ch06.html#sec_replication_crdts), and used by Riak).
 Last-write-wins is easy to implement: each write is tagged with a timestamp, and a value with a
 higher timestamp always overwrites a value with a lower timestamp. However, a timestamp doesn’t tell
@ -1582,11 +1580,11 @@ take more care to detect concurrent writes.
 How do we decide whether two operations are concurrent or not? To develop an intuition, let’s look
 at some examples:
-* In [Figure 6-8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality), the two writes are not concurrent: A’s insert *happens before*
+* In [Figure 6-8](/ch06.html#fig_replication_causality), the two writes are not concurrent: A’s insert *happens before*
  B’s increment, because the value incremented by B is the value inserted by A. In other words, B’s
  operation builds upon A’s operation, so B’s operation must have happened later.
  We also say that B is *causally dependent* on A.
-* On the other hand, the two writes in [Figure 6-14](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_concurrency) are concurrent: when each
+* On the other hand, the two writes in [Figure 6-14](/ch06.html#fig_replication_concurrency) are concurrent: when each
  client starts the operation, it does not know that another client is also performing an operation
  on the same key. Thus, there is no causal dependency between the operations.
@ -1607,7 +1605,7 @@ conflict that needs to be resolved.
 It may seem that two operations should be called concurrent if they occur “at the same time”—but
 in fact, it is not important whether they literally overlap in time. Because of problems with clocks
 in distributed systems, it is actually quite difficult to tell whether two things happened
-at exactly the same time—an issue we will discuss in more detail in [Chapter 9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#ch_distributed).
+at exactly the same time—an issue we will discuss in more detail in [Chapter 9](/ch09.html#ch_distributed).
 For defining concurrency, exact time doesn’t matter: we simply call two operations concurrent if
 they are both unaware of each other, regardless of the physical time at which they occurred. People
@ -1629,7 +1627,7 @@ happened before another. To keep things simple, let’s start with a database th
 replica. Once we have worked out how to do this on a single replica, we can generalize the approach
 to a leaderless database with multiple replicas.
-[Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single) shows two clients concurrently adding items to the same
+[Figure 6-15](/ch06.html#fig_replication_causality_single) shows two clients concurrently adding items to the same
 shopping cart. (If that example strikes you as too inane, imagine instead two air traffic
 controllers concurrently adding aircraft to the sector they are tracking.) Initially, the cart is
 empty. Between them, the clients make five writes to the database:
@ -1664,8 +1662,8 @@ empty. Between them, the clients make five writes to the database:
 ###### Figure 6-15. Capturing causal dependencies between two clients concurrently editing a shopping cart.
-The dataflow between the operations in [Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single) is illustrated
+The dataflow between the operations in [Figure 6-15](/ch06.html#fig_replication_causality_single) is illustrated
-graphically in [Figure 6-16](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causal_dependencies). The arrows indicate which operation
+graphically in [Figure 6-16](/ch06.html#fig_replication_causal_dependencies). The arrows indicate which operation
 *happened before* which other operation, in the sense that the later operation *knew about* or
 *depended on* the earlier one. In this example, the clients are never fully up to date with the data
 on the server, since there is always another operation going on concurrently. But old versions of
@ -1673,7 +1671,7 @@ the value do get overwritten eventually, and no writes are lost.
 ![ddia 0616](/fig/ddia_0616.png)
-###### Figure 6-16. Graph of causal dependencies in [Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single).
+###### Figure 6-16. Graph of causal dependencies in [Figure 6-15](/ch06.html#fig_replication_causality_single).
 Note that the server can determine whether two operations are concurrent by looking at the version
 numbers—it does not need to interpret the value itself (so the value could be any data
@ -1699,10 +1697,10 @@ on subsequent reads.
 ### Version vectors
-The example in [Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single) used only a single replica. How does the
+The example in [Figure 6-15](/ch06.html#fig_replication_causality_single) used only a single replica. How does the
 algorithm change when there are multiple replicas, but no leader?
-[Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single) uses a single version number to capture dependencies between
+[Figure 6-15](/ch06.html#fig_replication_causality_single) uses a single version number to capture dependencies between
 operations, but that is not sufficient when there are multiple replicas accepting writes
 concurrently. Instead, we need to use a version number *per replica* as well as per key. Each
 replica increments its own version number when processing a write, and also keeps track of the
@ -1713,14 +1711,14 @@ The collection of version numbers from all the replicas is called a *version vec
 [^58].
 A few variants of this idea are in use, but the most interesting is probably the *dotted version
 vector*
-[[59](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Preguica2010),
+[[59](/ch06.html#Preguica2010),
-[60](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Manepalli2022)],
+[60](/ch06.html#Manepalli2022)],
 which is used in Riak 2.0
-[[61](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Cribbs2014),
+[[61](/ch06.html#Cribbs2014),
-[62](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Brown2015)].
+[62](/ch06.html#Brown2015)].
 We won’t go into the details, but the way it works is quite similar to what we saw in our cart example.
-Like the version numbers in [Figure 6-15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_causality_single), version vectors are sent from the
+Like the version numbers in [Figure 6-15](/ch06.html#fig_replication_causality_single), version vectors are sent from the
 database replicas to clients when values are read, and need to be sent back to the database when a
 value is subsequently written. (Riak encodes the version vector as a string that it calls *causal
 context*.) The version vector allows the database to distinguish between overwrites and concurrent
@ -1734,12 +1732,12 @@ siblings are merged correctly.
 A *version vector* is sometimes also called a *vector clock*, even though they are not quite the
 same. The difference is subtle—please see the references for details
-[[60](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Manepalli2022),
+[[60](/ch06.html#Manepalli2022),
-[63](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Baquero2011),
+[63](/ch06.html#Baquero2011),
-[64](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Schwarz1994)]. In brief, when
+[64](/ch06.html#Schwarz1994)]. In brief, when
 comparing the state of replicas, version vectors are the right data structure to use.
-# Summary
+## Summary
 In this chapter we looked at the issue of replication. Replication can serve several purposes:
@ -1816,10 +1814,10 @@ This chapter has assumed that every replica stores a full copy of the whole data
 unrealistic for large datasets. In the next chapter we will look at *sharding*, which allows each
 machine to store only a subset of the data.
 ##### Footnotes
-##### References
+
 ### Summary
 [^1]: B. G. Lindsay, P. G. Selinger, C. Galtieri, J. N. Gray, R. A. Lorie, T. G. Price, F. Putzolu, I. L. Traiger, and B. W. Wade. [Notes on Distributed Databases](https://dominoweb.draco.res.ibm.com/reports/RJ2571.pdf). IBM Research, Research Report RJ2571(33471), July 1979. Archived at [perma.cc/EPZ3-MHDD](https://perma.cc/EPZ3-MHDD)
--- a/content/en/ch7.md
+++ b/content/en/ch7.md
@ -51,14 +51,12 @@ Some databases treat partitions and shards as two distinct concepts. For example
 partitioning is a way of splitting a large table into several files that are stored on the same
 machine (which has several advantages, such as making it very fast to delete an entire partition),
 whereas sharding splits a dataset across multiple machines
-[[1](/en/ch7#Giordano2023),
+[[^1], [^2]].
 [2](/en/ch7#Leach2022)].
 In many other systems, partitioning is just another word for sharding.
 While *partitioning* is quite descriptive, the term *sharding* is perhaps surprising. According to
 one theory, the term arose from the online role-play game *Ultima Online*, in which a magic crystal
-was shattered into pieces, and each of those shards refracted a copy of the game world
+was shattered into pieces, and each of those shards refracted a copy of the game world [^3].
 [^3].
 The term *shard* thus came to mean one of a set of parallel game servers, and later was carried over
 to databases. Another theory is that *shard* was originally an acronym of *System for Highly
 Available Replicated Data*—reportedly a 1980s database, details of which are lost to history.
@ -87,8 +85,7 @@ single-shard database.
 The reason for this recommendation is that sharding often adds complexity: you typically have to
 decide which records to put in which shard by choosing a *partition key*; all records with the
-same partition key are placed in the same shard
+same partition key are placed in the same shard [^4].
 [^4].
 This choice matters because accessing a record is fast if you know which shard it’s in, but if you
 don’t know the shard you have to do an inefficient search across all shards, and the sharding scheme
 is difficult to change.
@ -107,11 +104,9 @@ some systems don’t support them at all.
 Some systems use sharding even on a single machine, typically running one single-threaded process
 per CPU core to make use of the parallelism in the CPU, or to take advantage of a *nonuniform memory
-access* (NUMA) architecture in which some banks of memory are closer to one CPU than to others
+access* (NUMA) architecture in which some banks of memory are closer to one CPU than to others [^5].
 [^5].
 For example, Redis, VoltDB, and FoundationDB use one process per core, and rely on sharding to
-spread load across CPU cores in the same machine
+spread load across CPU cores in the same machine [^6].
 [^6].
 ## Sharding for Multitenancy
@ -124,8 +119,7 @@ signups, delivery data etc. are separate from those of other businesses.
 Sometimes sharding is used to implement multitenant systems: either each tenant is given a separate
 shard, or multiple small tenants may be grouped together into a larger shard. These shards might be
 physically separate databases (which we previously touched on in [“Embedded storage engines”](/en/ch4#sidebar_embedded)), or
-separately manageable portions of a larger logical database
+separately manageable portions of a larger logical database [^7].
 [^7].
 Using sharding for multitenancy has several advantages:
 Resource isolation
@ -226,8 +220,7 @@ to distribute the data evenly, the shard boundaries need to adapt to the data.
 The shard boundaries might be chosen manually by an administrator, or the database can choose them
 automatically. Manual key-range sharding is used by Vitess (a sharding layer for MySQL), for
 example; the automatic variant is used by Bigtable, its open source equivalent HBase, the
-range-based sharding option in MongoDB, CockroachDB, RethinkDB, and FoundationDB
+range-based sharding option in MongoDB, CockroachDB, RethinkDB, and FoundationDB [^6]. YugabyteDB offers both manual and automatic
 [^6]. YugabyteDB offers both manual and automatic
 tablet splitting.
 Within each shard, keys are stored in sorted order (e.g., in a B-tree or SSTables, as discussed in
@ -241,8 +234,7 @@ A downside of key range sharding is that you can easily get a hot shard if there
 lot of writes to nearby keys. For example, if the key is a timestamp, then the shards correspond to
 ranges of time—e.g., one shard per month. Unfortunately, if you write data from the sensors to the
 database as the measurements happen, all the writes end up going to the same shard (the one for
-this month), so that shard can be overloaded with writes while others sit idle
+this month), so that shard can be overloaded with writes while others sit idle [^13].
 [^13].
 To avoid this problem in the sensor database, you need to use something other than the timestamp as
 the first element of the key. For example, you could prefix each timestamp with the sensor ID so
@ -256,8 +248,7 @@ need to perform a separate range query for each sensor.
 When you first set up your database, there are no key ranges to split into shards. Some databases,
 such as HBase and MongoDB, allow you to configure an initial set of shards on an empty database,
 which is called *pre-splitting*. This requires that you already have some idea of what the key
-distribution is going to look like, so that you can choose appropriate key range boundaries
+distribution is going to look like, so that you can choose appropriate key range boundaries [^14].
 [^14].
 Later on, as your data volume and write throughput grow, a system with key-range sharding grows by
 splitting an existing shard into two or more smaller shards, each of which holds a contiguous
@ -300,8 +291,7 @@ For sharding purposes, the hash function need not be cryptographically strong: f
 uses MD5, whereas Cassandra and ScyllaDB use Murmur3. Many programming languages have simple hash
 functions built in (as they are used for hash tables), but they may not be suitable for sharding:
 for example, in Java’s `Object.hashCode()` and Ruby’s `Object#hash`, the same key may have a
-different hash value in different processes, making them unsuitable for sharding
+different hash value in different processes, making them unsuitable for sharding [^16].
 [^16].
 ### Hash modulo number of nodes
@ -411,16 +401,14 @@ cluster keys for a table. Delta Lake supports both manual and automatic partitio
 supports cluster keys. Clustering data not only improves range scan performance, but can
 improve compression and filtering performance as well.
-Hash-range sharding is used in YugabyteDB and DynamoDB
+Hash-range sharding is used in YugabyteDB and DynamoDB [^17], and is an option in MongoDB.
 [^17], and is an option in MongoDB.
 Cassandra and ScyllaDB use a variant of this approach that is illustrated in
 [Figure 7-6](/en/ch7#fig_sharding_cassandra): the space of hash values is split into a number of ranges proportional
 to the number of nodes (3 ranges per node in [Figure 7-6](/en/ch7#fig_sharding_cassandra), but actual numbers are 8
 per node in Cassandra by default, and 256 per node in ScyllaDB), with random boundaries between
 those ranges. This means some ranges are bigger than others, but by having multiple ranges per node
 those imbalances tend to even out
-[[15](/en/ch7#Evans2013),
+[[^15], [^18]].
 [18](/en/ch7#Williams2012)].
 ![ddia 0706](/fig/ddia_0706.png)
@ -446,10 +434,8 @@ ACID consistency (see [Chapter 8](/en/ch8#ch_transactions)), but rather describ
 the same shard as much as possible.
 The sharding algorithm used by Cassandra and ScyllaDB is similar to the original definition of
-consistent hashing
+consistent hashing [^20],
-[^20],
+but several other consistent hashing algorithms have also been proposed [^21],
 but several other consistent hashing algorithms have also been proposed
 [^21],
 such as *highest random weight*, also known as *rendezvous hashing*
 [^22],
 and *jump consistent hash*
@ -473,11 +459,9 @@ This event can result in a large volume of reads and writes to the same key (whe
 is perhaps the user ID of the celebrity, or the ID of the action that people are commenting on).
 In such situations, a more flexible sharding policy is required
-[[25](/en/ch7#Guo2020),
+[[^25], [^26]].
 [26](/en/ch7#Lee2021)].
 A system that defines shards based on ranges of keys (or ranges of hashes) makes it possible to put
-an individual hot key in a shard by its own, and perhaps even assigning it a dedicated machine
+an individual hot key in a shard by its own, and perhaps even assigning it a dedicated machine [^27].
 [^27].
 It’s also possible to compensate for skew at the application level. For example, if one key is known
 to be very hot, a simple technique is to add a random number to the beginning or end of the key.
@ -518,16 +502,14 @@ Fully automated rebalancing can be convenient, because there is less operational
 normal maintenance, and such systems can even auto-scale to adapt to changes in workload. Cloud
 databases such as DynamoDB are promoted as being able to automatically add and remove shards to
 adapt to big increases or decreases of load within a matter of minutes
-[[17](/en/ch7#Elhemali2022_ch7),
+[[^17], [^29]].
 [29](/en/ch7#Houlihan2017)].
 However, automatic shard management can also be unpredictable. Rebalancing is an expensive
 operation, because it requires rerouting requests and moving a large amount of data from one node to
 another. If it is not done carefully, this process can overload the network or the nodes, and it
 might harm the performance of other requests. The system must continue processing writes while the
 rebalancing is in progress; if a system is near its maximum write throughput, the shard-splitting
-process might not even be able to keep up with the rate of incoming writes
+process might not even be able to keep up with the rate of incoming writes [^29].
 [^29].
 Such automation can be dangerous in combination with automatic failure detection. For example, say
 one node is overloaded and is temporarily slow to respond to requests. The other nodes conclude that
@ -684,8 +666,7 @@ expensive. Even if you query the shards in parallel, it is prone to tail latency
 shards lets you store more data, but it doesn’t increase your query throughput if every shard has to
 process every query anyway.
-Nevertheless, local secondary indexes are widely used
+Nevertheless, local secondary indexes are widely used [^31]:
 [^31]:
 for example, MongoDB, Riak, Cassandra [^32],
 Elasticsearch [^33], SolrCloud,
 and VoltDB [^34]
@ -742,7 +723,7 @@ indexes, so reads from a global index may be stale (similarly to replication lag
 Nevertheless, global indexes are useful if read throughput is higher than write throughput, and if
 the postings lists are not too long.
-# Summary
+## Summary
 In this chapter we explored different ways of sharding a large dataset into smaller subsets.
 Sharding is necessary when you have so much data that storing and processing it on a single machine
@ -795,10 +776,10 @@ to multiple machines. However, operations that need to write to several shards c
 for example, what happens if the write to one shard succeeds, but another fails? We will address
 that question in the following chapters.
 ##### Footnotes
-##### References
+
 ### Summary
 [^1]: Claire Giordano. [Understanding partitioning and sharding in Postgres and Citus](https://www.citusdata.com/blog/2023/08/04/understanding-partitioning-and-sharding-in-postgres-and-citus/). *citusdata.com*, August 2023. Archived at [perma.cc/8BTK-8959](https://perma.cc/8BTK-8959) 
--- a/content/en/ch8.md
+++ b/content/en/ch8.md
@ -46,8 +46,7 @@ transactional guarantees or abandoning them entirely (for example, to achieve hi
 higher availability). Some safety properties can be achieved without transactions. On the other
 hand, transactions can prevent a lot of grief: for example, the technical cause behind the Post
 Office Horizon scandal (see [“How Important Is Reliability?”](/en/ch2#sidebar_reliability_importance)) was probably a lack of ACID
-transactions in the underlying accounting system
+transactions in the underlying accounting system [^1].
 [^1].
 How do you figure out whether you need transactions? In order to answer that question, we first need
 to understand exactly what safety guarantees transactions can provide, and what costs are associated
@ -68,9 +67,7 @@ the challenge of achieving atomicity in a distributed transaction.
 Almost all relational databases today, and some nonrelational databases, support transactions. Most
 of them follow the style that was introduced in 1975 by IBM System R, the first SQL database
-[[2](/en/ch8#Chamberlin1981),
+[[^2], [^3], [^4]].
 [3](/en/ch8#Gray1976),
 [4](/en/ch8#Eswaran1976)].
 Although some implementation details have changed, the general idea has remained virtually the same
 for 50 years: the transaction support in MySQL, PostgreSQL, Oracle, SQL Server, etc., is uncannily
 similar to that of System R.
@ -85,8 +82,7 @@ much weaker set of guarantees than had previously been understood.
 The hype around NoSQL distributed databases led to a popular belief that transactions were
 fundamentally unscalable, and that any large-scale system would have to abandon transactions in
 order to maintain good performance and high availability. More recently, that belief has turned out
-to be wrong. So-called “NewSQL” databases such as CockroachDB
+to be wrong. So-called “NewSQL” databases such as CockroachDB [^5],
 [^5],
 TiDB [^6],
 Spanner [^7],
 FoundationDB [^8],
@ -103,8 +99,7 @@ operation and in various extreme (but realistic) circumstances.
 The safety guarantees provided by transactions are often described by the well-known acronym *ACID*,
 which stands for *Atomicity*, *Consistency*, *Isolation*, and *Durability*. It was coined in 1983 by
-Theo Härder and Andreas Reuter
+Theo Härder and Andreas Reuter [^9]
 [^9]
 in an effort to establish precise terminology for fault-tolerance mechanisms in databases.
 However, in practice, one database’s implementation of ACID does not equal another’s implementation.
@ -213,15 +208,13 @@ each other: they cannot step on each other’s toes. The classic database textbo
 isolation as *serializability*, which means that each transaction can pretend that it is the only
 transaction running on the entire database. The database ensures that when the transactions have
 committed, the result is the same as if they had run *serially* (one after another), even though in
-reality they may have run concurrently
+reality they may have run concurrently [^13].
 [^13].
 However, serializability has a performance cost. In practice, many databases use forms of isolation
 that are weaker than serializability: that is, they allow concurrent transactions to interfere with
 each other in limited ways. Some popular databases, such as Oracle, don’t even implement it (Oracle
 has an isolation level called “serializable,” but it actually implements *snapshot isolation*, which
-is a weaker guarantee than serializability [[10](/en/ch8#Bailis2013HAT),
+is a weaker guarantee than serializability [[^10], [^14]]).
 [14](/en/ch8#Fekete2005)]).
 This means that some kinds of race conditions can still occur. We will explore snapshot isolation
 and other forms of isolation in [“Weak Isolation Levels”](/en/ch8#sec_transactions_isolation_levels).
@ -264,18 +257,18 @@ The truth is, nothing is perfect:
 guarantees they are supposed to provide: even `fsync` isn’t guaranteed to work correctly
 [^15].
 Disk firmware can have bugs, just like any other kind of software
-  [[16](/en/ch8#Denness2015),
+ [[^16],
-  [17](/en/ch8#Surak2015)],
+ [^17]],
 e.g. causing drives to fail after exactly 32,768 hours of operation
 [^18].
 And `fsync` is hard to use; even PostgreSQL used it incorrectly for over 20 years
-  [[19](/en/ch8#Ringer2018),
+ [[^19],
-  [20](/en/ch8#Rebello2020),
+ [^20],
-  [21](/en/ch8#Pillai2015)].
+ [^21]].
 * Subtle interactions between the storage engine and the filesystem implementation can lead to bugs
 that are hard to track down, and may cause files on disk to be corrupted after a crash
-  [[22](/en/ch8#Pillai2014),
+ [[^22],
-  [23](/en/ch8#Siebenmann2016)].
+ [^23]].
 Filesystem errors on one replica can sometimes spread to other replicas as well
 [^24].
 * Data on disk can gradually become corrupted without this being detected
@ -489,20 +482,15 @@ guarantees that transactions have the same effect as if they ran *serially* (i.e
 without any concurrency).
 In practice, isolation is unfortunately not that simple. Serializable isolation has a performance
-cost, and many databases don’t want to pay that price
+cost, and many databases don’t want to pay that price [^10]. It’s therefore common for systems to use
 [^10]. It’s therefore common for systems to use
 weaker levels of isolation, which protect against *some* concurrency issues, but not all. Those
 levels of isolation are much harder to understand, and they can lead to subtle bugs, but they are
-nevertheless used in practice
+nevertheless used in practice [^29].
 [^29].
 Concurrency bugs caused by weak transaction isolation are not just a theoretical problem. They have
 caused substantial loss of money
-[[30](/en/ch8#Warszawski2017),
+[[^30], [^31], [^32]],
-[31](/en/ch8#DAgosta2014),
+led to investigation by financial auditors [^33],
 [32](/en/ch8#bitcointhief2014)],
 led to investigation by financial auditors
 [^33],
 and caused customer data to be corrupted [^34].
 A popular comment on revelations of such problems is “Use an ACID database if you’re handling
 financial data!”—but that misses the point. Even many popular relational database systems (which
@ -517,8 +505,7 @@ bugs from occurring.
 Those examples also highlight an important point: even if concurrency issues are rare in normal
 operation, you have to consider the possibility that an attacker deliberately sends a burst of
-highly concurrent requests to your API in an attempt to deliberately exploit concurrency bugs
+highly concurrent requests to your API in an attempt to deliberately exploit concurrency bugs [^30]. Therefore, in order to build
 [^30]. Therefore, in order to build
 applications that are reliable and secure, you have to ensure that such bugs are systematically
 prevented.
@ -528,10 +515,7 @@ decide what level is appropriate to your application. Once we’ve done that, we
 serializability in detail (see [“Serializability”](/en/ch8#sec_transactions_serializability)). Our discussion of isolation
 levels will be informal, using examples. If you want rigorous definitions and analyses of their
 properties, you can find them in the academic literature
-[[36](/en/ch8#Berenson1995),
+[[^36], [^37], [^38], [^39]].
 [37](/en/ch8#Adya1999),
 [38](/en/ch8#Bailis2014virtues_ch8),
 [39](/en/ch8#Crooks2017)].
 ## Read Committed
@ -608,8 +592,7 @@ By preventing dirty writes, this isolation level avoids some kinds of concurrenc
 ### Implementing read committed
 Read committed is a very popular isolation level. It is the default setting in Oracle Database,
-PostgreSQL, SQL Server, and many other databases
+PostgreSQL, SQL Server, and many other databases [^10].
 [^10].
 Most commonly, databases prevent dirty writes by using row-level locks: when a transaction wants to
 modify a particular row (or document or some other object), it must first acquire a lock on that
@ -633,8 +616,7 @@ operability: a slowdown in one part of an application can have a knock-on effect
 different part of the application, due to waiting for locks.
 Nevertheless, locks are used to prevent dirty reads in some databases, such as IBM
-Db2 and Microsoft SQL Server in the `read_committed_snapshot=off` setting
+Db2 and Microsoft SQL Server in the `read_committed_snapshot=off` setting [^29].
 [^29].
 A more commonly used approach to preventing dirty reads is the one illustrated in
 [Figure 8-4](/en/ch8#fig_transactions_read_committed): for every
@ -708,9 +690,7 @@ database, frozen at a particular point in time, it is much easier to understand.
 Snapshot isolation is a popular feature: variants of it are supported by PostgreSQL, MySQL with the
 InnoDB storage engine, Oracle, SQL Server, and others, although the detailed behavior varies from
-one system to the next [[29](/en/ch8#Kleppmann2014),
+one system to the next [[^29], [^40], [^41]].
 [40](/en/ch8#Momjian2014),
 [41](/en/ch8#Alvaro2023)].
 Some databases, such as Oracle, TiDB, and Aurora DSQL, even choose snapshot isolation as their
 highest isolation level.
@ -733,9 +713,7 @@ maintains several versions of a row side by side, this technique is known as *mu
 concurrency control* (MVCC).
 [Figure 8-7](/en/ch8#fig_transactions_mvcc) illustrates how MVCC-based snapshot isolation is implemented in PostgreSQL
-[[40](/en/ch8#Momjian2014),
+[[^40], [^42], [^43]] (other implementations are similar).
 [42](/en/ch8#Rogov2023),
 [43](/en/ch8#Suzuki2017_ch8)] (other implementations are similar).
 When a transaction is started, it is given a unique, always-increasing transaction ID (`txid`).
 Whenever a transaction writes anything to the database, the data it writes is tagged with the
 transaction ID of the writer. (To be precise, transaction IDs in PostgreSQL are 32-bit integers, so
@ -754,8 +732,7 @@ At some later time, when it is certain that no transaction can any longer access
 garbage collection process in the database removes any rows marked for deletion and frees their
 space.
-An update is internally translated into a delete and a insert
+An update is internally translated into a delete and a insert [^44].
 [^44].
 For example, in [Figure 8-7](/en/ch8#fig_transactions_mvcc), transaction 13 deducts $100 from account 2, changing the
 balance from $500 to $400. The `accounts` table now actually contains two rows for account 2: a row
 with a balance of $500 which was marked as deleted by transaction 13, and a row with a balance of
@ -765,15 +742,13 @@ All of the versions of a row are stored within the same database heap (see
 [“Storing values within the index”](/en/ch4#sec_storage_index_heap)), regardless of whether the transactions that wrote them have committed
 or not. The versions of the same row form a linked list, going either from newest version to oldest
 version or the other way round, so that queries can internally iterate over all versions of a row
-[[45](/en/ch8#Pavlo2023),
+[[^45], [^46]].
 [46](/en/ch8#Wu2017)].
 ### Visibility rules for observing a consistent snapshot
 When a transaction reads from the database, transaction IDs are used to decide which row versions it
 can see and which are invisible. By carefully defining visibility rules, the database can present a
-consistent snapshot of the database to the application. This works roughly as follows
+consistent snapshot of the database to the application. This works roughly as follows [^43]:
 [^43]:
 1. At the start of each transaction, the database makes a list of all the other transactions that
 are in progress (not yet committed or aborted) at that time. Any writes that those
@ -815,7 +790,7 @@ value matches what the query is looking for. When garbage collection removes old
 are no longer visible to any transaction, the corresponding index entries can also be removed.
 Many implementation details affect the performance of multi-version concurrency control
-[[45](/en/ch8#Pavlo2023), [46](/en/ch8#Wu2017)].
+[[^45], [^46]].
 For example, PostgreSQL has optimizations for avoiding index updates if different versions of the
 same row can fit on the same page [^40].
 Some other databases avoid storing full copies of modified rows, and only store differences between
@ -845,22 +820,17 @@ snapshot isolation, in MySQL it means an implementation of MVCC with weaker cons
 snapshot isolation [^41].
 The reason for this naming confusion is that the SQL standard doesn’t have the concept of snapshot
-isolation, because the standard is based on System R’s 1975 definition of isolation levels
+isolation, because the standard is based on System R’s 1975 definition of isolation levels [^3] and snapshot isolation hadn’t yet been
 [^3] and snapshot isolation hadn’t yet been
 invented then. Instead, it defines repeatable read, which looks superficially similar to snapshot
 isolation. PostgreSQL calls its snapshot isolation level “repeatable read” because it meets the
 requirements of the standard, and so they can claim standards compliance.
 Unfortunately, the SQL standard’s definition of isolation levels is flawed—it is ambiguous,
-imprecise, and not as implementation-independent as a standard should be
+imprecise, and not as implementation-independent as a standard should be [^36]. Even though several databases
 [^36]. Even though several databases
 implement repeatable read, there are big differences in the guarantees they actually provide,
-despite being ostensibly standardized
+despite being ostensibly standardized [^29]. There has been a formal definition of
-[^29]. There has been a formal definition of
+repeatable read in the research literature [[^37], [^38]], but most implementations don’t satisfy that
-repeatable read in the research literature [[37](/en/ch8#Adya1999),
+formal definition. And to top it off, IBM Db2 uses “repeatable read” to refer to serializability [^10].
 [38](/en/ch8#Bailis2014virtues_ch8)], but most implementations don’t satisfy that
 formal definition. And to top it off, IBM Db2 uses “repeatable read” to refer to serializability
 [^10].
 As a result, nobody really knows what repeatable read means.
@ -888,8 +858,7 @@ pattern occurs in various different scenarios:
 * Two users editing a wiki page at the same time, where each user saves their changes by sending the
 entire page contents to the server, overwriting whatever is currently in the database
-Because this is such a common problem, a variety of solutions have been developed
+Because this is such a common problem, a variety of solutions have been developed [^48].
 [^48].
 ### Atomic write operations
@ -915,9 +884,7 @@ Another option is to simply force all atomic operations to be executed on a sing
 Unfortunately, object-relational mapping (ORM) frameworks make it easy to accidentally write code
 that performs unsafe read-modify-write cycles instead of using atomic operations provided by the
-database [[49](/en/ch8#Wiger2010),
+database [[^49], [^50], [^51]].
 [50](/en/ch8#Coglan2020),
 [51](/en/ch8#Bailis2015_ch8)].
 This can be a source of subtle bugs that are difficult to find by testing.
 ### Explicit locking
@ -973,10 +940,8 @@ An advantage of this approach is that databases can perform this check efficient
 with snapshot isolation. Indeed, PostgreSQL’s repeatable read, Oracle’s serializable, and SQL
 Server’s snapshot isolation levels automatically detect when a lost update has occurred and abort
 the offending transaction. However, MySQL/InnoDB’s repeatable read does not detect lost updates
-[[29](/en/ch8#Kleppmann2014),
+[[^29], [^41]].
-[41](/en/ch8#Alvaro2023)].
+Some authors [[^36], [^38]] argue that a database must prevent lost
 Some authors [[36](/en/ch8#Berenson1995),
 [38](/en/ch8#Bailis2014virtues_ch8)] argue that a database must prevent lost
 updates in order to qualify as providing snapshot isolation, so MySQL does not provide snapshot
 isolation under this definition.
@ -1058,8 +1023,7 @@ To begin, imagine this example: you are writing an application for doctors to ma
 shifts at a hospital. The hospital usually tries to have several doctors on call at any one time,
 but it absolutely must have at least one doctor on call. Doctors can give up their shifts (e.g., if
 they are sick themselves), provided that at least one colleague remains on call in that shift
-[[53](/en/ch8#Cahill2008),
+[[^53], [^54]].
 [54](/en/ch8#Ports2012)].
 Now imagine that Aaliyah and Bryce are the two on-call doctors for a particular shift. Both are
 feeling unwell, so they both decide to request leave. Unfortunately, they happen to click the button
@ -1220,8 +1184,7 @@ transaction, is called a *phantom* [^4].
 Snapshot isolation avoids phantoms in read-only queries, but in read-write transactions like the
 examples we discussed, phantoms can lead to particularly tricky cases of write skew. The SQL
 generated by ORMs is also prone to write skew
-[[50](/en/ch8#Coglan2020),
+[[^50], [^51]].
 [51](/en/ch8#Bailis2015_ch8)].
 ### Materializing conflicts
@ -1240,8 +1203,7 @@ isn’t used to store information about the booking—it’s purely a collection
 to prevent bookings on the same room and time range from being modified concurrently.
 This approach is called *materializing conflicts*, because it takes a phantom and turns it into a
-lock conflict on a concrete set of rows that exist in the database
+lock conflict on a concrete set of rows that exist in the database [^14]. Unfortunately, it can be hard and
 [^14]. Unfortunately, it can be hard and
 error-prone to figure out how to materialize conflicts, and it’s ugly to let a concurrency control
 mechanism leak into the application data model. For those reasons, materializing conflicts should be
 considered a last resort if no alternative is possible. A serializable isolation level is much
@ -1293,8 +1255,7 @@ sidestep the problem of detecting and preventing conflicts between transactions:
 isolation is by definition serializable.
 Even though this seems like an obvious idea, it was only in the 2000s that database designers
-decided that a single-threaded loop for executing transactions was feasible
+decided that a single-threaded loop for executing transactions was feasible [^57].
 [^57].
 If multi-threaded concurrency was considered essential for getting good performance during the
 previous 30 years, what changed to make single-threaded execution possible?
@ -1310,9 +1271,7 @@ Two developments caused this rethink:
 outside of the serial execution loop.
 The approach of executing transactions serially is implemented in VoltDB/H-Store, Redis, and Datomic,
-for example [[58](/en/ch8#Hugg2014streaming),
+for example [[^58], [^59], [^60]].
 [59](/en/ch8#Kallman2008),
 [60](/en/ch8#Hickey2012)].
 A system designed for single-threaded execution can sometimes perform better than a system that
 supports concurrency, because it can avoid the coordination overhead of locking. However, its
 throughput is limited to that of a single CPU core. In order to make the most of that single thread,
@ -1425,8 +1384,7 @@ Since cross-shard transactions have additional coordination overhead, they are v
 single-shard transactions. VoltDB reports a throughput of about 1,000 cross-shard writes per second,
 which is orders of magnitude below its single-shard throughput and cannot be increased by adding
 more machines [^61]. More recent research
-has explored ways of making multi-shard transactions more scalable
+has explored ways of making multi-shard transactions more scalable [^63].
 [^63].
 Whether transactions can be single-shard depends very much on the structure of the data used by the
 application. Simple key-value data can often be sharded very easily, but data with multiple
@ -1485,8 +1443,7 @@ it protects against all the race conditions discussed earlier, including lost up
 ### Implementation of two-phase locking
 2PL is used by the serializable isolation level in MySQL (InnoDB) and SQL Server, and the
-repeatable read isolation level in Db2
+repeatable read isolation level in Db2 [^29].
 [^29].
 The blocking of readers and writers is implemented by having a lock on each object in the
 database. The lock can either be in *shared mode* or in *exclusive mode* (also known as a
@ -1584,8 +1541,7 @@ becomes serializable.
 Unfortunately, predicate locks do not perform well: if there are many locks by active transactions,
 checking for matching locks becomes time-consuming. For that reason, most databases with 2PL
 actually implement *index-range locking* (also known as *next-key locking*), which is a simplified
-approximation of predicate locking [[54](/en/ch8#Ports2012),
+approximation of predicate locking [[^54], [^64]].
 [64](/en/ch8#Hellerstein2007_ch8)].
 It’s safe to simplify a predicate by making it match a greater set of objects. For example, if you
 have a predicate lock for bookings of room 123 between noon and 1 p.m., you can approximate it by
@ -1629,13 +1585,11 @@ serializable isolation and good performance fundamentally at odds with each othe
 It seems not: an algorithm called *serializable snapshot isolation* (SSI) provides full
 serializability with only a small performance penalty compared to snapshot isolation. SSI is
 comparatively new: it was first described in 2008
-[[53](/en/ch8#Cahill2008),
+[[^53], [^65]].
 [65](/en/ch8#Cahill2009)].
 Today SSI and similar algorithms are used in single-node databases (the serializable isolation level
 in PostgreSQL [^54], SQL Server’s In-Memory
-OLTP/Hekaton [^66], and HyPer
+OLTP/Hekaton [^66], and HyPer [^67]),
 [^67]),
 distributed databases (CockroachDB [^5] and
 FoundationDB [^8]), and embedded storage
 engines such as BadgerDB.
@ -1659,10 +1613,8 @@ transaction wants to commit, the database checks whether anything bad happened (
 isolation was violated); if so, the transaction is aborted and has to be retried. Only transactions
 that executed serializably are allowed to commit.
-Optimistic concurrency control is an old idea
+Optimistic concurrency control is an old idea [^68],
-[^68],
+and its advantages and disadvantages have been debated for a long time [^69].
 and its advantages and disadvantages have been debated for a long time
 [^69].
 It performs badly if there is high contention (many transactions trying to access the same objects),
 as this leads to a high proportion of transactions needing to abort. If the system is already close
 to its maximum throughput, the additional transaction load from retried transactions can make
@ -1781,8 +1733,7 @@ tracking is faster, but may lead to more transactions being aborted than strictl
 In some cases, it’s okay for a transaction to read information that was overwritten by another
 transaction: depending on what else happened, it’s sometimes possible to prove that the result of
 the execution is nevertheless serializable. PostgreSQL uses this theory to reduce the number of
-unnecessary aborts [[14](/en/ch8#Fekete2005),
+unnecessary aborts [[^14], [^54]].
 [54](/en/ch8#Ports2012)].
 Compared to two-phase locking, the big advantage of serializable snapshot isolation is that one
 transaction doesn’t need to block waiting for locks held by another transaction. Like under snapshot
@ -1798,8 +1749,7 @@ serializable isolation.
 Compared to non-serializable snapshot isolation, the need to check for serializability violations
 introduces some performance overheads. How significant these overheads are is a matter of debate:
-some believe that serializability checking is not worth it
+some believe that serializability checking is not worth it [^70],
 [^70],
 while others believe that the performance of serializability is now so good that there is no need to
 use the weaker snapshot isolation any more [^67].
@ -1815,8 +1765,7 @@ The last few sections have focused on concurrency control for isolation, the I i
 algorithms we have seen apply to both single-node and distributed databases: although there are
 challenges in making concurrency control algorithms scalable (for example, performing distributed
 serializability checking for SSI), the high-level ideas for distributed concurrency control are
-similar to single-node concurrency control
+similar to single-node concurrency control [^8].
 [^8].
 Consistency and durability also don’t change much when we move to distributed transactions. However,
 atomicity requires more care.
@ -1830,8 +1779,7 @@ successfully written to disk before the crash, the transaction is considered com
 writes from that transaction are rolled back.
 Thus, on a single node, transaction commitment crucially depends on the *order* in which data is
-durably written to disk: first the data, then the commit record
+durably written to disk: first the data, then the commit record [^22].
 [^22].
 The key deciding moment for whether the transaction commits or aborts is the moment at which the
 disk finishes writing the commit record: before that moment, it is still possible to abort (due to a
 crash), but after that moment, the transaction is committed (even if the database crashes). Thus, it
@ -1876,15 +1824,12 @@ problem.
 Two-phase commit is an algorithm for achieving atomic transaction commit across multiple nodes. It
 is a classic algorithm in distributed databases
-[[13](/en/ch8#Bernstein1987_ch8),
+[[^13], [^71], [^72]]. 2PC is used
 [71](/en/ch8#Lindsay1979_ch8),
 [72](/en/ch8#Mohan1986)]. 2PC is used
 internally in some databases and also made available to applications in the form of *XA transactions*
 [^73]
 (which are supported by the Java Transaction API, for example) or via WS-AtomicTransaction for SOAP
 web services
-[[74](/en/ch8#Neto2008),
+[[^74], [^75]].
 [75](/en/ch8#Johnson2004)].
 The basic flow of 2PC is illustrated in [Figure 8-13](/en/ch8#fig_transactions_two_phase_commit). Instead of a single
 commit request, as with a single-node transaction, the commit/abort process in 2PC is split into two
@ -1916,8 +1861,7 @@ This process is somewhat like the traditional marriage ceremony in Western cultu
 asks the bride and groom individually whether each wants to marry the other, and typically receives
 the answer “I do” from both. After receiving both acknowledgments, the minister pronounces the
 couple husband and wife: the transaction is committed, and the happy fact is broadcast to all
-attendees. If either bride or groom does not say “yes,” the ceremony is aborted
+attendees. If either bride or groom does not say “yes,” the ceremony is aborted [^76].
 [^76].
 ### A system of promises
@ -2014,8 +1958,7 @@ stuck waiting for the coordinator to recover. It is possible to make an atomic c
 is not so straightforward.
 As an alternative to 2PC, an algorithm called *three-phase commit* (3PC) has been proposed
-[[13](/en/ch8#Bernstein1987_ch8),
+[[^13], [^77]].
 [77](/en/ch8#Skeen1981)].
 However, 3PC assumes a network with bounded delay and nodes with bounded response times; in most
 practical systems with unbounded network delay and process pauses (see [Chapter 9](/en/ch9#ch_distributed)), it
 cannot guarantee atomicity.
@ -2028,10 +1971,7 @@ consensus protocol. We will see how to do this in [Chapter 10](/en/ch10#ch_cons
 Distributed transactions and two-phase commit have a mixed reputation. On the one hand, they are
 seen as providing an important safety guarantee that would be hard to achieve otherwise; on the
 other hand, they are criticized for causing operational problems, killing performance, and promising
-more than they can deliver [[78](/en/ch8#Hohpe2005),
+more than they can deliver [[^78], [^79], [^80], [^81]].
 [79](/en/ch8#Helland2007_ch8),
 [80](/en/ch8#Oliver2011),
 [81](/en/ch8#Rahien2014)].
 Many cloud services choose not to implement distributed transactions due to the operational
 problems they engender [^82].
@ -2149,8 +2089,7 @@ transaction is resolved.
 In theory, if the coordinator crashes and is restarted, it should cleanly recover its state from the
 log and resolve any in-doubt transactions. However, in practice, *orphaned* in-doubt transactions do
-occur [[83](/en/ch8#Dhariwal2008),
+occur [[^83], [^84]]—that is,
 [84](/en/ch8#Randal2013)]—that is,
 transactions for which the coordinator cannot decide the outcome for whatever reason (e.g., because
 the transaction log has been lost or corrupted due to a software bug). These transactions cannot be
 resolved automatically, so they sit forever in the database, holding locks and blocking other
@ -2215,8 +2154,7 @@ CockroachDB [^5],
 TiDB [^6],
 Spanner [^7],
 FoundationDB [^8], and YugabyteDB, for
-example. Some message brokers such as Kafka also support internal distributed transactions
+example. Some message brokers such as Kafka also support internal distributed transactions [^85].
 [^85].
 Many of these systems use 2-phase commit to ensure atomicity of transactions that write to multiple
 shards, and yet they don’t suffer the same problems as XA transactions. The reason is that because
@ -2292,7 +2230,7 @@ of patterns such as these: for example, they would allow the message IDs to be s
 and the main data updated by the message processing to be stored on other shards, and to ensure
 atomicity of the transaction commit across those shards.
-# Summary
+## Summary
 Transactions are an abstraction layer that allows an application to pretend that certain concurrency
 problems and certain kinds of hardware and software faults don’t exist. A large class of errors is
@ -2385,10 +2323,11 @@ The examples in this chapter used a relational data model. However, as discussed
 [“The need for multi-object transactions”](/en/ch8#sec_transactions_need), transactions are a valuable database feature, no matter which data model
 is used.
 ##### Footnotes
-##### References
+
 ### Summary
--- a/content/en/ch9.md
+++ b/content/en/ch9.md
@ -22,8 +22,7 @@ anything that *can* go wrong *will* go wrong.
 Moreover, working with distributed systems is fundamentally different from writing software on a
 single computer—and the main difference is that there are lots of new and exciting ways for things
-to go wrong [[1](/en/ch9#Cavage2013),
+to go wrong [[^1], [^2]].
 [2](/en/ch9#Kreps2012_ch9)].
 In this chapter, you will get a taste of the problems that arise in practice, and an understanding
 of the things you can and cannot rely on.
@ -157,8 +156,7 @@ algorithm decides that it has capacity to send a packet, it takes the next packe
 that buffer and passes it to the network interface. The packet passes through several switches and
 routers, and eventually the receiving node’s operating system places the packet’s data in a receive
 buffer and sends an acknowledgment packet back to the sender. Only then does the receiving operating
-system notify the application that some more data has arrived
+system notify the application that some more data has arrived [^6].
 [^6].
 So, if TCP provides “reliability”, does that mean we no longer need to worry about networks being
 unreliable? Unfortunately not. It decides that a packet must have been lost if no acknowledgment
@ -173,8 +171,7 @@ actually processed by the remote node [^6].
 Even if TCP acknowledged that a packet was delivered, this only means that the operating system
 kernel on the remote node received it, but the application may have crashed before it handled that
 data. If you want to be sure that a request was successful, you need a positive response from the
-application itself
+application itself [^7].
 [^7].
 Nevertheless, TCP is very useful, because it provides a convenient way of sending and receiving
 messages that are too big to fit in one packet. Once a TCP connection is established, you can also
@ -187,47 +184,32 @@ many RPC protocols (see [“Dataflow Through Services: REST and RPC”](/en/ch5#
 We have been building computer networks for decades—one might hope that by now we would have figured
 out how to make them reliable. Unfortunately, we have not yet succeeded. There are some systematic
 studies, and plenty of anecdotal evidence, showing that network problems can be surprisingly common,
-even in controlled environments like a datacenter operated by one company
+even in controlled environments like a datacenter operated by one company [^8]:
 [^8]:
 * One study in a medium-sized datacenter found about 12 network faults per month, of which half
-  disconnected a single machine, and half disconnected an entire rack
+ disconnected a single machine, and half disconnected an entire rack [^9].
  [^9].
 * Another study measured the failure rates of components like top-of-rack switches, aggregation
-  switches, and load balancers
+ switches, and load balancers [^10].
  [^10].
 It found that adding redundant networking gear doesn’t reduce faults as much as you might hope,
 since it doesn’t guard against human error (e.g., misconfigured switches), which is a major cause
 of outages.
-* Interruptions of wide-area fiber links have been blamed on cows
+* Interruptions of wide-area fiber links have been blamed on cows [^11], beavers [^12], and sharks [^13]
-  [^11],
+ (though shark bites have become rarer due to better shielding of submarine cables [^14]).
-  beavers [^12],
+ Humans are also at fault, be it due to accidental misconfiguration [^15], scavenging [^16], or sabotage [^17].
  and sharks [^13]
  (though shark bites have become rarer due to better shielding of submarine cables
  [^14]).
  Humans are also at fault, be it due to accidental misconfiguration
  [^15],
  scavenging [^16],
  or sabotage
  [^17].
 * Across different cloud regions, round-trip times of up to several *minutes* have been observed at
-  high percentiles [[18](/en/ch9#Liu2016), Table 3].
+ high percentiles [[^18], Table 3].
 Even within a single datacenter, packet delay of more than a minute can occur during a network
 topology reconfiguration, triggered by a problem during a software upgrade for a switch
 [^19].
 Thus, we have to assume that messages might be delayed arbitrarily.
 * Sometimes communications are partially interrupted, depending on who you’re talking to: for
-  example, A and B can communicate, B and C can communicate, but A and C cannot
+ example, A and B can communicate, B and C can communicate, but A and C cannot [^20] [^21].
  [[20](/en/ch9#Lianza2020_ch9),
  [21](/en/ch9#Alfatafta2020)].
 Other surprising faults include a network interface that sometimes drops all inbound packets but
 sends outbound packets successfully [^22]:
 just because a network link works in one direction doesn’t guarantee it’s also working in the
 opposite direction.
 * Even a brief network interruption can have repercussions that last for much longer than the
-  original issue [[8](/en/ch9#Bailis2014reliable),
+ original issue [^8] [^20] [^23].
  [20](/en/ch9#Lianza2020_ch9),
  [23](/en/ch9#Toman2020)].
 # Network partitions
@ -243,8 +225,7 @@ may fail—there is no way around it.
 If the error handling of network faults is not defined and tested, arbitrarily bad things could
 happen: for example, the cluster could become deadlocked and permanently unable to serve requests,
 even when the network recovers [^24],
-or it could even delete all of your data
+or it could even delete all of your data [^25].
 [^25].
 If software is put in an unanticipated situation, it may do arbitrary unexpected things.
 Handling network faults doesn’t necessarily mean *tolerating* them: if your network is normally
@ -302,7 +283,7 @@ Prematurely declaring a node dead is problematic: if the node is actually alive
 performing some action (for example, sending an email), and another node takes over, the action may
 end up being performed twice. We will discuss this issue in more detail in
 [“Knowledge, Truth, and Lies”](/en/ch9#sec_distributed_truth), and in
-Chapters [10](/en/ch10#ch_consistency)
+Chapters [^10]
 and [Link to Come].
 When a node is declared dead, its responsibilities need to be transferred to other nodes, which
@ -331,8 +312,7 @@ times to throw the system off-balance.
 ### Network congestion and queueing
 When driving a car, travel times on road networks often vary most due to traffic congestion.
-Similarly, the variability of packet delays on computer networks is most often due to queueing
+Similarly, the variability of packet delays on computer networks is most often due to queueing [^27]:
 [^27]:
 * If several different nodes simultaneously try to send packets to the same destination, the network
 switch must queue them up and feed them into the destination network link one by one (as illustrated
@ -384,8 +364,7 @@ network links and switches, and even each machine’s network interface and CPUs
 virtual machines), are shared. Processing large amounts of data can use the entire capacity of
 network links (*saturate* them). As you have no control over or insight into other customers’ usage of the shared
 resources, network delays can be highly variable if someone near you (a *noisy neighbor*) is
-using a lot of resources [[30](/en/ch9#Philips2014),
+using a lot of resources [[^30], [^31]].
 [31](/en/ch9#Newman2012)].
 In such environments, you can only choose timeouts experimentally: measure the distribution of
 network round-trip times over an extended period, and over many machines, to determine the expected
@ -394,12 +373,9 @@ determine an appropriate trade-off between failure detection delay and risk of p
 Even better, rather than using configured constant timeouts, systems can continually measure
 response times and their variability (*jitter*), and automatically adjust timeouts according to the
-observed response time distribution. The Phi Accrual failure detector
+observed response time distribution. The Phi Accrual failure detector [^32],
-[^32],
+which is used for example in Akka and Cassandra [^33]
-which is used for example in Akka and Cassandra
+is one way of doing this. TCP retransmission timeouts also work similarly [^5].
 [^33]
 is one way of doing this. TCP retransmission timeouts also work similarly
 [^5].
 ## Synchronous Versus Asynchronous Networks
@ -415,13 +391,11 @@ similar reliability and predictability in computer networks?
 When you make a call over the telephone network, it establishes a *circuit*: a fixed, guaranteed
 amount of bandwidth is allocated for the call, along the entire route between the two callers. This
-circuit remains in place until the call ends
+circuit remains in place until the call ends [^34].
 [^34].
 For example, an ISDN network runs at a fixed rate of 4,000 frames per second. When a call is
 established, it is allocated 16 bits of space within each frame (in each direction). Thus, for the
 duration of the call, each side is guaranteed to be able to send exactly 16 bits of audio data every
-250 microseconds
+250 microseconds [^35].
 [^35].
 This kind of network is *synchronous*: even as data passes through several routers, it does not
 suffer from queueing, because the 16 bits of space for the call have already been reserved in the
@ -457,15 +431,12 @@ the rate of data transfer to the available network capacity.
 There have been some attempts to build hybrid networks that support both circuit switching and
 packet switching. *Asynchronous Transfer Mode* (ATM) was a competitor to Ethernet in the 1980s, but
-it didn’t gain much adoption outside of telephone network core switches. InfiniBand has some similarities
+it didn’t gain much adoption outside of telephone network core switches. InfiniBand has some similarities [^36]:
 [^36]:
 it implements end-to-end flow control at the link layer, which reduces the need for queueing in the
-network, although it can still suffer from delays due to link congestion
+network, although it can still suffer from delays due to link congestion [^37].
 [^37].
 With careful use of *quality of service* (QoS, prioritization and scheduling of packets) and *admission
 control* (rate-limiting senders), it is possible to emulate circuit switching on packet networks, or
-provide statistically bounded delay [[27](/en/ch9#Grosvenor2015),
+provide statistically bounded delay [^27] [^34]. New network algorithms like Low Latency, Low
 [34](/en/ch9#Keshav1997)]. New network algorithms like Low Latency, Low
 Loss, and Scalable Throughput (L4S) attempt to mitigate some of the queuing and congestion control
 problems both at the client and router level. Linux’s traffic controller (TC) also allows
 applications to reprioritize packets for QoS purposes.
@ -489,8 +460,7 @@ fixed cost, so if you utilize it better, each byte you send over the wire is che
 A similar situation arises with CPUs: if you share each CPU core dynamically between several
 threads, one thread sometimes has to wait in the operating system’s run queue while another thread
-is running, so a thread can be paused for varying lengths of time
+is running, so a thread can be paused for varying lengths of time [^38].
 [^38].
 However, this utilizes the hardware better than if you allocated a static number of CPU cycles to
 each thread (see [“Response time guarantees”](/en/ch9#sec_distributed_clocks_realtime)). Better hardware utilization is also why cloud
 platforms run several virtual machines from different customers on the same physical machine.
@ -544,8 +514,7 @@ Moreover, each machine on the network has its own clock, which is an actual hard
 a quartz crystal oscillator. These devices are not perfectly accurate, so each machine has its own
 notion of time, which may be slightly faster or slower than on other machines. It is possible to
 synchronize clocks to some degree: the most commonly used mechanism is the Network Time Protocol (NTP), which
-allows the computer clock to be adjusted according to the time reported by a group of servers
+allows the computer clock to be adjusted according to the time reported by a group of servers [^39].
 [^39].
 The servers in turn get their time from a more accurate time source, such as a GPS receiver.
 ## Monotonic Versus Time-of-Day Clocks
@ -570,14 +539,12 @@ Time-of-day clocks are usually synchronized with NTP, which means that a timesta
 various oddities, as described in the next section. In particular, if the local clock is too far
 ahead of the NTP server, it may be forcibly reset and appear to jump back to a previous point in
 time. These jumps, as well as similar jumps caused by leap seconds, make time-of-day clocks
-unsuitable for measuring elapsed time
+unsuitable for measuring elapsed time [^40].
 [^40].
 Time-of-day clocks can experience jumps due to the start and end of Daylight Saving Time (DST);
 these can be avoided by always using UTC as time zone, which does not have DST.
 Time-of-day clocks have also historically had quite a coarse-grained resolution, e.g., moving forward
-in steps of 10 ms on older Windows systems
+in steps of 10 ms on older Windows systems [^41].
 [^41].
 On recent systems, this is less of a problem.
 ### Monotonic clocks
@ -596,12 +563,10 @@ booted up, or something similarly arbitrary. In particular, it makes no sense to
 clock values from two different computers, because they don’t mean the same thing.
 On a server with multiple CPU sockets, there may be a separate timer per CPU, which is not
-necessarily synchronized with other CPUs
+necessarily synchronized with other CPUs [^43].
 [^43].
 Operating systems compensate for any discrepancy and try
 to present a monotonic view of the clock to application threads, even as they are scheduled across
-different CPUs. However, it is wise to take this guarantee of monotonicity with a pinch of salt
+different CPUs. However, it is wise to take this guarantee of monotonicity with a pinch of salt [^44].
 [^44].
 NTP may adjust the frequency at which the monotonic clock moves forward (this is known as *slewing*
 the clock) if it detects that the computer’s local quartz is moving faster or slower than the NTP
@ -642,24 +607,17 @@ hope—hardware clocks and NTP can be fickle beasts. To give just a few examples
 though occasional spikes in network delay lead to errors of around a second. Depending on the
 configuration, large network delays can cause the NTP client to give up entirely.
 * Some NTP servers are wrong or misconfigured, reporting time that is off by hours
-  [[47](/en/ch9#Minar1999),
+ [^47] [^48].
  [48](/en/ch9#Holub2014)].
 NTP clients mitigate such errors by querying several servers and ignoring outliers.
 Nevertheless, it’s somewhat worrying to bet the correctness of your systems on the time that you
 were told by a stranger on the internet.
 * Leap seconds result in a minute that is 59 seconds or 61 seconds long, which messes up timing
-  assumptions in systems that are not designed with leap seconds in mind
+ assumptions in systems that are not designed with leap seconds in mind [^49].
-  [^49].
+ The fact that leap seconds have crashed many large systems [^40] [^50]
  The fact that leap seconds have crashed many large systems
  [[40](/en/ch9#GrahamCumming2017),
  [50](/en/ch9#Minar2012_ch9)]
 shows how easy it is for incorrect assumptions about clocks to sneak into a system. The best
 way of handling leap seconds may be to make NTP servers “lie,” by performing the leap second
-  adjustment gradually over the course of a day (this is known as *smearing*)
+ adjustment gradually over the course of a day (this is known as *smearing*) [^51] [^52],
-  [[51](/en/ch9#Pascoe2011),
+ although actual NTP server behavior varies in practice [^53].
  [52](/en/ch9#Zhao2015)],
  although actual NTP server behavior varies in practice
  [^53].
 Leap seconds will no longer be used from 2035 onwards, so this problem will fortunately go away.
 * In virtual machines, the hardware clock is virtualized, which raises additional challenges for
 applications that need accurate timekeeping
@ -668,31 +626,24 @@ hope—hardware clocks and NTP can be fickle beasts. To give just a few examples
 while another VM is running. From an application’s point of view, this pause manifests itself as
 the clock suddenly jumping forward [^29].
 If a VM pauses for several seconds, the clock may then be several seconds behind the actual time,
-  but NTP may continue to report that the clock is almost perfectly in sync
+ but NTP may continue to report that the clock is almost perfectly in sync [^55].
  [^55].
 * If you run software on devices that you don’t fully control (e.g., mobile or embedded devices), you
 probably cannot trust the device’s hardware clock at all. Some users deliberately set their
-  hardware clock to an incorrect date and time, for example to cheat in games
+ hardware clock to an incorrect date and time, for example to cheat in games [^56].
  [^56].
 As a result, the clock might be set to a time wildly in the past or the future.
 It is possible to achieve very good clock accuracy if you care about it sufficiently to invest
 significant resources. For example, the MiFID II European regulation for financial
 institutions requires all high-frequency trading funds to synchronize their clocks to within 100
 microseconds of UTC, in order to help debug market anomalies such as “flash crashes” and to help
-detect market manipulation
+detect market manipulation [^57].
 [^57].
 Such accuracy can be achieved with some special hardware (GPS receivers and/or atomic clocks), the
-Precision Time Protocol (PTP) and careful deployment and monitoring
+Precision Time Protocol (PTP) and careful deployment and monitoring [^58] [^59].
 [[58](/en/ch9#Bigum2015),
 [59](/en/ch9#Obleukhov2022)].
 Relying on GPS alone can be risky because GPS signals can easily be jammed. In some locations this
-happens frequently, e.g. close to military facilities
+happens frequently, e.g. close to military facilities [^60].
 [^60].
 Some cloud providers have begun offering high-accuracy clock synchronization for their virtual
-machines
+machines [^61].
 [^61].
 However, clock synchronization still requires a lot of care. If your NTP daemon is misconfigured, or
 a firewall is blocking NTP traffic, the clock error due to drift can quickly become large.
@ -714,8 +665,7 @@ fixed. On the other hand, if its quartz clock is defective or its NTP client is
 things will seem to work fine, even though its clock gradually drifts further and further away from
 reality. If some piece of software is relying on an accurately synchronized clock, the result is
 more likely to be silent and subtle data loss than a dramatic crash
-[[62](/en/ch9#Kingsbury2013cassandra),
+[[^62], [^63]].
 [63](/en/ch9#Daily2013_ch9)].
 Thus, if you use software that requires synchronized clocks, it is essential that you also carefully
 monitor the clock offsets between all the machines. Any node whose clock drifts too far from the
@ -725,8 +675,7 @@ the broken clocks before they can cause too much damage.
 ### Timestamps for ordering events
 Let’s consider one particular situation in which it is tempting, but dangerous, to rely on clocks:
-ordering of events across multiple nodes
+ordering of events across multiple nodes [^64].
 [^64].
 For example, if two clients write to a distributed database, who got there first? Which write is the
 more recent one?
@ -766,8 +715,8 @@ serious problems:
 * Database writes can mysteriously disappear: a node with a lagging clock is unable to overwrite
 values previously written by a node with a fast clock until the clock skew between the nodes has
-  elapsed [[63](/en/ch9#Daily2013_ch9),
+ elapsed [[^63],
-  [65](/en/ch9#Kingsbury2013timestamps)].
+ [^65]].
 This scenario can cause arbitrary amounts of data to be silently dropped without any error being
 reported to the application.
 * LWW cannot distinguish between writes that occurred sequentially in quick succession (in
@ -830,8 +779,7 @@ Unfortunately, most systems don’t expose this uncertainty: for example, when y
 `clock_gettime()`, the return value doesn’t tell you the expected error of the timestamp, so you
 don’t know if its confidence interval is five milliseconds or five years.
-There are exceptions: the *TrueTime* API in Google’s Spanner
+There are exceptions: the *TrueTime* API in Google’s Spanner [^45] and Amazon’s ClockBound explicitly report the
 [^45] and Amazon’s ClockBound explicitly report the
 confidence interval on the local clock. When you ask it for the current time, you get back two
 values: `[earliest, latest]`, which are the *earliest possible* and the *latest possible*
 timestamp. Based on its uncertainty calculations, the clock knows that the actual current time is
@ -864,8 +812,7 @@ the synchronization good enough, they would have the right properties: later tra
 higher timestamp. The problem, of course, is the uncertainty about clock accuracy.
 Spanner implements snapshot isolation across datacenters in this way
-[[68](/en/ch9#Demirbas2013),
+[[^68], [^69]].
 [69](/en/ch9#Malkhi2013)].
 It uses the clock’s confidence interval as reported by the TrueTime API, and is based on the
 following observation: if you have two confidence intervals, each consisting of an earliest and
 latest possible timestamp (*A* = [*Aearliest*, *Alatest*] and
@ -884,10 +831,7 @@ receiver or atomic clock in each datacenter, allowing clocks to be synchronized
 The atomic clocks and GPS receivers are not strictly necessary in Spanner: the important thing is to
 have a confidence interval, and the accurate clock sources only help keep that interval small. Other
 systems are beginning to adopt similar approaches: for example, YugabyteDB can leverage ClockBound
-when running on AWS [^70],
+when running on AWS [^70], and several other systems now also rely on clock synchronization to various degrees [^71] [^72].
 and several other systems now also rely on clock synchronization to various degrees
 [[71](/en/ch9#Kimball2022),
 [72](/en/ch9#Demirbas2025)].
 ## Process Pauses
@ -905,7 +849,7 @@ lease, so another node can take over when it expires.
 You can imagine the request-handling loop looking something like this:
-```
+```js
 while (true) {
 request = getIncomingRequest();
@ -1048,8 +992,7 @@ operating in a non-real-time environment.
 ### Limiting the impact of garbage collection
-Garbage collection used to be one of the biggest reasons for process pauses
+Garbage collection used to be one of the biggest reasons for process pauses [^79],
 [^79],
 but fortunately GC algorithms have improved a lot: a properly tuned collector will now usually pause
 for no more than a few milliseconds. The Java runtime offers collectors such as concurrent mark
 sweep (CMS), garbage-first (G1), the Z garbage collector (ZGC), Epsilon, and Shenandoah. Each of
@ -1068,13 +1011,11 @@ handle requests from clients while one node is collecting its garbage. If the ru
 application that a node soon requires a GC pause, the application can stop sending new requests to
 that node, wait for it to finish processing outstanding requests, and then perform the GC while no
 requests are in progress. This trick hides GC pauses from clients and reduces the high percentiles
-of the response time [[80](/en/ch9#Terei2015),
+of the response time [[^80], [^81]].
 [81](/en/ch9#Maas2015)].
 A variant of this idea is to use the garbage collector only for short-lived objects (which are fast
 to collect) and to restart processes periodically, before they accumulate enough long-lived objects
-to require a full GC of long-lived objects [[79](/en/ch9#Thompson2013),
+to require a full GC of long-lived objects [[^79], [^82]].
 [82](/en/ch9#Fowler2011_ch9)].
 One node can be restarted at a time, and traffic can be shifted away from the node before the
 planned restart, like in a rolling upgrade (see [Chapter 5](/en/ch5#ch_encoding)).
@ -1116,8 +1057,7 @@ assumptions.
 ## The Majority Rules
 Imagine a network with an asymmetric fault: a node is able to receive all messages sent to it, but
-any outgoing messages from that node are dropped or delayed
+any outgoing messages from that node are dropped or delayed [^22]. Even though that node is working
 [^22]. Even though that node is working
 perfectly well, and is receiving requests from other nodes, the other nodes cannot hear its
 responses. After some timeout, the other nodes declare it dead, because they haven’t heard from the
 node. The situation unfolds like a nightmare: the semi-disconnected node is dragged to the
@ -1158,8 +1098,7 @@ the use of quorums in more detail when we get to *consensus algorithms* in [Chap
 ## Distributed Locks and Leases
-Locks and leases in distributed application are prone to be misused, and a common source of bugs
+Locks and leases in distributed application are prone to be misused, and a common source of bugs [^84].
 [^84].
 Let’s look at one particular case of how they can go wrong.
 In [“Process Pauses”](/en/ch9#sec_distributed_clocks_pauses) we saw that a lease is a kind of lock that times out and can be
@ -1181,8 +1120,7 @@ could be lost or corrupted data, which is much more serious.
 For example, [Figure 9-4](/en/ch9#fig_distributed_lease_pause) shows a data corruption bug due to an incorrect
 implementation of locking. (The bug is not theoretical: HBase used to have this problem
-[[85](/en/ch9#Junqueira2013_ch9),
+[[^85], [^86]].)
 [86](/en/ch9#Soztutar2013hdfs)].)
 Say you want to ensure that a file in a storage service can only be
 accessed by one client at a time, because if multiple clients tried to write to it, the file would
 become corrupted. You try to implement this by requiring a client to obtain a lease from a lock
@ -1220,12 +1158,10 @@ split brain. This is called *fencing off* the zombie.
 Some systems attempt to fence off zombies by shutting them down, for example by disconnecting them
 from the network [^9], shutting down the VM via
-the cloud provider’s management interface, or even physically powering down the machine
+the cloud provider’s management interface, or even physically powering down the machine [^87].
 [^87].
 This approach is known as *Shoot The Other Node In The Head* or STONITH. Unfortunately, it suffers
 from some problems: it does not protect against large network delays like in
-[Figure 9-5](/en/ch9#fig_distributed_lease_delay); it can happen that all of the nodes shut each other down
+[Figure 9-5](/en/ch9#fig_distributed_lease_delay); it can happen that all of the nodes shut each other down [^19]; and by the time the zombie has been
 [^19]; and by the time the zombie has been
 detected and shut down, it may already be too late and data may already have been corrupted.
 A more robust fencing solution, which protects against both zombies and delayed requests, is
@ -1257,10 +1193,8 @@ write has completed, any zombies are fenced off.
 If ZooKeeper is your lock service, you can use the transaction ID `zxid` or the node version
 `cversion` as fencing token [^85].
-With etcd, the revision number along with the lease ID serves a similar purpose
+With etcd, the revision number along with the lease ID serves a similar purpose [^89].
-[^89].
+The FencedLock API in Hazelcast explicitly generates a fencing token [^90].
 The FencedLock API in Hazelcast explicitly generates a fencing token
 [^90].
 This mechanism requires that the storage service has some way of checking whether a write is based
 on an outdated token. Alternatively, it’s sufficient for the service to support a write that
@ -1273,10 +1207,8 @@ services support such a check: Amazon S3 calls it *conditional writes*, Azure Bl
 If your clients need to write only to one storage service that supports such conditional writes, the
 lock service is somewhat redundant
-[[91](/en/ch9#Kleppmann2016),
+[[^91], [^92]],
-[92](/en/ch9#Sanfilippo2016)],
+since the lease assignment could have been implemented directly based on that storage service [^93].
 since the lease assignment could have been implemented directly based on that storage service
 [^93].
 However, once you have a fencing token you can also use it with multiple services or replicas, and
 ensure that the old leaseholder is fenced off on all of those services.
@ -1344,8 +1276,7 @@ prone to intrigue and conspiracy than those elsewhere. Rather, the name is deriv
 in the sense of *excessively complicated, bureaucratic, devious*, which was used in politics long
 before computers [^96].
 Lamport wanted to choose a nationality that would not offend any readers, and he was advised that
-calling it *The Albanian Generals Problem* was not such a good idea
+calling it *The Albanian Generals Problem* was not such a good idea [^97].
 [^97].
 A system is *Byzantine fault-tolerant* if it continues to operate correctly even if some of the
 nodes are malfunctioning and not obeying the protocol, or if malicious attackers are interfering
@ -1355,8 +1286,8 @@ with the network. This concern is relevant in certain specific circumstances. Fo
 by radiation, leading it to respond to other nodes in arbitrarily unpredictable ways. Since a
 system failure would be very expensive (e.g., an aircraft crashing and killing everyone on board,
 or a rocket colliding with the International Space Station), flight control systems must tolerate
-  Byzantine faults [[98](/en/ch9#Rushby2001),
+ Byzantine faults [[^98],
-  [99](/en/ch9#Edge2013)].
+ [^99]].
 * In a system with multiple participating parties, some participants may attempt to cheat or
 defraud others. In such circumstances, it is not safe for a node to simply trust another node’s
 messages, since they may be sent with malicious intent. For example, cryptocurrencies like
@ -1367,14 +1298,11 @@ with the network. This concern is relevant in certain specific circumstances. Fo
 However, in the kinds of systems we discuss in this book, we can usually safely assume that there
 are no Byzantine faults. In a datacenter, all the nodes are controlled by your organization (so
 they can hopefully be trusted) and radiation levels are low enough that memory corruption is not a
-major problem (although datacenters in orbit are being considered
+major problem (although datacenters in orbit are being considered [^101]).
 [^101]).
 Multitenant systems have mutually untrusting tenants, but they are isolated from each
 other using firewalls, virtualization, and access control policies, not using Byzantine fault
-tolerance. Protocols for making systems Byzantine fault-tolerant are quite expensive
+tolerance. Protocols for making systems Byzantine fault-tolerant are quite expensive [^102],
-[^102],
+and fault-tolerant embedded systems rely on support from the hardware level [^98]. In most server-side data systems, the
 and fault-tolerant embedded systems rely on support from the hardware level
 [^98]. In most server-side data systems, the
 cost of deploying Byzantine fault-tolerant solutions makes them impracticable.
 Web applications do need to expect arbitrary and malicious behavior of clients that are under
@ -1383,8 +1311,7 @@ escaping are so important: to prevent SQL injection and cross-site scripting, fo
 we typically don’t use Byzantine fault-tolerant protocols here, but simply make the server the
 authority on deciding what client behavior is and isn’t allowed. In peer-to-peer networks, where
 there is no such central authority, Byzantine fault tolerance is more relevant
-[[103](/en/ch9#Kleppmann2020),
+[[^103], [^104]].
 [104](/en/ch9#Kleppmann2022)].
 A bug in the software could be regarded as a Byzantine fault, but if you deploy the same software to
 all nodes, then a Byzantine fault-tolerant algorithm cannot save you. Most Byzantine fault-tolerant
@ -1409,9 +1336,9 @@ pragmatic steps toward better reliability. For example:
 * Network packets do sometimes get corrupted due to hardware issues or bugs in operating systems,
 drivers, routers, etc. Usually, corrupted packets are caught by the checksums built into TCP and
-  UDP, but sometimes they evade detection [[105](/en/ch9#Gilman2015),
+ UDP, but sometimes they evade detection [[^105],
-  [106](/en/ch9#Stone2000),
+ [^106],
-  [107](/en/ch9#Jones2015)].
+ [^107]].
 Simple measures are usually sufficient protection against such corruption, such as checksums in
 the application-level protocol. TLS-encrypted connections also offer protection against
 corruption.
@ -1543,8 +1470,7 @@ liveness property [^115].)
 Safety is often informally defined as *nothing bad happens*, and liveness as *something good
 eventually happens*. However, it’s best to not read too much into those informal definitions,
 because “good” and “bad” are value judgements that don’t apply well to algorithms. The actual
-definitions of safety and liveness are more precise
+definitions of safety and liveness are more precise [^116]:
 [^116]:
 * If a safety property is violated, we can point at a particular point in time at which it was
 broken (for example, if the uniqueness property was violated, we can identify the particular
@ -1556,8 +1482,7 @@ definitions of safety and liveness are more precise
 An advantage of distinguishing between safety and liveness properties is that it helps us deal with
 difficult system models. For distributed algorithms, it is common to require that safety properties
-*always* hold, in all possible situations of a system model
+*always* hold, in all possible situations of a system model [^108]. That is, even if all nodes crash, or
 [^108]. That is, even if all nodes crash, or
 the entire network fails, the algorithm must nevertheless ensure that it does not return a wrong
 result (i.e., that the safety properties remain satisfied).
@ -1576,11 +1501,9 @@ abstraction of reality.
 For example, algorithms in the crash-recovery model generally assume that data in stable storage
 survives crashes. However, what happens if the data on disk is corrupted, or the data is wiped out
-due to hardware error or misconfiguration
+due to hardware error or misconfiguration [^117]?
 [^117]?
 What happens if a server has a firmware bug and fails to recognize
-its hard drives on reboot, even though the drives are correctly attached to the server
+its hard drives on reboot, even though the drives are correctly attached to the server [^118]?
 [^118]?
 Quorum algorithms (see [“Quorums for reading and writing”](/en/ch6#sec_replication_quorum_condition)) rely on a node remembering the data
 that it claims to have stored. If a node may suffer from amnesia and forget previously stored data,
@ -1592,8 +1515,7 @@ The theoretical description of an algorithm can declare that certain things are
 to happen—and in non-Byzantine systems, we do have to make some assumptions about faults that can
 and cannot happen. However, a real implementation may still have to include code to handle the
 case where something happens that was assumed to be impossible, even if that handling boils down to
-`printf("Sucks to be you")` and `exit(666)`—i.e., letting a human operator clean up the mess
+`printf("Sucks to be you")` and `exit(666)`—i.e., letting a human operator clean up the mess [^119].
 [^119].
 (This is one difference between computer science and software engineering.)
 That is not to say that theoretical, abstract system models are worthless—quite the opposite.
@ -1620,8 +1542,7 @@ It is prudent to combine theoretical analysis with empirical testing to verify t
 behave as expected. Techniques such as property-based testing, fuzzing, and deterministic simulation
 testing (DST) use randomization to test a system in a wide range of situations. Companies such as
 Amazon Web Services have successfully used a combination of these techniques on many of their
-products [[120](/en/ch9#Brooker2024correctness),
+products [[^120], [^121]].
 [121](/en/ch9#SatarinTesting)].
 ### Model checking and specification languages
@ -1642,20 +1563,16 @@ longer executions would then not be found.
 Still, model checkers strike a nice balance between ease of use and the ability to find non-obvious
 bugs. CockroachDB, TiDB, Kafka, and many other distributed systems use model specifications to find
 and fix bugs
-[[122](/en/ch9#Vanlightly2024),
+[[^122], [^123], [^124]]. For example,
 [123](/en/ch9#Tang2018),
 [124](/en/ch9#VanBenschoten2019)]. For example,
 using TLA+, researchers were able to demonstrate the potential for data loss in viewstamped
-replication (VR) caused by ambiguity in the prose description of the algorithm
+replication (VR) caused by ambiguity in the prose description of the algorithm [^125].
 [^125].
 By design, model checkers don’t run your actual code, but rather a simplified model that specifies
 only the core ideas of your protocol. This makes it more tractable to systematically explore the
 state space, but it risks that your specification and your implementation go out of sync with each
 other [^126].
 It is possible to check whether the model and the real implementation have equivalent behavior, but
-this requires instrumentation in the real implementation
+this requires instrumentation in the real implementation [^127].
 [^127].
 ### Fault injection
@ -1667,8 +1584,7 @@ processes—anything you can imagine going wrong with a computer.
 Fault injection tests are typically run in an environment that closely resembles the production
 environment where the system will run. Some even inject faults directly into their production
-environment. Netflix popularized this approach with their Chaos Monkey tool
+environment. Netflix popularized this approach with their Chaos Monkey tool [^128]. Production fault
 [^128]. Production fault
 injection is often referred to as *chaos engineering*, which we discussed in
 [“Reliability and Fault Tolerance”](/en/ch2#sec_introduction_reliability).
@ -1683,11 +1599,9 @@ during and after faults are injected to make sure things work as expected.
 The myriad of tools required to trigger failures make fault injection tests cumbersome to write.
 It’s common to adopt a fault injection framework like Jepsen to run fault injection tests to
 simplify the process. Such frameworks come with integrations for various operating systems and many
-pre-built fault injectors
+pre-built fault injectors [^129].
 [^129].
 Jepsen has been remarkably effective at finding critical bugs in many widely-used systems
-[[130](/en/ch9#Kingsbury2024),
+[[^130], [^131]].
 [131](/en/ch9#Majumdar2017)].
 ### Deterministic simulation testing
@ -1772,7 +1686,7 @@ simulations, elements of nondeterminism may remain. For example, in some program
 order in which you iterate over the elements of a hash table may be nondeterministic. Whether you
 run into a resource limit (memory allocation failure, stack overflow) is also nondeterministic.
-# Summary
+## Summary
 In this chapter we have discussed a wide range of problems that can occur in distributed systems,
 including:
@ -1810,8 +1724,7 @@ other nodes and try to get a quorum to agree.
 If you’re used to writing software in the idealized mathematical perfection of a single computer,
 where the same operation always deterministically returns the same result, then moving to the messy
 physical reality of distributed systems can be a bit of a shock. Conversely, distributed systems
-engineers will often regard a problem as trivial if it can be solved on a single computer
+engineers will often regard a problem as trivial if it can be solved on a single computer [^4],
 [^4],
 and indeed a single computer can do a lot nowadays. If you can avoid opening Pandora’s box and
 simply keep things on a single machine, for example by using an embedded storage engine (see
 [“Embedded storage engines”](/en/ch4#sidebar_embedded)), it is generally worth doing so.
@ -1834,11 +1747,10 @@ This chapter has been all about problems, and has given us a bleak outlook. In t
 will move on to solutions, and discuss some algorithms that have been designed to cope with the
 problems in distributed systems.
 ##### Footnotes
 ##### References
 ### Summary
 [^1]: Mark Cavage. [There’s Just No Getting Around It: You’re Building a Distributed System](https://queue.acm.org/detail.cfm?id=2482856). *ACM Queue*, volume 11, issue 4, pages 80-89, April 2013. [doi:10.1145/2466486.2482856](https://doi.org/10.1145/2466486.2482856) 
 [^2]: Jay Kreps. [Getting Real About Distributed System Reliability](https://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability). *blog.empathybox.com*, March 2012. Archived at [perma.cc/9B5Q-AEBW](https://perma.cc/9B5Q-AEBW) 
--- a/content/en/part-ii.md
+++ b/content/en/part-ii.md
@ -105,7 +105,7 @@ Later, in Part III of this book, we will discuss how you can take several (poten
 - [9. The Trouble with Distributed Systems](/en/ch9)
 - [10. Consistency and Consensus](/en/ch10)
-## References
+### References
 1. Ulrich Drepper: “[What Every Programmer Should Know About Memory](https://people.freebsd.org/~lstewart/articles/cpumemory.pdf),” akka‐dia.org, November 21, 2007.
 1. Ben Stopford: “[Shared Nothing vs. Shared Disk Architectures: An Independent View](http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/),” benstopford.com, November 24, 2009.