fix ref links

2026-06-21 00:47:05 +08:00 · 2025-08-09 16:22:19 +08:00 · 2025-08-09 16:22:19 +08:00 · 515bb3a093
commit 515bb3a093
parent 4ec385f161
9 changed files with 150 additions and 223 deletions
--- a/content/en/ch10.md
+++ b/content/en/ch10.md
@ -218,7 +218,7 @@ There are a few interesting details to point out in [Figure 10-4](/en/ch10#fig_
 That is the intuition behind linearizability; the formal definition [^1] describes it more precisely. It is
 possible (though computationally expensive) to test whether a system’s behavior is linearizable by
 recording the timings of all requests and responses, and checking whether they can be arranged into
-a valid sequential order [[^6], [^7]].
+a valid sequential order [^6] [^7].

 Just as there are various weak isolation levels for transactions besides serializability (see
 [“Weak Isolation Levels”](/en/ch8#sec_transactions_isolation_levels)), there are also various weaker consistency models for
@ -255,7 +255,7 @@ Linearizability

 A database may provide both serializability and linearizability, and this combination is known as
 *strict serializability* or *strong one-copy serializability* (*strong-1SR*)
-[[^11], [^12]].
+[^11] [^12].
 Single-node databases are typically linearizable. With distributed databases using optimistic
 methods like serializable snapshot isolation (see [“Serializable Snapshot Isolation (SSI)”](/en/ch8#sec_transactions_ssi)) the situation is more
 complicated: for example, CockroachDB provides serializability, and some recency guarantees on
@ -264,7 +264,7 @@ because this would require expensive coordination between transactions [^14].

 It is also possible to combine a weaker isolation level with linearizability, or a weaker
 consistency model with serializability; in fact, consistency model and isolation level can be chosen
-largely independently from each other [[^15], [^16]].
+largely independently from each other [^15] [^16].

 ## Relying on Linearizability

@ -460,7 +460,7 @@ performance: a reader must perform read repair (see [“Catching up on missed wr
 before returning results to the application [^24].
 Moreover, before writing, a writer must read the latest state of a quorum of nodes to fetch the
 latest timestamp of any prior write, and ensure that the new write has a greater timestamp
-[[^25], [^26]].
+[^25] [^26].
 However, Riak does not perform synchronous read repair due to the performance penalty.
 Cassandra does wait for read repair to complete on quorum reads [^27],
 but it loses linearizability due to its use of time-of-day clocks for timestamps.
@ -529,10 +529,10 @@ The trade-off is as follows:

 Thus, applications that don’t require linearizability can be more tolerant of network problems. This
 insight is popularly known as the *CAP theorem*
-[[^29], [^30], [^31], [^32]],
+[^29] [^30] [^31] [^32],
 named by Eric Brewer in 2000, although the trade-off had been known to designers of
 distributed databases since the 1970s
-[[^33], [^34], [^35]].
+[^33] [^34] [^35].

 CAP was originally proposed as a rule of thumb, without precise definitions, with the goal of
 starting a discussion about trade-offs in databases. At the time, many distributed databases
@ -563,7 +563,7 @@ formalization of *availability* [^30] does not
 match the usual meaning of the term [^38]. Many highly available (fault-tolerant) systems actually do not meet CAP’s
 idiosyncratic definition of availability. Moreover, some system designers choose (with good reason)
 to provide neither linearizability nor the form of availability that the CAP theorem assumes, so
-those systems are neither CP nor AP [[^39], [^40]].
+those systems are neither CP nor AP [^39] [^40].

 All in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us
 understand systems better, so CAP is best avoided.
@ -574,11 +574,11 @@ fault (network partitions, which according to data from Google are the cause of
 incidents [^41]).
 It doesn’t say anything about network delays, dead nodes, or other trade-offs. Thus, although CAP
 has been historically influential, it has little practical value for designing systems
-[[^4], [^38]].
+[^4] [^38].

 There have been efforts to generalize CAP. For example, the *PACELC principle* observes that system
 designers might also choose to weaken consistency at times when the network is working fine in order
-to reduce latency [[^39], [^40], [^42]].
+to reduce latency [^39] [^40] [^42].
 Thus, during a network partition (P), we need to choose between availability (A) and consistency
 (C); else (E), when there is no partition, we may choose between low latency (L) and
 consistency (C). However, this definition inherits several problems with CAP, such as the
@ -586,7 +586,7 @@ counterintuitive definitions of consistency and availability.

 There are many more interesting impossibility results in distributed systems [^43],
 and CAP has now been superseded by more precise results
-[[^44], [^45]],
+[^44] [^45],
 so it is of mostly historical interest today.

 ### Linearizability and network delays
@ -945,18 +945,18 @@ node, but which get a lot harder if you want fault tolerance:
 It turns out that all of these are instances of the same fundamental distributed systems problem:
 *consensus*. Consensus is one of the most important and fundamental problems in distributed
 computing; it is also infamously difficult to get right
-[[^58], [^59]],
+[^58] [^59],
 and many systems have got it wrong in the past. Now that we have discussed replication
 ([Chapter 6](/en/ch6#ch_replication)), transactions ([Chapter 8](/en/ch8#ch_transactions)), system models ([Chapter 9](/en/ch9#ch_distributed)), and
 linearizability (this chapter), we are finally ready to tackle the consensus problem.

 The best-known consensus algorithms are Viewstamped Replication
-[[^60], [^61]],
-Paxos [[^58], [^62], [^63], [^64]],
-Raft [[^23], [^65], [^66]],
-and Zab [[^18], [^22], [^67]].
+[^60] [^61],
+Paxos [^58] [^62] [^63] [^64],
+Raft [^23] [^65] [^66],
+and Zab [^18] [^22] [^67].
 There are quite a few similarities between these algorithms, but they are not the same
-[[^68], [^69]].
+[^68] [^69].
 These algorithms work in a non-Byzantine system model: that is, network communication may be
 arbitrarily delayed or dropped, and nodes may crash, restart, and become disconnected, but the
 algorithms assume that nodes otherwise follow the protocol correctly and do not behave maliciously.
@ -964,7 +964,7 @@ algorithms assume that nodes otherwise follow the protocol correctly and do not
 There are also consensus algorithms that can tolerate some Byzantine nodes, i.e., nodes that don’t
 correctly follow the protocol (for example, by sending contradictory messages to other nodes). A
 common assumption is that fewer than one-third of the nodes are Byzantine-faulty
-[[^26], [^70]].
+[^26] [^70].
 Such *Byzantine fault tolerant* (BFT) consensus algorithms are used in blockchains [^71].
 However, as explained in [“Byzantine Faults”](/en/ch9#sec_distributed_byzantine), BFT algorithms are beyond the scope of this
 book.
@ -1095,7 +1095,7 @@ consensus. Any CAS invocations whose new value was not decided return an error.
 different expected values use separate runs of the consensus protocol.

 This shows that CAS and consensus are equivalent to each other
-[[^28], [^73]].
+[^28] [^73].
 Again, both are straightforward on a single node, but challenging to make fault-tolerant. As an
 example of CAS in a distributed setting, we saw conditional write operations for object stores in
 [“Databases backed by object storage”](/en/ch6#sec_replication_object_storage), which allow a write to happen only if an object with the same
@ -1105,7 +1105,7 @@ However, a linearizable read-write register is not sufficient to solve consensus
 tells us that consensus cannot be solved by a deterministic algorithm in the asynchronous crash-stop
 model [^72], but we saw in
 [“Linearizability and quorums”](/en/ch10#sec_consistency_quorum_linearizable) that a linearizable register can be implemented using quorum
-reads/writes in this model [[^24], [^25], [^26]].
+reads/writes in this model [^24] [^25] [^26].
 From this it follows that a linearizable register cannot solve consensus.

 ### Shared logs as consensus
@ -1142,12 +1142,8 @@ Validity
 value to be added to the log.

 > [!NOTE]
-> A shared log is formally known as a *total order broadcast*, *atomic broadcast*, or *total order
-> multicast* protocol [[^26],
-> [^76],
-> [^77]].
-> It’s the same thing described in different words: requesting a value to be added to the log is then
-> called “broadcasting” it, and reading a log entry is called “delivering” it.
+> A shared log is formally known as a *total order broadcast*, *atomic broadcast*, or *total order multicast* protocol [^26] [^76] [^77]
+> It’s the same thing described in different words: requesting a value to be added to the log is then called “broadcasting” it, and reading a log entry is called “delivering” it.

 If you have an implementation of a shared log, it is easy to solve the consensus problem: every node
 that wants to propose a value requests for it to be added to the log, and whichever value is read
@ -1243,7 +1239,7 @@ any of the communication among the nodes times out). The other three properties
 same as for consensus.

 If you have a solution for consensus, there are multiple ways you could solve atomic commitment
-[[^78], [^79]].
+[^78] [^79].
 One works like this: when you want to commit the transaction, every node sends its vote to commit or
 abort to every other node. Nodes that receive a vote to commit from itself and every other node
 propose “commit” using the consensus algorithm; nodes that receive a vote to abort, or which
@ -1290,7 +1286,7 @@ Similarly, a shared log can be used to implement serializable transactions: as d
 [“Actual Serial Execution”](/en/ch8#sec_transactions_serial), if every log entry represents a deterministic transaction to be
 executed as a stored procedure, and if every node executes those transactions in the same order,
 then the transactions will be serializable
-[[^81], [^82]].
+[^81] [^82].

 > [!NOTE]
 > Sharded databases with a strong consistency model often maintain a separate log per shard, which
@ -1353,7 +1349,7 @@ a vote on a proposal succeeds, at least one of the nodes that voted for it must
 participated in the most recent successful leader election [^85]. Thus, if the vote on a proposal
 passes without revealing any higher-numbered epoch, the current leader can conclude that no leader
 with a higher epoch number has been elected, and therefore it can safely append the proposed entry
-to the log [[^26], [^86]].
+to the log [^26] [^86].

 These two rounds of voting look superficially similar to two-phase commit, but they are very
 different protocols. In consensus algorithms, any node can start an election and it requires only a
@ -1364,7 +1360,7 @@ vote from *every* participant before it can commit.

 This basic structure is common to all of Raft, Multi-Paxos, Zab, and Viewstamped Replication: a vote
 by a quorum of nodes elects a leader, and then another quorum vote is required for every entry that
-the leader wants to append to the log [[^68], [^69]]. Every new log entry is synchronously replicated
+the leader wants to append to the log [^68] [^69]. Every new log entry is synchronously replicated
 to a quorum of nodes before it is confirmed to the client that requested the write. This ensures
 that the log entry won’t be lost if the current leader fails.

@ -1398,7 +1394,7 @@ easily cause a lot of data loss or corruption.
 Another subtlety is in how the algorithms deal with log entries that had been proposed by the old
 leader before it failed, but for which the vote on appending to the log had not yet completed. You
 can find discussions of these details in the references for this chapter
-[[^23], [^69], [^86]].
+[^23] [^69] [^86].

 For databases that use a consensus algorithm for replication, not only do writes need to be turned
 into log entries and replicated to a quorum. If you want to guarantee linearizable reads, they also
@ -1441,7 +1437,7 @@ work.

 Sometimes, consensus algorithms are particularly sensitive to network problems. For example, Raft
 has been shown to have unpleasant edge cases
-[[^88], [^89]]:
+[^88] [^89]:
 if the entire network is working correctly except for one particular network link that is
 consistently unreliable, Raft can get into situations where leadership continually bounces between
 two nodes, or the current leader is continually forced to resign, so the system effectively never
@ -1468,7 +1464,7 @@ entirely in memory (although they still write to disk for durability), which is
 multiple nodes using a fault-tolerant consensus algorithm.

 Coordination services are modeled after Google’s Chubby lock service
-[[^17], [^58]].
+[^17] [^58].
 They combine a consensus algorithm with several other features that turn out to be particularly
 useful when building distributed systems:

@ -1545,7 +1541,7 @@ information like “the node running on IP address 10.1.1.23 is the leader for s
 assignments usually change on a timescale of minutes or hours. Coordination services are not
 intended for storing data that may change thousands of times per second. For that, it is better to
 use a conventional database; alternatively, tools like Apache BookKeeper
-[[^90], [^91]]
+[^90] [^91]
 can be used to replicate fast-changing internal state of a service.

 ### Service discovery
--- a/content/en/ch2.md
+++ b/content/en/ch2.md
@ -186,19 +186,13 @@ is a long queue of requests waiting to be handled, response times may increase s
 time out and resend their request. This causes the rate of requests to increase even further, making
 the problem worse—a *retry storm*. Even when the load is reduced again, such a system may remain in
 an overloaded state until it is rebooted or otherwise reset. This phenomenon is called a *metastable
-failure*, and it can cause serious outages in production systems
-[[^7], [^8]].
+failure*, and it can cause serious outages in production systems [^7] [^8].

 To avoid retries overloading a service, you can increase and randomize the time between successive
-retries on the client side (*exponential backoff*
-[[^9], [^10]]),
-and temporarily stop sending requests to a service that has returned errors or timed out recently
-(using a *circuit breaker* [[^11], [^12]]
-or *token bucket* algorithm [^13]).
+retries on the client side (*exponential backoff* [^9] [^10]), and temporarily stop sending requests to a service that has returned errors or timed out recently
+(using a *circuit breaker* [^11] [^12] or *token bucket* algorithm [^13]).
 The server can also detect when it is approaching overload and start proactively rejecting requests
-(*load shedding* [^14]), and send back
-responses asking clients to slow down (*backpressure*
-[[^1], [^15]]).
+(*load shedding* [^14]), and send back responses asking clients to slow down (*backpressure* [^1] [^15]).
 The choice of queueing and load-balancing algorithms can also make a difference [^16].

 In terms of performance metrics, the response time is usually what users care about the most,
@ -342,7 +336,7 @@ For example, an SLO may set a target for a service to have a median response tim
 result in non-error responses. An SLA is a contract that specifies what happens if the SLO is not
 met (for example, customers may be entitled to a refund). That is the basic idea, at least; in
 practice, defining good availability metrics for SLOs and SLAs is not straightforward
-[[^28], [^29]].
+[^28] [^29].

 # Computing percentiles

@ -355,7 +349,7 @@ The simplest implementation is to keep a list of response times for all requests
 window and to sort that list every minute. If that is too inefficient for you, there are algorithms
 that can calculate a good approximation of percentiles at minimal CPU and memory cost.
 Open source percentile estimation libraries include HdrHistogram,
-t-digest [[^30], [^31]],
+t-digest [^30] [^31],
 OpenHistogram [^32], and DDSketch [^33].

 Beware that averaging percentiles, e.g., to reduce the time resolution or to combine data from
@ -375,7 +369,7 @@ software, typical expectations include:
 If all those things together mean “working correctly,” then we can understand *reliability* as
 meaning, roughly, “continuing to work correctly, even when things go wrong.” To be more precise
 about things going wrong, we will distinguish between *faults* and *failures*
-[[^35], [^36], [^37]]:
+[^35] [^36] [^37]:

 Fault
 : A fault is when a particular *part* of a system stops working correctly: for example, if a
@ -432,21 +426,14 @@ cured, as described in the following sections.

 When we think of causes of system failure, hardware faults quickly come to mind:

-* Approximately 2–5% of magnetic hard drives fail per year
- [[^40],
- [^41]];
+* Approximately 2–5% of magnetic hard drives fail per year [^40] [^41];
 in a storage cluster with 10,000 disks, we should therefore expect on average one disk failure per day.
- Recent data suggests that disks are getting more reliable, but failure rates remain significant
- [^42].
-* Approximately 0.5–1% of solid state drives (SSDs) fail per year
- [^43].
- Small numbers of bit errors are corrected automatically
- [^44],
+ Recent data suggests that disks are getting more reliable, but failure rates remain significant [^42].
+* Approximately 0.5–1% of solid state drives (SSDs) fail per year [^43].
+ Small numbers of bit errors are corrected automatically [^44],
 but uncorrectable errors occur approximately once per year per drive, even in drives that are
 fairly new (i.e., that have experienced little wear); this error rate is higher than that of
- magnetic hard drives
- [[^45],
- [^46]].
+ magnetic hard drives [^45], [^46].
 * Other hardware components such as power supplies, RAID controllers, and memory modules also fail,
 although less frequently than hard drives [^47] [^48].
 * Approximately one in 1,000 machines has a CPU core that occasionally computes the wrong result,
@ -676,7 +663,7 @@ If you can double the resources in order to handle twice the load, while keeping
 same, we say that you have *linear scalability*, and this is considered a good thing. Occasionally
 it is possible to handle twice the load with less than double the resources, due to economies of
 scale or a better distribution of peak load
-[[^79], [^80]].
+[^79] [^80].
 Much more likely is that the cost grows faster than linearly, and there may be many reasons for the
 inefficiency. For example, if you have a lot of data, then processing a single write request may
 involve more work than if you have a small amount of data, even if the size of the request is the
@ -762,7 +749,7 @@ bugs that need fixing.
 It is widely recognized that the majority of the cost of software is not in its initial development,
 but in its ongoing maintenance—fixing bugs, keeping its systems operational, investigating failures,
 adapting it to new platforms, modifying it for new use cases, repaying technical debt, and adding
-new features [[^85], [^86]].
+new features [^85] [^86].

 However, maintenance is also difficult. If a system has been successfully running for a long time,
 it may well use outdated technologies that not many engineers understand today (such as mainframes
@ -925,7 +912,7 @@ There are no easy answers on how to achieve these things, but one thing that can
 applications using well-understood building blocks that provide useful abstractions. The rest of
 this book will cover a selection of building blocks that have proved to be valuable in practice.

-### Summary
+### References

 [^1]: Mike Cvet. [How We Learned to Stop Worrying and Love Fan-In at Twitter](https://www.youtube.com/watch?v=WEgCjwyXvwc). At *QCon San Francisco*, December 2016. 
 [^2]: Raffi Krikorian. [Timelines at Scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability/). At *QCon San Francisco*, November 2012. Archived at [perma.cc/V9G5-KLYK](https://perma.cc/V9G5-KLYK) 
--- a/content/en/ch3.md
+++ b/content/en/ch3.md
@ -219,10 +219,7 @@ structure explicit (see [Figure 3-2](/en/ch3#fig_json_tree)).
 ###### Figure 3-2. One-to-many relationships forming a tree structure.

 > [!NOTE]
-> This type of relationship is sometimes called *one-to-few* rather than *one-to-many*, since a résumé
-> typically has a small number of positions
-> [[^9],
-> [^10]].
+> This type of relationship is sometimes called *one-to-few* rather than *one-to-many*, since a résumé typically has a small number of positions [^9] [^10].
 > In situations where there may be a genuinely large number of related items—say, comments on a
 > celebrity’s social media post, of which there could be many thousands—embedding them all in the same
 > document may be too unwieldy, so the relational approach in [Figure 3-1](/en/ch3#fig_obama_relational) is preferable.
@ -540,7 +537,7 @@ such applications well, because the items (or their IDs) can simply be stored in
 determine their order. In relational databases there isn’t a standard way of representing such
 reorderable lists, and various tricks are used: sorting by an integer column (requiring renumbering
 when you insert into the middle), a linked list of IDs, or fractional indexing
-[[^14], [^15], [^16]].
+[^14] [^15] [^16].

 ### Schema flexibility in the document model

@ -593,7 +590,7 @@ since every row needs to be rewritten, and other schema operations (such as chan
 of a column) also typically require the entire table to be copied.

 Various tools exist to allow this type of schema changes to be performed in the background without downtime
-[[^21], [^22], [^23], [^24]],
+[^21] [^22] [^23] [^24],
 but performing such migrations on large databases remains operationally challenging. Complicated
 migrations can be avoided by only adding the `first_name` column with a default value of `NULL`
 (which is fast), and filling it in at read time, like you would with a document database.
@ -1044,7 +1041,7 @@ Oracle has a different SQL extension for recursive queries, which it calls *hier
 [^41].

 However, the situation may be improving: at the time of writing, there are plans to add a graph
-query language called GQL to the SQL standard [[^42], [^43]],
+query language called GQL to the SQL standard [^42] [^43],
 which will provide a syntax inspired by Cypher, GSQL [^44], and PGQL [^45].

 ## Triple-Stores and SPARQL
@ -1127,7 +1124,7 @@ Some of the research and development effort on triple stores was motivated by th
 early-2000s effort to facilitate internet-wide data exchange by publishing data not only as
 human-readable web pages, but also in a standardized, machine-readable format. Although the Semantic
 Web as originally envisioned did not succeed
-[[^49], [^50]],
+[^49] [^50],
 the legacy of the Semantic Web project lives on in a couple of specific technologies: *linked data*
 standards such as JSON-LD [^51],
 *ontologies* used in biomedical science [^52],
@ -1238,7 +1235,7 @@ various other triple stores [^36].
 ## Datalog: Recursive Relational Queries

 Datalog is a much older language than SPARQL or Cypher: it arose from academic research in the 1980s
-[[^57], [^58], [^59]].
+[^57] [^58] [^59].
 It is less well known among software engineers and not widely supported in mainstream databases, but
 it ought to be better-known since it is a very expressive language that is particularly powerful for
 complex queries. Several niche databases, including Datomic, LogicBlox, CozoDB, and LinkedIn’s
@ -1498,7 +1495,7 @@ the status of each booking, another that computes charts for the conference orga
 and a third that generates files for the printer that produces the attendees’ badges.

 The idea of using events as the source of truth, and expressing every state change as an event, is
-known as *event sourcing* [[^62], [^63]].
+known as *event sourcing* [^62] [^63].
 The principle of maintaining separate read-optimized representations and deriving them from the
 write-optimized representation is called *command query responsibility segregation (CQRS)*
 [^64].
@ -1724,11 +1721,7 @@ come into play when *implementing* the data models described in this chapter.



-### Summary
-
-
-
-
+### References

 [^1]: Jamie Brandon. [Unexplanations: query optimization works because sql is declarative](https://www.scattered-thoughts.net/writing/unexplanations-sql-declarative/). *scattered-thoughts.net*, February 2024. Archived at [perma.cc/P6W2-WMFZ](https://perma.cc/P6W2-WMFZ) 
 [^2]: Joseph M. Hellerstein. [The Declarative Imperative: Experiences and Conjectures in Distributed Logic](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-90.pdf). Tech report UCB/EECS-2010-90, Electrical Engineering and Computer Sciences, University of California at Berkeley, June 2010. Archived at [perma.cc/K56R-VVQM](https://perma.cc/K56R-VVQM) 
--- a/content/en/ch4.md
+++ b/content/en/ch4.md
@ -320,8 +320,7 @@ In the context of an LSM storage engines, false positives are no problem:

 An important detail is how the LSM storage chooses when to perform compaction, and which SSTables to
 include in a compaction. Many LSM-based storage systems allow you to configure which compaction
-strategy to use, and some of the common choices are
-[[^16], [^17]]:
+strategy to use, and some of the common choices are [^16] [^17]:

 Size-tiered compaction
 : Newer and smaller SSTables are successively merged into older and larger SSTables. The SSTables
@ -452,7 +451,7 @@ In order to make the database resilient to crashes, it is common for B-tree impl
 include an additional data structure on disk: a *write-ahead log* (WAL). This is an append-only file
 to which every B-tree modification must be written before it can be applied to the pages of the tree
 itself. When the database comes back up after a crash, this log is used to restore the B-tree back
-to a consistent state [[^2], [^24]].
+to a consistent state [^2] [^24].
 In filesystems, the equivalent mechanism is known as *journaling*.

 To improve performance, B-tree implementations typically don’t immediately write every modified page
@ -484,8 +483,7 @@ mention just a few:

 ## Comparing B-Trees and LSM-Trees

-As a rule of thumb, LSM-trees are better suited for write-heavy applications, whereas B-trees are faster for reads
-[[^27], [^28]].
+As a rule of thumb, LSM-trees are better suited for write-heavy applications, whereas B-trees are faster for reads [^27] [^28].
 However, benchmarks are often sensitive to details of the workload. You need to test systems with
 your particular workload in order to make a valid comparison. Moreover, it’s not a strict either/or
 choice between LSM and B-trees: storage engines sometimes blend characteristics of both approaches,
@ -512,7 +510,7 @@ memtable fills up. This happens if data can’t be written out to disk fast enou
 the compaction process cannot keep up with incoming writes. Many storage engines, including RocksDB,
 perform *backpressure* in this situation: they suspend all reads and writes until the memtable has
 been written out to disk
-[[^30], [^31]].
+[^30] [^31].

 Regarding read throughput, modern SSDs (and especially NVMe) can perform many independent read
 requests in parallel. Both LSM-trees and B-trees are able to provide high read throughput, but
@ -555,7 +553,7 @@ A sequential write workload writes larger chunks of data at a time, so it is lik
 can be erased without having to perform any GC. On the other hand, with a random write workload, it
 is more likely that a block contains a mixture of pages with valid and invalid data, so the GC has
 to perform more work before a block can be erased
-[[^34], [^35], [^36]].
+[^34] [^35] [^36].

 The write bandwidth consumed by GC is then not available for the application. Moreover, the
 additional writes performed by GC contribute to wear on the flash memory; therefore, random writes
@ -573,7 +571,7 @@ containing keys and references to values [^37].)
 A B-tree index must write every piece of data at least twice: once to the write-ahead log, and once
 to the tree page itself. In addition, they sometimes need to write out an entire page, even if only
 a few bytes in that page changed, to ensure the B-tree can be correctly recovered after a crash or
-power failure [[^38], [^39]].
+power failure [^38] [^39].

 If you take the total number of bytes written to disk in some workload, and divide by the number of
 bytes you would have to write if you simply wrote an append-only log with no index, you get the
@ -610,7 +608,7 @@ the data files anyway, and SSTables don’t have pages with unused space. Moreov
 key-value pairs can better be compressed in SSTables, and thus often produce smaller files on disk
 than B-trees. Keys and values that have been overwritten continue to consume space until they are
 removed by a compaction, but this overhead is quite low when using leveled compaction
-[[^40], [^41]].
+[^40] [^41].
 Size-tiered compaction (see [“Compaction strategies”](/en/ch4#sec_storage_lsm_compaction)) uses more disk space, especially
 temporarily during compaction.

@ -710,7 +708,7 @@ easily be backed up, inspected, and analyzed by external utilities.
 Products such as VoltDB, SingleStore, and Oracle TimesTen are in-memory databases with a relational model,
 and the vendors claim that they can offer big performance improvements by removing all the overheads
 associated with managing on-disk data structures
-[[^46], [^47]].
+[^46] [^47].
 RAMCloud is an open source, in-memory key-value store with durability (using a log-structured
 approach for the data in memory as well as the data on disk) [^48].

@ -744,7 +742,7 @@ transaction processing and data warehousing in the same product. However, these
 and analytical processing (HTAP) databases (introduced in [“Data Warehousing”](/en/ch1#sec_introduction_dwh)) are increasingly
 becoming two separate storage and query engines, which happen to be accessible through a common SQL
 interface
-[[^50], [^51], [^52], [^53]].
+[^50] [^51] [^52] [^53].

 ## Cloud Data Warehouses

@ -881,11 +879,11 @@ to single-node embedded databases such as DuckDB [^62],
 and product analytics systems such as Pinot [^63]
 and Druid [^64].
 It is used in storage formats such as Parquet, ORC
-[[^65], [^66]],
+[^65] [^66],
 Lance [^67],
 and Nimble [^68],
 and in-memory analytics formats like Apache Arrow
-[[^65], [^69]]
+[^65] [^69]
 and Pandas/NumPy [^70].
 Some time-series databases, such as InfluxDB IOx [^71] and TimescaleDB [^72],
 are also based on column-oriented storage.
@ -999,7 +997,7 @@ Queries need to examine both the column data on disk and the recent writes in me
 the two. The query execution engine hides this distinction from the user. From an analyst’s point
 of view, data that has been modified with inserts, updates, or deletes is immediately reflected in
 subsequent queries. Snowflake, Vertica, Apache Pinot, Apache Druid, and many others do this
-[[^61], [^63], [^64], [^76]].
+[^61] [^63] [^64] [^76].

 ## Query Execution: Compilation and Vectorization

@ -1034,7 +1032,7 @@ Vectorized processing
 : The query is interpreted, not compiled, but it is made fast by processing many values from a
 column in a batch, instead of iterating over rows one by one. A fixed set of predefined operators
 are built into the database; we can pass arguments to them and get back a batch of results
- [[^50], [^75]].
+ [^50] [^75].

 For example, we could pass the `product_sk` column and the ID of “bananas” to an equality operator,
 and get back a bitmap (one bit per value in the input column, which is 1 if it’s a banana); we could
@ -1056,9 +1054,7 @@ performance by taking advantages of the characteristics of modern CPUs:
 * doing most of the work in tight inner loops (that is, with a small number of instructions and no
 function calls) to keep the CPU instruction processing pipeline busy and avoid branch
 mispredictions,
-* making use of parallelism such as multiple threads and single-instruction-multi-data (SIMD)
- instructions [[^79],
- [^80]], and
+* making use of parallelism such as multiple threads and single-instruction-multi-data (SIMD) instructions [^79] [^80], and
 * operating directly on compressed data without decoding it into a separate in-memory
 representation, which saves memory allocation and copying costs.

@ -1196,7 +1192,7 @@ It stores the mapping from term to postings list in SSTable-like sorted files, w
 the background using the same log-structured approach we saw earlier in this chapter [^91].
 PostgreSQL’s GIN index type also uses postings lists to support full-text search and indexing inside
 JSON documents
-[[^92], [^93]].
+[^92] [^93].

 Instead of breaking text into words, an alternative is to find all the substrings of length *n*,
 which are called *n*-grams. For example, the trigrams (*n* = 3) of the string
@ -1295,7 +1291,7 @@ variations of each [^101],
 and PostgreSQL’s pgvector supports both as well [^102].
 The full details of the IVF and HNSW algorithms are beyond the scope of this book, but their papers
 are an excellent resource
-[[^103], [^104]].
+[^103] [^104].

 ## Summary

@ -1347,7 +1343,7 @@ documentation for the database of your choice.



-### Summary
+### References



--- a/content/en/ch5.md
+++ b/content/en/ch5.md
@ -447,7 +447,7 @@ application code is expecting, and their types.
 If the reader’s and writer’s schema are the same, decoding is easy. If they are different, Avro
 resolves the differences by looking at the writer’s schema and the reader’s schema side by side and
 translating the data from the writer’s schema into the reader’s schema. The Avro specification
-[[^16], [^17]]
+[^16] [^17]
 defines exactly how this resolution works, and it is illustrated in
 [Figure 5-6](/en/ch5#fig_encoding_avro_resolution).

@ -571,7 +571,7 @@ languages.

 The ideas on which these encodings are based are by no means new. For example, they have a lot in
 common with ASN.1, a schema definition language that was first standardized in 1984
-[[^23], [^24]].
+[^23] [^24].
 It was used to define various network protocols, and its binary encoding (DER) is still used to encode
 SSL certificates (X.509), for example [^25].
 ASN.1 supports schema evolution using tag numbers, similar to Protocol Buffers [^26].
@ -737,7 +737,7 @@ different contexts. For example:
 systems, or OAuth for shared access to user data.

 The most popular service design philosophy is REST, which builds upon the principles of HTTP
-[[^30], [^31]].
+[^30] [^31].
 It emphasizes simple data formats, using URLs for identifying resources and using HTTP features for
 cache control, authentication, and content type negotiation. An API designed according to the
 principles of REST is called *RESTful*.
@ -824,14 +824,14 @@ Architecture (CORBA) is excessively complex, and does not provide backward or fo
 compatibility [^33].
 SOAP and the WS-\* web services framework aim to provide interoperability across vendors, but are
 also plagued by complexity and compatibility problems
-[[^34], [^35], [^36]].
+[^34] [^35] [^36].

 All of these are based on the idea of a *remote procedure call* (RPC), which has been around since
 the 1970s [^37].
 The RPC model tries to make a request to a remote network service look the same as calling a function or
 method in your programming language, within the same process (this abstraction is called *location
 transparency*). Although RPC seems convenient at first, the approach is fundamentally flawed
-[[^38], [^39]].
+[^38] [^39].
 A network request is very different from a local function call:

 * A local function call is predictable and either succeeds or fails, depending only on parameters
@ -1016,7 +1016,7 @@ task fails, the framework will re-execute the task, but will skip any RPC calls
 that the task made successfully before failing. Instead, the framework will pretend to make the
 call, but will instead return the results from the previous call. This is possible because durable
 execution frameworks log all RPCs and state changes to durable storage like a write-ahead log
-[[^45], [^46]].
+[^45] [^46].
 [Example 5-5](/en/ch5#fig_temporal_workflow) shows an example of a workflow definition that supports durable execution
 using Temporal.

@ -1109,7 +1109,7 @@ Message brokers typically don’t enforce any particular data model—a message
 bytes with some metadata, so you can use any encoding format. A common approach is to use Protocol
 Buffers, Avro, or JSON, and to deploy a schema registry alongside the message broker to store all
 the valid schema versions and check their compatibility
-[[^19], [^21]].
+[^19] [^21].
 AsyncAPI, a messaging-based equivalent of OpenAPI, can also be used to specify the schema of
 messages.

@ -1197,7 +1197,7 @@ quite achievable. May your application’s evolution be rapid and your deploymen



-### Summary
+### References



--- a/content/en/ch6.md
+++ b/content/en/ch6.md
@ -221,9 +221,7 @@ for live queries. Storing database data in object storage has many benefits:
 * Object stores also provide multi-zone, dual-region, or multi-region replication with very high
  durability guarantees. This also allows databases to bypass inter-zone network fees.
 * Databases can use an object store’s *conditional write* feature—essentially, a *compare-and-set*
-  (CAS) operation—to implement transactions and leadership election
-  [[10](/ch06.html#Morling2024_ch6),
-  [11](/ch06.html#Chandramohan2024)]).
+  (CAS) operation—to implement transactions and leadership election [^10] [^11]
 * Storing data from multiple databases in the same object store can simplify data integration,
  particularly when open formats such as Apache Parquet and Apache Iceberg are used.

@ -420,9 +418,7 @@ heap into a consistent state, we can use the exact same log to build a replica o
 besides writing the log to disk, the leader also sends it across the network to its followers. When
 the follower processes this log, it builds a copy of the exact same files as found on the leader.

-This method of replication is used in PostgreSQL and Oracle, among others
-[[17](/ch06.html#Suzuki2017_ch6),
-[18](/ch06.html#Kapila2012)].
+This method of replication is used in PostgreSQL and Oracle, among others [^17] [^18]
 The main disadvantage is that the log describes the data on a very low level: a WAL contains details
 of which bytes were changed in which disk blocks. This makes replication tightly coupled to the
 storage engine. If the database changes its storage format from one version to another, it is
@ -915,10 +911,7 @@ Moreover, many modern web apps offer *real-time collaboration* features, such as
 Sheets for text documents and spreadsheets, Figma for graphics, and Linear for project management.
 What makes these apps so responsive is that user input is immediately reflected in the user
 interface, without waiting for a network round-trip to the server, and edits by one user are shown
-to their collaborators with low latency
-[[32](/ch06.html#DayRichter2010),
-[33](/ch06.html#Wallace2019),
-[34](/ch06.html#Artman2023)].
+to their collaborators with low latency [^32] [^33] [^34]

 This again results in a multi-leader architecture: each web browser tab that has opened the shared
 file is a replica, and any updates that you make to the file are asynchronously replicated to the
@ -935,19 +928,14 @@ multiple users have changed the file concurrently, conflict resolution logic may
 those changes.

 A software library that supports this process is called a *sync engine*. Although the idea has
-existed for a long time, the term has recently gained attention
-[[35](/ch06.html#Saafan2024),
-[36](/ch06.html#Hagoel2024),
-[37](/ch06.html#Jayakar2024)].
+existed for a long time, the term has recently gained attention [^35] [^36] [^37].
 An application that allows a user to continue editing a file while offline (which may be implemented
-using a sync engine) is called *offline-first*
-[^38].
+using a sync engine) is called *offline-first* [^38].
 The term *local-first software* refers to collaborative apps that are not only offline-first, but
 are also designed to continue working even if the developer who made the software shuts down all of
 their online services [^39].
 This can be achieved by using a sync engine with an open standard sync protocol for which multiple
-service providers are available
-[^40].
+service providers are available [^40].
 For example, Git is a local-first collaboration system (albeit one that doesn’t support real-time
 collaboration) since you can sync via GitHub, GitLab, or any other repository hosting service.

@ -1243,20 +1231,16 @@ writes in the same order.

 Some data storage systems take a different approach, abandoning the concept of a leader and
 allowing any replica to directly accept writes from clients. Some of the earliest replicated data
-systems were leaderless [[1](/ch06.html#Lindsay1979_ch6),
-[50](/ch06.html#Gifford1979)], but the
-idea was mostly forgotten during the era of dominance of relational databases. It once again became
+systems were leaderless [^1] [^50], but the idea was mostly forgotten during the era of dominance of relational databases. It once again became
 a fashionable architecture for databases after Amazon used it for its in-house *Dynamo* system in
 2007 [^45].
 Riak, Cassandra, and ScyllaDB are open source datastores with leaderless replication models inspired
 by Dynamo, so this kind of database is also known as *Dynamo-style*.

 > [!NOTE]
-> The original *Dynamo* system was only described in a paper
-> [^45], but never released outside of
-> Amazon. The similarly-named *DynamoDB* is a more recent cloud database from AWS, but it has a
-> completely different architecture: it uses single-leader replication based on the Multi-Paxos
-> consensus algorithm [^5].
+> The original *Dynamo* system was only described in a paper [^45], but never released outside of Amazon. 
+> The similarly-named *DynamoDB* is a more recent cloud database from AWS, but it has a completely different architecture: 
+> it uses single-leader replication based on the Multi-Paxos consensus algorithm [^5].

 In some leaderless implementations, the client directly sends its writes to several replicas, while
 in others, a coordinator node does this on behalf of the client. However, unlike a leader database,
@ -1707,15 +1691,9 @@ replica increments its own version number when processing a write, and also keep
 version numbers it has seen from each of the other replicas. This information indicates which values
 to overwrite and which values to keep as siblings.

-The collection of version numbers from all the replicas is called a *version vector*
-[^58].
-A few variants of this idea are in use, but the most interesting is probably the *dotted version
-vector*
-[[59](/ch06.html#Preguica2010),
-[60](/ch06.html#Manepalli2022)],
-which is used in Riak 2.0
-[[61](/ch06.html#Cribbs2014),
-[62](/ch06.html#Brown2015)].
+The collection of version numbers from all the replicas is called a *version vector* [^58].
+A few variants of this idea are in use, but the most interesting is probably the *dotted version vector* [^59] [^60],
+which is used in Riak 2.0 [^61] [^62].
 We won’t go into the details, but the way it works is quite similar to what we saw in our cart example.

 Like the version numbers in [Figure 6-15](/ch06.html#fig_replication_causality_single), version vectors are sent from the
@ -1731,10 +1709,7 @@ siblings are merged correctly.
 # Version vectors and vector clocks

 A *version vector* is sometimes also called a *vector clock*, even though they are not quite the
-same. The difference is subtle—please see the references for details
-[[60](/ch06.html#Manepalli2022),
-[63](/ch06.html#Baquero2011),
-[64](/ch06.html#Schwarz1994)]. In brief, when
+same. The difference is subtle—please see the references for details [^60] [^63] [^64]. In brief, when
 comparing the state of replicas, version vectors are the right data structure to use.

 ## Summary
@ -1760,8 +1735,7 @@ Despite being a simple goal—keeping a copy of the same data on several machine
 to be a remarkably tricky problem. It requires carefully thinking about concurrency and about all
 the things that can go wrong, and dealing with the consequences of those faults. At a minimum, we
 need to deal with unavailable nodes and network interruptions (and that’s not even considering the
-more insidious kinds of fault, such as silent data corruption due to software bugs or hardware
-errors).
+more insidious kinds of fault, such as silent data corruption due to software bugs or hardware errors).

 We discussed three main approaches to replication:

@ -1817,7 +1791,7 @@ machine to store only a subset of the data.



-### Summary
+### References


 [^1]: B. G. Lindsay, P. G. Selinger, C. Galtieri, J. N. Gray, R. A. Lorie, T. G. Price, F. Putzolu, I. L. Traiger, and B. W. Wade. [Notes on Distributed Databases](https://dominoweb.draco.res.ibm.com/reports/RJ2571.pdf). IBM Research, Research Report RJ2571(33471), July 1979. Archived at [perma.cc/EPZ3-MHDD](https://perma.cc/EPZ3-MHDD)
--- a/content/en/ch7.md
+++ b/content/en/ch7.md
@ -51,7 +51,7 @@ Some databases treat partitions and shards as two distinct concepts. For example
 partitioning is a way of splitting a large table into several files that are stored on the same
 machine (which has several advantages, such as making it very fast to delete an entire partition),
 whereas sharding splits a dataset across multiple machines
-[[^1], [^2]].
+[^1] [^2].
 In many other systems, partitioning is just another word for sharding.

 While *partitioning* is quite descriptive, the term *sharding* is perhaps surprising. According to
@ -408,7 +408,7 @@ to the number of nodes (3 ranges per node in [Figure 7-6](/en/ch7#fig_sharding_
 per node in Cassandra by default, and 256 per node in ScyllaDB), with random boundaries between
 those ranges. This means some ranges are bigger than others, but by having multiple ranges per node
 those imbalances tend to even out
-[[^15], [^18]].
+[^15] [^18].

 ![ddia 0706](/fig/ddia_0706.png)

@ -459,7 +459,7 @@ This event can result in a large volume of reads and writes to the same key (whe
 is perhaps the user ID of the celebrity, or the ID of the action that people are commenting on).

 In such situations, a more flexible sharding policy is required
-[[^25], [^26]].
+[^25] [^26].
 A system that defines shards based on ranges of keys (or ranges of hashes) makes it possible to put
 an individual hot key in a shard by its own, and perhaps even assigning it a dedicated machine [^27].

@ -502,7 +502,7 @@ Fully automated rebalancing can be convenient, because there is less operational
 normal maintenance, and such systems can even auto-scale to adapt to changes in workload. Cloud
 databases such as DynamoDB are promoted as being able to automatically add and remove shards to
 adapt to big increases or decreases of load within a matter of minutes
-[[^17], [^29]].
+[^17] [^29].

 However, automatic shard management can also be unpredictable. Rebalancing is an expensive
 operation, because it requires rerouting requests and moving a large amount of data from one node to
@ -779,7 +779,7 @@ that question in the following chapters.



-### Summary
+### References


 [^1]: Claire Giordano. [Understanding partitioning and sharding in Postgres and Citus](https://www.citusdata.com/blog/2023/08/04/understanding-partitioning-and-sharding-in-postgres-and-citus/). *citusdata.com*, August 2023. Archived at [perma.cc/8BTK-8959](https://perma.cc/8BTK-8959) 
--- a/content/en/ch8.md
+++ b/content/en/ch8.md
@ -67,7 +67,7 @@ the challenge of achieving atomicity in a distributed transaction.

 Almost all relational databases today, and some nonrelational databases, support transactions. Most
 of them follow the style that was introduced in 1975 by IBM System R, the first SQL database
-[[^2], [^3], [^4]].
+[^2] [^3] [^4].
 Although some implementation details have changed, the general idea has remained virtually the same
 for 50 years: the transaction support in MySQL, PostgreSQL, Oracle, SQL Server, etc., is uncannily
 similar to that of System R.
@ -214,7 +214,7 @@ However, serializability has a performance cost. In practice, many databases use
 that are weaker than serializability: that is, they allow concurrent transactions to interfere with
 each other in limited ways. Some popular databases, such as Oracle, don’t even implement it (Oracle
 has an isolation level called “serializable,” but it actually implements *snapshot isolation*, which
-is a weaker guarantee than serializability [[^10], [^14]]).
+is a weaker guarantee than serializability [^10] [^14]).
 This means that some kinds of race conditions can still occur. We will explore snapshot isolation
 and other forms of isolation in [“Weak Isolation Levels”](/en/ch8#sec_transactions_isolation_levels).

@ -254,37 +254,22 @@ The truth is, nothing is perfect:
 * In an asynchronously replicated system, recent writes may be lost when the leader becomes
 unavailable (see [“Handling Node Outages”](/en/ch6#sec_replication_failover)).
 * When the power is suddenly cut, SSDs in particular have been shown to sometimes violate the
- guarantees they are supposed to provide: even `fsync` isn’t guaranteed to work correctly
- [^15].
- Disk firmware can have bugs, just like any other kind of software
- [[^16],
- [^17]],
- e.g. causing drives to fail after exactly 32,768 hours of operation
- [^18].
- And `fsync` is hard to use; even PostgreSQL used it incorrectly for over 20 years
- [[^19],
- [^20],
- [^21]].
+ guarantees they are supposed to provide: even `fsync` isn’t guaranteed to work correctly [^15].
+ Disk firmware can have bugs, just like any other kind of software [^16] [^17],
+ e.g. causing drives to fail after exactly 32,768 hours of operation [^18].
+ And `fsync` is hard to use; even PostgreSQL used it incorrectly for over 20 years [^19] [^20] [^21].
 * Subtle interactions between the storage engine and the filesystem implementation can lead to bugs
- that are hard to track down, and may cause files on disk to be corrupted after a crash
- [[^22],
- [^23]].
- Filesystem errors on one replica can sometimes spread to other replicas as well
- [^24].
-* Data on disk can gradually become corrupted without this being detected
- [^25].
+ that are hard to track down, and may cause files on disk to be corrupted after a crash [^22] [^23].
+ Filesystem errors on one replica can sometimes spread to other replicas as well [^24].
+* Data on disk can gradually become corrupted without this being detected [^25].
 If data has been corrupted for some time, replicas and recent backups may also be corrupted. In
 this case, you will need to try to restore the data from a historical backup.
 * One study of SSDs found that between 30% and 80% of drives develop at least one bad block during
- the first four years of operation, and only some of these can be corrected by the firmware
- [^26].
- Magnetic hard drives have a lower rate of bad sectors, but a higher rate of complete failure than
- SSDs.
+ the first four years of operation, and only some of these can be corrected by the firmware [^26].
+ Magnetic hard drives have a lower rate of bad sectors, but a higher rate of complete failure than SSDs.
 * When a worn-out SSD (that has gone through many write/erase cycles) is disconnected from power,
- it can start losing data within a timescale of weeks to months, depending on the temperature
- [^27].
- This is less of a problem for drives with lower wear levels
- [^28].
+ it can start losing data within a timescale of weeks to months, depending on the temperature [^27].
+ This is less of a problem for drives with lower wear levels [^28].

 In practice, there is no one technique that can provide absolute guarantees. There are only various
 risk-reduction techniques, including writing to disk, replicating to remote machines, and
@ -489,7 +474,7 @@ nevertheless used in practice [^29].

 Concurrency bugs caused by weak transaction isolation are not just a theoretical problem. They have
 caused substantial loss of money
-[[^30], [^31], [^32]],
+[^30] [^31] [^32],
 led to investigation by financial auditors [^33],
 and caused customer data to be corrupted [^34].
 A popular comment on revelations of such problems is “Use an ACID database if you’re handling
@ -515,7 +500,7 @@ decide what level is appropriate to your application. Once we’ve done that, we
 serializability in detail (see [“Serializability”](/en/ch8#sec_transactions_serializability)). Our discussion of isolation
 levels will be informal, using examples. If you want rigorous definitions and analyses of their
 properties, you can find them in the academic literature
-[[^36], [^37], [^38], [^39]].
+[^36] [^37] [^38] [^39].

 ## Read Committed

@ -690,7 +675,7 @@ database, frozen at a particular point in time, it is much easier to understand.

 Snapshot isolation is a popular feature: variants of it are supported by PostgreSQL, MySQL with the
 InnoDB storage engine, Oracle, SQL Server, and others, although the detailed behavior varies from
-one system to the next [[^29], [^40], [^41]].
+one system to the next [^29] [^40] [^41].
 Some databases, such as Oracle, TiDB, and Aurora DSQL, even choose snapshot isolation as their
 highest isolation level.

@ -713,7 +698,7 @@ maintains several versions of a row side by side, this technique is known as *mu
 concurrency control* (MVCC).

 [Figure 8-7](/en/ch8#fig_transactions_mvcc) illustrates how MVCC-based snapshot isolation is implemented in PostgreSQL
-[[^40], [^42], [^43]] (other implementations are similar).
+[^40] [^42] [^43] (other implementations are similar).
 When a transaction is started, it is given a unique, always-increasing transaction ID (`txid`).
 Whenever a transaction writes anything to the database, the data it writes is tagged with the
 transaction ID of the writer. (To be precise, transaction IDs in PostgreSQL are 32-bit integers, so
@ -742,7 +727,7 @@ All of the versions of a row are stored within the same database heap (see
 [“Storing values within the index”](/en/ch4#sec_storage_index_heap)), regardless of whether the transactions that wrote them have committed
 or not. The versions of the same row form a linked list, going either from newest version to oldest
 version or the other way round, so that queries can internally iterate over all versions of a row
-[[^45], [^46]].
+[^45] [^46].

 ### Visibility rules for observing a consistent snapshot

@ -790,7 +775,7 @@ value matches what the query is looking for. When garbage collection removes old
 are no longer visible to any transaction, the corresponding index entries can also be removed.

 Many implementation details affect the performance of multi-version concurrency control
-[[^45], [^46]].
+[^45] [^46].
 For example, PostgreSQL has optimizations for avoiding index updates if different versions of the
 same row can fit on the same page [^40].
 Some other databases avoid storing full copies of modified rows, and only store differences between
@ -829,7 +814,7 @@ Unfortunately, the SQL standard’s definition of isolation levels is flawed—i
 imprecise, and not as implementation-independent as a standard should be [^36]. Even though several databases
 implement repeatable read, there are big differences in the guarantees they actually provide,
 despite being ostensibly standardized [^29]. There has been a formal definition of
-repeatable read in the research literature [[^37], [^38]], but most implementations don’t satisfy that
+repeatable read in the research literature [^37] [^38], but most implementations don’t satisfy that
 formal definition. And to top it off, IBM Db2 uses “repeatable read” to refer to serializability [^10].

 As a result, nobody really knows what repeatable read means.
@ -884,7 +869,7 @@ Another option is to simply force all atomic operations to be executed on a sing

 Unfortunately, object-relational mapping (ORM) frameworks make it easy to accidentally write code
 that performs unsafe read-modify-write cycles instead of using atomic operations provided by the
-database [[^49], [^50], [^51]].
+database [^49] [^50] [^51].
 This can be a source of subtle bugs that are difficult to find by testing.

 ### Explicit locking
@ -940,8 +925,8 @@ An advantage of this approach is that databases can perform this check efficient
 with snapshot isolation. Indeed, PostgreSQL’s repeatable read, Oracle’s serializable, and SQL
 Server’s snapshot isolation levels automatically detect when a lost update has occurred and abort
 the offending transaction. However, MySQL/InnoDB’s repeatable read does not detect lost updates
-[[^29], [^41]].
-Some authors [[^36], [^38]] argue that a database must prevent lost
+[^29] [^41].
+Some authors [^36] [^38] argue that a database must prevent lost
 updates in order to qualify as providing snapshot isolation, so MySQL does not provide snapshot
 isolation under this definition.

@ -1023,7 +1008,7 @@ To begin, imagine this example: you are writing an application for doctors to ma
 shifts at a hospital. The hospital usually tries to have several doctors on call at any one time,
 but it absolutely must have at least one doctor on call. Doctors can give up their shifts (e.g., if
 they are sick themselves), provided that at least one colleague remains on call in that shift
-[[^53], [^54]].
+[^53] [^54].

 Now imagine that Aaliyah and Bryce are the two on-call doctors for a particular shift. Both are
 feeling unwell, so they both decide to request leave. Unfortunately, they happen to click the button
@ -1184,7 +1169,7 @@ transaction, is called a *phantom* [^4].
 Snapshot isolation avoids phantoms in read-only queries, but in read-write transactions like the
 examples we discussed, phantoms can lead to particularly tricky cases of write skew. The SQL
 generated by ORMs is also prone to write skew
-[[^50], [^51]].
+[^50] [^51].

 ### Materializing conflicts

@ -1271,7 +1256,7 @@ Two developments caused this rethink:
 outside of the serial execution loop.

 The approach of executing transactions serially is implemented in VoltDB/H-Store, Redis, and Datomic,
-for example [[^58], [^59], [^60]].
+for example [^58] [^59] [^60].
 A system designed for single-threaded execution can sometimes perform better than a system that
 supports concurrency, because it can avoid the coordination overhead of locking. However, its
 throughput is limited to that of a single CPU core. In order to make the most of that single thread,
@ -1541,7 +1526,7 @@ becomes serializable.
 Unfortunately, predicate locks do not perform well: if there are many locks by active transactions,
 checking for matching locks becomes time-consuming. For that reason, most databases with 2PL
 actually implement *index-range locking* (also known as *next-key locking*), which is a simplified
-approximation of predicate locking [[^54], [^64]].
+approximation of predicate locking [^54] [^64].

 It’s safe to simplify a predicate by making it match a greater set of objects. For example, if you
 have a predicate lock for bookings of room 123 between noon and 1 p.m., you can approximate it by
@ -1585,7 +1570,7 @@ serializable isolation and good performance fundamentally at odds with each othe
 It seems not: an algorithm called *serializable snapshot isolation* (SSI) provides full
 serializability with only a small performance penalty compared to snapshot isolation. SSI is
 comparatively new: it was first described in 2008
-[[^53], [^65]].
+[^53] [^65].

 Today SSI and similar algorithms are used in single-node databases (the serializable isolation level
 in PostgreSQL [^54], SQL Server’s In-Memory
@ -1733,7 +1718,7 @@ tracking is faster, but may lead to more transactions being aborted than strictl
 In some cases, it’s okay for a transaction to read information that was overwritten by another
 transaction: depending on what else happened, it’s sometimes possible to prove that the result of
 the execution is nevertheless serializable. PostgreSQL uses this theory to reduce the number of
-unnecessary aborts [[^14], [^54]].
+unnecessary aborts [^14] [^54].

 Compared to two-phase locking, the big advantage of serializable snapshot isolation is that one
 transaction doesn’t need to block waiting for locks held by another transaction. Like under snapshot
@ -1824,12 +1809,12 @@ problem.

 Two-phase commit is an algorithm for achieving atomic transaction commit across multiple nodes. It
 is a classic algorithm in distributed databases
-[[^13], [^71], [^72]]. 2PC is used
+[^13] [^71] [^72]. 2PC is used
 internally in some databases and also made available to applications in the form of *XA transactions*
 [^73]
 (which are supported by the Java Transaction API, for example) or via WS-AtomicTransaction for SOAP
 web services
-[[^74], [^75]].
+[^74] [^75].

 The basic flow of 2PC is illustrated in [Figure 8-13](/en/ch8#fig_transactions_two_phase_commit). Instead of a single
 commit request, as with a single-node transaction, the commit/abort process in 2PC is split into two
@ -1958,7 +1943,7 @@ stuck waiting for the coordinator to recover. It is possible to make an atomic c
 is not so straightforward.

 As an alternative to 2PC, an algorithm called *three-phase commit* (3PC) has been proposed
-[[^13], [^77]].
+[^13] [^77].
 However, 3PC assumes a network with bounded delay and nodes with bounded response times; in most
 practical systems with unbounded network delay and process pauses (see [Chapter 9](/en/ch9#ch_distributed)), it
 cannot guarantee atomicity.
@ -1971,7 +1956,7 @@ consensus protocol. We will see how to do this in [Chapter 10](/en/ch10#ch_cons
 Distributed transactions and two-phase commit have a mixed reputation. On the one hand, they are
 seen as providing an important safety guarantee that would be hard to achieve otherwise; on the
 other hand, they are criticized for causing operational problems, killing performance, and promising
-more than they can deliver [[^78], [^79], [^80], [^81]].
+more than they can deliver [^78] [^79] [^80] [^81].
 Many cloud services choose not to implement distributed transactions due to the operational
 problems they engender [^82].

@ -2089,7 +2074,7 @@ transaction is resolved.

 In theory, if the coordinator crashes and is restarted, it should cleanly recover its state from the
 log and resolve any in-doubt transactions. However, in practice, *orphaned* in-doubt transactions do
-occur [[^83], [^84]]—that is,
+occur [^83] [^84]—that is,
 transactions for which the coordinator cannot decide the outcome for whatever reason (e.g., because
 the transaction log has been lost or corrupted due to a software bug). These transactions cannot be
 resolved automatically, so they sit forever in the database, holding locks and blocking other
@ -2326,7 +2311,7 @@ is used.



-### Summary
+### References



--- a/content/en/ch9.md
+++ b/content/en/ch9.md
@ -22,7 +22,7 @@ anything that *can* go wrong *will* go wrong.

 Moreover, working with distributed systems is fundamentally different from writing software on a
 single computer—and the main difference is that there are lots of new and exciting ways for things
-to go wrong [[^1], [^2]].
+to go wrong [^1] [^2].
 In this chapter, you will get a taste of the problems that arise in practice, and an understanding
 of the things you can and cannot rely on.

@ -197,7 +197,7 @@ even in controlled environments like a datacenter operated by one company [^8]:
 (though shark bites have become rarer due to better shielding of submarine cables [^14]).
 Humans are also at fault, be it due to accidental misconfiguration [^15], scavenging [^16], or sabotage [^17].
 * Across different cloud regions, round-trip times of up to several *minutes* have been observed at
- high percentiles [[^18], Table 3].
+ high percentiles [^18].
 Even within a single datacenter, packet delay of more than a minute can occur during a network
 topology reconfiguration, triggered by a problem during a software upgrade for a switch
 [^19].
@ -364,7 +364,7 @@ network links and switches, and even each machine’s network interface and CPUs
 virtual machines), are shared. Processing large amounts of data can use the entire capacity of
 network links (*saturate* them). As you have no control over or insight into other customers’ usage of the shared
 resources, network delays can be highly variable if someone near you (a *noisy neighbor*) is
-using a lot of resources [[^30], [^31]].
+using a lot of resources [^30] [^31].

 In such environments, you can only choose timeouts experimentally: measure the distribution of
 network round-trip times over an extended period, and over many machines, to determine the expected
@ -665,7 +665,7 @@ fixed. On the other hand, if its quartz clock is defective or its NTP client is
 things will seem to work fine, even though its clock gradually drifts further and further away from
 reality. If some piece of software is relying on an accurately synchronized clock, the result is
 more likely to be silent and subtle data loss than a dramatic crash
-[[^62], [^63]].
+[^62] [^63].

 Thus, if you use software that requires synchronized clocks, it is essential that you also carefully
 monitor the clock offsets between all the machines. Any node whose clock drifts too far from the
@ -715,8 +715,7 @@ serious problems:

 * Database writes can mysteriously disappear: a node with a lagging clock is unable to overwrite
 values previously written by a node with a fast clock until the clock skew between the nodes has
- elapsed [[^63],
- [^65]].
+ elapsed [^63] [^65].
 This scenario can cause arbitrary amounts of data to be silently dropped without any error being
 reported to the application.
 * LWW cannot distinguish between writes that occurred sequentially in quick succession (in
@ -812,7 +811,7 @@ the synchronization good enough, they would have the right properties: later tra
 higher timestamp. The problem, of course, is the uncertainty about clock accuracy.

 Spanner implements snapshot isolation across datacenters in this way
-[[^68], [^69]].
+[^68] [^69].
 It uses the clock’s confidence interval as reported by the TrueTime API, and is based on the
 following observation: if you have two confidence intervals, each consisting of an earliest and
 latest possible timestamp (*A* = [*Aearliest*, *Alatest*] and
@ -1011,11 +1010,11 @@ handle requests from clients while one node is collecting its garbage. If the ru
 application that a node soon requires a GC pause, the application can stop sending new requests to
 that node, wait for it to finish processing outstanding requests, and then perform the GC while no
 requests are in progress. This trick hides GC pauses from clients and reduces the high percentiles
-of the response time [[^80], [^81]].
+of the response time [^80] [^81].

 A variant of this idea is to use the garbage collector only for short-lived objects (which are fast
 to collect) and to restart processes periodically, before they accumulate enough long-lived objects
-to require a full GC of long-lived objects [[^79], [^82]].
+to require a full GC of long-lived objects [^79] [^82].
 One node can be restarted at a time, and traffic can be shifted away from the node before the
 planned restart, like in a rolling upgrade (see [Chapter 5](/en/ch5#ch_encoding)).

@ -1120,7 +1119,7 @@ could be lost or corrupted data, which is much more serious.

 For example, [Figure 9-4](/en/ch9#fig_distributed_lease_pause) shows a data corruption bug due to an incorrect
 implementation of locking. (The bug is not theoretical: HBase used to have this problem
-[[^85], [^86]].)
+[^85] [^86].)
 Say you want to ensure that a file in a storage service can only be
 accessed by one client at a time, because if multiple clients tried to write to it, the file would
 become corrupted. You try to implement this by requiring a client to obtain a lease from a lock
@ -1207,7 +1206,7 @@ services support such a check: Amazon S3 calls it *conditional writes*, Azure Bl

 If your clients need to write only to one storage service that supports such conditional writes, the
 lock service is somewhat redundant
-[[^91], [^92]],
+[^91] [^92],
 since the lease assignment could have been implemented directly based on that storage service [^93].
 However, once you have a fencing token you can also use it with multiple services or replicas, and
 ensure that the old leaseholder is fenced off on all of those services.
@ -1286,8 +1285,7 @@ with the network. This concern is relevant in certain specific circumstances. Fo
 by radiation, leading it to respond to other nodes in arbitrarily unpredictable ways. Since a
 system failure would be very expensive (e.g., an aircraft crashing and killing everyone on board,
 or a rocket colliding with the International Space Station), flight control systems must tolerate
- Byzantine faults [[^98],
- [^99]].
+ Byzantine faults [^98] [^99].
 * In a system with multiple participating parties, some participants may attempt to cheat or
 defraud others. In such circumstances, it is not safe for a node to simply trust another node’s
 messages, since they may be sent with malicious intent. For example, cryptocurrencies like
@ -1311,7 +1309,7 @@ escaping are so important: to prevent SQL injection and cross-site scripting, fo
 we typically don’t use Byzantine fault-tolerant protocols here, but simply make the server the
 authority on deciding what client behavior is and isn’t allowed. In peer-to-peer networks, where
 there is no such central authority, Byzantine fault tolerance is more relevant
-[[^103], [^104]].
+[^103] [^104].

 A bug in the software could be regarded as a Byzantine fault, but if you deploy the same software to
 all nodes, then a Byzantine fault-tolerant algorithm cannot save you. Most Byzantine fault-tolerant
@ -1336,9 +1334,7 @@ pragmatic steps toward better reliability. For example:

 * Network packets do sometimes get corrupted due to hardware issues or bugs in operating systems,
 drivers, routers, etc. Usually, corrupted packets are caught by the checksums built into TCP and
- UDP, but sometimes they evade detection [[^105],
- [^106],
- [^107]].
+ UDP, but sometimes they evade detection [^105] [^106] [^107].
 Simple measures are usually sufficient protection against such corruption, such as checksums in
 the application-level protocol. TLS-encrypted connections also offer protection against
 corruption.
@ -1542,7 +1538,7 @@ It is prudent to combine theoretical analysis with empirical testing to verify t
 behave as expected. Techniques such as property-based testing, fuzzing, and deterministic simulation
 testing (DST) use randomization to test a system in a wide range of situations. Companies such as
 Amazon Web Services have successfully used a combination of these techniques on many of their
-products [[^120], [^121]].
+products [^120] [^121].

 ### Model checking and specification languages

@ -1563,7 +1559,7 @@ longer executions would then not be found.
 Still, model checkers strike a nice balance between ease of use and the ability to find non-obvious
 bugs. CockroachDB, TiDB, Kafka, and many other distributed systems use model specifications to find
 and fix bugs
-[[^122], [^123], [^124]]. For example,
+[^122] [^123] [^124]. For example,
 using TLA+, researchers were able to demonstrate the potential for data loss in viewstamped
 replication (VR) caused by ambiguity in the prose description of the algorithm [^125].

@ -1601,7 +1597,7 @@ It’s common to adopt a fault injection framework like Jepsen to run fault inje
 simplify the process. Such frameworks come with integrations for various operating systems and many
 pre-built fault injectors [^129].
 Jepsen has been remarkably effective at finding critical bugs in many widely-used systems
-[[^130], [^131]].
+[^130] [^131].

 ### Deterministic simulation testing

@ -1750,7 +1746,7 @@ problems in distributed systems.



-### Summary
+### References

 [^1]: Mark Cavage. [There’s Just No Getting Around It: You’re Building a Distributed System](https://queue.acm.org/detail.cfm?id=2482856). *ACM Queue*, volume 11, issue 4, pages 80-89, April 2013. [doi:10.1145/2466486.2482856](https://doi.org/10.1145/2466486.2482856) 
 [^2]: Jay Kreps. [Getting Real About Distributed System Reliability](https://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability). *blog.empathybox.com*, March 2012. Archived at [perma.cc/9B5Q-AEBW](https://perma.cc/9B5Q-AEBW)