mirror of
https://github.com/Vonng/ddia.git
synced 2026-06-21 00:47:05 +08:00
fix reference link
This commit is contained in:
parent
0c9db16820
commit
d216e35c8e
11 changed files with 1494 additions and 4958 deletions
|
|
@ -23,7 +23,7 @@ more complex, it is no longer sufficient to store everything in one system, but
|
|||
necessary to combine multiple storage or processing systems that provide different capabilities.
|
||||
|
||||
We call an application *data-intensive* if data management is one of the primary challenges in
|
||||
developing the application [[1](/en/ch1#Kouzes2009)].
|
||||
developing the application [^1].
|
||||
While in *compute-intensive* systems the challenge is parallelizing some very large computation, in
|
||||
data-intensive applications we usually worry more about things like storing and processing large
|
||||
data volumes, managing changes to data, ensuring consistency in the face of failures and
|
||||
|
|
@ -86,7 +86,7 @@ for web applications, the client-side code (which runs in a web browser) is call
|
|||
and the server-side code that handles user requests is known as the *backend*. Mobile apps are
|
||||
similar to frontends in that they provide user interfaces, which often communicate over the Internet
|
||||
with a server-side backend. Frontends sometimes manage data locally on the user’s device
|
||||
[[2](/en/ch1#Kleppmann2019_ch1)],
|
||||
[^2],
|
||||
but the greatest data infrastructure challenges often lie in the backend: a frontend only needs to
|
||||
handle one user’s data, whereas the backend manages data on behalf of *all* of the users.
|
||||
|
||||
|
|
@ -132,10 +132,10 @@ As we shall see in the next section, operational and analytical systems are ofte
|
|||
good reasons. As these systems have matured, two new specialized roles have emerged: *data
|
||||
engineers* and *analytics engineers*. Data engineers are the people who know how to integrate the
|
||||
operational and the analytical systems, and who take responsibility for the organization’s data
|
||||
infrastructure more widely [[3](/en/ch1#Reis2022)].
|
||||
infrastructure more widely [^3].
|
||||
Analytics engineers model and transform data to make it more useful for the business analysts and
|
||||
data scientists in an organization
|
||||
[[4](/en/ch1#Machado2023)].
|
||||
[^4].
|
||||
|
||||
Many engineers specialize on either the operational or the analytical side. However, this book
|
||||
covers both operational and analytical data systems, since both play an important role in the
|
||||
|
|
@ -176,7 +176,7 @@ answer analytic queries such as:
|
|||
The reports that result from these types of queries are important for business intelligence, helping
|
||||
the management decide what to do next. In order to differentiate this pattern of using databases
|
||||
from transaction processing, it has been called *online analytic processing* (OLAP)
|
||||
[[5](/en/ch1#Codd1993)].
|
||||
[^5].
|
||||
The difference between OLTP and analytics is not always clear-cut, but some typical characteristics
|
||||
are listed in [Table 1-1](/en/ch1#tab_oltp_vs_olap).
|
||||
|
||||
|
|
@ -211,7 +211,7 @@ There is also a type of systems that is designed for analytical workloads (queri
|
|||
over many records) but that are embedded into user-facing products. This category is known as
|
||||
*product analytics* or *real-time analytics*, and systems designed for this type of use include
|
||||
Pinot, Druid, and ClickHouse
|
||||
[[6](/en/ch1#Soman2023)].
|
||||
[^6].
|
||||
|
||||
## Data Warehousing
|
||||
|
||||
|
|
@ -242,7 +242,7 @@ systems, for several reasons:
|
|||
|
||||
A *data warehouse*, by contrast, is a separate database that analysts can query to their hearts’
|
||||
content, without affecting OLTP operations
|
||||
[[7](/en/ch1#Chaudhuri1997)].
|
||||
[^7].
|
||||
As we shall see in [Chapter 4](/en/ch4#ch_storage), data warehouses often store data in a way that is very different
|
||||
from OLTP databases, in order to optimize for the types of queries that are common in analytics.
|
||||
|
||||
|
|
@ -267,8 +267,7 @@ specialist data connector services such as Fivetran, Singer, or AirByte.
|
|||
|
||||
Some database systems offer *hybrid transactional/analytic processing* (HTAP), which aims to enable
|
||||
OLTP and analytics in a single system without requiring ETL from one system into another
|
||||
[[8](/en/ch1#Ozcan2017),
|
||||
[9](/en/ch1#Prout2022_ch1)].
|
||||
[^8] [^9].
|
||||
However, many HTAP systems internally consist of an OLTP system coupled with a separate analytical
|
||||
system, hidden behind a common interface—so the distinction between the two remains important for
|
||||
understanding how these systems work.
|
||||
|
|
@ -283,13 +282,13 @@ data from several operational systems in a single query.
|
|||
HTAP therefore does not replace data warehouses. Rather, it is useful in scenarios where the same
|
||||
application needs to both perform analytics queries that scan a large number of rows, and also
|
||||
read and update individual records with low latency. Fraud detection can involve such workloads, for
|
||||
example [[10](/en/ch1#Zhang2024)].
|
||||
example [^10].
|
||||
|
||||
The separation between operational and analytical systems is part of a wider trend: as workloads
|
||||
have become more demanding, systems have become more specialized and optimized for particular
|
||||
workloads. General-purpose systems can handle small data volumes comfortably, but the greater the
|
||||
scale, the more specialized systems tend to become
|
||||
[[11](/en/ch1#Stonebraker2005fitsall)].
|
||||
[^11].
|
||||
|
||||
### From data warehouse to data lake
|
||||
|
||||
|
|
@ -308,14 +307,11 @@ needs of data scientists, who might need to perform tasks such as:
|
|||
they mention). Similarly, they might need to extract structured information from photos using
|
||||
computer vision techniques.
|
||||
|
||||
Although there have been efforts to add machine learning operators to a SQL data model
|
||||
[[12](/en/ch1#Cohen2009)]
|
||||
and to build efficient machine learning systems on top of a relational foundation
|
||||
[[13](/en/ch1#Olteanu2020)],
|
||||
Although there have been efforts to add machine learning operators to a SQL data model [^12]
|
||||
and to build efficient machine learning systems on top of a relational foundation [^13],
|
||||
many data scientists prefer not to work in a relational database such as a data warehouse. Instead,
|
||||
many prefer to use Python data analysis libraries such as pandas and scikit-learn, statistical
|
||||
analysis languages such as R, and distributed analytics frameworks such as Spark
|
||||
[[14](/en/ch1#Bornstein2020)].
|
||||
analysis languages such as R, and distributed analytics frameworks such as Spark [^14].
|
||||
We discuss these further in [“Dataframes, Matrices, and Arrays”](/en/ch3#sec_datamodels_dataframes).
|
||||
|
||||
Consequently, organizations face a need to make data available in a form that is suitable for use by
|
||||
|
|
@ -325,7 +321,7 @@ difference from a data warehouse is that a data lake simply contains files, with
|
|||
particular file format or data model. Files in a data lake might be collections of database records,
|
||||
encoded using a file format such as Avro or Parquet (see [Chapter 5](/en/ch5#ch_encoding)), but they can equally well
|
||||
contain text, images, videos, sensor readings, sparse matrices, feature vectors, genome sequences,
|
||||
or any other kind of data [[15](/en/ch1#Fowler2015)].
|
||||
or any other kind of data [^15].
|
||||
Besides being more flexible, this is also often cheaper than relational data storage, since the data
|
||||
lake can use commoditized file storage such as object stores (see [“Cloud-Native System Architecture”](/en/ch1#sec_introduction_cloud_native)).
|
||||
|
||||
|
|
@ -334,14 +330,13 @@ an intermediate stop on the path from the operational systems to the data wareho
|
|||
contains data in a “raw” form produced by the operational systems, without the transformation into a
|
||||
relational data warehouse schema. This approach has the advantage that each consumer of the data can
|
||||
transform the raw data into a form that best suits their needs. It has been dubbed the *sushi
|
||||
principle*: “raw data is better” [[16](/en/ch1#Johnson2015)].
|
||||
principle*: “raw data is better” [^16].
|
||||
|
||||
Besides loading data from a data lake into a separate data warehouse, it is also possible to run
|
||||
typical data warehousing workloads (SQL queries and business analytics) directly on the files in the
|
||||
data lake, alongside data science/machine learning workloads. This architecture is known as a *data
|
||||
lakehouse*, and it requires a query execution engine and a metadata (e.g., schema management) layer
|
||||
that extend the data lake’s file storage
|
||||
[[17](/en/ch1#Armbrust2021)].
|
||||
that extend the data lake’s file storage [^17].
|
||||
|
||||
Apache Hive, Spark SQL, Presto, and Trino are examples of this approach.
|
||||
|
||||
|
|
@ -349,7 +344,7 @@ Apache Hive, Spark SQL, Presto, and Trino are examples of this approach.
|
|||
|
||||
As analytics practices have matured, organizations have been increasingly paying attention to the
|
||||
management and operations of analytics systems and data pipelines, as captured for example in the
|
||||
DataOps manifesto [[18](/en/ch1#DataOps)].
|
||||
DataOps manifesto [^18].
|
||||
Part of this are issues of governance, privacy, and compliance with regulation such as GDPR and
|
||||
CCPA, which we discuss in [“Data Systems, Law, and Society”](/en/ch1#sec_introduction_compliance) and [Link to Come].
|
||||
|
||||
|
|
@ -361,11 +356,9 @@ application and how time-sensitive it is, a stream processing approach can be va
|
|||
to identify and block potentially fraudulent or abusive activity.
|
||||
|
||||
In some cases the outputs of analytics systems are made available to operational systems (a process
|
||||
sometimes known as *reverse ETL* [[19](/en/ch1#Manohar2021)]). For example, a
|
||||
machine-learning model that was trained on data in an analytics system may be deployed to
|
||||
sometimes known as *reverse ETL* [^19]). For example, a machine-learning model that was trained on data in an analytics system may be deployed to
|
||||
production, so that it can generate recommendations for end-users, such as “people who bought X also
|
||||
bought Y”. Such deployed outputs of analytics systems are also known as *data products*
|
||||
[[20](/en/ch1#ORegan2018)].
|
||||
bought Y”. Such deployed outputs of analytics systems are also known as *data products* [^20].
|
||||
Machine learning models can be deployed to operational systems using specialized tools such as
|
||||
TFX, Kubeflow, or MLflow.
|
||||
|
||||
|
|
@ -425,7 +418,7 @@ in-house, or should it be outsourced? Should you build or should you buy?
|
|||
Ultimately, this is a question about business priorities. The received management wisdom is that
|
||||
things that are a core competency or a competitive advantage of your organization should be done
|
||||
in-house, whereas things that are non-core, routine, or commonplace should be left to a vendor
|
||||
[[21](/en/ch1#Fournier2021)].
|
||||
[^21].
|
||||
To give an extreme example, most companies do not generate their own electricity (unless they are an
|
||||
energy company, and leaving aside emergency backup power), since it is cheaper to buy electricity
|
||||
from the grid.
|
||||
|
|
@ -464,8 +457,7 @@ Whether a cloud service is actually cheaper and easier than self-hosting depends
|
|||
skills and the workload on your systems. If you already have experience setting up and operating the
|
||||
systems you need, and if your load is quite predictable (i.e., the number of machines you need does
|
||||
not fluctuate wildly), then it’s often cheaper to buy your own machines and run the software on them
|
||||
yourself [[22](/en/ch1#HeinemeierHansson2022),
|
||||
[23](/en/ch1#Badizadegan2022)].
|
||||
yourself [^22] [^23].
|
||||
|
||||
On the other hand, if you need a system that you don’t already know how to deploy and operate, then
|
||||
adopting a cloud service is often easier and quicker than learning to manage the system yourself. If
|
||||
|
|
@ -508,7 +500,7 @@ The biggest downside of a cloud service is that you have no control over it:
|
|||
* Moreover, if the service shuts down or becomes unacceptably expensive, or if the vendor decides to
|
||||
change their product in a way you don’t like, you are at their mercy—continuing to run an old
|
||||
version of the software is usually not an option, so you will be forced to migrate to an
|
||||
alternative service [[24](/en/ch1#Yegge2020)].
|
||||
alternative service [^24].
|
||||
This risk is mitigated if there are alternative services that expose a compatible API, but for
|
||||
many cloud services there are no standard APIs, which raises the cost of switching, making vendor
|
||||
lock-in a problem.
|
||||
|
|
@ -535,17 +527,15 @@ and indeed such managed services are now available for many popular data systems
|
|||
that have been designed from the ground up to be cloud-native have been shown to have several
|
||||
advantages: better performance on the same hardware, faster recovery from failures, being able to
|
||||
quickly scale computing resources to match the load, and supporting larger datasets
|
||||
[[25](/en/ch1#Verbitski2017),
|
||||
[26](/en/ch1#Antonopoulos2019_ch1),
|
||||
[27](/en/ch1#Vuppalapati2020)].
|
||||
[^25] [^26] [^27].
|
||||
[Table 1-2](/en/ch1#tab_cloud_native_dbs) lists some examples of both types of systems.
|
||||
|
||||
Table 1-2. Examples of self-hosted and cloud-native database systems
|
||||
|
||||
| Category | Self-hosted systems | Cloud-native systems |
|
||||
| --- | --- | --- |
|
||||
| Operational/OLTP | MySQL, PostgreSQL, MongoDB | AWS Aurora [[25](/en/ch1#Verbitski2017)], Azure SQL DB Hyperscale [[26](/en/ch1#Antonopoulos2019_ch1)], Google Cloud Spanner |
|
||||
| Analytical/OLAP | Teradata, ClickHouse, Spark | Snowflake [[27](/en/ch1#Vuppalapati2020)], Google BigQuery, Azure Synapse Analytics |
|
||||
| Category | Self-hosted systems | Cloud-native systems |
|
||||
|------------------|-----------------------------|-----------------------------------------------------------------------|
|
||||
| Operational/OLTP | MySQL, PostgreSQL, MongoDB | AWS Aurora [^25], Azure SQL DB Hyperscale [^26], Google Cloud Spanner |
|
||||
| Analytical/OLAP | Teradata, ClickHouse, Spark | Snowflake [^27], Google BigQuery, Azure Synapse Analytics |
|
||||
|
||||
### Layering of cloud services
|
||||
|
||||
|
|
@ -574,7 +564,7 @@ higher-level services. For example:
|
|||
lost.
|
||||
* Many other services are in turn built upon object storage and other cloud services: for example,
|
||||
Snowflake is a cloud-based analytic database (data warehouse) that relies on S3 for data storage
|
||||
[[27](/en/ch1#Vuppalapati2020)], and some other services in turn
|
||||
[^27], and some other services in turn
|
||||
build upon Snowflake.
|
||||
|
||||
As always with abstractions in computing, there is no one right answer to what you should use. As a
|
||||
|
|
@ -605,9 +595,9 @@ cloud service provided by a separate set of machines, which emulates the behavio
|
|||
*block device*, where each block is typically 4 KiB in size). This technology makes it
|
||||
possible to run traditional disk-based software in the cloud, but the block device emulation
|
||||
introduces overheads that can be avoided in systems that are designed from the ground up for the
|
||||
cloud [[25](/en/ch1#Verbitski2017)]. It also makes the application
|
||||
cloud [^25]. It also makes the application
|
||||
very sensitive to network glitches, since every I/O on the virtual block device is actually a
|
||||
network call [[28](/en/ch1#NickVanWiggeren2025)].
|
||||
network call [^28].
|
||||
|
||||
To address this problem, cloud-native services generally avoid using virtual disks, and instead
|
||||
build on dedicated storage services that are optimized for particular workloads. Object storage
|
||||
|
|
@ -615,28 +605,23 @@ services such as S3 are designed for long-term storage of fairly large files, ra
|
|||
of kilobytes to several gigabytes in size. The individual rows or values stored in a database are
|
||||
typically much smaller than this; cloud databases therefore typically manage smaller values in a
|
||||
separate service, and store larger data blocks (containing many individual values) in an object
|
||||
store [[26](/en/ch1#Antonopoulos2019_ch1),
|
||||
[29](/en/ch1#Breck2024)].
|
||||
store [^26] [^29].
|
||||
We will see ways of doing this in [Chapter 4](/en/ch4#ch_storage).
|
||||
|
||||
In a traditional systems architecture, the same computer is responsible for both storage (disk) and
|
||||
computation (CPU and RAM), but in cloud-native systems, these two responsibilities have become
|
||||
somewhat separated or *disaggregated* [[9](/en/ch1#Prout2022_ch1),
|
||||
[27](/en/ch1#Vuppalapati2020),
|
||||
[30](/en/ch1#Shapira2023separation),
|
||||
[31](/en/ch1#Murthy2022)]:
|
||||
somewhat separated or *disaggregated* [^9] [^27] [^30] [^31]:
|
||||
for example, S3 only stores files, and if you want to analyze that data, you will have to run the
|
||||
analysis code somewhere outside of S3. This implies transferring the data over the network, which we
|
||||
will discuss further in [“Distributed versus Single-Node Systems”](/en/ch1#sec_introduction_distributed).
|
||||
|
||||
Moreover, cloud-native systems are often *multitenant*, which means that rather than having a
|
||||
separate machine for each customer, data and computation from several different customers are
|
||||
handled on the same shared hardware by the same service
|
||||
[[32](/en/ch1#Vanlightly2023serverless)].
|
||||
handled on the same shared hardware by the same service [^32].
|
||||
|
||||
Multitenancy can enable better hardware utilization, easier scalability, and easier management by
|
||||
the cloud provider, but it also requires careful engineering to ensure that one customer’s activity
|
||||
does not affect the performance or security of the system for other customers
|
||||
[[33](/en/ch1#Jonas2019)].
|
||||
does not affect the performance or security of the system for other customers [^33].
|
||||
|
||||
## Operations in the Cloud Era
|
||||
|
||||
|
|
@ -645,7 +630,7 @@ Traditionally, the people managing an organization’s server-side data infrastr
|
|||
organizations have tried to integrate the roles of software development and operations into teams
|
||||
with a shared responsibility for both backend services and data infrastructure; the *DevOps*
|
||||
philosophy has guided this trend. *Site Reliability Engineers* (SREs) are Google’s implementation of
|
||||
this idea [[34](/en/ch1#Beyer2016)].
|
||||
this idea [^34].
|
||||
|
||||
The role of operations is to ensure services are reliably delivered to users (including configuring
|
||||
infrastructure and deploying applications), and to ensure a stable production environment (including
|
||||
|
|
@ -669,31 +654,28 @@ processes and tools have evolved. The DevOps/SRE philosophy places greater empha
|
|||
* preferring ephemeral virtual machines and services over long running servers,
|
||||
* enabling frequent application updates,
|
||||
* learning from incidents, and
|
||||
* preserving the organization’s knowledge about the system, even as individual people come and go
|
||||
[[35](/en/ch1#Limoncelli2020)].
|
||||
* preserving the organization’s knowledge about the system, even as individual people come and go [^35].
|
||||
|
||||
With the rise of cloud services, there has been a bifurcation of roles: operations teams at
|
||||
infrastructure companies specialize in the details of providing a reliable service to a large number
|
||||
of customers, while the customers of the service spend as little time and effort as possible on
|
||||
infrastructure [[36](/en/ch1#Majors2020)].
|
||||
infrastructure [^36].
|
||||
|
||||
Customers of cloud services still require operations, but they focus on different aspects, such as
|
||||
choosing the most appropriate service for a given task, integrating different services with each
|
||||
other, and migrating from one service to another. Even though metered billing removes the need for
|
||||
capacity planning in the traditional sense, it’s still important to know what resources you are
|
||||
using for which purpose, so that you don’t waste money on cloud resources that are not needed:
|
||||
capacity planning becomes financial planning, and performance optimization becomes cost optimization
|
||||
[[37](/en/ch1#Cherkasky2021)].
|
||||
capacity planning becomes financial planning, and performance optimization becomes cost optimization [^37].
|
||||
|
||||
Moreover, cloud services do have resource limits or *quotas* (such as the maximum number of
|
||||
processes you can run concurrently), which you need to know about and plan for before you run into
|
||||
them [[38](/en/ch1#Kushchi2023)].
|
||||
processes you can run concurrently), which you need to know about and plan for before you run into them [^38].
|
||||
|
||||
Adopting a cloud service can be easier and quicker than running your own infrastructure, although
|
||||
even here there is a cost in learning how to use it, and perhaps working around its limitations.
|
||||
Integration between different services becomes a particular challenge as a growing number of vendors
|
||||
offers an ever broader range of cloud services targeting different use cases
|
||||
[[39](/en/ch1#Bernhardsson2021),
|
||||
[40](/en/ch1#Stancil2021)].
|
||||
offers an ever broader range of cloud services targeting different use cases [^39][^40].
|
||||
|
||||
ETL (see [“Data Warehousing”](/en/ch1#sec_introduction_dwh)) is only part of the story; operational cloud services also need
|
||||
to be integrated with each other. At present, there is a lack of standards that would facilitate
|
||||
this sort of integration, so it often involves significant manual effort.
|
||||
|
|
@ -751,8 +733,7 @@ Using specialized hardware
|
|||
|
||||
Legal compliance
|
||||
: Some countries have data residency laws that require data about people in their jurisdiction to be
|
||||
stored and processed geographically within that country
|
||||
[[41](/en/ch1#Korolov2022)].
|
||||
stored and processed geographically within that country [^41].
|
||||
The scope of these rules varies—for example, in some cases it applies only to medical or financial
|
||||
data, while other cases are broader. A service with users in several such jurisdictions will
|
||||
therefore have to distribute their data across servers in several locations.
|
||||
|
|
@ -761,9 +742,7 @@ Sustainability
|
|||
: If you have flexibility on where and when to run your jobs, you might be able to run them in a
|
||||
time and place where plenty of renewable electricity is available, and avoid running them when the
|
||||
power grid is under strain. This can reduce your carbon emissions and allow you to take advantage
|
||||
of cheap power when it is available
|
||||
[[42](/en/ch1#Borenstein2025),
|
||||
[43](/en/ch1#Acun2023)].
|
||||
of cheap power when it is available [^42][^43].
|
||||
|
||||
These reasons apply both to services that you write yourself (application code) and services
|
||||
consisting of off-the-shelf software (such as databases).
|
||||
|
|
@ -777,39 +756,32 @@ case, we don’t know whether the service received the request, and simply retry
|
|||
safe. We will discuss these problems in detail in [Chapter 9](/en/ch9#ch_distributed).
|
||||
|
||||
Although datacenter networks are fast, making a call to another service is still vastly slower than
|
||||
calling a function in the same process
|
||||
[[44](/en/ch1#Nath2019)].
|
||||
calling a function in the same process [^44].
|
||||
|
||||
When operating on large volumes of data, rather than transferring the data from storage to a
|
||||
separate machine that processes it, it can be faster to bring the computation to the machine that
|
||||
already has the data
|
||||
[[45](/en/ch1#Hellerstein2019)].
|
||||
already has the data [^45].
|
||||
|
||||
More nodes are not always faster: in some cases, a simple single-threaded program on one computer
|
||||
can perform significantly better than a cluster with over 100 CPU cores
|
||||
[[46](/en/ch1#McSherry2015_ch1)].
|
||||
can perform significantly better than a cluster with over 100 CPU cores [^46].
|
||||
|
||||
Troubleshooting a distributed system is often difficult: if the system is slow to respond, how do
|
||||
you figure out where the problem lies? Techniques for diagnosing problems in distributed systems are
|
||||
developed under the heading of *observability* [[47](/en/ch1#Sridharan2018),
|
||||
[48](/en/ch1#Majors2019)],
|
||||
developed under the heading of *observability* [^47] [^48],
|
||||
which involves collecting data about the execution of a system, and allowing it to be queried in
|
||||
ways that allows both high-level metrics and individual events to be analyzed. *Tracing* tools such
|
||||
as OpenTelemetry, Zipkin, and Jaeger allow you to track which client called which server for which
|
||||
operation, and how long each call took
|
||||
[[49](/en/ch1#Sigelman2010)].
|
||||
operation, and how long each call took [^49].
|
||||
|
||||
Databases provide various mechanisms for ensuring data consistency, as we shall see in
|
||||
[Chapter 6](/en/ch6#ch_replication) and [Chapter 8](/en/ch8#ch_transactions). However, when each service has its own database,
|
||||
maintaining consistency of data across those different services becomes the application’s problem.
|
||||
Distributed transactions, which we explore in [Chapter 8](/en/ch8#ch_transactions), are a possible technique for
|
||||
ensuring consistency, but they are rarely used in a microservices context because they run counter
|
||||
to the goal of making services independent from each other, and many databases don’t support them
|
||||
[[50](/en/ch1#Laigner2021)].
|
||||
to the goal of making services independent from each other, and many databases don’t support them [^50].
|
||||
|
||||
For all these reasons, if you can do something on a single machine, this is often much simpler and
|
||||
cheaper compared to setting up a distributed system
|
||||
[[23](/en/ch1#Badizadegan2022),
|
||||
[46](/en/ch1#McSherry2015_ch1),
|
||||
[51](/en/ch1#Tigani2023)].
|
||||
cheaper compared to setting up a distributed system [^23] [^46] [^51].
|
||||
CPUs, memory, and disks have grown larger, faster, and more reliable. When combined with single-node
|
||||
databases such as DuckDB, SQLite, and KùzuDB, many workloads can now run on a single node. We will
|
||||
explore more on this topic in [Chapter 4](/en/ch4#ch_storage).
|
||||
|
|
@ -823,8 +795,7 @@ server (handling incoming requests) and a client (making outbound requests to ot
|
|||
|
||||
This way of building applications has traditionally been called a *service-oriented architecture*
|
||||
(SOA); more recently the idea has been refined into a *microservices* architecture
|
||||
[[52](/en/ch1#Newman2021_ch1),
|
||||
[53](/en/ch1#Richardson2014)].
|
||||
[^52] [^53].
|
||||
In this architecture, a service has one well-defined purpose (for example, in the case of S3, this
|
||||
would be file storage); each service exposes an API that can be called by clients via the network,
|
||||
and each service has one team that is responsible for its maintenance. A complex application can
|
||||
|
|
@ -857,16 +828,14 @@ client and server APIs; we discuss these further in [Chapter 5](/en/ch5#ch_enco
|
|||
Microservices are primarily a technical solution to a people problem: allowing different teams to
|
||||
make progress independently without having to coordinate with each other. This is valuable in a large
|
||||
company, but in a small company where there are not many teams, using microservices is likely to be
|
||||
unnecessary overhead, and it is preferable to implement the application in the simplest way possible
|
||||
[[52](/en/ch1#Newman2021_ch1)].
|
||||
unnecessary overhead, and it is preferable to implement the application in the simplest way possible [^52].
|
||||
|
||||
*Serverless*, or *function-as-a-service* (FaaS), is another approach to deploying services, in which
|
||||
the management of the infrastructure is outsourced to a cloud vendor
|
||||
[[33](/en/ch1#Jonas2019)].
|
||||
the management of the infrastructure is outsourced to a cloud vendor [^33].
|
||||
When using virtual machines, you have to explicitly choose when to start up or shut down an
|
||||
instance; in contrast, with the serverless model, the cloud provider automatically allocates and
|
||||
frees hardware resources as needed, based on the incoming requests to your service
|
||||
[[54](/en/ch1#Shahrad2020)]. Serverless deployment
|
||||
[^54]. Serverless deployment
|
||||
shifts more of the operational burden to cloud providers and enables flexible billing by usage
|
||||
rather than machine instances. To offer such benefits, many serverless infrastructure providers
|
||||
impose a time limit on function execution, limit runtime environments, and might suffer from slow
|
||||
|
|
@ -896,22 +865,20 @@ enterprise datacenter systems. Some of those differences are:
|
|||
* A supercomputer typically runs large batch jobs that checkpoint the state of their computation to
|
||||
disk from time to time. If a node fails, a common solution is to simply stop the entire cluster
|
||||
workload, repair the faulty node, and then restart the computation from the last checkpoint
|
||||
[[55](/en/ch1#Barroso2018),
|
||||
[56](/en/ch1#Fiala2012)].
|
||||
[^55] [^56].
|
||||
With cloud services, it is usually not desirable to stop the entire cluster, since the services
|
||||
need to continually serve users with minimal interruptions.
|
||||
* Supercomputer nodes typically communicate through shared memory and remote direct memory access
|
||||
(RDMA), which support high bandwidth and low latency, but assume a high level of trust among the
|
||||
users of the system [[57](/en/ch1#KornfeldSimpson2020)].
|
||||
users of the system [^57].
|
||||
In cloud computing, the network and the machines are often shared by mutually untrusting
|
||||
organizations, requiring stronger security mechanisms such as resource isolation (e.g., virtual
|
||||
machines), encryption and authentication.
|
||||
* Cloud datacenter networks are often based on IP and Ethernet, arranged in Clos topologies to
|
||||
provide high bisection bandwidth—a commonly used measure of a network’s overall performance
|
||||
[[55](/en/ch1#Barroso2018),
|
||||
[58](/en/ch1#Singh2015)].
|
||||
[^55] [^58].
|
||||
Supercomputers often use specialized network topologies, such as multi-dimensional meshes and toruses
|
||||
[[59](/en/ch1#Lockwood2014)],
|
||||
[^59],
|
||||
which yield better performance for HPC workloads with known communication patterns.
|
||||
* Cloud computing allows nodes to be distributed across multiple geographic regions, whereas
|
||||
supercomputers generally assume that all of their nodes are close together.
|
||||
|
|
@ -940,16 +907,14 @@ of the effects that computer systems have on people and society. Social media ha
|
|||
individuals consume news, which influences their political opinions and hence may affect the outcome
|
||||
of elections. Automated systems increasingly make decisions that have profound consequences for
|
||||
individuals, such as deciding who should be given a loan or insurance coverage, who should be
|
||||
invited to a job interview, or who should be suspected of a crime
|
||||
[[60](/en/ch1#ONeil2016_ch1)].
|
||||
invited to a job interview, or who should be suspected of a crime [^60].
|
||||
|
||||
Everyone who works on such systems shares a responsibility for considering the ethical impact and
|
||||
ensuring that they comply with relevant law. It is not necessary for everybody to become an expert
|
||||
in law and ethics, but a basic awareness of legal and ethical principles is just as important as,
|
||||
say, some foundational knowledge in distributed systems.
|
||||
|
||||
Legal considerations are influencing the very foundations of how data systems are being designed
|
||||
[[61](/en/ch1#Shastri2020)].
|
||||
Legal considerations are influencing the very foundations of how data systems are being designed [^61].
|
||||
For example, the GDPR grants individuals the right to have their data erased on request (sometimes
|
||||
known as the *right to be forgotten*). However, as we shall see in this book, many data systems rely
|
||||
on immutable constructs such as append-only logs as part of their design; how can we ensure deletion
|
||||
|
|
@ -970,7 +935,7 @@ However, it is worth remembering that the costs of storage are not just the bill
|
|||
S3 or another service: the cost-benefit calculation should also take into account the risks of
|
||||
liability and reputational damage if the data were to be leaked or compromised by adversaries, and
|
||||
the risk of legal costs and fines if the storage and processing of the data is found not to be
|
||||
compliant with the law [[51](/en/ch1#Tigani2023)].
|
||||
compliant with the law [^51].
|
||||
|
||||
Governments or police forces might also compel companies to hand over data. When there is a risk
|
||||
that the data may reveal criminalized behaviors (for example, homosexuality in several Middle
|
||||
|
|
@ -982,12 +947,10 @@ indicate approximate location).
|
|||
Once all the risks are taken into account, it might be reasonable to decide that some data is simply
|
||||
not worth storing, and that it should therefore be deleted. This principle of *data minimization*
|
||||
(sometimes known by the German term *Datensparsamkeit*) runs counter to the “big data” philosophy of
|
||||
storing lots of data speculatively in case it turns out to be useful in the future
|
||||
[[62](/en/ch1#Datensparsamkeit)].
|
||||
storing lots of data speculatively in case it turns out to be useful in the future [^62].
|
||||
But it fits with the GDPR, which mandates that personal data may only be collected for a specified,
|
||||
explicit purpose, that this data may not later be used for any other purpose, and that the data must
|
||||
not be kept for longer than necessary for the purposes for which it was collected
|
||||
[[63](/en/ch1#GDPR)].
|
||||
not be kept for longer than necessary for the purposes for which it was collected [^63].
|
||||
|
||||
Businesses have also taken notice of privacy and safety concerns. Credit card companies require
|
||||
payment processing businesses to adhere to strict payment card industry (PCI) standards. Processors
|
||||
|
|
@ -1033,346 +996,71 @@ data is being processed—an aspect that many engineers are prone to ignoring. H
|
|||
requirements into technical implementations is not yet well understood, but it’s important to keep
|
||||
this question in mind as we move through the rest of this book.
|
||||
|
||||
##### Footnotes
|
||||
|
||||
##### References
|
||||
|
||||
[[1](/en/ch1#Kouzes2009-marker)] Richard T. Kouzes,
|
||||
Gordon A. Anderson, Stephen T. Elbert, Ian Gorton, and Deborah K. Gracio.
|
||||
[The
|
||||
Changing Paradigm of Data-Intensive Computing](http://www2.ic.uff.br/~boeres/slides_AP/papers/TheChanginParadigmDataIntensiveComputing_2009.pdf). *IEEE Computer*, volume 42, issue 1,
|
||||
January 2009. [doi:10.1109/MC.2009.26](https://doi.org/10.1109/MC.2009.26)
|
||||
|
||||
[[2](/en/ch1#Kleppmann2019_ch1-marker)] Martin Kleppmann, Adam Wiggins, Peter van
|
||||
Hardenberg, and Mark McGranaghan. [Local-first
|
||||
software: you own your data, in spite of the cloud](https://www.inkandswitch.com/local-first/). At *2019 ACM SIGPLAN International
|
||||
Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software* (Onward!),
|
||||
October 2019. [doi:10.1145/3359591.3359737](https://doi.org/10.1145/3359591.3359737)
|
||||
|
||||
[[3](/en/ch1#Reis2022-marker)] Joe Reis and Matt Housley.
|
||||
[*Fundamentals
|
||||
of Data Engineering*](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/). O’Reilly Media, 2022. ISBN: 9781098108304
|
||||
|
||||
[[4](/en/ch1#Machado2023-marker)] Rui Pedro Machado and Helder Russa.
|
||||
[*Analytics
|
||||
Engineering with SQL and dbt*](https://www.oreilly.com/library/view/analytics-engineering-with/9781098142377/). O’Reilly Media, 2023. ISBN: 9781098142384
|
||||
|
||||
[[5](/en/ch1#Codd1993-marker)] Edgar F. Codd, S. B. Codd, and C. T. Salley.
|
||||
[Providing
|
||||
OLAP to User-Analysts: An IT Mandate](https://www.estgv.ipv.pt/PaginasPessoais/jloureiro/ESI_AID2007_2008/fichas/codd.pdf). E. F. Codd Associates, 1993.
|
||||
Archived at [perma.cc/RKX8-2GEE](https://perma.cc/RKX8-2GEE)
|
||||
|
||||
[[6](/en/ch1#Soman2023-marker)] Chinmay Soman and Neha Pawar.
|
||||
[Comparing Three
|
||||
Real-Time OLAP Databases: Apache Pinot, Apache Druid, and ClickHouse](https://startree.ai/blog/a-tale-of-three-real-time-olap-databases). *startree.ai*,
|
||||
April 2023. Archived at [perma.cc/8BZP-VWPA](https://perma.cc/8BZP-VWPA)
|
||||
|
||||
[[7](/en/ch1#Chaudhuri1997-marker)] Surajit Chaudhuri and Umeshwar Dayal.
|
||||
[An Overview of Data
|
||||
Warehousing and OLAP Technology](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/sigrecord.pdf). *ACM SIGMOD Record*, volume 26, issue 1, pages 65–74,
|
||||
March 1997. [doi:10.1145/248603.248616](https://doi.org/10.1145/248603.248616)
|
||||
|
||||
[[8](/en/ch1#Ozcan2017-marker)] Fatma Özcan, Yuanyuan Tian, and Pinar Tözün.
|
||||
[Hybrid Transactional/Analytical
|
||||
Processing: A Survey](https://humming80.github.io/papers/sigmod-htaptut.pdf). At *ACM International Conference on Management of Data* (SIGMOD), May 2017.
|
||||
[doi:10.1145/3035918.3054784](https://doi.org/10.1145/3035918.3054784)
|
||||
|
||||
[[9](/en/ch1#Prout2022_ch1-marker)] Adam Prout, Szu-Po Wang, Joseph Victor, Zhou Sun, Yongzhu
|
||||
Li, Jack Chen, Evan Bergeron, Eric Hanson, Robert Walzer, Rodrigo Gomes, and Nikita Shamgunov.
|
||||
[Cloud-Native Transactions and Analytics
|
||||
in SingleStore](https://dl.acm.org/doi/abs/10.1145/3514221.3526055). At *International Conference on Management of Data* (SIGMOD), June 2022.
|
||||
[doi:10.1145/3514221.3526055](https://doi.org/10.1145/3514221.3526055)
|
||||
|
||||
[[10](/en/ch1#Zhang2024-marker)] Chao Zhang, Guoliang Li, Jintao Zhang,
|
||||
Xinning Zhang, and Jianhua Feng.
|
||||
[HTAP Databases: A Survey](https://arxiv.org/pdf/2404.15670).
|
||||
*IEEE Transactions on Knowledge and Data Engineering*, April 2024.
|
||||
[doi:10.1109/TKDE.2024.3389693](https://doi.org/10.1109/TKDE.2024.3389693)
|
||||
|
||||
[[11](/en/ch1#Stonebraker2005fitsall-marker)] Michael Stonebraker and Uğur Çetintemel.
|
||||
[‘One Size Fits All’: An
|
||||
Idea Whose Time Has Come and Gone](https://pages.cs.wisc.edu/~shivaram/cs744-readings/fits_all.pdf). At *21st International Conference on Data Engineering*
|
||||
(ICDE), April 2005. [doi:10.1109/ICDE.2005.1](https://doi.org/10.1109/ICDE.2005.1)
|
||||
|
||||
[[12](/en/ch1#Cohen2009-marker)] Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M.
|
||||
Hellerstein, and Caleb Welton. [MAD Skills:
|
||||
New Analysis Practices for Big Data](https://www.vldb.org/pvldb/vol2/vldb09-219.pdf). *Proceedings of the VLDB Endowment*, volume 2,
|
||||
issue 2, pages 1481–1492, August 2009.
|
||||
[doi:10.14778/1687553.1687576](https://doi.org/10.14778/1687553.1687576)
|
||||
|
||||
[[13](/en/ch1#Olteanu2020-marker)] Dan Olteanu.
|
||||
[The Relational Data Borg is Learning](https://www.vldb.org/pvldb/vol13/p3502-olteanu.pdf).
|
||||
*Proceedings of the VLDB Endowment*, volume 13, issue 12, August 2020.
|
||||
[doi:10.14778/3415478.3415572](https://doi.org/10.14778/3415478.3415572)
|
||||
|
||||
[[14](/en/ch1#Bornstein2020-marker)] Matt Bornstein, Martin Casado, and Jennifer Li.
|
||||
[Emerging
|
||||
Architectures for Modern Data Infrastructure: 2020](https://future.a16z.com/emerging-architectures-for-modern-data-infrastructure-2020/). *future.a16z.com*, October 2020.
|
||||
Archived at [perma.cc/LF8W-KDCC](https://perma.cc/LF8W-KDCC)
|
||||
|
||||
[[15](/en/ch1#Fowler2015-marker)] Martin Fowler.
|
||||
[DataLake](https://www.martinfowler.com/bliki/DataLake.html).
|
||||
*martinfowler.com*, February 2015.
|
||||
Archived at [perma.cc/4WKN-CZUK](https://perma.cc/4WKN-CZUK)
|
||||
|
||||
[[16](/en/ch1#Johnson2015-marker)] Bobby Johnson and Joseph Adler.
|
||||
[The
|
||||
Sushi Principle: Raw Data Is Better](https://learning.oreilly.com/videos/strata-hadoop/9781491924143/9781491924143-video210840/). At *Strata+Hadoop World*, February 2015.
|
||||
|
||||
[[17](/en/ch1#Armbrust2021-marker)] Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia.
|
||||
[Lakehouse: A New Generation of
|
||||
Open Platforms that Unify Data Warehousing and Advanced Analytics](https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf). At *11th Annual Conference
|
||||
on Innovative Data Systems Research* (CIDR), January 2021.
|
||||
|
||||
[[18](/en/ch1#DataOps-marker)] DataKitchen, Inc.
|
||||
[The DataOps Manifesto](https://dataopsmanifesto.org/en/). *dataopsmanifesto.org*, 2017.
|
||||
Archived at [perma.cc/3F5N-FUQ4](https://perma.cc/3F5N-FUQ4)
|
||||
|
||||
[[19](/en/ch1#Manohar2021-marker)] Tejas Manohar.
|
||||
[What is Reverse ETL: A Definition & Why It’s
|
||||
Taking Off](https://hightouch.io/blog/reverse-etl/). *hightouch.io*, November 2021.
|
||||
Archived at [perma.cc/A7TN-GLYJ](https://perma.cc/A7TN-GLYJ)
|
||||
|
||||
[[20](/en/ch1#ORegan2018-marker)] Simon O’Regan.
|
||||
[Designing Data
|
||||
Products](https://towardsdatascience.com/designing-data-products-b6b93edf3d23). *towardsdatascience.com*, August 2018.
|
||||
Archived at [perma.cc/HU67-3RV8](https://perma.cc/HU67-3RV8)
|
||||
|
||||
[[21](/en/ch1#Fournier2021-marker)] Camille Fournier.
|
||||
[Why is it so
|
||||
hard to decide to buy?](https://skamille.medium.com/why-is-it-so-hard-to-decide-to-buy-d86fee98e88e) *skamille.medium.com*, July 2021.
|
||||
Archived at [perma.cc/6VSG-HQ5X](https://perma.cc/6VSG-HQ5X)
|
||||
|
||||
[[22](/en/ch1#HeinemeierHansson2022-marker)] David Heinemeier Hansson.
|
||||
[Why we’re leaving the cloud](https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47e0).
|
||||
*world.hey.com*, October 2022.
|
||||
Archived at [perma.cc/82E6-UJ65](https://perma.cc/82E6-UJ65)
|
||||
|
||||
[[23](/en/ch1#Badizadegan2022-marker)] Nima Badizadegan.
|
||||
[Use One Big Server](https://specbranch.com/posts/one-big-server/).
|
||||
*specbranch.com*, August 2022.
|
||||
Archived at [perma.cc/M8NB-95UK](https://perma.cc/M8NB-95UK)
|
||||
|
||||
[[24](/en/ch1#Yegge2020-marker)] Steve Yegge.
|
||||
[Dear
|
||||
Google Cloud: Your Deprecation Policy is Killing You](https://steve-yegge.medium.com/dear-google-cloud-your-deprecation-policy-is-killing-you-ee7525dc05dc). *steve-yegge.medium.com*, August 2020.
|
||||
Archived at [perma.cc/KQP9-SPGU](https://perma.cc/KQP9-SPGU)
|
||||
|
||||
[[25](/en/ch1#Verbitski2017-marker)] Alexandre Verbitski, Anurag Gupta, Debanjan
|
||||
Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz
|
||||
Kharatishvili, and Xiaofeng Bao.
|
||||
[Amazon
|
||||
Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases](https://media.amazonwebservices.com/blog/2017/aurora-design-considerations-paper.pdf).
|
||||
At *ACM International Conference on Management of Data* (SIGMOD), pages 1041–1052, May 2017.
|
||||
[doi:10.1145/3035918.3056101](https://doi.org/10.1145/3035918.3056101)
|
||||
|
||||
[[26](/en/ch1#Antonopoulos2019_ch1-marker)] Panagiotis Antonopoulos, Alex Budovski, Cristian
|
||||
Diaconu, Alejandro Hernandez Saenz, Jack Hu, Hanuma Kodavalla, Donald Kossmann, Sandeep Lingam, Umar
|
||||
Farooq Minhas, Naveen Prakash, Vijendra Purohit, Hugh Qu, Chaitanya Sreenivas Ravella, Krystyna
|
||||
Reisteter, Sheetal Shrotri, Dixin Tang, and Vikram Wakade.
|
||||
[Socrates: The
|
||||
New SQL Server in the Cloud](https://www.microsoft.com/en-us/research/uploads/prod/2019/05/socrates.pdf). At *ACM International Conference on Management of Data*
|
||||
(SIGMOD), pages 1743–1756, June 2019.
|
||||
[doi:10.1145/3299869.3314047](https://doi.org/10.1145/3299869.3314047)
|
||||
|
||||
[[27](/en/ch1#Vuppalapati2020-marker)] Midhul Vuppalapati, Justin Miron, Rachit Agarwal,
|
||||
Dan Truong, Ashish Motivala, and Thierry Cruanes.
|
||||
[Building An Elastic Query
|
||||
Engine on Disaggregated Storage](https://www.usenix.org/system/files/nsdi20-paper-vuppalapati.pdf). At *17th USENIX Symposium on Networked Systems Design and
|
||||
Implementation* (NSDI), February 2020.
|
||||
|
||||
[[28](/en/ch1#NickVanWiggeren2025-marker)] Nick Van Wiggeren.
|
||||
[The Real Failure Rate of EBS](https://planetscale.com/blog/the-real-fail-rate-of-ebs).
|
||||
*planetscale.com*, March 2025.
|
||||
Archived at [perma.cc/43CR-SAH5](https://perma.cc/43CR-SAH5)
|
||||
|
||||
[[29](/en/ch1#Breck2024-marker)] Colin Breck.
|
||||
[Predicting the
|
||||
Future of Distributed Systems](https://blog.colinbreck.com/predicting-the-future-of-distributed-systems/). *blog.colinbreck.com*, August 2024.
|
||||
Archived at [perma.cc/K5FC-4XX2](https://perma.cc/K5FC-4XX2)
|
||||
|
||||
[[30](/en/ch1#Shapira2023separation-marker)] Gwen Shapira.
|
||||
[Compute-Storage Separation Explained](https://www.thenile.dev/blog/storage-compute).
|
||||
*thenile.dev*, January 2023. Archived at
|
||||
[perma.cc/QCV3-XJNZ](https://perma.cc/QCV3-XJNZ)
|
||||
|
||||
[[31](/en/ch1#Murthy2022-marker)] Ravi Murthy and Gurmeet Goindi.
|
||||
[AlloyDB
|
||||
for PostgreSQL under the hood: Intelligent, database-aware storage](https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage). *cloud.google.com*,
|
||||
May 2022. Archived at
|
||||
[archive.org](https://web.archive.org/web/20220514021120/https%3A//cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage)
|
||||
|
||||
[[32](/en/ch1#Vanlightly2023serverless-marker)] Jack Vanlightly.
|
||||
[The
|
||||
Architecture of Serverless Data Systems](https://jack-vanlightly.com/blog/2023/11/14/the-architecture-of-serverless-data-systems). *jack-vanlightly.com*, November 2023.
|
||||
Archived at [perma.cc/UDV4-TNJ5](https://perma.cc/UDV4-TNJ5)
|
||||
|
||||
[[33](/en/ch1#Jonas2019-marker)] Eric Jonas, Johann Schleier-Smith, Vikram
|
||||
Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth,
|
||||
Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson.
|
||||
[Cloud Programming Simplified: A Berkeley View on
|
||||
Serverless Computing](https://arxiv.org/abs/1902.03383). *arxiv.org*, February 2019.
|
||||
|
||||
[[34](/en/ch1#Beyer2016-marker)] Betsy Beyer, Jennifer Petoff, Chris
|
||||
Jones, and Niall Richard Murphy.
|
||||
[*Site
|
||||
Reliability Engineering: How Google Runs Production Systems*](https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/).
|
||||
O’Reilly Media, 2016. ISBN: 9781491929124
|
||||
|
||||
[[35](/en/ch1#Limoncelli2020-marker)] Thomas Limoncelli.
|
||||
[The Time I Stole $10,000 from Bell Labs](https://queue.acm.org/detail.cfm?id=3434773).
|
||||
*ACM Queue*, volume 18, issue 5, November 2020.
|
||||
[doi:10.1145/3434571.3434773](https://doi.org/10.1145/3434571.3434773)
|
||||
|
||||
[[36](/en/ch1#Majors2020-marker)] Charity Majors.
|
||||
[The Future of Ops Jobs](https://acloudguru.com/blog/engineering/the-future-of-ops-jobs).
|
||||
*acloudguru.com*, August 2020.
|
||||
Archived at [perma.cc/GRU2-CZG3](https://perma.cc/GRU2-CZG3)
|
||||
|
||||
[[37](/en/ch1#Cherkasky2021-marker)] Boris Cherkasky.
|
||||
[(Over)Pay
|
||||
As You Go for Your Datastore](https://medium.com/riskified-technology/over-pay-as-you-go-for-your-datastore-11a29ae49a8b). *medium.com*, September 2021.
|
||||
Archived at [perma.cc/Q8TV-2AM2](https://perma.cc/Q8TV-2AM2)
|
||||
|
||||
[[38](/en/ch1#Kushchi2023-marker)] Shlomi Kushchi.
|
||||
[Serverless Doesn’t Mean
|
||||
DevOpsLess or NoOps](https://thenewstack.io/serverless-doesnt-mean-devopsless-or-noops/). *thenewstack.io*, February 2023.
|
||||
Archived at [perma.cc/3NJR-AYYU](https://perma.cc/3NJR-AYYU)
|
||||
|
||||
[[39](/en/ch1#Bernhardsson2021-marker)] Erik Bernhardsson.
|
||||
[Storm
|
||||
in the stratosphere: how the cloud will be reshuffled](https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html). *erikbern.com*, November 2021.
|
||||
Archived at [perma.cc/SYB2-99P3](https://perma.cc/SYB2-99P3)
|
||||
|
||||
[[40](/en/ch1#Stancil2021-marker)] Benn Stancil.
|
||||
[The data OS](https://benn.substack.com/p/the-data-os). *benn.substack.com*,
|
||||
September 2021. Archived at [perma.cc/WQ43-FHS6](https://perma.cc/WQ43-FHS6)
|
||||
|
||||
[[41](/en/ch1#Korolov2022-marker)] Maria Korolov.
|
||||
[Data
|
||||
residency laws pushing companies toward residency as a service](https://www.csoonline.com/article/3647761/data-residency-laws-pushing-companies-toward-residency-as-a-service.html). *csoonline.com*,
|
||||
January 2022. Archived at [perma.cc/CHE4-XZZ2](https://perma.cc/CHE4-XZZ2)
|
||||
|
||||
[[42](/en/ch1#Borenstein2025-marker)] Severin Borenstein.
|
||||
[Can
|
||||
Data Centers Flex Their Power Demand?](https://energyathaas.wordpress.com/2025/04/14/can-data-centers-flex-their-power-demand/) *energyathaas.wordpress.com*, April 2025.
|
||||
Archived at <https://perma.cc/MUD3-A6FF>
|
||||
|
||||
[[43](/en/ch1#Acun2023-marker)] Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Aditya
|
||||
Sundarrajan, Kiwan Maeng, Manoj Chakkaravarthy, David Brooks, and Carole-Jean Wu.
|
||||
[Carbon Dependencies in
|
||||
Datacenter Design and Management](https://hotcarbon.org/assets/2022/pdf/hotcarbon22-acun.pdf).
|
||||
*ACM SIGENERGY Energy Informatics Review*, volume 3, issue 3, pages 21–26.
|
||||
[doi:10.1145/3630614.3630619](https://doi.org/10.1145/3630614.3630619)
|
||||
|
||||
[[44](/en/ch1#Nath2019-marker)] Kousik Nath.
|
||||
[These are
|
||||
the numbers every computer engineer should know](https://www.freecodecamp.org/news/must-know-numbers-for-every-computer-engineer/). *freecodecamp.org*, September 2019.
|
||||
Archived at [perma.cc/RW73-36RL](https://perma.cc/RW73-36RL)
|
||||
|
||||
[[45](/en/ch1#Hellerstein2019-marker)] Joseph M. Hellerstein, Jose Faleiro, Joseph E.
|
||||
Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu.
|
||||
[Serverless Computing: One Step Forward, Two Steps Back](https://arxiv.org/abs/1812.03651).
|
||||
At *Conference on Innovative Data Systems Research* (CIDR), January 2019.
|
||||
|
||||
[[46](/en/ch1#McSherry2015_ch1-marker)] Frank McSherry, Michael Isard, and Derek G. Murray.
|
||||
[Scalability!
|
||||
But at What COST?](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf) At *15th USENIX Workshop on Hot Topics in Operating Systems* (HotOS),
|
||||
May 2015.
|
||||
|
||||
[[47](/en/ch1#Sridharan2018-marker)] Cindy Sridharan.
|
||||
*[Distributed
|
||||
Systems Observability: A Guide to Building Robust Systems](https://unlimited.humio.com/rs/756-LMY-106/images/Distributed-Systems-Observability-eBook.pdf)*. Report, O’Reilly Media, May 2018.
|
||||
Archived at [perma.cc/M6JL-XKCM](https://perma.cc/M6JL-XKCM)
|
||||
|
||||
[[48](/en/ch1#Majors2019-marker)] Charity Majors.
|
||||
[Observability — A 3-Year
|
||||
Retrospective](https://thenewstack.io/observability-a-3-year-retrospective/). *thenewstack.io*, August 2019.
|
||||
Archived at [perma.cc/CG62-TJWL](https://perma.cc/CG62-TJWL)
|
||||
|
||||
[[49](/en/ch1#Sigelman2010-marker)] Benjamin H. Sigelman, Luiz André Barroso, Mike
|
||||
Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag.
|
||||
[Dapper, a Large-Scale Distributed Systems Tracing
|
||||
Infrastructure](https://research.google/pubs/pub36356/). Google Technical Report dapper-2010-1, April 2010.
|
||||
Archived at [perma.cc/K7KU-2TMH](https://perma.cc/K7KU-2TMH)
|
||||
|
||||
[[50](/en/ch1#Laigner2021-marker)] Rodrigo Laigner, Yongluan Zhou, Marcos Antonio
|
||||
Vaz Salles, Yijian Liu, and Marcos Kalinowski.
|
||||
[Data management in microservices: State
|
||||
of the practice, challenges, and research directions](https://www.vldb.org/pvldb/vol14/p3348-laigner.pdf). *Proceedings of the VLDB Endowment*,
|
||||
volume 14, issue 13, pages 3348–3361, September 2021.
|
||||
[doi:10.14778/3484224.3484232](https://doi.org/10.14778/3484224.3484232)
|
||||
|
||||
[[51](/en/ch1#Tigani2023-marker)] Jordan Tigani.
|
||||
[Big Data is Dead](https://motherduck.com/blog/big-data-is-dead/).
|
||||
*motherduck.com*, February 2023.
|
||||
Archived at [perma.cc/HT4Q-K77U](https://perma.cc/HT4Q-K77U)
|
||||
|
||||
[[52](/en/ch1#Newman2021_ch1-marker)] Sam Newman.
|
||||
[*Building
|
||||
Microservices*, second edition](https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/). O’Reilly Media, 2021. ISBN: 9781492034025
|
||||
|
||||
[[53](/en/ch1#Richardson2014-marker)] Chris Richardson.
|
||||
[Microservices: Decomposing
|
||||
Applications for Deployability and Scalability](https://www.infoq.com/articles/microservices-intro/). *infoq.com*, May 2014.
|
||||
Archived at [perma.cc/CKN4-YEQ2](https://perma.cc/CKN4-YEQ2)
|
||||
|
||||
[[54](/en/ch1#Shahrad2020-marker)] Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri,
|
||||
Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, Ricardo Bianchini.
|
||||
[Serverless in the Wild:
|
||||
Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider](https://www.usenix.org/system/files/atc20-shahrad.pdf).
|
||||
At *USENIX Annual Technical Conference* (ATC), July 2020.
|
||||
|
||||
[[55](/en/ch1#Barroso2018-marker)] Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan.
|
||||
[The Datacenter as a
|
||||
Computer: Designing Warehouse-Scale Machines](https://www.morganclaypool.com/doi/10.2200/S00874ED3V01Y201809CAC046), third edition.
|
||||
Morgan & Claypool Synthesis Lectures on Computer Architecture, October 2018.
|
||||
[doi:10.2200/S00874ED3V01Y201809CAC046](https://doi.org/10.2200/S00874ED3V01Y201809CAC046)
|
||||
|
||||
[[56](/en/ch1#Fiala2012-marker)] David Fiala, Frank Mueller, Christian Engelmann, Rolf
|
||||
Riesen, Kurt Ferreira, and Ron Brightwell.
|
||||
[Detection and
|
||||
Correction of Silent Data Corruption for Large-Scale High-Performance Computing](https://arcb.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/sc12.pdf),” at
|
||||
*International Conference for High Performance Computing, Networking, Storage and
|
||||
Analysis* (SC), November 2012.
|
||||
[doi:10.1109/SC.2012.49](https://doi.org/10.1109/SC.2012.49)
|
||||
|
||||
[[57](/en/ch1#KornfeldSimpson2020-marker)] Anna Kornfeld
|
||||
Simpson, Adriana Szekeres, Jacob Nelson, and Irene Zhang.
|
||||
[Securing RDMA
|
||||
for High-Performance Datacenter Storage Systems](https://www.usenix.org/conference/hotcloud20/presentation/kornfeld-simpson). At *12th USENIX Workshop on Hot Topics in
|
||||
Cloud Computing* (HotCloud), July 2020.
|
||||
|
||||
[[58](/en/ch1#Singh2015-marker)] Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson,
|
||||
Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala,
|
||||
Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat.
|
||||
[Jupiter Rising: A
|
||||
Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network](https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf). At
|
||||
*Annual Conference of the ACM Special Interest Group on Data Communication* (SIGCOMM), August 2015.
|
||||
[doi:10.1145/2785956.2787508](https://doi.org/10.1145/2785956.2787508)
|
||||
|
||||
[[59](/en/ch1#Lockwood2014-marker)] Glenn K. Lockwood.
|
||||
[Hadoop’s
|
||||
Uncomfortable Fit in HPC](https://blog.glennklockwood.com/2014/05/hadoops-uncomfortable-fit-in-hpc.html). *glennklockwood.blogspot.co.uk*, May 2014.
|
||||
Archived at [perma.cc/S8XX-Y67B](https://perma.cc/S8XX-Y67B)
|
||||
|
||||
[[60](/en/ch1#ONeil2016_ch1-marker)] Cathy O’Neil: *Weapons of Math Destruction:
|
||||
How Big Data Increases Inequality and Threatens Democracy*. Crown Publishing, 2016.
|
||||
ISBN: 9780553418811
|
||||
|
||||
[[61](/en/ch1#Shastri2020-marker)] Supreeth Shastri, Vinay Banakar, Melissa
|
||||
Wasserman, Arun Kumar, and Vijay Chidambaram.
|
||||
[Understanding and Benchmarking the
|
||||
Impact of GDPR on Database Systems](https://www.vldb.org/pvldb/vol13/p1064-shastri.pdf). *Proceedings of the VLDB Endowment*, volume 13, issue
|
||||
7, pages 1064–1077, March 2020.
|
||||
[doi:10.14778/3384345.3384354](https://doi.org/10.14778/3384345.3384354)
|
||||
|
||||
[[62](/en/ch1#Datensparsamkeit-marker)] Martin Fowler.
|
||||
[Datensparsamkeit](https://www.martinfowler.com/bliki/Datensparsamkeit.html).
|
||||
*martinfowler.com*, December 2013.
|
||||
Archived at [perma.cc/R9QX-CME6](https://perma.cc/R9QX-CME6)
|
||||
|
||||
[[63](/en/ch1#GDPR-marker)] [Regulation
|
||||
(EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data
|
||||
Protection Regulation)](https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN). *Official Journal of the European Union* L 119/1, May 2016.
|
||||
## Footnotes
|
||||
|
||||
## References
|
||||
|
||||
[^1]: Richard T. Kouzes, Gordon A. Anderson, Stephen T. Elbert, Ian Gorton, and Deborah K. Gracio. [The Changing Paradigm of Data-Intensive Computing](http://www2.ic.uff.br/~boeres/slides_AP/papers/TheChanginParadigmDataIntensiveComputing_2009.pdf). *IEEE Computer*, volume 42, issue 1, January 2009. [doi:10.1109/MC.2009.26](https://doi.org/10.1109/MC.2009.26)
|
||||
[^2]: Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. [Local-first software: you own your data, in spite of the cloud](https://www.inkandswitch.com/local-first/). At *2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software* (Onward!), October 2019. [doi:10.1145/3359591.3359737](https://doi.org/10.1145/3359591.3359737)
|
||||
[^3]: Joe Reis and Matt Housley. [*Fundamentals of Data Engineering*](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/). O’Reilly Media, 2022. ISBN: 9781098108304
|
||||
[^4]: Rui Pedro Machado and Helder Russa. [*Analytics Engineering with SQL and dbt*](https://www.oreilly.com/library/view/analytics-engineering-with/9781098142377/). O’Reilly Media, 2023. ISBN: 9781098142384
|
||||
[^5]: Edgar F. Codd, S. B. Codd, and C. T. Salley. [Providing OLAP to User-Analysts: An IT Mandate](https://www.estgv.ipv.pt/PaginasPessoais/jloureiro/ESI_AID2007_2008/fichas/codd.pdf). E. F. Codd Associates, 1993. Archived at [perma.cc/RKX8-2GEE](https://perma.cc/RKX8-2GEE)
|
||||
[^6]: Chinmay Soman and Neha Pawar. [Comparing Three Real-Time OLAP Databases: Apache Pinot, Apache Druid, and ClickHouse](https://startree.ai/blog/a-tale-of-three-real-time-olap-databases). *startree.ai*, April 2023. Archived at [perma.cc/8BZP-VWPA](https://perma.cc/8BZP-VWPA)
|
||||
[^7]: Surajit Chaudhuri and Umeshwar Dayal. [An Overview of Data Warehousing and OLAP Technology](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/sigrecord.pdf). *ACM SIGMOD Record*, volume 26, issue 1, pages 65–74, March 1997. [doi:10.1145/248603.248616](https://doi.org/10.1145/248603.248616)
|
||||
[^8]: Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. [Hybrid Transactional/Analytical Processing: A Survey](https://humming80.github.io/papers/sigmod-htaptut.pdf). At *ACM International Conference on Management of Data* (SIGMOD), May 2017. [doi:10.1145/3035918.3054784](https://doi.org/10.1145/3035918.3054784)
|
||||
[^9]: Adam Prout, Szu-Po Wang, Joseph Victor, Zhou Sun, Yongzhu Li, Jack Chen, Evan Bergeron, Eric Hanson, Robert Walzer, Rodrigo Gomes, and Nikita Shamgunov. [Cloud-Native Transactions and Analytics in SingleStore](https://dl.acm.org/doi/abs/10.1145/3514221.3526055). At *International Conference on Management of Data* (SIGMOD), June 2022. [doi:10.1145/3514221.3526055](https://doi.org/10.1145/3514221.3526055)
|
||||
[^10]: Chao Zhang, Guoliang Li, Jintao Zhang, Xinning Zhang, and Jianhua Feng. [HTAP Databases: A Survey](https://arxiv.org/pdf/2404.15670). *IEEE Transactions on Knowledge and Data Engineering*, April 2024. [doi:10.1109/TKDE.2024.3389693](https://doi.org/10.1109/TKDE.2024.3389693)
|
||||
[^11]: Michael Stonebraker and Uğur Çetintemel. [‘One Size Fits All’: An Idea Whose Time Has Come and Gone](https://pages.cs.wisc.edu/~shivaram/cs744-readings/fits_all.pdf). At *21st International Conference on Data Engineering* (ICDE), April 2005. [doi:10.1109/ICDE.2005.1](https://doi.org/10.1109/ICDE.2005.1)
|
||||
[^12]: Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. [MAD Skills: New Analysis Practices for Big Data](https://www.vldb.org/pvldb/vol2/vldb09-219.pdf). *Proceedings of the VLDB Endowment*, volume 2, issue 2, pages 1481–1492, August 2009. [doi:10.14778/1687553.1687576](https://doi.org/10.14778/1687553.1687576)
|
||||
[^13]: Dan Olteanu. [The Relational Data Borg is Learning](https://www.vldb.org/pvldb/vol13/p3502-olteanu.pdf). *Proceedings of the VLDB Endowment*, volume 13, issue 12, August 2020. [doi:10.14778/3415478.3415572](https://doi.org/10.14778/3415478.3415572)
|
||||
[^14]: Matt Bornstein, Martin Casado, and Jennifer Li. [Emerging Architectures for Modern Data Infrastructure: 2020](https://future.a16z.com/emerging-architectures-for-modern-data-infrastructure-2020/). *future.a16z.com*, October 2020. Archived at [perma.cc/LF8W-KDCC](https://perma.cc/LF8W-KDCC)
|
||||
[^15]: Martin Fowler. [DataLake](https://www.martinfowler.com/bliki/DataLake.html). *martinfowler.com*, February 2015. Archived at [perma.cc/4WKN-CZUK](https://perma.cc/4WKN-CZUK)
|
||||
[^16]: Bobby Johnson and Joseph Adler. [The Sushi Principle: Raw Data Is Better](https://learning.oreilly.com/videos/strata-hadoop/9781491924143/9781491924143-video210840/). At *Strata+Hadoop World*, February 2015.
|
||||
[^17]: Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. [Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics](https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf). At *11th Annual Conference on Innovative Data Systems Research* (CIDR), January 2021.
|
||||
[^18]: DataKitchen, Inc. [The DataOps Manifesto](https://dataopsmanifesto.org/en/). *dataopsmanifesto.org*, 2017. Archived at [perma.cc/3F5N-FUQ4](https://perma.cc/3F5N-FUQ4)
|
||||
[^19]: Tejas Manohar. [What is Reverse ETL: A Definition & Why It’s Taking Off](https://hightouch.io/blog/reverse-etl/). *hightouch.io*, November 2021. Archived at [perma.cc/A7TN-GLYJ](https://perma.cc/A7TN-GLYJ)
|
||||
[^20]: Simon O’Regan. [Designing Data Products](https://towardsdatascience.com/designing-data-products-b6b93edf3d23). *towardsdatascience.com*, August 2018. Archived at [perma.cc/HU67-3RV8](https://perma.cc/HU67-3RV8)
|
||||
[^21]: Camille Fournier. [Why is it so hard to decide to buy?](https://skamille.medium.com/why-is-it-so-hard-to-decide-to-buy-d86fee98e88e) *skamille.medium.com*, July 2021. Archived at [perma.cc/6VSG-HQ5X](https://perma.cc/6VSG-HQ5X)
|
||||
[^22]: David Heinemeier Hansson. [Why we’re leaving the cloud](https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47e0). *world.hey.com*, October 2022. Archived at [perma.cc/82E6-UJ65](https://perma.cc/82E6-UJ65)
|
||||
[^23]: Nima Badizadegan. [Use One Big Server](https://specbranch.com/posts/one-big-server/). *specbranch.com*, August 2022. Archived at [perma.cc/M8NB-95UK](https://perma.cc/M8NB-95UK)
|
||||
[^24]: Steve Yegge. [Dear Google Cloud: Your Deprecation Policy is Killing You](https://steve-yegge.medium.com/dear-google-cloud-your-deprecation-policy-is-killing-you-ee7525dc05dc). *steve-yegge.medium.com*, August 2020. Archived at [perma.cc/KQP9-SPGU](https://perma.cc/KQP9-SPGU)
|
||||
[^25]: Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. [Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases](https://media.amazonwebservices.com/blog/2017/aurora-design-considerations-paper.pdf). At *ACM International Conference on Management of Data* (SIGMOD), pages 1041–1052, May 2017. [doi:10.1145/3035918.3056101](https://doi.org/10.1145/3035918.3056101)
|
||||
[^26]: Panagiotis Antonopoulos, Alex Budovski, Cristian Diaconu, Alejandro Hernandez Saenz, Jack Hu, Hanuma Kodavalla, Donald Kossmann, Sandeep Lingam, Umar Farooq Minhas, Naveen Prakash, Vijendra Purohit, Hugh Qu, Chaitanya Sreenivas Ravella, Krystyna Reisteter, Sheetal Shrotri, Dixin Tang, and Vikram Wakade. [Socrates: The New SQL Server in the Cloud](https://www.microsoft.com/en-us/research/uploads/prod/2019/05/socrates.pdf). At *ACM International Conference on Management of Data* (SIGMOD), pages 1743–1756, June 2019. [doi:10.1145/3299869.3314047](https://doi.org/10.1145/3299869.3314047)
|
||||
[^27]: Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. [Building An Elastic Query Engine on Disaggregated Storage](https://www.usenix.org/system/files/nsdi20-paper-vuppalapati.pdf). At *17th USENIX Symposium on Networked Systems Design and Implementation* (NSDI), February 2020.
|
||||
[^28]: Nick Van Wiggeren. [The Real Failure Rate of EBS](https://planetscale.com/blog/the-real-fail-rate-of-ebs). *planetscale.com*, March 2025. Archived at [perma.cc/43CR-SAH5](https://perma.cc/43CR-SAH5)
|
||||
[^29]: Colin Breck. [Predicting the Future of Distributed Systems](https://blog.colinbreck.com/predicting-the-future-of-distributed-systems/). *blog.colinbreck.com*, August 2024. Archived at [perma.cc/K5FC-4XX2](https://perma.cc/K5FC-4XX2)
|
||||
[^30]: Gwen Shapira. [Compute-Storage Separation Explained](https://www.thenile.dev/blog/storage-compute). *thenile.dev*, January 2023. Archived at [perma.cc/QCV3-XJNZ](https://perma.cc/QCV3-XJNZ)
|
||||
[^31]: Ravi Murthy and Gurmeet Goindi. [AlloyDB for PostgreSQL under the hood: Intelligent, database-aware storage](https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage). *cloud.google.com*, May 2022. Archived at [archive.org](https://web.archive.org/web/20220514021120/https%3A//cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage)
|
||||
[^32]: Jack Vanlightly. [The Architecture of Serverless Data Systems](https://jack-vanlightly.com/blog/2023/11/14/the-architecture-of-serverless-data-systems). *jack-vanlightly.com*, November 2023. Archived at [perma.cc/UDV4-TNJ5](https://perma.cc/UDV4-TNJ5)
|
||||
[^33]: Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson. [Cloud Programming Simplified: A Berkeley View on Serverless Computing](https://arxiv.org/abs/1902.03383). *arxiv.org*, February 2019.
|
||||
[^34]: Betsy Beyer, Jennifer Petoff, Chris Jones, and Niall Richard Murphy. [*Site Reliability Engineering: How Google Runs Production Systems*](https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/). O’Reilly Media, 2016. ISBN: 9781491929124
|
||||
[^35]: Thomas Limoncelli. [The Time I Stole $10,000 from Bell Labs](https://queue.acm.org/detail.cfm?id=3434773). *ACM Queue*, volume 18, issue 5, November 2020. [doi:10.1145/3434571.3434773](https://doi.org/10.1145/3434571.3434773)
|
||||
[^36]: Charity Majors. [The Future of Ops Jobs](https://acloudguru.com/blog/engineering/the-future-of-ops-jobs). *acloudguru.com*, August 2020. Archived at [perma.cc/GRU2-CZG3](https://perma.cc/GRU2-CZG3)
|
||||
[^37]: Boris Cherkasky. [(Over)Pay As You Go for Your Datastore](https://medium.com/riskified-technology/over-pay-as-you-go-for-your-datastore-11a29ae49a8b). *medium.com*, September 2021. Archived at [perma.cc/Q8TV-2AM2](https://perma.cc/Q8TV-2AM2)
|
||||
[^38]: Shlomi Kushchi. [Serverless Doesn’t Mean DevOpsLess or NoOps](https://thenewstack.io/serverless-doesnt-mean-devopsless-or-noops/). *thenewstack.io*, February 2023. Archived at [perma.cc/3NJR-AYYU](https://perma.cc/3NJR-AYYU)
|
||||
[^39]: Erik Bernhardsson. [Storm in the stratosphere: how the cloud will be reshuffled](https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html). *erikbern.com*, November 2021. Archived at [perma.cc/SYB2-99P3](https://perma.cc/SYB2-99P3)
|
||||
[^40]: Benn Stancil. [The data OS](https://benn.substack.com/p/the-data-os). *benn.substack.com*, September 2021. Archived at [perma.cc/WQ43-FHS6](https://perma.cc/WQ43-FHS6)
|
||||
[^41]: Maria Korolov. [Data residency laws pushing companies toward residency as a service](https://www.csoonline.com/article/3647761/data-residency-laws-pushing-companies-toward-residency-as-a-service.html). *csoonline.com*, January 2022. Archived at [perma.cc/CHE4-XZZ2](https://perma.cc/CHE4-XZZ2)
|
||||
[^42]: Severin Borenstein. [Can Data Centers Flex Their Power Demand?](https://energyathaas.wordpress.com/2025/04/14/can-data-centers-flex-their-power-demand/) *energyathaas.wordpress.com*, April 2025. Archived at <https://perma.cc/MUD3-A6FF>
|
||||
[^43]: Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Aditya Sundarrajan, Kiwan Maeng, Manoj Chakkaravarthy, David Brooks, and Carole-Jean Wu. [Carbon Dependencies in Datacenter Design and Management](https://hotcarbon.org/assets/2022/pdf/hotcarbon22-acun.pdf). *ACM SIGENERGY Energy Informatics Review*, volume 3, issue 3, pages 21–26. [doi:10.1145/3630614.3630619](https://doi.org/10.1145/3630614.3630619)
|
||||
[^44]: Kousik Nath. [These are the numbers every computer engineer should know](https://www.freecodecamp.org/news/must-know-numbers-for-every-computer-engineer/). *freecodecamp.org*, September 2019. Archived at [perma.cc/RW73-36RL](https://perma.cc/RW73-36RL)
|
||||
[^45]: Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. [Serverless Computing: One Step Forward, Two Steps Back](https://arxiv.org/abs/1812.03651). At *Conference on Innovative Data Systems Research* (CIDR), January 2019.
|
||||
[^46]: Frank McSherry, Michael Isard, and Derek G. Murray. [Scalability! But at What COST?](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf) At *15th USENIX Workshop on Hot Topics in Operating Systems* (HotOS), May 2015.
|
||||
[^47]: Cindy Sridharan. *[Distributed Systems Observability: A Guide to Building Robust Systems](https://unlimited.humio.com/rs/756-LMY-106/images/Distributed-Systems-Observability-eBook.pdf)*. Report, O’Reilly Media, May 2018. Archived at [perma.cc/M6JL-XKCM](https://perma.cc/M6JL-XKCM)
|
||||
[^48]: Charity Majors. [Observability — A 3-Year Retrospective](https://thenewstack.io/observability-a-3-year-retrospective/). *thenewstack.io*, August 2019. Archived at [perma.cc/CG62-TJWL](https://perma.cc/CG62-TJWL)
|
||||
[^49]: Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. [Dapper, a Large-Scale Distributed Systems Tracing Infrastructure](https://research.google/pubs/pub36356/). Google Technical Report dapper-2010-1, April 2010. Archived at [perma.cc/K7KU-2TMH](https://perma.cc/K7KU-2TMH)
|
||||
[^50]: Rodrigo Laigner, Yongluan Zhou, Marcos Antonio Vaz Salles, Yijian Liu, and Marcos Kalinowski. [Data management in microservices: State of the practice, challenges, and research directions](https://www.vldb.org/pvldb/vol14/p3348-laigner.pdf). *Proceedings of the VLDB Endowment*, volume 14, issue 13, pages 3348–3361, September 2021. [doi:10.14778/3484224.3484232](https://doi.org/10.14778/3484224.3484232)
|
||||
[^51]: Jordan Tigani. [Big Data is Dead](https://motherduck.com/blog/big-data-is-dead/). *motherduck.com*, February 2023. Archived at [perma.cc/HT4Q-K77U](https://perma.cc/HT4Q-K77U)
|
||||
[^52]: Sam Newman. [*Building Microservices*, second edition](https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/). O’Reilly Media, 2021. ISBN: 9781492034025
|
||||
[^53]: Chris Richardson. [Microservices: Decomposing Applications for Deployability and Scalability](https://www.infoq.com/articles/microservices-intro/). *infoq.com*, May 2014. Archived at [perma.cc/CKN4-YEQ2](https://perma.cc/CKN4-YEQ2)
|
||||
[^54]: Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, Ricardo Bianchini. [Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider](https://www.usenix.org/system/files/atc20-shahrad.pdf). At *USENIX Annual Technical Conference* (ATC), July 2020.
|
||||
[^55]: Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. [The Datacenter as a Computer: Designing Warehouse-Scale Machines](https://www.morganclaypool.com/doi/10.2200/S00874ED3V01Y201809CAC046), third edition. Morgan & Claypool Synthesis Lectures on Computer Architecture, October 2018. [doi:10.2200/S00874ED3V01Y201809CAC046](https://doi.org/10.2200/S00874ED3V01Y201809CAC046)
|
||||
[^56]: David Fiala, Frank Mueller, Christian Engelmann, Rolf Riesen, Kurt Ferreira, and Ron Brightwell. [Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing](https://arcb.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/sc12.pdf),” at *International Conference for High Performance Computing, Networking, Storage and Analysis* (SC), November 2012. [doi:10.1109/SC.2012.49](https://doi.org/10.1109/SC.2012.49)
|
||||
[^57]: Anna Kornfeld Simpson, Adriana Szekeres, Jacob Nelson, and Irene Zhang. [Securing RDMA for High-Performance Datacenter Storage Systems](https://www.usenix.org/conference/hotcloud20/presentation/kornfeld-simpson). At *12th USENIX Workshop on Hot Topics in Cloud Computing* (HotCloud), July 2020.
|
||||
[^58]: Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. [Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network](https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf). At *Annual Conference of the ACM Special Interest Group on Data Communication* (SIGCOMM), August 2015. [doi:10.1145/2785956.2787508](https://doi.org/10.1145/2785956.2787508)
|
||||
[^59]: Glenn K. Lockwood. [Hadoop’s Uncomfortable Fit in HPC](https://blog.glennklockwood.com/2014/05/hadoops-uncomfortable-fit-in-hpc.html). *glennklockwood.blogspot.co.uk*, May 2014. Archived at [perma.cc/S8XX-Y67B](https://perma.cc/S8XX-Y67B)
|
||||
[^60]: Cathy O’Neil: *Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy*. Crown Publishing, 2016. ISBN: 9780553418811
|
||||
[^61]: Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram. [Understanding and Benchmarking the Impact of GDPR on Database Systems](https://www.vldb.org/pvldb/vol13/p1064-shastri.pdf). *Proceedings of the VLDB Endowment*, volume 13, issue 7, pages 1064–1077, March 2020. [doi:10.14778/3384345.3384354](https://doi.org/10.14778/3384345.3384354)
|
||||
[^62]: Martin Fowler. [Datensparsamkeit](https://www.martinfowler.com/bliki/Datensparsamkeit.html). *martinfowler.com*, December 2013. Archived at [perma.cc/R9QX-CME6](https://perma.cc/R9QX-CME6)
|
||||
[^63]: [Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation)](https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN). *Official Journal of the European Union* L 119/1, May 2016.
|
||||
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -55,17 +55,17 @@ the computer which operations to perform in which order. A declarative query lan
|
|||
because it is typically more concise and easier to write than an explicit algorithm. But more
|
||||
importantly, it also hides implementation details of the query engine, which makes it possible for
|
||||
the database system to introduce performance improvements without requiring any changes to queries.
|
||||
[[1](/en/ch3#Brandon2024)].
|
||||
[^1].
|
||||
|
||||
For example, a database might be able to execute a declarative query in parallel across multiple CPU
|
||||
cores and machines, without you having to worry about how to implement that parallelism
|
||||
[[2](/en/ch3#Hellerstein2010)].
|
||||
[^2].
|
||||
In a hand-coded algorithm it would be a lot of work to implement such parallel execution yourself.
|
||||
|
||||
# Relational Model versus Document Model
|
||||
|
||||
The best-known data model today is probably that of SQL, based on the relational model proposed by
|
||||
Edgar Codd in 1970 [[3](/en/ch3#Codd1970)]:
|
||||
Edgar Codd in 1970 [^3]:
|
||||
data is organized into *relations* (called *tables* in SQL), where each relation is an unordered collection
|
||||
of *tuples* (*rows* in SQL).
|
||||
|
||||
|
|
@ -80,10 +80,10 @@ and early 1980s, the *network model* and the *hierarchical model* were the main
|
|||
the relational model came to dominate them. Object databases came and went again in the late 1980s
|
||||
and early 1990s. XML databases appeared in the early 2000s, but have only seen niche adoption. Each
|
||||
competitor to the relational model generated a lot of hype in its time, but it never lasted
|
||||
[[4](/en/ch3#Stonebraker2005around)].
|
||||
[^4].
|
||||
Instead, SQL has grown to incorporate other data types besides its relational core—for example,
|
||||
adding support for XML, JSON, and graph data
|
||||
[[5](/en/ch3#Winand2015)].
|
||||
[^5].
|
||||
|
||||
In the 2010s, *NoSQL* was the latest buzzword that tried to overthrow the dominance of relational
|
||||
databases. NoSQL refers not to a single technology, but a loose set of ideas around new data models,
|
||||
|
|
@ -122,7 +122,7 @@ reflections and other troubles.
|
|||
|
||||
Object-relational mapping (ORM) frameworks like ActiveRecord and Hibernate reduce the amount of
|
||||
boilerplate code required for this translation layer, but they are often criticized
|
||||
[[6](/en/ch3#Fowler2012)].
|
||||
[^6].
|
||||
Some commonly cited problems are:
|
||||
|
||||
* ORMs are complex and can’t completely hide the differences between the two models, so developers
|
||||
|
|
@ -137,7 +137,7 @@ Some commonly cited problems are:
|
|||
database. Customizing the ORM’s schema and query generation can be complex and negate the benefit
|
||||
of using the ORM in the first place.
|
||||
* ORMs make it easy to accidentally write inefficient queries, such as the *N+1 query problem*
|
||||
[[7](/en/ch3#Mihalcea2023)].
|
||||
[^7].
|
||||
For example, say you want to display a list of user comments on a page, so you perform one query
|
||||
that returns *N* comments, each containing the ID of its author. To show the name of the comment
|
||||
author you need to look up the ID in the users table. In hand-written SQL you would probably
|
||||
|
|
@ -213,7 +213,7 @@ The JSON representation has better *locality* than the multi-table schema in
|
|||
[Figure 3-1](/en/ch3#fig_obama_relational) (see [“Data locality for reads and writes”](/en/ch3#sec_datamodels_document_locality)). If you want to fetch a profile
|
||||
in the relational example, you need to either perform multiple queries (query each table by
|
||||
`user_id`) or perform a messy multi-way join between the `users` table and its subordinate tables
|
||||
[[8](/en/ch3#Schauder2023)].
|
||||
[^8].
|
||||
In the JSON representation, all the relevant information is in one place, making the query both
|
||||
faster and simpler.
|
||||
|
||||
|
|
@ -314,7 +314,7 @@ name:
|
|||
* In a denormalized representation, we would include the image URL of the logo on every individual
|
||||
person’s profile; this makes the JSON document self-contained, but it creates a headache if we
|
||||
ever need to change the logo, because we now need to find all of the occurrences of the old URL
|
||||
and update them [[9](/en/ch3#Zola2014)].
|
||||
and update them [^9].
|
||||
* In a normalized representation, we would create an entity representing an organization or school,
|
||||
and store its name, logo URL, and perhaps other attributes (description, news feed, etc.) once on
|
||||
that entity. Every résumé that mentions the organization would then simply reference its ID, and
|
||||
|
|
@ -350,7 +350,7 @@ denormalized representation consistent.
|
|||
However, the implementation of materialized timelines at X (formerly Twitter) does not store the
|
||||
actual text of each post: each entry actually only stores the post ID, the ID of the user who posted
|
||||
it, and a little bit of extra information to identify reposts and replies
|
||||
[[11](/en/ch3#Krikorian2012_ch3)].
|
||||
[^11].
|
||||
In other words, it is a precomputed result of (approximately) the following query:
|
||||
|
||||
```
|
||||
|
|
@ -366,7 +366,7 @@ the post ID to fetch the actual post content (as well as statistics such as the
|
|||
and replies), and look up the sender’s profile by ID (to get their username, profile picture, and
|
||||
other details). This process of looking up the human-readable information by ID is called
|
||||
*hydrating* the IDs, and it is essentially a join performed in application code
|
||||
[[11](/en/ch3#Krikorian2012_ch3)].
|
||||
[^11].
|
||||
|
||||
The reason for storing only IDs in the precomputed timeline is that the data they refer to is
|
||||
fast-changing: the number of likes and replies may change multiple times per second on a popular
|
||||
|
|
@ -453,7 +453,7 @@ support are able to create such indexes on values inside a document.
|
|||
Data warehouses (see [“Data Warehousing”](/en/ch1#sec_introduction_dwh)) are usually relational, and there are a few
|
||||
widely-used conventions for the structure of tables in a data warehouse: a *star schema*,
|
||||
*snowflake schema*, *dimensional modeling*
|
||||
[[12](/en/ch3#Kimball2013_ch3)],
|
||||
[^12],
|
||||
and *one big table* (OBT). These structures are optimized for the needs of business analysts. ETL
|
||||
processes translate data from operational systems into this schema.
|
||||
|
||||
|
|
@ -498,7 +498,7 @@ product categories, and each row in the `dim_product` table could reference the
|
|||
as foreign keys, rather than storing them as strings in the `dim_product` table. Snowflake schemas
|
||||
are more normalized than star schemas, but star schemas are often preferred because
|
||||
they are simpler for analysts to work with
|
||||
[[12](/en/ch3#Kimball2013_ch3)].
|
||||
[^12].
|
||||
|
||||
In a typical data warehouse, tables are often quite wide: fact tables often have over 100 columns,
|
||||
sometimes several hundred. Dimension tables can also be wide, as they include all the metadata that
|
||||
|
|
@ -519,7 +519,7 @@ Some data warehouse schemas take denormalization even further and leave out the
|
|||
entirely, folding the information in the dimensions into denormalized columns on the fact table
|
||||
instead (essentially, precomputing the join between the fact table and the dimension tables). This
|
||||
approach is known as *one big table* (OBT), and while it requires more storage space, it sometimes
|
||||
enables faster queries [[13](/en/ch3#Kaminsky2022)].
|
||||
enables faster queries [^13].
|
||||
|
||||
In the context of analytics, such denormalization is unproblematic, since the data typically
|
||||
represents a log of historical data that is not going to change (except maybe for occasionally
|
||||
|
|
@ -564,23 +564,23 @@ reading, clients have no guarantees as to what fields the documents may contain.
|
|||
|
||||
Document databases are sometimes called *schemaless*, but that’s misleading, as the code that reads
|
||||
the data usually assumes some kind of structure—i.e., there is an implicit schema, but it is not
|
||||
enforced by the database [[17](/en/ch3#Schemaless)].
|
||||
enforced by the database [^17].
|
||||
A more accurate term is *schema-on-read* (the structure of the data is implicit, and only
|
||||
interpreted when the data is read), in contrast with *schema-on-write* (the traditional approach of
|
||||
relational databases, where the schema is explicit and the database ensures all data conforms to it
|
||||
when the data is written) [[18](/en/ch3#Awadallah2009)].
|
||||
when the data is written) [^18].
|
||||
|
||||
Schema-on-read is similar to dynamic (runtime) type checking in programming languages, whereas
|
||||
schema-on-write is similar to static (compile-time) type checking. Just as the advocates of static
|
||||
and dynamic type checking have big debates about their relative merits
|
||||
[[19](/en/ch3#Odersky2013)],
|
||||
[^19],
|
||||
enforcement of schemas in database is a contentious topic, and in general there’s no right or wrong
|
||||
answer.
|
||||
|
||||
The difference between the approaches is particularly noticeable in situations where an application
|
||||
wants to change the format of its data. For example, say you are currently storing each user’s full
|
||||
name in one field, and you instead want to store the first name and last name separately
|
||||
[[20](/en/ch3#Irwin2013)].
|
||||
[^20].
|
||||
In a document database, you would just start writing new documents with the new fields and have
|
||||
code in the application that handles the case when old documents are read. For example:
|
||||
|
||||
|
|
@ -647,12 +647,12 @@ However, the idea of storing related data together for locality is not limited t
|
|||
model. For example, Google’s Spanner database offers the same locality properties in a relational
|
||||
data model, by allowing the schema to declare that a table’s rows should be interleaved (nested)
|
||||
within a parent table
|
||||
[[25](/en/ch3#Corbett2012_ch2)].
|
||||
[^25].
|
||||
Oracle allows the same, using a feature called *multi-table index cluster tables*
|
||||
[[26](/en/ch3#BurlesonCluster)].
|
||||
[^26].
|
||||
The *wide-column* data model popularized by Google’s Bigtable, and used e.g. in HBase and Accumulo,
|
||||
has a concept of *column families*, which have a similar purpose of managing locality
|
||||
[[27](/en/ch3#Chang2006_ch3)].
|
||||
[^27].
|
||||
|
||||
### Query languages for documents
|
||||
|
||||
|
|
@ -663,9 +663,9 @@ to query for values inside documents, and some provide rich query languages.
|
|||
|
||||
XML databases are often queried using XQuery and XPath, which are designed to allow complex queries,
|
||||
including joins across multiple documents, and also format their results as XML
|
||||
[[28](/en/ch3#Walmsley2015)]. JSON Pointer
|
||||
[[29](/en/ch3#Bryan2013)] and JSONPath
|
||||
[[30](/en/ch3#Goessner2024)] provide an equivalent to XPath for JSON.
|
||||
[^28]. JSON Pointer
|
||||
[^29] and JSONPath
|
||||
[^30] provide an equivalent to XPath for JSON.
|
||||
|
||||
MongoDB’s aggregation pipeline, whose `$lookup` operator for joins we saw in
|
||||
[“Normalization, Denormalization, and Joins”](/en/ch3#sec_datamodels_normalization), is an example of a query language for collections of JSON
|
||||
|
|
@ -716,7 +716,7 @@ matter of taste.
|
|||
|
||||
Document databases and relational databases started out as very different approaches to data
|
||||
management, but they have grown more similar over time
|
||||
[[31](/en/ch3#Stonebraker2024)].
|
||||
[^31].
|
||||
Relational databases added support for JSON types and query operators, and the ability to index
|
||||
properties inside documents. Some document databases (such as MongoDB, Couchbase, and RethinkDB)
|
||||
added support for joins, secondary indexes, and declarative query languages.
|
||||
|
|
@ -730,7 +730,7 @@ combination.
|
|||
###### Note
|
||||
|
||||
Codd’s original description of the relational model
|
||||
[[3](/en/ch3#Codd1970)] actually allowed something similar to JSON
|
||||
[^3] actually allowed something similar to JSON
|
||||
within a relational schema. He called it *nonsimple domains*. The idea was that a value in a row
|
||||
doesn’t have to just be a primitive datatype like a number or a string, but it could also be a
|
||||
nested relation (table)—so you can have an arbitrarily nested tree structure as a value, much like
|
||||
|
|
@ -763,7 +763,7 @@ Well-known algorithms can operate on these graphs: for example, map navigation a
|
|||
the shortest path between two points in a road network, and
|
||||
PageRank can be used on the web graph to determine the
|
||||
popularity of a web page and thus its ranking in search results
|
||||
[[32](/en/ch3#Page1999)].
|
||||
[^32].
|
||||
|
||||
Graphs can be represented in several different ways. In the *adjacency list* model, each vertex
|
||||
stores the IDs of its neighbor vertices that are one edge away. Alternatively, you can use an
|
||||
|
|
@ -781,24 +781,24 @@ types of objects in a single database. For example:
|
|||
represent people, locations, events, checkins, and comments made by users; edges indicate which
|
||||
people are friends with each other, which checkin happened in which location, who commented on
|
||||
which post, who attended which event, and so on
|
||||
[[33](/en/ch3#Bronson2013)].
|
||||
[^33].
|
||||
* Knowledge graphs are used by search engines to record facts about entities that often occur in
|
||||
search queries, such as organizations, people, and places
|
||||
[[34](/en/ch3#Noy2019)].
|
||||
[^34].
|
||||
This information is obtained by crawling and analyzing the text on websites; some websites, such
|
||||
as Wikidata, also publish graph data in a structured form.
|
||||
|
||||
There are several different, but related, ways of structuring and querying data in graphs. In this
|
||||
section we will discuss the *property graph* model (implemented by Neo4j, Memgraph, KùzuDB
|
||||
[[35](/en/ch3#Feng2023)],
|
||||
and others [[36](/en/ch3#Besta2019)])
|
||||
[^35],
|
||||
and others [^36])
|
||||
and the *triple-store* model (implemented by Datomic, AllegroGraph, Blazegraph, and others). These
|
||||
models are fairly similar in what they can express, and some graph databases (such as Amazon
|
||||
Neptune) support both models.
|
||||
|
||||
We will also look at four query languages for graphs (Cypher, SPARQL, Datalog, and GraphQL), as well
|
||||
as SQL support for querying graphs. Other graph query languages exist, such as Gremlin
|
||||
[[37](/en/ch3#TinkerPop2023)],
|
||||
[^37],
|
||||
but these will give us a representative overview.
|
||||
|
||||
To illustrate these different languages and models, this section uses the graph shown in
|
||||
|
|
@ -902,12 +902,12 @@ extended to accommodate changes in your application’s data structures.
|
|||
|
||||
*Cypher* is a query language for property graphs, originally created for the Neo4j graph database,
|
||||
and later developed into an open standard as *openCypher*
|
||||
[[38](/en/ch3#Francis2018)].
|
||||
[^38].
|
||||
Besides Neo4j, Cypher is supported by Memgraph, KùzuDB
|
||||
[[35](/en/ch3#Feng2023)],
|
||||
[^35],
|
||||
Amazon Neptune, Apache AGE (with storage in PostgreSQL), and others. It is named after a character
|
||||
in the movie *The Matrix* and is not related to ciphers in cryptography
|
||||
[[39](/en/ch3#EifremTweet)].
|
||||
[^39].
|
||||
|
||||
[Example 3-4](/en/ch3#fig_cypher_create) shows the Cypher query to insert the lefthand portion of
|
||||
[Figure 3-6](/en/ch3#fig_datamodels_graph) into a graph database. The rest of the graph can be added similarly. Each
|
||||
|
|
@ -1069,17 +1069,17 @@ JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;
|
|||
The fact that a 4-line Cypher query requires 31 lines in SQL shows how much of a difference the
|
||||
right choice of data model and query language can make. And this is just the beginning; there are
|
||||
more details to consider, e.g., around handling cycles, and choosing between breadth-first or
|
||||
depth-first traversal [[40](/en/ch3#Tisiot2021)].
|
||||
depth-first traversal [^40].
|
||||
|
||||
Oracle has a different SQL extension for recursive queries, which it calls *hierarchical*
|
||||
[[41](/en/ch3#Goel2020)].
|
||||
[^41].
|
||||
|
||||
However, the situation may be improving: at the time of writing, there are plans to add a graph
|
||||
query language called GQL to the SQL standard [[42](/en/ch3#Deutsch2022),
|
||||
[43](/en/ch3#Green2019)],
|
||||
which will provide a syntax inspired by Cypher, GSQL
|
||||
[[44](/en/ch3#Deutsch2018)], and PGQL
|
||||
[[45](/en/ch3#vanRest2016)].
|
||||
[^44], and PGQL
|
||||
[^45].
|
||||
|
||||
## Triple-Stores and SPARQL
|
||||
|
||||
|
|
@ -1107,15 +1107,15 @@ The subject of a triple is equivalent to a vertex in a graph. The object is one
|
|||
|
||||
To be precise, databases that offer a triple-like data model often need to store some additional
|
||||
metadata on each tuple. For example, AWS Neptune uses quads (4-tuples) by adding a graph ID to each
|
||||
triple [[46](/en/ch3#NeptuneDataModel)];
|
||||
triple [^46];
|
||||
Datomic uses 5-tuples, extending each triple with a transaction ID and a boolean to indicate
|
||||
deletion [[47](/en/ch3#DatomicDataModel)].
|
||||
deletion [^47].
|
||||
Since these databases retain the basic *subject-predicate-object* structure explained above, this
|
||||
book nevertheless calls them triple-stores.
|
||||
|
||||
[Example 3-7](/en/ch3#fig_graph_n3_triples) shows the same data as in [Example 3-4](/en/ch3#fig_cypher_create), written as
|
||||
triples in a format called *Turtle*, a subset of *Notation3* (*N3*)
|
||||
[[48](/en/ch3#Beckett2011)].
|
||||
[^48].
|
||||
|
||||
##### Example 3-7. A subset of the data in [Figure 3-6](/en/ch3#fig_datamodels_graph), represented as Turtle triples
|
||||
|
||||
|
|
@ -1166,13 +1166,13 @@ Web as originally envisioned did not succeed
|
|||
[[49](/en/ch3#Target2018),
|
||||
[50](/en/ch3#MendelGleason2022)],
|
||||
the legacy of the Semantic Web project lives on in a couple of specific technologies: *linked data*
|
||||
standards such as JSON-LD [[51](/en/ch3#Sporny2014)],
|
||||
standards such as JSON-LD [^51],
|
||||
*ontologies* used in biomedical science
|
||||
[[52](/en/ch3#MichiganOntologies)],
|
||||
[^52],
|
||||
Facebook’s Open Graph protocol
|
||||
[[53](/en/ch3#OpenGraph)]
|
||||
[^53]
|
||||
(which is used for link unfurling
|
||||
[[54](/en/ch3#Haughey2015)]),
|
||||
[^54]),
|
||||
knowledge graphs such as Wikidata, and standardized vocabularies for structured data maintained by
|
||||
[`schema.org`](https://schema.org/).
|
||||
|
||||
|
|
@ -1184,7 +1184,7 @@ for applications.
|
|||
|
||||
The Turtle language we used in [Example 3-8](/en/ch3#fig_graph_n3_shorthand) is actually a way of encoding data in the
|
||||
*Resource Description Framework* (RDF)
|
||||
[[55](/en/ch3#W3CRDF)],
|
||||
[^55],
|
||||
a data model that was designed for the Semantic Web. RDF data can also be encoded in other ways, for
|
||||
example (more verbosely) in XML, as shown in [Example 3-9](/en/ch3#fig_graph_rdf_xml). Tools like Apache Jena can
|
||||
automatically convert between different RDF encodings.
|
||||
|
|
@ -1235,7 +1235,7 @@ just specify this prefix once at the top of the file, and then forget about it.
|
|||
### The SPARQL query language
|
||||
|
||||
*SPARQL* is a query language for triple-stores using the RDF data model
|
||||
[[56](/en/ch3#Harris2013)].
|
||||
[^56].
|
||||
(It is an acronym for *SPARQL Protocol and RDF Query Language*, pronounced “sparkle.”)
|
||||
It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite
|
||||
similar.
|
||||
|
|
@ -1275,7 +1275,7 @@ bound to any vertex that has a `name` property whose value is the string `"Unite
|
|||
```
|
||||
|
||||
SPARQL is supported by Amazon Neptune, AllegroGraph, Blazegraph, OpenLink Virtuoso, Apache Jena, and
|
||||
various other triple stores [[36](/en/ch3#Besta2019)].
|
||||
various other triple stores [^36].
|
||||
|
||||
## Datalog: Recursive Relational Queries
|
||||
|
||||
|
|
@ -1286,7 +1286,7 @@ Datalog is a much older language than SPARQL or Cypher: it arose from academic r
|
|||
It is less well known among software engineers and not widely supported in mainstream databases, but
|
||||
it ought to be better-known since it is a very expressive language that is particularly powerful for
|
||||
complex queries. Several niche databases, including Datomic, LogicBlox, CozoDB, and LinkedIn’s
|
||||
LIquid [[60](/en/ch3#Meyer2020)] use Datalog as
|
||||
LIquid [^60] use Datalog as
|
||||
their query language.
|
||||
|
||||
Datalog is actually based on a relational data model, not a graph, but it appears in the graph
|
||||
|
|
@ -1403,7 +1403,7 @@ APIs.
|
|||
GraphQL’s flexibility comes at a cost. Organizations that adopt GraphQL often need tooling to
|
||||
convert GraphQL queries into requests to internal services, which often use REST or gRPC (see
|
||||
[Chapter 5](/en/ch5#ch_encoding)). Authorization, rate limiting, and performance challenges are additional concerns
|
||||
[[61](/en/ch3#Bessey2024)].
|
||||
[^61].
|
||||
GraphQL’s query language is also limited since GraphQL come from an untrusted source. The language
|
||||
does not allow anything that could be expensive to execute, since otherwise users could perform
|
||||
denial-of-service attacks on a server by running lots of expensive queries. In particular, GraphQL
|
||||
|
|
@ -1547,7 +1547,7 @@ known as *event sourcing* [[62](/en/ch3#Betts2012),
|
|||
[63](/en/ch3#Young2014)].
|
||||
The principle of maintaining separate read-optimized representations and deriving them from the
|
||||
write-optimized representation is called *command query responsibility segregation (CQRS)*
|
||||
[[64](/en/ch3#Young2010)].
|
||||
[^64].
|
||||
These terms originated in the domain-driven design (DDD) community, although similar ideas have been
|
||||
around for a long time, for example in *state machine replication* (see [“Using shared logs”](/en/ch10#sec_consistency_smr)).
|
||||
|
||||
|
|
@ -1661,7 +1661,7 @@ users.
|
|||
|
||||
Dataframe APIs also offer a wide variety of operations that go far beyond what relational databases
|
||||
offer, and the data model is often used in ways that are very different from typical relational data
|
||||
modelling [[65](/en/ch3#Petersohn2020)].
|
||||
modelling [^65].
|
||||
For example, a common use of dataframes is to transform data from a relational-like representation
|
||||
into a matrix or multidimensional array representation, which is the form that many machine learning
|
||||
algorithms expect of their input.
|
||||
|
|
@ -1698,14 +1698,14 @@ into a matrix representation, while giving the data scientist control over the r
|
|||
is most suitable for achieving the goals of the data analysis or model training process.
|
||||
|
||||
There are also databases such as TileDB
|
||||
[[66](/en/ch3#Papadopoulos2016)]
|
||||
[^66]
|
||||
that specialize in storing large multidimensional arrays of numbers; they are called *array
|
||||
databases* and are most commonly used for scientific datasets such as geospatial measurements
|
||||
(raster data on a regularly spaced grid), medical imaging, or observations from astronomical
|
||||
telescopes [[67](/en/ch3#Rusu2022)].
|
||||
telescopes [^67].
|
||||
Dataframes are also used in the financial industry for representing *time series data*, such as the
|
||||
prices of assets and trades over time
|
||||
[[68](/en/ch3#Targett2023)].
|
||||
[^68].
|
||||
|
||||
# Summary
|
||||
|
||||
|
|
@ -1757,7 +1757,7 @@ a few brief examples:
|
|||
means taking one very long string (representing a DNA molecule) and matching it against a large
|
||||
database of strings that are similar, but not identical. None of the databases described here can
|
||||
handle this kind of usage, which is why researchers have written specialized genome database
|
||||
software like GenBank [[69](/en/ch3#Benson2007)].
|
||||
software like GenBank [^69].
|
||||
* Many financial systems use *ledgers* with double-entry accounting as their data model. This type
|
||||
of data can be represented in relational databases, but there are also databases such as
|
||||
TigerBeetle that specialize in this data model. Cryptocurrencies and blockchains are typically
|
||||
|
|
@ -1771,361 +1771,78 @@ come into play when *implementing* the data models described in this chapter.
|
|||
|
||||
##### Footnotes
|
||||
|
||||
|
||||
##### References
|
||||
|
||||
[[1](/en/ch3#Brandon2024-marker)] Jamie Brandon.
|
||||
[Unexplanations:
|
||||
query optimization works because sql is declarative](https://www.scattered-thoughts.net/writing/unexplanations-sql-declarative/). *scattered-thoughts.net*, February 2024.
|
||||
Archived at [perma.cc/P6W2-WMFZ](https://perma.cc/P6W2-WMFZ)
|
||||
|
||||
[[2](/en/ch3#Hellerstein2010-marker)] Joseph M. Hellerstein.
|
||||
[The Declarative
|
||||
Imperative: Experiences and Conjectures in Distributed Logic](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-90.pdf). Tech report UCB/EECS-2010-90,
|
||||
Electrical Engineering and Computer Sciences, University of California at Berkeley, June 2010.
|
||||
Archived at [perma.cc/K56R-VVQM](https://perma.cc/K56R-VVQM)
|
||||
|
||||
[[3](/en/ch3#Codd1970-marker)] Edgar F. Codd.
|
||||
[A Relational Model of Data for Large
|
||||
Shared Data Banks](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf). *Communications of the ACM*, volume 13, issue 6, pages 377–387, June 1970.
|
||||
[doi:10.1145/362384.362685](https://doi.org/10.1145/362384.362685)
|
||||
|
||||
[[4](/en/ch3#Stonebraker2005around-marker)] Michael Stonebraker and Joseph M. Hellerstein.
|
||||
[What Goes Around Comes Around](http://mitpress2.mit.edu/books/chapters/0262693143chapm1.pdf).
|
||||
In *Readings in Database Systems*, 4th edition, MIT Press, pages 2–41, 2005.
|
||||
ISBN: 9780262693141
|
||||
|
||||
[[5](/en/ch3#Winand2015-marker)] Markus Winand.
|
||||
[Modern SQL: Beyond Relational](https://modern-sql.com/). *modern-sql.com*, 2015.
|
||||
Archived at [perma.cc/D63V-WAPN](https://perma.cc/D63V-WAPN)
|
||||
|
||||
[[6](/en/ch3#Fowler2012-marker)] Martin Fowler.
|
||||
[OrmHate](https://martinfowler.com/bliki/OrmHate.html). *martinfowler.com*, May
|
||||
2012. Archived at [perma.cc/VCM8-PKNG](https://perma.cc/VCM8-PKNG)
|
||||
|
||||
[[7](/en/ch3#Mihalcea2023-marker)] Vlad Mihalcea.
|
||||
[N+1 query problem with JPA and Hibernate](https://vladmihalcea.com/n-plus-1-query-problem/).
|
||||
*vladmihalcea.com*, January 2023.
|
||||
Archived at [perma.cc/79EV-TZKB](https://perma.cc/79EV-TZKB)
|
||||
|
||||
[[8](/en/ch3#Schauder2023-marker)] Jens Schauder.
|
||||
[This
|
||||
is the Beginning of the End of the N+1 Problem: Introducing Single Query Loading](https://spring.io/blog/2023/08/31/this-is-the-beginning-of-the-end-of-the-n-1-problem-introducing-single-query). *spring.io*, August 2023.
|
||||
Archived at [perma.cc/6V96-R333](https://perma.cc/6V96-R333)
|
||||
|
||||
[[9](/en/ch3#Zola2014-marker)] William Zola.
|
||||
[6 Rules of
|
||||
Thumb for MongoDB Schema Design](https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design). *mongodb.com*, June 2014.
|
||||
Archived at [perma.cc/T2BZ-PPJB](https://perma.cc/T2BZ-PPJB)
|
||||
|
||||
[[10](/en/ch3#Andrews2023-marker)] Sidney Andrews and Christopher McClister.
|
||||
[Data modeling in
|
||||
Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data). *learn.microsoft.com*, February 2023. Archived at
|
||||
[archive.org](https://web.archive.org/web/20230207193233/https%3A//learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data)
|
||||
|
||||
[[11](/en/ch3#Krikorian2012_ch3-marker)] Raffi Krikorian.
|
||||
[Timelines at Scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability/).
|
||||
At *QCon San Francisco*, November 2012.
|
||||
Archived at [perma.cc/V9G5-KLYK](https://perma.cc/V9G5-KLYK)
|
||||
|
||||
[[12](/en/ch3#Kimball2013_ch3-marker)] Ralph Kimball and Margy Ross.
|
||||
[*The Data
|
||||
Warehouse Toolkit: The Definitive Guide to Dimensional Modeling*](https://learning.oreilly.com/library/view/the-data-warehouse/9781118530801/),
|
||||
3rd edition. John Wiley & Sons, July 2013. ISBN: 9781118530801
|
||||
|
||||
[[13](/en/ch3#Kaminsky2022-marker)] Michael Kaminsky.
|
||||
[Data warehouse modeling: Star schema vs.
|
||||
OBT](https://www.fivetran.com/blog/star-schema-vs-obt). *fivetran.com*, August 2022.
|
||||
Archived at [perma.cc/2PZK-BFFP](https://perma.cc/2PZK-BFFP)
|
||||
|
||||
[[14](/en/ch3#Nelson2018-marker)] Joe Nelson.
|
||||
[User-defined Order in
|
||||
SQL](https://begriffs.com/posts/2018-03-20-user-defined-order.html). *begriffs.com*, March 2018.
|
||||
Archived at [perma.cc/GS3W-F7AD](https://perma.cc/GS3W-F7AD)
|
||||
|
||||
[[15](/en/ch3#Wallace2017-marker)] Evan Wallace.
|
||||
[Realtime Editing of
|
||||
Ordered Sequences](https://www.figma.com/blog/realtime-editing-of-ordered-sequences/). *figma.com*, March 2017.
|
||||
Archived at [perma.cc/K6ER-CQZW](https://perma.cc/K6ER-CQZW)
|
||||
|
||||
[[16](/en/ch3#Greenspan2020-marker)] David Greenspan.
|
||||
[Implementing
|
||||
Fractional Indexing](https://observablehq.com/%40dgreensp/implementing-fractional-indexing). *observablehq.com*, October 2020.
|
||||
Archived at [perma.cc/5N4R-MREN](https://perma.cc/5N4R-MREN)
|
||||
|
||||
[[17](/en/ch3#Schemaless-marker)] Martin Fowler.
|
||||
[Schemaless Data Structures](https://martinfowler.com/articles/schemaless/).
|
||||
*martinfowler.com*, January 2013.
|
||||
|
||||
[[18](/en/ch3#Awadallah2009-marker)] Amr Awadallah.
|
||||
[Schema-on-Read vs.
|
||||
Schema-on-Write](https://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite). At *Berkeley EECS RAD Lab Retreat*, Santa Cruz, CA, May 2009.
|
||||
Archived at [perma.cc/DTB2-JCFR](https://perma.cc/DTB2-JCFR)
|
||||
|
||||
[[19](/en/ch3#Odersky2013-marker)] Martin Odersky.
|
||||
[The Trouble with Types](https://www.infoq.com/presentations/data-types-issues/).
|
||||
At *Strange Loop*, September 2013.
|
||||
Archived at [perma.cc/85QE-PVEP](https://perma.cc/85QE-PVEP)
|
||||
|
||||
[[20](/en/ch3#Irwin2013-marker)] Conrad Irwin.
|
||||
[MongoDB—Confessions
|
||||
of a PostgreSQL Lover](https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover). At *HTML5DevConf*, October 2013.
|
||||
Archived at [perma.cc/C2J6-3AL5](https://perma.cc/C2J6-3AL5)
|
||||
|
||||
[[21](/en/ch3#Percona2023-marker)] [Percona
|
||||
Toolkit Documentation: pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html). *docs.percona.com*, 2023.
|
||||
Archived at [perma.cc/9K8R-E5UH](https://perma.cc/9K8R-E5UH)
|
||||
|
||||
[[22](/en/ch3#Noach2016-marker)] Shlomi Noach.
|
||||
[gh-ost:
|
||||
GitHub’s Online Schema Migration Tool for MySQL](https://github.blog/2016-08-01-gh-ost-github-s-online-migration-tool-for-mysql/). *github.blog*, August 2016.
|
||||
Archived at [perma.cc/7XAG-XB72](https://perma.cc/7XAG-XB72)
|
||||
|
||||
[[23](/en/ch3#Mukherjee2022-marker)] Shayon Mukherjee.
|
||||
[pg-osc:
|
||||
Zero downtime schema changes in PostgreSQL](https://www.shayon.dev/post/2022/47/pg-osc-zero-downtime-schema-changes-in-postgresql/). *shayon.dev*, February 2022.
|
||||
Archived at [perma.cc/35WN-7WMY](https://perma.cc/35WN-7WMY)
|
||||
|
||||
[[24](/en/ch3#PerezAradros2023-marker)] Carlos Pérez-Aradros Herce.
|
||||
[Introducing pgroll: zero-downtime,
|
||||
reversible, schema migrations for Postgres](https://xata.io/blog/pgroll-schema-migrations-postgres). *xata.io*, October 2023. Archived at
|
||||
[archive.org](https://web.archive.org/web/20231008161750/https%3A//xata.io/blog/pgroll-schema-migrations-postgres)
|
||||
|
||||
[[25](/en/ch3#Corbett2012_ch2-marker)] James C. Corbett, Jeffrey Dean, Michael
|
||||
Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher
|
||||
Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd,
|
||||
Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Dale Woodford,
|
||||
Yasushi Saito, Christopher Taylor, Michal Szymaniak, and Ruth Wang.
|
||||
[Spanner: Google’s Globally-Distributed Database](https://research.google/pubs/pub39966/).
|
||||
At *10th USENIX Symposium on Operating System Design and Implementation* (OSDI),
|
||||
October 2012.
|
||||
|
||||
[[26](/en/ch3#BurlesonCluster-marker)] Donald K. Burleson.
|
||||
[Reduce I/O with Oracle
|
||||
Cluster Tables](http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm). *dba-oracle.com*.
|
||||
Archived at [perma.cc/7LBJ-9X2C](https://perma.cc/7LBJ-9X2C)
|
||||
|
||||
[[27](/en/ch3#Chang2006_ch3-marker)] Fay Chang, Jeffrey Dean, Sanjay Ghemawat,
|
||||
Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber.
|
||||
[Bigtable: A Distributed Storage System for
|
||||
Structured Data](https://research.google/pubs/pub27898/). At *7th USENIX Symposium on Operating System Design and Implementation*
|
||||
(OSDI), November 2006.
|
||||
|
||||
[[28](/en/ch3#Walmsley2015-marker)] Priscilla Walmsley.
|
||||
[*XQuery,
|
||||
2nd Edition*](https://learning.oreilly.com/library/view/xquery-2nd-edition/9781491915080/). O’Reilly Media, December 2015. ISBN: 9781491915080
|
||||
|
||||
[[29](/en/ch3#Bryan2013-marker)] Paul C. Bryan, Kris Zyp, and Mark Nottingham.
|
||||
[JavaScript Object Notation (JSON) Pointer](https://www.rfc-editor.org/rfc/rfc6901).
|
||||
RFC 6901, IETF, April 2013.
|
||||
|
||||
[[30](/en/ch3#Goessner2024-marker)] Stefan Gössner, Glyn Normington, and Carsten Bormann.
|
||||
[JSONPath: Query Expressions for JSON](https://www.rfc-editor.org/rfc/rfc9535.html).
|
||||
RFC 9535, IETF, February 2024.
|
||||
|
||||
[[31](/en/ch3#Stonebraker2024-marker)] Michael Stonebraker and Andrew Pavlo.
|
||||
[What Goes Around Comes
|
||||
Around… And Around…](https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf). *ACM SIGMOD Record*, volume 53, issue 2, pages 21–37.
|
||||
[doi:10.1145/3685980.3685984](https://doi.org/10.1145/3685980.3685984)
|
||||
|
||||
[[32](/en/ch3#Page1999-marker)] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd.
|
||||
[The PageRank Citation Ranking: Bringing Order to the Web](http://ilpubs.stanford.edu:8090/422/).
|
||||
Technical Report 1999-66, Stanford University InfoLab, November 1999.
|
||||
Archived at [perma.cc/UML9-UZHW](https://perma.cc/UML9-UZHW)
|
||||
|
||||
[[33](/en/ch3#Bronson2013-marker)] Nathan Bronson, Zach Amsden, George Cabrera,
|
||||
Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li,
|
||||
Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani.
|
||||
[TAO:
|
||||
Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/conference/atc13/technical-sessions/presentation/bronson). At *USENIX Annual Technical
|
||||
Conference* (ATC), June 2013.
|
||||
|
||||
[[34](/en/ch3#Noy2019-marker)] Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan,
|
||||
Alan Patterson, and Jamie Taylor.
|
||||
[Industry-Scale
|
||||
Knowledge Graphs: Lessons and Challenges](https://cacm.acm.org/magazines/2019/8/238342-industry-scale-knowledge-graphs/fulltext). *Communications of the ACM*, volume 62, issue
|
||||
8, pages 36–43, August 2019.
|
||||
[doi:10.1145/3331166](https://doi.org/10.1145/3331166)
|
||||
|
||||
[[35](/en/ch3#Feng2023-marker)] Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu.
|
||||
[KÙZU Graph Database Management System](https://www.cidrdb.org/cidr2023/papers/p48-jin.pdf).
|
||||
At *3th Annual Conference on Innovative Data Systems Research* (CIDR 2023), January 2023.
|
||||
|
||||
[[36](/en/ch3#Besta2019-marker)] Maciej Besta, Emanuel Peter, Robert
|
||||
Gerstenberger, Marc Fischer, Michał Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler.
|
||||
[Demystifying Graph Databases: Analysis and Taxonomy
|
||||
of Data Organization, System Designs, and Graph Queries](https://arxiv.org/pdf/1910.09017.pdf). *arxiv.org*, October 2019.
|
||||
|
||||
[[37](/en/ch3#TinkerPop2023-marker)] [Apache
|
||||
TinkerPop 3.6.3 Documentation](https://tinkerpop.apache.org/docs/3.6.3/reference/). *tinkerpop.apache.org*, May 2023.
|
||||
Archived at [perma.cc/KM7W-7PAT](https://perma.cc/KM7W-7PAT)
|
||||
|
||||
[[38](/en/ch3#Francis2018-marker)] Nadime Francis, Alastair Green, Paolo Guagliardo,
|
||||
Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and
|
||||
Andrés Taylor. [Cypher: An Evolving Query
|
||||
Language for Property Graphs](https://core.ac.uk/download/pdf/158372754.pdf). At *International Conference on Management of Data*
|
||||
(SIGMOD), pages 1433–1445, May 2018.
|
||||
[doi:10.1145/3183713.3190657](https://doi.org/10.1145/3183713.3190657)
|
||||
|
||||
[[39](/en/ch3#EifremTweet-marker)] Emil Eifrem.
|
||||
[Twitter correspondence](https://twitter.com/emileifrem/status/419107961512804352),
|
||||
January 2014. Archived at [perma.cc/WM4S-BW64](https://perma.cc/WM4S-BW64)
|
||||
|
||||
[[40](/en/ch3#Tisiot2021-marker)] Francesco Tisiot.
|
||||
[Explore
|
||||
the new SEARCH and CYCLE features in PostgreSQL® 14](https://aiven.io/blog/explore-the-new-search-and-cycle-features-in-postgresql-14). *aiven.io*, December 2021.
|
||||
Archived at [perma.cc/J6BT-83UZ](https://perma.cc/J6BT-83UZ)
|
||||
|
||||
[[41](/en/ch3#Goel2020-marker)] Gaurav Goel.
|
||||
[Understanding
|
||||
Hierarchies in Oracle](https://towardsdatascience.com/understanding-hierarchies-in-oracle-43f85561f3d9). *towardsdatascience.com*, May 2020.
|
||||
Archived at [perma.cc/5ZLR-Q7EW](https://perma.cc/5ZLR-Q7EW)
|
||||
|
||||
[[42](/en/ch3#Deutsch2022-marker)] Alin
|
||||
Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor
|
||||
Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest,
|
||||
Hannes Voigt, Domagoj Vrgoč, Mingxi Wu, and Fred Zemke.
|
||||
[Graph Pattern Matching in GQL and SQL/PGQ](https://arxiv.org/abs/2112.06217).
|
||||
At *International Conference on Management of Data* (SIGMOD), pages 2246–2258, June 2022.
|
||||
[doi:10.1145/3514221.3526057](https://doi.org/10.1145/3514221.3526057)
|
||||
|
||||
[[43](/en/ch3#Green2019-marker)] Alastair Green.
|
||||
[SQL... and now GQL](https://opencypher.org/articles/2019/09/12/SQL-and-now-GQL/).
|
||||
*opencypher.org*, September 2019.
|
||||
Archived at [perma.cc/AFB2-3SY7](https://perma.cc/AFB2-3SY7)
|
||||
|
||||
[[44](/en/ch3#Deutsch2018-marker)] Alin Deutsch, Yu Xu, and Mingxi Wu.
|
||||
[Seamless
|
||||
Syntactic and Semantic Integration of Query Primitives over Relational and Graph Data in GSQL](https://cdn2.hubspot.net/hubfs/4114546/IntegrationQuery%20PrimitivesGSQL.pdf).
|
||||
*tigergraph.com*, November 2018.
|
||||
Archived at [perma.cc/JG7J-Y35X](https://perma.cc/JG7J-Y35X)
|
||||
|
||||
[[45](/en/ch3#vanRest2016-marker)] Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming
|
||||
Meng, and Hassan Chafi. [PGQL: a property
|
||||
graph query language](https://event.cwi.nl/grades/2016/07-VanRest.pdf). At *4th International Workshop on Graph Data Management Experiences and
|
||||
Systems* (GRADES), June 2016.
|
||||
[doi:10.1145/2960414.2960421](https://doi.org/10.1145/2960414.2960421)
|
||||
|
||||
[[46](/en/ch3#NeptuneDataModel-marker)] Amazon Web Services.
|
||||
[Neptune
|
||||
Graph Data Model](https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html). Amazon Neptune User Guide, *docs.aws.amazon.com*.
|
||||
Archived at [perma.cc/CX3T-EZU9](https://perma.cc/CX3T-EZU9)
|
||||
|
||||
[[47](/en/ch3#DatomicDataModel-marker)] Cognitect.
|
||||
[Datomic Data Model](https://docs.datomic.com/cloud/whatis/data-model.html).
|
||||
Datomic Cloud Documentation, *docs.datomic.com*.
|
||||
Archived at [perma.cc/LGM9-LEUT](https://perma.cc/LGM9-LEUT)
|
||||
|
||||
[[48](/en/ch3#Beckett2011-marker)] David Beckett and Tim Berners-Lee.
|
||||
[Turtle – Terse RDF Triple Language](https://www.w3.org/TeamSubmission/turtle/).
|
||||
W3C Team Submission, March 2011.
|
||||
|
||||
[[49](/en/ch3#Target2018-marker)] Sinclair Target.
|
||||
[Whatever Happened to the Semantic
|
||||
Web?](https://twobithistory.org/2018/05/27/semantic-web.html) *twobithistory.org*, May 2018.
|
||||
Archived at [perma.cc/M8GL-9KHS](https://perma.cc/M8GL-9KHS)
|
||||
|
||||
[[50](/en/ch3#MendelGleason2022-marker)] Gavin Mendel-Gleason.
|
||||
[The Semantic Web is Dead – Long Live
|
||||
the Semantic Web!](https://terminusdb.com/blog/the-semantic-web-is-dead/) *terminusdb.com*, August 2022.
|
||||
Archived at [perma.cc/G2MZ-DSS3](https://perma.cc/G2MZ-DSS3)
|
||||
|
||||
[[51](/en/ch3#Sporny2014-marker)] Manu Sporny.
|
||||
[JSON-LD and Why I Hate the Semantic Web](http://manu.sporny.org/2014/json-ld-origins-2/).
|
||||
*manu.sporny.org*, January 2014.
|
||||
Archived at [perma.cc/7PT4-PJKF](https://perma.cc/7PT4-PJKF)
|
||||
|
||||
[[52](/en/ch3#MichiganOntologies-marker)] University of Michigan Library.
|
||||
[Biomedical Ontologies and Controlled Vocabularies](https://guides.lib.umich.edu/ontology),
|
||||
*guides.lib.umich.edu/ontology*.
|
||||
Archived at [perma.cc/Q5GA-F2N8](https://perma.cc/Q5GA-F2N8)
|
||||
|
||||
[[53](/en/ch3#OpenGraph-marker)] Facebook.
|
||||
[The Open Graph protocol](https://ogp.me/), *ogp.me*.
|
||||
Archived at [perma.cc/C49A-GUSY](https://perma.cc/C49A-GUSY)
|
||||
|
||||
[[54](/en/ch3#Haughey2015-marker)] Matt Haughey.
|
||||
[Everything
|
||||
you ever wanted to know about unfurling but were afraid to ask /or/ How to make your site previews
|
||||
look amazing in Slack](https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254). *medium.com*, November 2015.
|
||||
Archived at [perma.cc/C7S8-4PZN](https://perma.cc/C7S8-4PZN)
|
||||
|
||||
[[55](/en/ch3#W3CRDF-marker)] W3C RDF Working Group.
|
||||
[Resource Description Framework (RDF)](https://www.w3.org/RDF/).
|
||||
*w3.org*, February 2004.
|
||||
|
||||
[[56](/en/ch3#Harris2013-marker)] Steve Harris, Andy Seaborne, and Eric
|
||||
Prud’hommeaux. [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/).
|
||||
W3C Recommendation, March 2013.
|
||||
|
||||
[[57](/en/ch3#Green2013-marker)] Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou.
|
||||
[Datalog and Recursive
|
||||
Query Processing](http://blogs.evergreen.edu/sosw/files/2014/04/Green-Vol5-DBS-017.pdf). *Foundations and Trends in Databases*, volume 5, issue 2, pages 105–195,
|
||||
November 2013. [doi:10.1561/1900000017](https://doi.org/10.1561/1900000017)
|
||||
|
||||
[[58](/en/ch3#Ceri1989-marker)] Stefano Ceri, Georg Gottlob, and Letizia Tanca.
|
||||
[What
|
||||
You Always Wanted to Know About Datalog (And Never Dared to Ask)](https://www.researchgate.net/profile/Letizia_Tanca/publication/3296132_What_you_always_wanted_to_know_about_Datalog_and_never_dared_to_ask/links/0fcfd50ca2d20473ca000000.pdf). *IEEE Transactions on
|
||||
Knowledge and Data Engineering*, volume 1, issue 1, pages 146–166, March 1989.
|
||||
[doi:10.1109/69.43410](https://doi.org/10.1109/69.43410)
|
||||
|
||||
[[59](/en/ch3#Abiteboul1995-marker)] Serge Abiteboul, Richard Hull, and Victor Vianu.
|
||||
[*Foundations of Databases*](http://webdam.inria.fr/Alice/). Addison-Wesley, 1995.
|
||||
ISBN: 9780201537710, available online at
|
||||
[*webdam.inria.fr/Alice*](http://webdam.inria.fr/Alice/)
|
||||
|
||||
[[60](/en/ch3#Meyer2020-marker)] Scott Meyer, Andrew Carter, and Andrew Rodriguez.
|
||||
[LIquid:
|
||||
The soul of a new graph database, Part 2](https://engineering.linkedin.com/blog/2020/liquid--the-soul-of-a-new-graph-database--part-2). *engineering.linkedin.com*, September 2020.
|
||||
Archived at [perma.cc/K9M4-PD6Q](https://perma.cc/K9M4-PD6Q)
|
||||
|
||||
[[61](/en/ch3#Bessey2024-marker)] Matt Bessey.
|
||||
[Why, after 6 years, I’m over
|
||||
GraphQL](https://bessey.dev/blog/2024/05/24/why-im-over-graphql/). *bessey.dev*, May 2024. Archived at
|
||||
[perma.cc/2PAU-JYRA](https://perma.cc/2PAU-JYRA)
|
||||
|
||||
[[62](/en/ch3#Betts2012-marker)] Dominic Betts, Julián
|
||||
Domínguez, Grigori Melnik, Fernando Simonazzi, and Mani Subramanian.
|
||||
[*Exploring
|
||||
CQRS and Event Sourcing*](https://learn.microsoft.com/en-us/previous-versions/msp-n-p/jj554200%28v%3Dpandp.10%29). Microsoft Patterns & Practices, July 2012.
|
||||
ISBN: 1621140164, archived at [perma.cc/7A39-3NM8](https://perma.cc/7A39-3NM8)
|
||||
|
||||
[[63](/en/ch3#Young2014-marker)] Greg Young.
|
||||
[CQRS and Event Sourcing](https://www.youtube.com/watch?v=JHGkaShoyNs). At *Code on
|
||||
the Beach*, August 2014.
|
||||
|
||||
[[64](/en/ch3#Young2010-marker)] Greg Young.
|
||||
[CQRS Documents](https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf).
|
||||
*cqrs.wordpress.com*, November 2010.
|
||||
Archived at [perma.cc/X5R6-R47F](https://perma.cc/X5R6-R47F)
|
||||
|
||||
[[65](/en/ch3#Petersohn2020-marker)] Devin Petersohn, Stephen Macke, Doris
|
||||
Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D.
|
||||
Joseph, and Aditya Parameswaran.
|
||||
[Towards Scalable Dataframe Systems](https://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf).
|
||||
*Proceedings of the VLDB Endowment*, volume 13, issue 11, pages 2033–2046.
|
||||
[doi:10.14778/3407790.3407807](https://doi.org/10.14778/3407790.3407807)
|
||||
|
||||
[[66](/en/ch3#Papadopoulos2016-marker)] Stavros Papadopoulos, Kushal Datta, Samuel
|
||||
Madden, and Timothy Mattson.
|
||||
[The TileDB Array Data Storage Manager](https://www.vldb.org/pvldb/vol10/p349-papadopoulos.pdf).
|
||||
*Proceedings of the VLDB Endowment*, volume 10, issue 4, pages 349–360, November 2016.
|
||||
[doi:10.14778/3025111.3025117](https://doi.org/10.14778/3025111.3025117)
|
||||
|
||||
[[67](/en/ch3#Rusu2022-marker)] Florin Rusu.
|
||||
[Multidimensional
|
||||
Array Data Management](https://faculty.ucmerced.edu/frusu/Papers/Report/2022-09-fntdb-arrays.pdf). *Foundations and Trends in Databases*, volume 12, numbers 2–3,
|
||||
pages 69–220, February 2023.
|
||||
[doi:10.1561/1900000069](https://doi.org/10.1561/1900000069)
|
||||
|
||||
[[68](/en/ch3#Targett2023-marker)] Ed Targett.
|
||||
[Bloomberg,
|
||||
Man Group team up to develop open source “ArcticDB” database](https://www.thestack.technology/bloomberg-man-group-arcticdb-database-dataframe/). *thestack.technology*,
|
||||
March 2023. Archived at [perma.cc/M5YD-QQYV](https://perma.cc/M5YD-QQYV)
|
||||
|
||||
[[69](/en/ch3#Benson2007-marker)] Dennis A. Benson, Ilene
|
||||
Karsch-Mizrachi, David J. Lipman, James Ostell, and David L. Wheeler.
|
||||
[GenBank](https://academic.oup.com/nar/article/36/suppl_1/D25/2507746).
|
||||
*Nucleic Acids Research*, volume 36, database issue, pages D25–D30, December 2007.
|
||||
[doi:10.1093/nar/gkm929](https://doi.org/10.1093/nar/gkm929)
|
||||
|
||||
|
||||
[^1]: Jamie Brandon. [Unexplanations: query optimization works because sql is declarative](https://www.scattered-thoughts.net/writing/unexplanations-sql-declarative/). *scattered-thoughts.net*, February 2024. Archived at [perma.cc/P6W2-WMFZ](https://perma.cc/P6W2-WMFZ)
|
||||
[^2]: Joseph M. Hellerstein. [The Declarative Imperative: Experiences and Conjectures in Distributed Logic](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-90.pdf). Tech report UCB/EECS-2010-90, Electrical Engineering and Computer Sciences, University of California at Berkeley, June 2010. Archived at [perma.cc/K56R-VVQM](https://perma.cc/K56R-VVQM)
|
||||
[^3]: Edgar F. Codd. [A Relational Model of Data for Large Shared Data Banks](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf). *Communications of the ACM*, volume 13, issue 6, pages 377–387, June 1970. [doi:10.1145/362384.362685](https://doi.org/10.1145/362384.362685)
|
||||
[^4]: Michael Stonebraker and Joseph M. Hellerstein. [What Goes Around Comes Around](http://mitpress2.mit.edu/books/chapters/0262693143chapm1.pdf). In *Readings in Database Systems*, 4th edition, MIT Press, pages 2–41, 2005. ISBN: 9780262693141
|
||||
[^5]: Markus Winand. [Modern SQL: Beyond Relational](https://modern-sql.com/). *modern-sql.com*, 2015. Archived at [perma.cc/D63V-WAPN](https://perma.cc/D63V-WAPN)
|
||||
[^6]: Martin Fowler. [OrmHate](https://martinfowler.com/bliki/OrmHate.html). *martinfowler.com*, May 2012. Archived at [perma.cc/VCM8-PKNG](https://perma.cc/VCM8-PKNG)
|
||||
[^7]: Vlad Mihalcea. [N+1 query problem with JPA and Hibernate](https://vladmihalcea.com/n-plus-1-query-problem/). *vladmihalcea.com*, January 2023. Archived at [perma.cc/79EV-TZKB](https://perma.cc/79EV-TZKB)
|
||||
[^8]: Jens Schauder. [This is the Beginning of the End of the N+1 Problem: Introducing Single Query Loading](https://spring.io/blog/2023/08/31/this-is-the-beginning-of-the-end-of-the-n-1-problem-introducing-single-query). *spring.io*, August 2023. Archived at [perma.cc/6V96-R333](https://perma.cc/6V96-R333)
|
||||
[^9]: William Zola. [6 Rules of Thumb for MongoDB Schema Design](https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design). *mongodb.com*, June 2014. Archived at [perma.cc/T2BZ-PPJB](https://perma.cc/T2BZ-PPJB)
|
||||
[^10]: Sidney Andrews and Christopher McClister. [Data modeling in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data). *learn.microsoft.com*, February 2023. Archived at [archive.org](https://web.archive.org/web/20230207193233/https%3A//learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data)
|
||||
[^11]: Raffi Krikorian. [Timelines at Scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability/). At *QCon San Francisco*, November 2012. Archived at [perma.cc/V9G5-KLYK](https://perma.cc/V9G5-KLYK)
|
||||
[^12]: Ralph Kimball and Margy Ross. [*The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling*](https://learning.oreilly.com/library/view/the-data-warehouse/9781118530801/), 3rd edition. John Wiley & Sons, July 2013. ISBN: 9781118530801
|
||||
[^13]: Michael Kaminsky. [Data warehouse modeling: Star schema vs. OBT](https://www.fivetran.com/blog/star-schema-vs-obt). *fivetran.com*, August 2022. Archived at [perma.cc/2PZK-BFFP](https://perma.cc/2PZK-BFFP)
|
||||
[^14]: Joe Nelson. [User-defined Order in SQL](https://begriffs.com/posts/2018-03-20-user-defined-order.html). *begriffs.com*, March 2018. Archived at [perma.cc/GS3W-F7AD](https://perma.cc/GS3W-F7AD)
|
||||
[^15]: Evan Wallace. [Realtime Editing of Ordered Sequences](https://www.figma.com/blog/realtime-editing-of-ordered-sequences/). *figma.com*, March 2017. Archived at [perma.cc/K6ER-CQZW](https://perma.cc/K6ER-CQZW)
|
||||
[^16]: David Greenspan. [Implementing Fractional Indexing](https://observablehq.com/%40dgreensp/implementing-fractional-indexing). *observablehq.com*, October 2020. Archived at [perma.cc/5N4R-MREN](https://perma.cc/5N4R-MREN)
|
||||
[^17]: Martin Fowler. [Schemaless Data Structures](https://martinfowler.com/articles/schemaless/). *martinfowler.com*, January 2013.
|
||||
[^18]: Amr Awadallah. [Schema-on-Read vs. Schema-on-Write](https://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite). At *Berkeley EECS RAD Lab Retreat*, Santa Cruz, CA, May 2009. Archived at [perma.cc/DTB2-JCFR](https://perma.cc/DTB2-JCFR)
|
||||
[^19]: Martin Odersky. [The Trouble with Types](https://www.infoq.com/presentations/data-types-issues/). At *Strange Loop*, September 2013. Archived at [perma.cc/85QE-PVEP](https://perma.cc/85QE-PVEP)
|
||||
[^20]: Conrad Irwin. [MongoDB—Confessions of a PostgreSQL Lover](https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover). At *HTML5DevConf*, October 2013. Archived at [perma.cc/C2J6-3AL5](https://perma.cc/C2J6-3AL5)
|
||||
[^21]: [Percona Toolkit Documentation: pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html). *docs.percona.com*, 2023. Archived at [perma.cc/9K8R-E5UH](https://perma.cc/9K8R-E5UH)
|
||||
[^22]: Shlomi Noach. [gh-ost: GitHub’s Online Schema Migration Tool for MySQL](https://github.blog/2016-08-01-gh-ost-github-s-online-migration-tool-for-mysql/). *github.blog*, August 2016. Archived at [perma.cc/7XAG-XB72](https://perma.cc/7XAG-XB72)
|
||||
[^23]: Shayon Mukherjee. [pg-osc: Zero downtime schema changes in PostgreSQL](https://www.shayon.dev/post/2022/47/pg-osc-zero-downtime-schema-changes-in-postgresql/). *shayon.dev*, February 2022. Archived at [perma.cc/35WN-7WMY](https://perma.cc/35WN-7WMY)
|
||||
[^24]: Carlos Pérez-Aradros Herce. [Introducing pgroll: zero-downtime, reversible, schema migrations for Postgres](https://xata.io/blog/pgroll-schema-migrations-postgres). *xata.io*, October 2023. Archived at [archive.org](https://web.archive.org/web/20231008161750/https%3A//xata.io/blog/pgroll-schema-migrations-postgres)
|
||||
[^25]: James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Dale Woodford, Yasushi Saito, Christopher Taylor, Michal Szymaniak, and Ruth Wang. [Spanner: Google’s Globally-Distributed Database](https://research.google/pubs/pub39966/). At *10th USENIX Symposium on Operating System Design and Implementation* (OSDI), October 2012.
|
||||
[^26]: Donald K. Burleson. [Reduce I/O with Oracle Cluster Tables](http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm). *dba-oracle.com*. Archived at [perma.cc/7LBJ-9X2C](https://perma.cc/7LBJ-9X2C)
|
||||
[^27]: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. [Bigtable: A Distributed Storage System for Structured Data](https://research.google/pubs/pub27898/). At *7th USENIX Symposium on Operating System Design and Implementation* (OSDI), November 2006.
|
||||
[^28]: Priscilla Walmsley. [*XQuery, 2nd Edition*](https://learning.oreilly.com/library/view/xquery-2nd-edition/9781491915080/). O’Reilly Media, December 2015. ISBN: 9781491915080
|
||||
[^29]: Paul C. Bryan, Kris Zyp, and Mark Nottingham. [JavaScript Object Notation (JSON) Pointer](https://www.rfc-editor.org/rfc/rfc6901). RFC 6901, IETF, April 2013.
|
||||
[^30]: Stefan Gössner, Glyn Normington, and Carsten Bormann. [JSONPath: Query Expressions for JSON](https://www.rfc-editor.org/rfc/rfc9535.html). RFC 9535, IETF, February 2024.
|
||||
[^31]: Michael Stonebraker and Andrew Pavlo. [What Goes Around Comes Around… And Around…](https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf). *ACM SIGMOD Record*, volume 53, issue 2, pages 21–37. [doi:10.1145/3685980.3685984](https://doi.org/10.1145/3685980.3685984)
|
||||
[^32]: Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. [The PageRank Citation Ranking: Bringing Order to the Web](http://ilpubs.stanford.edu:8090/422/). Technical Report 1999-66, Stanford University InfoLab, November 1999. Archived at [perma.cc/UML9-UZHW](https://perma.cc/UML9-UZHW)
|
||||
[^33]: Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. [TAO: Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/conference/atc13/technical-sessions/presentation/bronson). At *USENIX Annual Technical Conference* (ATC), June 2013.
|
||||
[^34]: Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. [Industry-Scale Knowledge Graphs: Lessons and Challenges](https://cacm.acm.org/magazines/2019/8/238342-industry-scale-knowledge-graphs/fulltext). *Communications of the ACM*, volume 62, issue 8, pages 36–43, August 2019. [doi:10.1145/3331166](https://doi.org/10.1145/3331166)
|
||||
[^35]: Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. [KÙZU Graph Database Management System](https://www.cidrdb.org/cidr2023/papers/p48-jin.pdf). At *3th Annual Conference on Innovative Data Systems Research* (CIDR 2023), January 2023.
|
||||
[^36]: Maciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer, Michał Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler. [Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries](https://arxiv.org/pdf/1910.09017.pdf). *arxiv.org*, October 2019.
|
||||
[^37]: [Apache TinkerPop 3.6.3 Documentation](https://tinkerpop.apache.org/docs/3.6.3/reference/). *tinkerpop.apache.org*, May 2023. Archived at [perma.cc/KM7W-7PAT](https://perma.cc/KM7W-7PAT)
|
||||
[^38]: Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. [Cypher: An Evolving Query Language for Property Graphs](https://core.ac.uk/download/pdf/158372754.pdf). At *International Conference on Management of Data* (SIGMOD), pages 1433–1445, May 2018. [doi:10.1145/3183713.3190657](https://doi.org/10.1145/3183713.3190657)
|
||||
[^39]: Emil Eifrem. [Twitter correspondence](https://twitter.com/emileifrem/status/419107961512804352), January 2014. Archived at [perma.cc/WM4S-BW64](https://perma.cc/WM4S-BW64)
|
||||
[^40]: Francesco Tisiot. [Explore the new SEARCH and CYCLE features in PostgreSQL® 14](https://aiven.io/blog/explore-the-new-search-and-cycle-features-in-postgresql-14). *aiven.io*, December 2021. Archived at [perma.cc/J6BT-83UZ](https://perma.cc/J6BT-83UZ)
|
||||
[^41]: Gaurav Goel. [Understanding Hierarchies in Oracle](https://towardsdatascience.com/understanding-hierarchies-in-oracle-43f85561f3d9). *towardsdatascience.com*, May 2020. Archived at [perma.cc/5ZLR-Q7EW](https://perma.cc/5ZLR-Q7EW)
|
||||
[^42]: Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoč, Mingxi Wu, and Fred Zemke. [Graph Pattern Matching in GQL and SQL/PGQ](https://arxiv.org/abs/2112.06217). At *International Conference on Management of Data* (SIGMOD), pages 2246–2258, June 2022. [doi:10.1145/3514221.3526057](https://doi.org/10.1145/3514221.3526057)
|
||||
[^43]: Alastair Green. [SQL... and now GQL](https://opencypher.org/articles/2019/09/12/SQL-and-now-GQL/). *opencypher.org*, September 2019. Archived at [perma.cc/AFB2-3SY7](https://perma.cc/AFB2-3SY7)
|
||||
[^44]: Alin Deutsch, Yu Xu, and Mingxi Wu. [Seamless Syntactic and Semantic Integration of Query Primitives over Relational and Graph Data in GSQL](https://cdn2.hubspot.net/hubfs/4114546/IntegrationQuery%20PrimitivesGSQL.pdf). *tigergraph.com*, November 2018. Archived at [perma.cc/JG7J-Y35X](https://perma.cc/JG7J-Y35X)
|
||||
[^45]: Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. [PGQL: a property graph query language](https://event.cwi.nl/grades/2016/07-VanRest.pdf). At *4th International Workshop on Graph Data Management Experiences and Systems* (GRADES), June 2016. [doi:10.1145/2960414.2960421](https://doi.org/10.1145/2960414.2960421)
|
||||
[^46]: Amazon Web Services. [Neptune Graph Data Model](https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html). Amazon Neptune User Guide, *docs.aws.amazon.com*. Archived at [perma.cc/CX3T-EZU9](https://perma.cc/CX3T-EZU9)
|
||||
[^47]: Cognitect. [Datomic Data Model](https://docs.datomic.com/cloud/whatis/data-model.html). Datomic Cloud Documentation, *docs.datomic.com*. Archived at [perma.cc/LGM9-LEUT](https://perma.cc/LGM9-LEUT)
|
||||
[^48]: David Beckett and Tim Berners-Lee. [Turtle – Terse RDF Triple Language](https://www.w3.org/TeamSubmission/turtle/). W3C Team Submission, March 2011.
|
||||
[^49]: Sinclair Target. [Whatever Happened to the Semantic Web?](https://twobithistory.org/2018/05/27/semantic-web.html) *twobithistory.org*, May 2018. Archived at [perma.cc/M8GL-9KHS](https://perma.cc/M8GL-9KHS)
|
||||
[^50]: Gavin Mendel-Gleason. [The Semantic Web is Dead – Long Live the Semantic Web!](https://terminusdb.com/blog/the-semantic-web-is-dead/) *terminusdb.com*, August 2022. Archived at [perma.cc/G2MZ-DSS3](https://perma.cc/G2MZ-DSS3)
|
||||
[^51]: Manu Sporny. [JSON-LD and Why I Hate the Semantic Web](http://manu.sporny.org/2014/json-ld-origins-2/). *manu.sporny.org*, January 2014. Archived at [perma.cc/7PT4-PJKF](https://perma.cc/7PT4-PJKF)
|
||||
[^52]: University of Michigan Library. [Biomedical Ontologies and Controlled Vocabularies](https://guides.lib.umich.edu/ontology), *guides.lib.umich.edu/ontology*. Archived at [perma.cc/Q5GA-F2N8](https://perma.cc/Q5GA-F2N8)
|
||||
[^53]: Facebook. [The Open Graph protocol](https://ogp.me/), *ogp.me*. Archived at [perma.cc/C49A-GUSY](https://perma.cc/C49A-GUSY)
|
||||
[^54]: Matt Haughey. [Everything you ever wanted to know about unfurling but were afraid to ask /or/ How to make your site previews look amazing in Slack](https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254). *medium.com*, November 2015. Archived at [perma.cc/C7S8-4PZN](https://perma.cc/C7S8-4PZN)
|
||||
[^55]: W3C RDF Working Group. [Resource Description Framework (RDF)](https://www.w3.org/RDF/). *w3.org*, February 2004.
|
||||
[^56]: Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/). W3C Recommendation, March 2013.
|
||||
[^57]: Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. [Datalog and Recursive Query Processing](http://blogs.evergreen.edu/sosw/files/2014/04/Green-Vol5-DBS-017.pdf). *Foundations and Trends in Databases*, volume 5, issue 2, pages 105–195, November 2013. [doi:10.1561/1900000017](https://doi.org/10.1561/1900000017)
|
||||
[^58]: Stefano Ceri, Georg Gottlob, and Letizia Tanca. [What You Always Wanted to Know About Datalog (And Never Dared to Ask)](https://www.researchgate.net/profile/Letizia_Tanca/publication/3296132_What_you_always_wanted_to_know_about_Datalog_and_never_dared_to_ask/links/0fcfd50ca2d20473ca000000.pdf). *IEEE Transactions on Knowledge and Data Engineering*, volume 1, issue 1, pages 146–166, March 1989. [doi:10.1109/69.43410](https://doi.org/10.1109/69.43410)
|
||||
[^59]: Serge Abiteboul, Richard Hull, and Victor Vianu. [*Foundations of Databases*](http://webdam.inria.fr/Alice/). Addison-Wesley, 1995. ISBN: 9780201537710, available online at [*webdam.inria.fr/Alice*](http://webdam.inria.fr/Alice/)
|
||||
[^60]: Scott Meyer, Andrew Carter, and Andrew Rodriguez. [LIquid: The soul of a new graph database, Part 2](https://engineering.linkedin.com/blog/2020/liquid--the-soul-of-a-new-graph-database--part-2). *engineering.linkedin.com*, September 2020. Archived at [perma.cc/K9M4-PD6Q](https://perma.cc/K9M4-PD6Q)
|
||||
[^61]: Matt Bessey. [Why, after 6 years, I’m over GraphQL](https://bessey.dev/blog/2024/05/24/why-im-over-graphql/). *bessey.dev*, May 2024. Archived at [perma.cc/2PAU-JYRA](https://perma.cc/2PAU-JYRA)
|
||||
[^62]: Dominic Betts, Julián Domínguez, Grigori Melnik, Fernando Simonazzi, and Mani Subramanian. [*Exploring CQRS and Event Sourcing*](https://learn.microsoft.com/en-us/previous-versions/msp-n-p/jj554200%28v%3Dpandp.10%29). Microsoft Patterns & Practices, July 2012. ISBN: 1621140164, archived at [perma.cc/7A39-3NM8](https://perma.cc/7A39-3NM8)
|
||||
[^63]: Greg Young. [CQRS and Event Sourcing](https://www.youtube.com/watch?v=JHGkaShoyNs). At *Code on the Beach*, August 2014.
|
||||
[^64]: Greg Young. [CQRS Documents](https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf). *cqrs.wordpress.com*, November 2010. Archived at [perma.cc/X5R6-R47F](https://perma.cc/X5R6-R47F)
|
||||
[^65]: Devin Petersohn, Stephen Macke, Doris Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, and Aditya Parameswaran. [Towards Scalable Dataframe Systems](https://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf). *Proceedings of the VLDB Endowment*, volume 13, issue 11, pages 2033–2046. [doi:10.14778/3407790.3407807](https://doi.org/10.14778/3407790.3407807)
|
||||
[^66]: Stavros Papadopoulos, Kushal Datta, Samuel Madden, and Timothy Mattson. [The TileDB Array Data Storage Manager](https://www.vldb.org/pvldb/vol10/p349-papadopoulos.pdf). *Proceedings of the VLDB Endowment*, volume 10, issue 4, pages 349–360, November 2016. [doi:10.14778/3025111.3025117](https://doi.org/10.14778/3025111.3025117)
|
||||
[^67]: Florin Rusu. [Multidimensional Array Data Management](https://faculty.ucmerced.edu/frusu/Papers/Report/2022-09-fntdb-arrays.pdf). *Foundations and Trends in Databases*, volume 12, numbers 2–3, pages 69–220, February 2023. [doi:10.1561/1900000069](https://doi.org/10.1561/1900000069)
|
||||
[^68]: Ed Targett. [Bloomberg, Man Group team up to develop open source “ArcticDB” database](https://www.thestack.technology/bloomberg-man-group-arcticdb-database-dataframe/). *thestack.technology*, March 2023. Archived at [perma.cc/M5YD-QQYV](https://perma.cc/M5YD-QQYV)
|
||||
[^69]: Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and David L. Wheeler. [GenBank](https://academic.oup.com/nar/article/36/suppl_1/D25/2507746). *Nucleic Acids Research*, volume 36, database issue, pages D25–D30, December 2007. [doi:10.1093/nar/gkm929](https://doi.org/10.1093/nar/gkm929)
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -119,17 +119,17 @@ restored with minimal additional code. However, they also have a number of deep
|
|||
integrating your systems with those of other organizations (which may use different languages).
|
||||
* In order to restore data in the same object types, the decoding process needs to be able to
|
||||
instantiate arbitrary classes. This is frequently a source of security problems
|
||||
[[1](/en/ch5#CWE502)]:
|
||||
[^1]:
|
||||
if an attacker can get your application to decode an arbitrary byte sequence, they can instantiate
|
||||
arbitrary classes, which in turn often allows them to do terrible things such as remotely
|
||||
executing arbitrary code [[2](/en/ch5#Breen2015),
|
||||
[3](/en/ch5#McKenzie2013)].
|
||||
* Versioning data is often an afterthought in these libraries: as they are intended for quick and
|
||||
easy encoding of data, they often neglect the inconvenient problems of forward and backward
|
||||
compatibility [[4](/en/ch5#Goetz2019)].
|
||||
compatibility [^4].
|
||||
* Efficiency (CPU time taken to encode or decode, and the size of the encoded structure) is also
|
||||
often an afterthought. For example, Java’s built-in serialization is notorious for its bad
|
||||
performance and bloated encoding [[5](/en/ch5#JvmSerializers)].
|
||||
performance and bloated encoding [^5].
|
||||
|
||||
For these reasons it’s generally a bad idea to use your language’s built-in encoding for anything
|
||||
other than very transient purposes.
|
||||
|
|
@ -139,7 +139,7 @@ other than very transient purposes.
|
|||
When moving to standardized encodings that can be written and read by many programming languages, JSON
|
||||
and XML are the obvious contenders. They are widely known, widely supported, and almost as widely
|
||||
disliked. XML is often criticized for being too verbose and unnecessarily complicated
|
||||
[[6](/en/ch5#XMLSExp)].
|
||||
[^6].
|
||||
JSON’s popularity is mainly due to its built-in support in web browsers and simplicity relative to
|
||||
XML. CSV is another popular language-independent format, but it only supports tabular data without
|
||||
nesting.
|
||||
|
|
@ -156,11 +156,11 @@ problems:
|
|||
This is a problem when dealing with large numbers; for example, integers greater than 253 cannot
|
||||
be exactly represented in an IEEE 754 double-precision floating-point number, so such numbers become
|
||||
inaccurate when parsed in a language that uses floating-point numbers, such as JavaScript
|
||||
[[7](/en/ch5#Evans2023)].
|
||||
[^7].
|
||||
An example of numbers larger than 253 occurs on X (formerly Twitter), which uses a 64-bit number to
|
||||
identify each post. The JSON returned by the API includes post IDs twice, once as a JSON number and
|
||||
once as a decimal string, to work around the fact that the numbers are not correctly parsed by
|
||||
JavaScript applications [[8](/en/ch5#Harris2010)].
|
||||
JavaScript applications [^8].
|
||||
* JSON and XML have good support for Unicode character strings (i.e., human-readable text), but they
|
||||
don’t support binary strings (sequences of bytes without a character encoding). Binary strings are a
|
||||
useful feature, so people get around this limitation by encoding the binary data as text using
|
||||
|
|
@ -174,7 +174,7 @@ problems:
|
|||
column. If an application change adds a new row or column, you have to handle that change manually.
|
||||
CSV is also a quite vague format (what happens if a value contains a comma or a newline character?).
|
||||
Although its escaping rules have been formally specified
|
||||
[[9](/en/ch5#Shafranovich2005)],
|
||||
[^9],
|
||||
not all parsers implement them correctly.
|
||||
|
||||
Despite these flaws, JSON, XML, and CSV are good enough for many purposes. It’s likely that they will
|
||||
|
|
@ -228,9 +228,9 @@ In addition to open and closed content models and validators, JSON Schema suppor
|
|||
if/else schema logic, named types, references to remote schemas, and much more. All of this makes
|
||||
for a very powerful schema language. Such features also make for unwieldy definitions. It can be
|
||||
challenging to resolve remote schemas, reason about conditional rules, or evolve schemas in a
|
||||
forwards or backwards compatible way [[10](/en/ch5#Coates2024)].
|
||||
forwards or backwards compatible way [^10].
|
||||
Similar concerns apply to XML Schema
|
||||
[[11](/en/ch5#Geneves2008)].
|
||||
[^11].
|
||||
|
||||
### Binary encoding
|
||||
|
||||
|
|
@ -239,7 +239,7 @@ observation led to the development of a profusion of binary encodings for JSON (
|
|||
BSON, BJSON, UBJSON, BISON, Hessian, and Smile, to name a few) and for XML (WBXML and Fast Infoset,
|
||||
for example). These formats have been adopted in various niches, as they are more compact and
|
||||
sometimes faster to parse, but none of them are as widely adopted as the textual versions of JSON
|
||||
and XML [[12](/en/ch5#Bray2019)].
|
||||
and XML [^12].
|
||||
|
||||
Some of these formats extend the set of datatypes (e.g., distinguishing integers and floating-point numbers,
|
||||
or adding support for binary strings), but otherwise they keep the JSON/XML data model unchanged. In
|
||||
|
|
@ -287,7 +287,7 @@ In the following sections we will see how we can do much better, and encode the
|
|||
|
||||
Protocol Buffers (protobuf) is a binary encoding library developed at Google.
|
||||
It is similar to Apache Thrift, which was originally developed by Facebook
|
||||
[[13](/en/ch5#Slee2007)];
|
||||
[^13];
|
||||
most of what this section says about Protocol Buffers applies also to Thrift.
|
||||
|
||||
Protocol Buffers requires a schema for any data that is encoded. To encode the data
|
||||
|
|
@ -311,7 +311,7 @@ language is very simple compared to JSON Schema: it only defines the fields of r
|
|||
types, but it does not support other restrictions on the possible values of fields.
|
||||
|
||||
Encoding [Example 5-2](/en/ch5#fig_encoding_json) using a Protocol Buffers encoder requires 33 bytes, as shown in
|
||||
[Figure 5-3](/en/ch5#fig_encoding_protobuf) [[14](/en/ch5#Kleppmann2012evolution)].
|
||||
[Figure 5-3](/en/ch5#fig_encoding_protobuf) [^14].
|
||||
|
||||

|
||||
|
||||
|
|
@ -382,7 +382,7 @@ value won’t fit in 32 bits, it will be truncated.
|
|||
Apache Avro is another binary encoding format that is interestingly different from Protocol Buffers.
|
||||
It was started in 2009 as a subproject of Hadoop, as a result of Protocol Buffers not being a good
|
||||
fit for Hadoop’s use cases
|
||||
[[15](/en/ch5#Cutting2009)].
|
||||
[^15].
|
||||
|
||||
Avro also uses a schema to specify the structure of the data being encoded. It has two schema
|
||||
languages: one (Avro IDL) intended for human editing, and one (based on JSON) that is more easily
|
||||
|
|
@ -493,7 +493,7 @@ case in Avro: if you want to allow a field to be null, you have to use a *union
|
|||
`union { null, long, string } field;` indicates that `field` can be a number, or a string, or null.
|
||||
You can only use `null` as a default value if it is the first branch of the union. This is a little
|
||||
more verbose than having everything nullable by default, but it helps prevent bugs by being explicit
|
||||
about what can and cannot be null [[18](/en/ch5#Hoare2009)].
|
||||
about what can and cannot be null [^18].
|
||||
|
||||
Changing the datatype of a field is possible, provided that Avro can convert the type. Changing the
|
||||
name of a field is possible but a little tricky: the reader’s schema can contain aliases for field
|
||||
|
|
@ -525,9 +525,9 @@ Database with individually written records
|
|||
schema, it can decode the rest of the record.
|
||||
|
||||
Confluent’s schema registry for Apache Kafka
|
||||
[[19](/en/ch5#ConfluentSchemaReg)]
|
||||
[^19]
|
||||
and LinkedIn’s Espresso
|
||||
[[20](/en/ch5#Auradkar2015)]
|
||||
[^20]
|
||||
work this way, for example.
|
||||
|
||||
Sending records over a network connection
|
||||
|
|
@ -537,7 +537,7 @@ Sending records over a network connection
|
|||
|
||||
A database of schema versions is a useful thing to have in any case, since it acts as documentation
|
||||
and gives you a chance to check schema compatibility
|
||||
[[21](/en/ch5#Kreps2015)].
|
||||
[^21].
|
||||
As the version number, you could use a simple incrementing integer, or you could use a hash of the
|
||||
schema.
|
||||
|
||||
|
|
@ -552,7 +552,7 @@ you have a relational database whose contents you want to dump to a file, and yo
|
|||
binary format to avoid the aforementioned problems with textual formats (JSON, CSV, XML). If you use
|
||||
Avro, you can fairly easily generate an Avro schema (in the JSON representation we saw earlier) from the
|
||||
relational schema and encode the database contents using that schema, dumping it all to an Avro
|
||||
object container file [[22](/en/ch5#Shapira2014)].
|
||||
object container file [^22].
|
||||
You can generate a record schema for each database table, and each column becomes a field in that
|
||||
record. The column name in the database maps to the field name in Avro.
|
||||
|
||||
|
|
@ -585,9 +585,9 @@ common with ASN.1, a schema definition language that was first standardized in 1
|
|||
[24](/en/ch5#Kaliski1993)].
|
||||
It was used to define various network protocols, and its binary encoding (DER) is still used to encode
|
||||
SSL certificates (X.509), for example
|
||||
[[25](/en/ch5#HoffmanAndrews2020)].
|
||||
[^25].
|
||||
ASN.1 supports schema evolution using tag numbers, similar to Protocol Buffers
|
||||
[[26](/en/ch5#Walkin2010)].
|
||||
[^26].
|
||||
However, it’s also very complex and badly documented, so ASN.1
|
||||
is probably not a good choice for new applications.
|
||||
|
||||
|
|
@ -680,9 +680,9 @@ versions of the schema.
|
|||
|
||||
More complex schema changes—for example, changing a single-valued attribute to be multi-valued, or
|
||||
moving some data into a separate table—still require data to be rewritten, often at the application
|
||||
level [[27](/en/ch5#Xu2017)].
|
||||
level [^27].
|
||||
Maintaining forward and backward compatibility across such migrations is still a research problem
|
||||
[[28](/en/ch5#Litt2020)].
|
||||
[^28].
|
||||
|
||||
### Archival storage
|
||||
|
||||
|
|
@ -723,7 +723,7 @@ In some ways, services are similar to databases: they typically allow clients to
|
|||
data. However, while databases allow arbitrary queries using the query languages we discussed in
|
||||
[Chapter 3](/en/ch3#ch_datamodels), services expose an application-specific API that only allows inputs and outputs
|
||||
that are predetermined by the business logic (application code) of the service
|
||||
[[29](/en/ch5#Helland2005_ch5)]. This restriction provides a degree of encapsulation: services can impose
|
||||
[^29]. This restriction provides a degree of encapsulation: services can impose
|
||||
fine-grained restrictions on what clients can and cannot do.
|
||||
|
||||
A key design goal of a service-oriented/microservices architecture is to make the application easier
|
||||
|
|
@ -764,7 +764,7 @@ need to somehow find out these details. Service developers often use an interfac
|
|||
language (IDL) to define and document their service’s API endpoints and data models, and to evolve
|
||||
them over time. Other developers can then use the service definition to determine how to query the
|
||||
service. The two most popular service IDLs are OpenAPI (also known as Swagger
|
||||
[[32](/en/ch5#Swagger2014)])
|
||||
[^32])
|
||||
and gRPC. OpenAPI is used for web services that send and receive JSON data, while gRPC services send
|
||||
and receive Protocol Buffers.
|
||||
|
||||
|
|
@ -838,7 +838,7 @@ requests over a network, many of which received a lot of hype but have serious p
|
|||
JavaBeans (EJB) and Java’s Remote Method Invocation (RMI) are limited to Java. The Distributed
|
||||
Component Object Model (DCOM) is limited to Microsoft platforms. The Common Object Request Broker
|
||||
Architecture (CORBA) is excessively complex, and does not provide backward or forward
|
||||
compatibility [[33](/en/ch5#Henning2006)].
|
||||
compatibility [^33].
|
||||
SOAP and the WS-\* web services framework aim to provide interoperability across vendors, but are
|
||||
also plagued by complexity and compatibility problems
|
||||
[[34](/en/ch5#Lacey2006),
|
||||
|
|
@ -846,7 +846,7 @@ also plagued by complexity and compatibility problems
|
|||
[36](/en/ch5#Bray2004)].
|
||||
|
||||
All of these are based on the idea of a *remote procedure call* (RPC), which has been around since
|
||||
the 1970s [[37](/en/ch5#Birrell1984)].
|
||||
the 1970s [^37].
|
||||
The RPC model tries to make a request to a remote network service look the same as calling a function or
|
||||
method in your programming language, within the same process (this abstraction is called *location
|
||||
transparency*). Although RPC seems convenient at first, the approach is fundamentally flawed
|
||||
|
|
@ -868,7 +868,7 @@ A network request is very different from a local function call:
|
|||
through, and only the response was lost.
|
||||
In that case, retrying will cause the action to
|
||||
be performed multiple times, unless you build a mechanism for deduplication (*idempotence*) into
|
||||
the protocol [[40](/en/ch5#Leach2017idemptence)].
|
||||
the protocol [^40].
|
||||
Local function calls don’t have this problem. (We discuss idempotence in more detail
|
||||
in [Link to Come].)
|
||||
* Every time you call a local function, it normally takes about the same time to execute. A network
|
||||
|
|
@ -902,7 +902,7 @@ overloaded, the client has to be manually reconfigured.
|
|||
To provide higher availability and scalability, there are usually multiple instances of a service
|
||||
running on different machines, any of which can handle an incoming request. Spreading requests
|
||||
across these instances is called *load balancing*
|
||||
[[41](/en/ch5#Rose2023)].
|
||||
[^41].
|
||||
There are many load balancing and service discovery solutions available:
|
||||
|
||||
* *Hardware load balancers* are specialized pieces of equipment that are installed in data centers.
|
||||
|
|
@ -974,12 +974,12 @@ indefinitely. If a compatibility-breaking change is required, the service provid
|
|||
maintaining multiple versions of the service API side by side.
|
||||
|
||||
There is no agreement on how API versioning should work (i.e., how a client can indicate which
|
||||
version of the API it wants to use [[42](/en/ch5#Hunt2014wn)]).
|
||||
version of the API it wants to use [^42]).
|
||||
For RESTful APIs, common approaches are to use a version
|
||||
number in the URL or in the HTTP `Accept` header. For services that use API keys to identify a
|
||||
particular client, another option is to store a client’s requested API version on the server and to
|
||||
allow this version selection to be updated through a separate administrative interface
|
||||
[[43](/en/ch5#Leach2017versioning)].
|
||||
[^43].
|
||||
|
||||
## Durable Execution and Workflows
|
||||
|
||||
|
|
@ -995,7 +995,7 @@ the credit card, and call the banking service to deposit debited funds, as shown
|
|||
Workflows are typically defined as a graph of tasks. Workflow definitions may be written in a
|
||||
general-purpose programming language, a domain specific language (DSL), or a markup language such as
|
||||
Business Process Execution Language (BPEL)
|
||||
[[44](/en/ch5#BPEL2007)].
|
||||
[^44].
|
||||
|
||||
# Tasks, Activities, and Functions
|
||||
|
||||
|
|
@ -1068,19 +1068,19 @@ class PaymentWorkflow:
|
|||
Frameworks like Temporal are not without their challenges. External services, such as the
|
||||
third-party payment gateway in our example, must still provide an idempotent API. Developers must
|
||||
remember to use unique IDs for these APIs to prevent duplicate execution
|
||||
[[47](/en/ch5#Tenzer2024)].
|
||||
[^47].
|
||||
And because durable execution frameworks log each RPC call in order, it expects a subsequent
|
||||
execution to make the same RPC calls in the same order. This makes code changes brittle: you
|
||||
might introduce undefined behavior simply by re-ordering function calls
|
||||
[[48](/en/ch5#TemporalWorkflow)].
|
||||
[^48].
|
||||
Instead of modifying the code of an existing workflow, it is safer to deploy a new version of the
|
||||
code separately, so that re-executions of existing workflow invocations continue to use the old
|
||||
version, and only new invocations use the new code
|
||||
[[49](/en/ch5#Kleeman2024)].
|
||||
[^49].
|
||||
|
||||
Similarly, because durable execution frameworks expect to replay all code deterministically (the
|
||||
same inputs produce the same outputs), nondeterministic code such as random number generators or
|
||||
system clocks are problematic [[48](/en/ch5#TemporalWorkflow)].
|
||||
system clocks are problematic [^48].
|
||||
Frameworks often provide their own, deterministic implementations of such library functions, but
|
||||
you have to remember to use them. In some cases, such as with Temporal’s workflowcheck tool,
|
||||
frameworks provide static analysis tools to determine if nondeterministic behavior has been
|
||||
|
|
@ -1099,7 +1099,7 @@ unlike RPC, the sender usually does not wait for the recipient to process the ev
|
|||
events are typically not sent to the recipient via a direct network connection, but go via an
|
||||
intermediary called a *message broker* (also called an *event broker*, *message queue*, or
|
||||
*message-oriented middleware*), which stores the message temporarily.
|
||||
[[50](/en/ch5#Perera2023)].
|
||||
[^50].
|
||||
|
||||
Using a message broker has several advantages compared to direct RPC:
|
||||
|
||||
|
|
@ -1162,7 +1162,7 @@ scenarios, messages will be lost. Since each actor processes only one message at
|
|||
need to worry about threads, and each actor can be scheduled independently by the framework.
|
||||
|
||||
In *distributed actor frameworks* such as Akka, Orleans
|
||||
[[51](/en/ch5#Bernstein2014)],
|
||||
[^51],
|
||||
and Erlang/OTP, this programming model is used to scale an application across
|
||||
multiple nodes. The same message-passing mechanism is used, no matter whether the sender and recipient
|
||||
are on the same node or different nodes. If they are on different nodes, the message is
|
||||
|
|
@ -1225,257 +1225,58 @@ quite achievable. May your application’s evolution be rapid and your deploymen
|
|||
|
||||
##### Footnotes
|
||||
|
||||
|
||||
##### References
|
||||
|
||||
[[1](/en/ch5#CWE502-marker)] [CWE-502:
|
||||
Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html). Common Weakness Enumeration, *cwe.mitre.org*,
|
||||
July 2006. Archived at [perma.cc/26EU-UK9Y](https://perma.cc/26EU-UK9Y)
|
||||
|
||||
[[2](/en/ch5#Breen2015-marker)] Steve Breen.
|
||||
[What
|
||||
Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This
|
||||
Vulnerability](https://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/). *foxglovesecurity.com*, November 2015.
|
||||
Archived at [perma.cc/9U97-UVVD](https://perma.cc/9U97-UVVD)
|
||||
|
||||
[[3](/en/ch5#McKenzie2013-marker)] Patrick McKenzie.
|
||||
[What
|
||||
the Rails Security Issue Means for Your Startup](https://www.kalzumeus.com/2013/01/31/what-the-rails-security-issue-means-for-your-startup/). *kalzumeus.com*, January 2013.
|
||||
Archived at [perma.cc/2MBJ-7PZ6](https://perma.cc/2MBJ-7PZ6)
|
||||
|
||||
[[4](/en/ch5#Goetz2019-marker)] Brian Goetz.
|
||||
[Towards
|
||||
Better Serialization](https://openjdk.org/projects/amber/design-notes/towards-better-serialization). *openjdk.org*, June 2019.
|
||||
Archived at [perma.cc/UK6U-GQDE](https://perma.cc/UK6U-GQDE)
|
||||
|
||||
[[5](/en/ch5#JvmSerializers-marker)] Eishay Smith.
|
||||
[jvm-serializers wiki](https://github.com/eishay/jvm-serializers/wiki).
|
||||
*github.com*, October 2023.
|
||||
Archived at [perma.cc/PJP7-WCNG](https://perma.cc/PJP7-WCNG)
|
||||
|
||||
[[6](/en/ch5#XMLSExp-marker)] [XML
|
||||
Is a Poor Copy of S-Expressions](https://wiki.c2.com/?XmlIsaPoorCopyOfEssExpressions). *wiki.c2.com*, May 2013.
|
||||
Archived at [perma.cc/7FAN-YBKL](https://perma.cc/7FAN-YBKL)
|
||||
|
||||
[[7](/en/ch5#Evans2023-marker)] Julia Evans.
|
||||
[Examples of floating
|
||||
point problems](https://jvns.ca/blog/2023/01/13/examples-of-floating-point-problems/). *jvns.ca*, January 2023.
|
||||
Archived at [perma.cc/M57L-QKKW](https://perma.cc/M57L-QKKW)
|
||||
|
||||
[[8](/en/ch5#Harris2010-marker)] Matt Harris.
|
||||
[Snowflake:
|
||||
An Update and Some Very Important Information](https://groups.google.com/g/twitter-development-talk/c/ahbvo3VTIYI). Email to *Twitter Development
|
||||
Talk* mailing list, October 2010.
|
||||
Archived at [perma.cc/8UBV-MZ3D](https://perma.cc/8UBV-MZ3D)
|
||||
|
||||
[[9](/en/ch5#Shafranovich2005-marker)] Yakov Shafranovich.
|
||||
[RFC 4180: Common Format and MIME Type for
|
||||
Comma-Separated Values (CSV) Files](https://tools.ietf.org/html/rfc4180). IETF, October 2005.
|
||||
|
||||
[[10](/en/ch5#Coates2024-marker)] Andy Coates.
|
||||
[Evolving JSON Schemas - Part I](https://www.creekservice.org/articles/2024/01/08/json-schema-evolution-part-1.html) and
|
||||
[Part II](https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html).
|
||||
*creekservice.org*, January 2024. Archived at
|
||||
[perma.cc/MZW3-UA54](https://perma.cc/MZW3-UA54) and
|
||||
[perma.cc/GT5H-WKZ5](https://perma.cc/GT5H-WKZ5)
|
||||
|
||||
[[11](/en/ch5#Geneves2008-marker)] Pierre Genevès, Nabil Layaïda, and Vincent Quint.
|
||||
[Ensuring Query Compatibility with Evolving XML Schemas](https://arxiv.org/abs/0811.4324).
|
||||
INRIA Technical Report 6711, November 2008.
|
||||
|
||||
[[12](/en/ch5#Bray2019-marker)] Tim Bray.
|
||||
[Bits On the Wire](https://www.tbray.org/ongoing/When/201x/2019/11/17/Bits-On-the-Wire).
|
||||
*tbray.org*, November 2019.
|
||||
Archived at [perma.cc/3BT3-BQU3](https://perma.cc/3BT3-BQU3)
|
||||
|
||||
[[13](/en/ch5#Slee2007-marker)] Mark Slee, Aditya Agarwal, and Marc Kwiatkowski.
|
||||
[Thrift: Scalable
|
||||
Cross-Language Services Implementation](https://thrift.apache.org/static/files/thrift-20070401.pdf). Facebook technical report, April 2007.
|
||||
Archived at [perma.cc/22BS-TUFB](https://perma.cc/22BS-TUFB)
|
||||
|
||||
[[14](/en/ch5#Kleppmann2012evolution-marker)] Martin Kleppmann.
|
||||
[Schema
|
||||
Evolution in Avro, Protocol Buffers and Thrift](https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html). *martin.kleppmann.com*, December 2012.
|
||||
Archived at [perma.cc/E4R2-9RJT](https://perma.cc/E4R2-9RJT)
|
||||
|
||||
[[15](/en/ch5#Cutting2009-marker)] Doug Cutting, Chad Walters, Jim Kellerman, et al.
|
||||
[[PROPOSAL]
|
||||
New Subproject: Avro](https://lists.apache.org/thread/z571w0r5jmfsjvnl0fq4fgg0vh28d3bk). Email thread on *hadoop-general* mailing list,
|
||||
*lists.apache.org*, April 2009.
|
||||
Archived at [perma.cc/4A79-BMEB](https://perma.cc/4A79-BMEB)
|
||||
|
||||
[[16](/en/ch5#AvroSpec-marker)] Apache Software Foundation.
|
||||
[Apache Avro 1.12.0 Specification](https://avro.apache.org/docs/1.12.0/specification/).
|
||||
*avro.apache.org*, August 2024.
|
||||
Archived at [perma.cc/C36P-5EBQ](https://perma.cc/C36P-5EBQ)
|
||||
|
||||
[[17](/en/ch5#AvroParsing-marker)] Apache Software Foundation.
|
||||
[Avro
|
||||
schemas as LL(1) CFG definitions](https://avro.apache.org/docs/1.12.0/api/java/org/apache/avro/io/parsing/doc-files/parsing.html). *avro.apache.org*, August 2024.
|
||||
Archived at [perma.cc/JB44-EM9Q](https://perma.cc/JB44-EM9Q)
|
||||
|
||||
[[18](/en/ch5#Hoare2009-marker)] Tony Hoare.
|
||||
[Null
|
||||
References: The Billion Dollar Mistake](https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/). Talk at *QCon London*, March 2009.
|
||||
|
||||
[[19](/en/ch5#ConfluentSchemaReg-marker)] Confluent, Inc.
|
||||
[Schema Registry
|
||||
Overview](https://docs.confluent.io/platform/current/schema-registry/index.html). *docs.confluent.io*, 2024.
|
||||
Archived at [perma.cc/92C3-A9JA](https://perma.cc/92C3-A9JA)
|
||||
|
||||
[[20](/en/ch5#Auradkar2015-marker)] Aditya Auradkar and Tom Quiggle.
|
||||
[Introducing
|
||||
Espresso—LinkedIn’s Hot New Distributed Document Store](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store). *engineering.linkedin.com*, January 2015.
|
||||
Archived at [perma.cc/FX4P-VW9T](https://perma.cc/FX4P-VW9T)
|
||||
|
||||
[[21](/en/ch5#Kreps2015-marker)] Jay Kreps.
|
||||
[Putting Apache Kafka to
|
||||
Use: A Practical Guide to Building a Stream Data Platform (Part 2)](https://www.confluent.io/blog/event-streaming-platform-2/). *confluent.io*,
|
||||
February 2015. Archived at [perma.cc/8UA4-ZS5S](https://perma.cc/8UA4-ZS5S)
|
||||
|
||||
[[22](/en/ch5#Shapira2014-marker)] Gwen Shapira.
|
||||
[The Problem of Managing
|
||||
Schemas](https://www.oreilly.com/content/the-problem-of-managing-schemas/). *oreilly.com*, November 2014.
|
||||
Archived at [perma.cc/BY8Q-RYV3](https://perma.cc/BY8Q-RYV3)
|
||||
|
||||
[[23](/en/ch5#Larmouth1999-marker)] John Larmouth.
|
||||
[*ASN.1
|
||||
Complete*](https://www.oss.com/asn1/resources/books-whitepapers-pubs/larmouth-asn1-book.pdf). Morgan Kaufmann, 1999. ISBN: 978-0-122-33435-1.
|
||||
Archived at [perma.cc/GB7Y-XSXQ](https://perma.cc/GB7Y-XSXQ)
|
||||
|
||||
[[24](/en/ch5#Kaliski1993-marker)] Burton S. Kaliski Jr.
|
||||
[A Layman’s Guide to a Subset of ASN.1,
|
||||
BER, and DER](https://luca.ntop.org/Teaching/Appunti/asn1.html). Technical Note, RSA Data Security, Inc., November 1993.
|
||||
Archived at [perma.cc/2LMN-W9U8](https://perma.cc/2LMN-W9U8)
|
||||
|
||||
[[25](/en/ch5#HoffmanAndrews2020-marker)] Jacob Hoffman-Andrews.
|
||||
[A Warm Welcome to ASN.1 and DER](https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/).
|
||||
*letsencrypt.org*, April 2020.
|
||||
Archived at [perma.cc/CYT2-GPQ8](https://perma.cc/CYT2-GPQ8)
|
||||
|
||||
[[26](/en/ch5#Walkin2010-marker)] Lev Walkin.
|
||||
[Question:
|
||||
Extensibility and Dropping Fields](https://lionet.info/asn1c/blog/2010/09/21/question-extensibility-removing-fields/). *lionet.info*, September 2010.
|
||||
Archived at [perma.cc/VX8E-NLH3](https://perma.cc/VX8E-NLH3)
|
||||
|
||||
[[27](/en/ch5#Xu2017-marker)] Jacqueline Xu.
|
||||
[Online migrations at scale](https://stripe.com/blog/online-migrations).
|
||||
*stripe.com*, February 2017.
|
||||
Archived at [perma.cc/X59W-DK7Y](https://perma.cc/X59W-DK7Y)
|
||||
|
||||
[[28](/en/ch5#Litt2020-marker)] Geoffrey Litt, Peter van Hardenberg, and Orion Henry.
|
||||
[Project Cambria: Translate your data with lenses](https://www.inkandswitch.com/cambria/).
|
||||
Technical Report, *Ink & Switch*, October 2020.
|
||||
Archived at [perma.cc/WA4V-VKDB](https://perma.cc/WA4V-VKDB)
|
||||
|
||||
[[29](/en/ch5#Helland2005_ch5-marker)] Pat Helland.
|
||||
[Data on the Outside Versus Data on the
|
||||
Inside](https://www.cidrdb.org/cidr2005/papers/P12.pdf). At *2nd Biennial Conference on Innovative Data Systems Research* (CIDR),
|
||||
January 2005.
|
||||
|
||||
[[30](/en/ch5#Fielding2000-marker)] Roy Thomas Fielding.
|
||||
[Architectural
|
||||
Styles and the Design of Network-Based Software Architectures](https://ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf). PhD Thesis, University of
|
||||
California, Irvine, 2000. Archived at [perma.cc/LWY9-7BPE](https://perma.cc/LWY9-7BPE)
|
||||
|
||||
[[31](/en/ch5#Fielding2008-marker)] Roy Thomas Fielding.
|
||||
[REST APIs must
|
||||
be hypertext-driven](https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven).” *roy.gbiv.com*, October 2008.
|
||||
Archived at [perma.cc/M2ZW-8ATG](https://perma.cc/M2ZW-8ATG)
|
||||
|
||||
[[32](/en/ch5#Swagger2014-marker)] [OpenAPI
|
||||
Specification Version 3.1.0](https://swagger.io/specification/). *swagger.io*, February 2021.
|
||||
Archived at [perma.cc/3S6S-K5M4](https://perma.cc/3S6S-K5M4)
|
||||
|
||||
[[33](/en/ch5#Henning2006-marker)] Michi Henning.
|
||||
[The Rise and Fall of CORBA](https://cacm.acm.org/practice/the-rise-and-fall-of-corba/).
|
||||
*Communications of the ACM*, volume 51, issue 8, pages 52–57, August 2008.
|
||||
[doi:10.1145/1378704.1378718](https://doi.org/10.1145/1378704.1378718)
|
||||
|
||||
[[34](/en/ch5#Lacey2006-marker)] Pete Lacey.
|
||||
[The S Stands for Simple](https://harmful.cat-v.org/software/xml/soap/simple).
|
||||
*harmful.cat-v.org*, November 2006.
|
||||
Archived at [perma.cc/4PMK-Z9X7](https://perma.cc/4PMK-Z9X7)
|
||||
|
||||
[[35](/en/ch5#Tilkov2006-marker)] Stefan Tilkov.
|
||||
[Interview: Pete Lacey Criticizes
|
||||
Web Services](https://www.infoq.com/articles/pete-lacey-ws-criticism/). *infoq.com*, December 2006.
|
||||
Archived at [perma.cc/JWF4-XY3P](https://perma.cc/JWF4-XY3P)
|
||||
|
||||
[[36](/en/ch5#Bray2004-marker)] Tim Bray.
|
||||
[The Loyal WS-Opposition](https://www.tbray.org/ongoing/When/200x/2004/09/18/WS-Oppo).
|
||||
*tbray.org*, September 2004.
|
||||
Archived at [perma.cc/J5Q8-69Q2](https://perma.cc/J5Q8-69Q2)
|
||||
|
||||
[[37](/en/ch5#Birrell1984-marker)] Andrew D. Birrell and Bruce Jay Nelson.
|
||||
[Implementing
|
||||
Remote Procedure Calls](https://www.cs.princeton.edu/courses/archive/fall03/cs518/papers/rpc.pdf). *ACM Transactions on Computer Systems* (TOCS),
|
||||
volume 2, issue 1, pages 39–59, February 1984.
|
||||
[doi:10.1145/2080.357392](https://doi.org/10.1145/2080.357392)
|
||||
|
||||
[[38](/en/ch5#Waldo1994-marker)] Jim Waldo, Geoff Wyant, Ann Wollrath, and Sam Kendall.
|
||||
[A Note on Distributed Computing](https://m.mirror.facebook.net/kde/devel/smli_tr-94-29.pdf).
|
||||
Sun Microsystems Laboratories, Inc., Technical Report TR-94-29, November 1994.
|
||||
Archived at [perma.cc/8LRZ-BSZR](https://perma.cc/8LRZ-BSZR)
|
||||
|
||||
[[39](/en/ch5#Vinoski2008-marker)] Steve Vinoski.
|
||||
[Convenience over
|
||||
Correctness](https://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf). *IEEE Internet Computing*, volume 12, issue 4, pages 89–92, July 2008.
|
||||
[doi:10.1109/MIC.2008.75](https://doi.org/10.1109/MIC.2008.75)
|
||||
|
||||
[[40](/en/ch5#Leach2017idemptence-marker)] Brandur Leach.
|
||||
[Designing robust and predictable APIs with
|
||||
idempotency](https://stripe.com/blog/idempotency). *stripe.com*, February 2017.
|
||||
Archived at [perma.cc/JD22-XZQT](https://perma.cc/JD22-XZQT)
|
||||
|
||||
[[41](/en/ch5#Rose2023-marker)] Sam Rose.
|
||||
[Load Balancing](https://samwho.dev/load-balancing/). *samwho.dev*, April 2023.
|
||||
Archived at [perma.cc/Q7BA-9AE2](https://perma.cc/Q7BA-9AE2)
|
||||
|
||||
[[42](/en/ch5#Hunt2014wn-marker)] Troy Hunt.
|
||||
[Your API versioning is
|
||||
wrong, which is why I decided to do it 3 different wrong ways](https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/). *troyhunt.com*,
|
||||
February 2014. Archived at [perma.cc/9DSW-DGR5](https://perma.cc/9DSW-DGR5)
|
||||
|
||||
[[43](/en/ch5#Leach2017versioning-marker)] Brandur Leach.
|
||||
[APIs as infrastructure: future-proofing Stripe with
|
||||
versioning](https://stripe.com/blog/api-versioning). *stripe.com*, August 2017.
|
||||
Archived at [perma.cc/L63K-USFW](https://perma.cc/L63K-USFW)
|
||||
|
||||
[[44](/en/ch5#BPEL2007-marker)] Alexandre Alves, Assaf Arkin, Sid Askary, et al.
|
||||
[Web Services Business Process
|
||||
Execution Language Version 2.0](https://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.html). *docs.oasis-open.org*, April 2007.
|
||||
|
||||
[[45](/en/ch5#TemporalService-marker)] [What
|
||||
is a Temporal Service?](https://docs.temporal.io/clusters) *docs.temporal.io*, 2024.
|
||||
Archived at [perma.cc/32P3-CJ9V](https://perma.cc/32P3-CJ9V)
|
||||
|
||||
[[46](/en/ch5#Ewen2023-marker)] Stephan Ewen.
|
||||
[Why we built Restate](https://restate.dev/blog/why-we-built-restate/). *restate.dev*,
|
||||
August 2023. Archived at [perma.cc/BJJ2-X75K](https://perma.cc/BJJ2-X75K)
|
||||
|
||||
[[47](/en/ch5#Tenzer2024-marker)] Keith Tenzer and Joshua Smith.
|
||||
[Idempotency and Durable
|
||||
Execution](https://temporal.io/blog/idempotency-and-durable-execution). *temporal.io*, February 2024.
|
||||
Archived at [perma.cc/9LGW-PCLU](https://perma.cc/9LGW-PCLU)
|
||||
|
||||
[[48](/en/ch5#TemporalWorkflow-marker)] [What
|
||||
is a Temporal Workflow?](https://docs.temporal.io/workflows) *docs.temporal.io*, 2024.
|
||||
Archived at [perma.cc/B5C5-Y396](https://perma.cc/B5C5-Y396)
|
||||
|
||||
[[49](/en/ch5#Kleeman2024-marker)] Jack Kleeman.
|
||||
[Solving durable
|
||||
execution’s immutability problem](https://restate.dev/blog/solving-durable-executions-immutability-problem/). *restate.dev*, February 2024.
|
||||
Archived at [perma.cc/G55L-EYH5](https://perma.cc/G55L-EYH5)
|
||||
|
||||
[[50](/en/ch5#Perera2023-marker)] Srinath Perera.
|
||||
[Exploring
|
||||
Event-Driven Architecture: A Beginner’s Guide for Cloud Native Developers](https://wso2.com/blogs/thesource/exploring-event-driven-architecture-a-beginners-guide-for-cloud-native-developers/). *wso2.com*,
|
||||
August 2023. Archived at
|
||||
[archive.org](https://web.archive.org/web/20240716204613/https%3A//wso2.com/blogs/thesource/exploring-event-driven-architecture-a-beginners-guide-for-cloud-native-developers/)
|
||||
|
||||
[[51](/en/ch5#Bernstein2014-marker)] Philip A. Bernstein, Sergey Bykov, Alan
|
||||
Geller, Gabriel Kliot, and Jorgen Thelin.
|
||||
[Orleans:
|
||||
Distributed Virtual Actors for Programmability and Scalability](https://www.microsoft.com/en-us/research/publication/orleans-distributed-virtual-actors-for-programmability-and-scalability/). Microsoft Research Technical
|
||||
Report MSR-TR-2014-41, March 2014.
|
||||
Archived at [perma.cc/PD3U-WDMF](https://perma.cc/PD3U-WDMF)
|
||||
[^1]: [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html). Common Weakness Enumeration, *cwe.mitre.org*, July 2006. Archived at [perma.cc/26EU-UK9Y](https://perma.cc/26EU-UK9Y)
|
||||
[^2]: Steve Breen. [What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability](https://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/). *foxglovesecurity.com*, November 2015. Archived at [perma.cc/9U97-UVVD](https://perma.cc/9U97-UVVD)
|
||||
[^3]: Patrick McKenzie. [What the Rails Security Issue Means for Your Startup](https://www.kalzumeus.com/2013/01/31/what-the-rails-security-issue-means-for-your-startup/). *kalzumeus.com*, January 2013. Archived at [perma.cc/2MBJ-7PZ6](https://perma.cc/2MBJ-7PZ6)
|
||||
[^4]: Brian Goetz. [Towards Better Serialization](https://openjdk.org/projects/amber/design-notes/towards-better-serialization). *openjdk.org*, June 2019. Archived at [perma.cc/UK6U-GQDE](https://perma.cc/UK6U-GQDE)
|
||||
[^5]: Eishay Smith. [jvm-serializers wiki](https://github.com/eishay/jvm-serializers/wiki). *github.com*, October 2023. Archived at [perma.cc/PJP7-WCNG](https://perma.cc/PJP7-WCNG)
|
||||
[^6]: [XML Is a Poor Copy of S-Expressions](https://wiki.c2.com/?XmlIsaPoorCopyOfEssExpressions). *wiki.c2.com*, May 2013. Archived at [perma.cc/7FAN-YBKL](https://perma.cc/7FAN-YBKL)
|
||||
[^7]: Julia Evans. [Examples of floating point problems](https://jvns.ca/blog/2023/01/13/examples-of-floating-point-problems/). *jvns.ca*, January 2023. Archived at [perma.cc/M57L-QKKW](https://perma.cc/M57L-QKKW)
|
||||
[^8]: Matt Harris. [Snowflake: An Update and Some Very Important Information](https://groups.google.com/g/twitter-development-talk/c/ahbvo3VTIYI). Email to *Twitter Development Talk* mailing list, October 2010. Archived at [perma.cc/8UBV-MZ3D](https://perma.cc/8UBV-MZ3D)
|
||||
[^9]: Yakov Shafranovich. [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://tools.ietf.org/html/rfc4180). IETF, October 2005.
|
||||
[^10]: Andy Coates. [Evolving JSON Schemas - Part I](https://www.creekservice.org/articles/2024/01/08/json-schema-evolution-part-1.html) and [Part II](https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html). *creekservice.org*, January 2024. Archived at [perma.cc/MZW3-UA54](https://perma.cc/MZW3-UA54) and [perma.cc/GT5H-WKZ5](https://perma.cc/GT5H-WKZ5)
|
||||
[^11]: Pierre Genevès, Nabil Layaïda, and Vincent Quint. [Ensuring Query Compatibility with Evolving XML Schemas](https://arxiv.org/abs/0811.4324). INRIA Technical Report 6711, November 2008.
|
||||
[^12]: Tim Bray. [Bits On the Wire](https://www.tbray.org/ongoing/When/201x/2019/11/17/Bits-On-the-Wire). *tbray.org*, November 2019. Archived at [perma.cc/3BT3-BQU3](https://perma.cc/3BT3-BQU3)
|
||||
[^13]: Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. [Thrift: Scalable Cross-Language Services Implementation](https://thrift.apache.org/static/files/thrift-20070401.pdf). Facebook technical report, April 2007. Archived at [perma.cc/22BS-TUFB](https://perma.cc/22BS-TUFB)
|
||||
[^14]: Martin Kleppmann. [Schema Evolution in Avro, Protocol Buffers and Thrift](https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html). *martin.kleppmann.com*, December 2012. Archived at [perma.cc/E4R2-9RJT](https://perma.cc/E4R2-9RJT)
|
||||
[^15]: Doug Cutting, Chad Walters, Jim Kellerman, et al. [[PROPOSAL] New Subproject: Avro](https://lists.apache.org/thread/z571w0r5jmfsjvnl0fq4fgg0vh28d3bk). Email thread on *hadoop-general* mailing list, *lists.apache.org*, April 2009. Archived at [perma.cc/4A79-BMEB](https://perma.cc/4A79-BMEB)
|
||||
[^16]: Apache Software Foundation. [Apache Avro 1.12.0 Specification](https://avro.apache.org/docs/1.12.0/specification/). *avro.apache.org*, August 2024. Archived at [perma.cc/C36P-5EBQ](https://perma.cc/C36P-5EBQ)
|
||||
[^17]: Apache Software Foundation. [Avro schemas as LL(1) CFG definitions](https://avro.apache.org/docs/1.12.0/api/java/org/apache/avro/io/parsing/doc-files/parsing.html). *avro.apache.org*, August 2024. Archived at [perma.cc/JB44-EM9Q](https://perma.cc/JB44-EM9Q)
|
||||
[^18]: Tony Hoare. [Null References: The Billion Dollar Mistake](https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/). Talk at *QCon London*, March 2009.
|
||||
[^19]: Confluent, Inc. [Schema Registry Overview](https://docs.confluent.io/platform/current/schema-registry/index.html). *docs.confluent.io*, 2024. Archived at [perma.cc/92C3-A9JA](https://perma.cc/92C3-A9JA)
|
||||
[^20]: Aditya Auradkar and Tom Quiggle. [Introducing Espresso—LinkedIn’s Hot New Distributed Document Store](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store). *engineering.linkedin.com*, January 2015. Archived at [perma.cc/FX4P-VW9T](https://perma.cc/FX4P-VW9T)
|
||||
[^21]: Jay Kreps. [Putting Apache Kafka to Use: A Practical Guide to Building a Stream Data Platform (Part 2)](https://www.confluent.io/blog/event-streaming-platform-2/). *confluent.io*, February 2015. Archived at [perma.cc/8UA4-ZS5S](https://perma.cc/8UA4-ZS5S)
|
||||
[^22]: Gwen Shapira. [The Problem of Managing Schemas](https://www.oreilly.com/content/the-problem-of-managing-schemas/). *oreilly.com*, November 2014. Archived at [perma.cc/BY8Q-RYV3](https://perma.cc/BY8Q-RYV3)
|
||||
[^23]: John Larmouth. [*ASN.1 Complete*](https://www.oss.com/asn1/resources/books-whitepapers-pubs/larmouth-asn1-book.pdf). Morgan Kaufmann, 1999. ISBN: 978-0-122-33435-1. Archived at [perma.cc/GB7Y-XSXQ](https://perma.cc/GB7Y-XSXQ)
|
||||
[^24]: Burton S. Kaliski Jr. [A Layman’s Guide to a Subset of ASN.1, BER, and DER](https://luca.ntop.org/Teaching/Appunti/asn1.html). Technical Note, RSA Data Security, Inc., November 1993. Archived at [perma.cc/2LMN-W9U8](https://perma.cc/2LMN-W9U8)
|
||||
[^25]: Jacob Hoffman-Andrews. [A Warm Welcome to ASN.1 and DER](https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/). *letsencrypt.org*, April 2020. Archived at [perma.cc/CYT2-GPQ8](https://perma.cc/CYT2-GPQ8)
|
||||
[^26]: Lev Walkin. [Question: Extensibility and Dropping Fields](https://lionet.info/asn1c/blog/2010/09/21/question-extensibility-removing-fields/). *lionet.info*, September 2010. Archived at [perma.cc/VX8E-NLH3](https://perma.cc/VX8E-NLH3)
|
||||
[^27]: Jacqueline Xu. [Online migrations at scale](https://stripe.com/blog/online-migrations). *stripe.com*, February 2017. Archived at [perma.cc/X59W-DK7Y](https://perma.cc/X59W-DK7Y)
|
||||
[^28]: Geoffrey Litt, Peter van Hardenberg, and Orion Henry. [Project Cambria: Translate your data with lenses](https://www.inkandswitch.com/cambria/). Technical Report, *Ink & Switch*, October 2020. Archived at [perma.cc/WA4V-VKDB](https://perma.cc/WA4V-VKDB)
|
||||
[^29]: Pat Helland. [Data on the Outside Versus Data on the Inside](https://www.cidrdb.org/cidr2005/papers/P12.pdf). At *2nd Biennial Conference on Innovative Data Systems Research* (CIDR), January 2005.
|
||||
[^30]: Roy Thomas Fielding. [Architectural Styles and the Design of Network-Based Software Architectures](https://ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf). PhD Thesis, University of California, Irvine, 2000. Archived at [perma.cc/LWY9-7BPE](https://perma.cc/LWY9-7BPE)
|
||||
[^31]: Roy Thomas Fielding. [REST APIs must be hypertext-driven](https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven).” *roy.gbiv.com*, October 2008. Archived at [perma.cc/M2ZW-8ATG](https://perma.cc/M2ZW-8ATG)
|
||||
[^32]: [OpenAPI Specification Version 3.1.0](https://swagger.io/specification/). *swagger.io*, February 2021. Archived at [perma.cc/3S6S-K5M4](https://perma.cc/3S6S-K5M4)
|
||||
[^33]: Michi Henning. [The Rise and Fall of CORBA](https://cacm.acm.org/practice/the-rise-and-fall-of-corba/). *Communications of the ACM*, volume 51, issue 8, pages 52–57, August 2008. [doi:10.1145/1378704.1378718](https://doi.org/10.1145/1378704.1378718)
|
||||
[^34]: Pete Lacey. [The S Stands for Simple](https://harmful.cat-v.org/software/xml/soap/simple). *harmful.cat-v.org*, November 2006. Archived at [perma.cc/4PMK-Z9X7](https://perma.cc/4PMK-Z9X7)
|
||||
[^35]: Stefan Tilkov. [Interview: Pete Lacey Criticizes Web Services](https://www.infoq.com/articles/pete-lacey-ws-criticism/). *infoq.com*, December 2006. Archived at [perma.cc/JWF4-XY3P](https://perma.cc/JWF4-XY3P)
|
||||
[^36]: Tim Bray. [The Loyal WS-Opposition](https://www.tbray.org/ongoing/When/200x/2004/09/18/WS-Oppo). *tbray.org*, September 2004. Archived at [perma.cc/J5Q8-69Q2](https://perma.cc/J5Q8-69Q2)
|
||||
[^37]: Andrew D. Birrell and Bruce Jay Nelson. [Implementing Remote Procedure Calls](https://www.cs.princeton.edu/courses/archive/fall03/cs518/papers/rpc.pdf). *ACM Transactions on Computer Systems* (TOCS), volume 2, issue 1, pages 39–59, February 1984. [doi:10.1145/2080.357392](https://doi.org/10.1145/2080.357392)
|
||||
[^38]: Jim Waldo, Geoff Wyant, Ann Wollrath, and Sam Kendall. [A Note on Distributed Computing](https://m.mirror.facebook.net/kde/devel/smli_tr-94-29.pdf). Sun Microsystems Laboratories, Inc., Technical Report TR-94-29, November 1994. Archived at [perma.cc/8LRZ-BSZR](https://perma.cc/8LRZ-BSZR)
|
||||
[^39]: Steve Vinoski. [Convenience over Correctness](https://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf). *IEEE Internet Computing*, volume 12, issue 4, pages 89–92, July 2008. [doi:10.1109/MIC.2008.75](https://doi.org/10.1109/MIC.2008.75)
|
||||
[^40]: Brandur Leach. [Designing robust and predictable APIs with idempotency](https://stripe.com/blog/idempotency). *stripe.com*, February 2017. Archived at [perma.cc/JD22-XZQT](https://perma.cc/JD22-XZQT)
|
||||
[^41]: Sam Rose. [Load Balancing](https://samwho.dev/load-balancing/). *samwho.dev*, April 2023. Archived at [perma.cc/Q7BA-9AE2](https://perma.cc/Q7BA-9AE2)
|
||||
[^42]: Troy Hunt. [Your API versioning is wrong, which is why I decided to do it 3 different wrong ways](https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/). *troyhunt.com*, February 2014. Archived at [perma.cc/9DSW-DGR5](https://perma.cc/9DSW-DGR5)
|
||||
[^43]: Brandur Leach. [APIs as infrastructure: future-proofing Stripe with versioning](https://stripe.com/blog/api-versioning). *stripe.com*, August 2017. Archived at [perma.cc/L63K-USFW](https://perma.cc/L63K-USFW)
|
||||
[^44]: Alexandre Alves, Assaf Arkin, Sid Askary, et al. [Web Services Business Process Execution Language Version 2.0](https://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.html). *docs.oasis-open.org*, April 2007.
|
||||
[^45]: [What is a Temporal Service?](https://docs.temporal.io/clusters) *docs.temporal.io*, 2024. Archived at [perma.cc/32P3-CJ9V](https://perma.cc/32P3-CJ9V)
|
||||
[^46]: Stephan Ewen. [Why we built Restate](https://restate.dev/blog/why-we-built-restate/). *restate.dev*, August 2023. Archived at [perma.cc/BJJ2-X75K](https://perma.cc/BJJ2-X75K)
|
||||
[^47]: Keith Tenzer and Joshua Smith. [Idempotency and Durable Execution](https://temporal.io/blog/idempotency-and-durable-execution). *temporal.io*, February 2024. Archived at [perma.cc/9LGW-PCLU](https://perma.cc/9LGW-PCLU)
|
||||
[^48]: [What is a Temporal Workflow?](https://docs.temporal.io/workflows) *docs.temporal.io*, 2024. Archived at [perma.cc/B5C5-Y396](https://perma.cc/B5C5-Y396)
|
||||
[^49]: Jack Kleeman. [Solving durable execution’s immutability problem](https://restate.dev/blog/solving-durable-executions-immutability-problem/). *restate.dev*, February 2024. Archived at [perma.cc/G55L-EYH5](https://perma.cc/G55L-EYH5)
|
||||
[^50]: Srinath Perera. [Exploring Event-Driven Architecture: A Beginner’s Guide for Cloud Native Developers](https://wso2.com/blogs/thesource/exploring-event-driven-architecture-a-beginners-guide-for-cloud-native-developers/). *wso2.com*, August 2023. Archived at [archive.org](https://web.archive.org/web/20240716204613/https%3A//wso2.com/blogs/thesource/exploring-event-driven-architecture-a-beginners-guide-for-cloud-native-developers/)
|
||||
[^51]: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. [Orleans: Distributed Virtual Actors for Programmability and Scalability](https://www.microsoft.com/en-us/research/publication/orleans-distributed-virtual-actors-for-programmability-and-scalability/). Microsoft Research Technical Report MSR-TR-2014-41, March 2014. Archived at [perma.cc/PD3U-WDMF](https://perma.cc/PD3U-WDMF)
|
||||
|
|
@ -15,10 +15,8 @@ network. As discussed in [“Distributed versus Single-Node Systems”](https://
|
|||
why you might want to replicate data:
|
||||
|
||||
* To keep data geographically close to your users (and thus reduce access latency)
|
||||
* To allow the system to continue working even if some of its parts have failed (and thus
|
||||
increase availability)
|
||||
* To scale out the number of machines that can serve read queries (and thus increase read
|
||||
throughput)
|
||||
* To allow the system to continue working even if some of its parts have failed (and thus increase availability)
|
||||
* To scale out the number of machines that can serve read queries (and thus increase read throughput)
|
||||
|
||||
In this chapter we will assume that your dataset is small enough that each machine can hold a copy of
|
||||
the entire dataset. In [Chapter 7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch07.html#ch_sharding) we will relax that assumption and discuss *sharding*
|
||||
|
|
@ -39,7 +37,7 @@ many different implementations. We will discuss the consequences of such choices
|
|||
|
||||
Replication of databases is an old topic—the principles haven’t changed much since they were
|
||||
studied in the 1970s
|
||||
[[1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lindsay1979_ch6)],
|
||||
[^1],
|
||||
because the fundamental constraints of networks have remained the same. Despite being so old,
|
||||
concepts such as *eventual consistency* still cause confusion. In [“Problems with Replication Lag”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lag) we will
|
||||
get more precise about eventual consistency and discuss things like the *read-your-writes* and
|
||||
|
|
@ -74,7 +72,7 @@ longer contain the same data. The most common solution is called *leader-based r
|
|||
[Figure 6-1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_leader_follower)):
|
||||
|
||||
1. One of the replicas is designated the *leader* (also known as *primary* or *source*
|
||||
[[2](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gryp2020)]).
|
||||
[^2]).
|
||||
When clients want to write to the database, they must send their requests to the leader, which
|
||||
first writes the new data to its local storage.
|
||||
2. The other replicas are known as *followers* (*read replicas*, *secondaries*, or *hot standbys*).
|
||||
|
|
@ -97,15 +95,15 @@ multiple leaders for the same shard at the same time.
|
|||
|
||||
Single-leader replication is very widely used. It’s a built-in feature of many relational databases,
|
||||
such as PostgreSQL, MySQL, Oracle Data Guard
|
||||
[[3](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Oracle2019)],
|
||||
[^3],
|
||||
and SQL Server’s Always On Availability Groups
|
||||
[[4](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#AlwaysOn2012)].
|
||||
[^4].
|
||||
It is also used in some document databases such as MongoDB and DynamoDB
|
||||
[[5](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Elhemali2022_ch6)],
|
||||
[^5],
|
||||
message brokers such as Kafka, replicated block devices such as DRBD, and some network filesystems.
|
||||
Many consensus algorithms such as Raft, which is used for replication in CockroachDB
|
||||
[[6](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Taft2020_ch6)],
|
||||
TiDB [[7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Huang2020_ch6)],
|
||||
[^6],
|
||||
TiDB [^7],
|
||||
etcd, and RabbitMQ quorum queues (among others), are also based on a single leader, and
|
||||
automatically elect a new leader if the old one fails (we will discuss consensus in more detail in
|
||||
[Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency)).
|
||||
|
|
@ -114,7 +112,7 @@ automatically elect a new leader if the old one fails (we will discuss consensus
|
|||
|
||||
In older documents you may see the term *master–slave replication*. It means the same as
|
||||
leader-based replication, but the term should be avoided as it is widely considered offensive
|
||||
[[8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Knodel2023)].
|
||||
[^8].
|
||||
|
||||
## Synchronous Versus Asynchronous Replication
|
||||
|
||||
|
|
@ -174,7 +172,7 @@ processing writes, even if all of its followers have fallen behind.
|
|||
|
||||
Weakening durability may sound like a bad trade-off, but asynchronous replication is nevertheless
|
||||
widely used, especially if there are many followers or if they are geographically distributed
|
||||
[[9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hodges2018)].
|
||||
[^9].
|
||||
We will return to this issue in [“Problems with Replication Lag”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lag).
|
||||
|
||||
## Setting Up New Followers
|
||||
|
|
@ -250,7 +248,7 @@ architecture that places less frequently accessed data on object storage while n
|
|||
accessed data is kept on faster storage devices such as SSDs, NVMe, or even in memory. Other systems
|
||||
use object storage as their primary storage tier, but use a separate low-latency storage system such
|
||||
as Amazon’s EBS or Neon’s Safekeepers
|
||||
[[12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kelvich2022)])
|
||||
[^12])
|
||||
to store their WAL. Recently, some systems have gone even farther by adopting a
|
||||
*zero-disk architecture* (ZDA). ZDA-based systems persist all data to object storage and use disks
|
||||
and memory strictly for caching. This allows nodes to have no persistent state, which dramatically
|
||||
|
|
@ -312,7 +310,7 @@ consists of the following steps:
|
|||
2. *Choosing a new leader.* This could be done through an election process (where the leader is chosen by
|
||||
a majority of the remaining replicas), or a new leader could be appointed by a previously
|
||||
established *controller node*
|
||||
[[13](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Fontaine2021)].
|
||||
[^13].
|
||||
The best candidate for leadership is usually the replica with the most up-to-date data changes
|
||||
from the old leader (to minimize any data loss). Getting all the nodes to agree on a new leader
|
||||
is a consensus problem, discussed in detail in [Chapter 10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch10.html#ch_consistency).
|
||||
|
|
@ -333,7 +331,7 @@ Failover is fraught with things that can go wrong:
|
|||
* Discarding writes is especially dangerous if other storage systems outside of the database need to
|
||||
be coordinated with the database contents.
|
||||
For example, in one incident at GitHub
|
||||
[[14](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Newland2012)],
|
||||
[^14],
|
||||
an out-of-date MySQL follower
|
||||
was promoted to leader. The database used an autoincrementing counter to assign primary keys to
|
||||
new rows, but because the new leader’s counter lagged behind the old leader’s, it reused some
|
||||
|
|
@ -346,7 +344,7 @@ Failover is fraught with things that can go wrong:
|
|||
[“Multi-Leader Replication”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_multi_leader)), data is likely to be lost or corrupted. As a safety catch, some
|
||||
systems have a mechanism to shut down one node if two leaders are detected. However, if this
|
||||
mechanism is not carefully designed, you can end up with both nodes being shut down
|
||||
[[15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Imbriaco2012_ch6)].
|
||||
[^15].
|
||||
Moreover, there is a risk that by the time the split brain is detected and the old node is shut
|
||||
down, it is already too late and data has already been corrupted.
|
||||
* What is the right timeout before the leader is declared dead? A longer timeout means a longer
|
||||
|
|
@ -413,7 +411,7 @@ Statement-based replication was used in MySQL before version 5.1. It is still so
|
|||
as it is quite compact, but by default MySQL now switches to row-based replication (discussed shortly) if
|
||||
there is any nondeterminism in a statement. VoltDB uses statement-based replication, and makes it
|
||||
safe by requiring transactions to be deterministic
|
||||
[[16](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hugg2015)].
|
||||
[^16].
|
||||
However, determinism can be hard to guarantee in practice, so many databases prefer other
|
||||
replication methods.
|
||||
|
||||
|
|
@ -464,17 +462,17 @@ indicating that the transaction was committed. MySQL keeps a separate logical re
|
|||
called the *binlog*, in addition to the WAL (when configured to use row-based replication).
|
||||
PostgreSQL implements logical replication by decoding the physical WAL into row
|
||||
insertion/update/delete events
|
||||
[[19](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kapila2023)].
|
||||
[^19].
|
||||
|
||||
Since a logical log is decoupled from the storage engine internals, it can more easily be kept
|
||||
backward compatible, allowing the leader and the follower to run different versions of the database
|
||||
software. This in turn enables upgrading to a new version with minimal downtime
|
||||
[[20](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Petchimuthu2021)].
|
||||
[^20].
|
||||
|
||||
A logical log format is also easier for external applications to parse. This aspect is useful if you want
|
||||
to send the contents of a database to an external system, such as a data warehouse for offline
|
||||
analysis, or for building custom indexes and caches
|
||||
[[21](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Sharma2015te_ch6)].
|
||||
[^21].
|
||||
This technique is called *change data capture*, and we will return to it in [Link to Come].
|
||||
|
||||
# Problems with Replication Lag
|
||||
|
|
@ -502,14 +500,14 @@ database: if you run the same query on the leader and a follower at the same tim
|
|||
different results, because not all writes have been reflected in the follower. This inconsistency is
|
||||
just a temporary state—if you stop writing to the database and wait a while, the followers will
|
||||
eventually catch up and become consistent with the leader. For that reason, this effect is known
|
||||
as *eventual consistency* [[22](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry2011)].
|
||||
as *eventual consistency* [^22].
|
||||
|
||||
###### Note
|
||||
|
||||
The term *eventual consistency* was coined by Douglas Terry et al.
|
||||
[[23](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry1994)],
|
||||
[^23],
|
||||
popularized by Werner Vogels
|
||||
[[24](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Vogels2008)],
|
||||
[^24],
|
||||
and became the battle cry of many NoSQL projects. However, not only NoSQL databases are eventually
|
||||
consistent: followers in an asynchronously replicated relational database have the same
|
||||
characteristics.
|
||||
|
|
@ -542,7 +540,7 @@ submitted was lost, so they will be understandably unhappy.
|
|||
###### Figure 6-3. A user makes a write, followed by a read from a stale replica. To prevent this anomaly, we need read-after-write consistency.
|
||||
|
||||
In this situation, we need *read-after-write consistency*, also known as *read-your-writes consistency*
|
||||
[[23](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry1994)].
|
||||
[^23].
|
||||
This is a guarantee that if the user reloads the page, they will always see any updates they
|
||||
submitted themselves. It makes no promises about other users: other users’ updates may not be
|
||||
visible until some later time. However, it reassures the user that their own input has been saved
|
||||
|
|
@ -563,14 +561,14 @@ are various possible techniques. To mention a few:
|
|||
scaling). In that case, other criteria may be used to decide whether to read from the leader. For
|
||||
example, you could track the time of the last update and, for one minute after the last update, make all
|
||||
reads from the leader
|
||||
[[25](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Willison2022)].
|
||||
[^25].
|
||||
You could also monitor the replication lag on followers and prevent queries on any follower that
|
||||
is more than one minute behind the leader.
|
||||
* The client can remember the timestamp of its most recent write—then the system can ensure that the
|
||||
replica serving any reads for that user reflects updates at least until that timestamp. If a
|
||||
replica is not sufficiently up to date, either the read can be handled by another replica or the
|
||||
query can wait until the replica has caught up
|
||||
[[26](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Tharakan2020)].
|
||||
[^26].
|
||||
The timestamp could be a *logical timestamp* (something that indicates ordering of writes, such as
|
||||
the log sequence number) or the actual system clock (in which case clock synchronization becomes
|
||||
critical; see [“Unreliable Clocks”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch09.html#sec_distributed_clocks)).
|
||||
|
|
@ -632,7 +630,7 @@ and then see it disappear again.
|
|||
|
||||
###### Figure 6-4. A user first reads from a fresh replica, then from a stale replica. Time appears to go backward. To prevent this anomaly, we need monotonic reads.
|
||||
|
||||
*Monotonic reads* [[22](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry2011)] is a guarantee that this
|
||||
*Monotonic reads* [^22] is a guarantee that this
|
||||
kind of anomaly does not happen. It’s a lesser guarantee than strong consistency, but a stronger
|
||||
guarantee than eventual consistency. When you read data, you may see an old value; monotonic reads
|
||||
only means that if one user makes several reads in sequence, they will not see time go
|
||||
|
|
@ -669,14 +667,14 @@ Mr. Poons
|
|||
|
||||
To the observer it looks as though Mrs. Cake is answering the question before Mr. Poons has even asked
|
||||
it. Such psychic powers are impressive, but very confusing
|
||||
[[27](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Pratchett1991)].
|
||||
[^27].
|
||||
|
||||

|
||||
|
||||
###### Figure 6-5. If some shards are replicated slower than others, an observer may see the answer before they see the question.
|
||||
|
||||
Preventing this kind of anomaly requires another type of guarantee: *consistent prefix reads*
|
||||
[[22](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry2011)]. This guarantee says that if a sequence of
|
||||
[^22]. This guarantee says that if a sequence of
|
||||
writes happens in a certain order, then anyone reading those writes will see them appear in the same
|
||||
order.
|
||||
|
||||
|
|
@ -811,7 +809,7 @@ Consistency
|
|||
with another write on another leader.
|
||||
|
||||
This is simply a fundamental limitation of distributed systems
|
||||
[[28](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Bailis2014coord_ch6)].
|
||||
[^28].
|
||||
If you need to enforce such constraints, you’re therefore better off with a single-leader system.
|
||||
However, as we will see in [“Dealing with Conflicting Writes”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_write_conflicts), multi-leader systems can still
|
||||
achieve consistency properties that are useful in a wide range of apps that don’t need such
|
||||
|
|
@ -820,13 +818,13 @@ Consistency
|
|||
Multi-leader replication is less common than single-leader replication, but it is still supported by
|
||||
many databases, including MySQL, Oracle, SQL Server, and YugabyteDB. In some cases it is an external
|
||||
add-on feature, for example in Redis Enterprise, EDB Postgres Distributed, and pglogical
|
||||
[[29](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Raja2022)].
|
||||
[^29].
|
||||
|
||||
As multi-leader replication is a somewhat retrofitted feature in many databases, there are often
|
||||
subtle configuration pitfalls and surprising interactions with other database features. For example,
|
||||
autoincrementing keys, triggers, and integrity constraints can be problematic. For this reason,
|
||||
multi-leader replication is often considered dangerous territory that should be avoided if possible
|
||||
[[30](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hodges2012)].
|
||||
[^30].
|
||||
|
||||
### Multi-leader replication topologies
|
||||
|
||||
|
|
@ -857,7 +855,7 @@ In circular and star topologies, a write may need to pass through several nodes
|
|||
all replicas. Therefore, nodes need to forward data changes they receive from other nodes. To
|
||||
prevent infinite replication loops, each node is given a unique identifier, and in the replication
|
||||
log, each write is tagged with the identifiers of all the nodes it has passed through
|
||||
[[31](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#HBase7709)].
|
||||
[^31].
|
||||
When a node receives a data change that is tagged with its own identifier, that data change is
|
||||
ignored, because the node knows that it has already been processed.
|
||||
|
||||
|
|
@ -949,13 +947,13 @@ existed for a long time, the term has recently gained attention
|
|||
[37](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Jayakar2024)].
|
||||
An application that allows a user to continue editing a file while offline (which may be implemented
|
||||
using a sync engine) is called *offline-first*
|
||||
[[38](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Feyerke2013)].
|
||||
[^38].
|
||||
The term *local-first software* refers to collaborative apps that are not only offline-first, but
|
||||
are also designed to continue working even if the developer who made the software shuts down all of
|
||||
their online services [[39](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kleppmann2019_ch6)].
|
||||
their online services [^39].
|
||||
This can be achieved by using a sync engine with an open standard sync protocol for which multiple
|
||||
service providers are available
|
||||
[[40](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kleppmann2024lofi)].
|
||||
[^40].
|
||||
For example, Git is a local-first collaboration system (albeit one that doesn’t support real-time
|
||||
collaboration) since you can sync via GitHub, GitLab, or any other repository hosting service.
|
||||
|
||||
|
|
@ -979,11 +977,11 @@ approach has a number of advantages:
|
|||
[“The problems with remote procedure calls (RPCs)”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch05.html#sec_problems_with_rpc): for example, if a request to update data on a server fails, the user
|
||||
interface needs to somehow reflect that error. A sync engine allows the app to perform reads and
|
||||
writes on local data, which almost never fails, leading to a more declarative programming style
|
||||
[[41](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hofmeyr2024)].
|
||||
[^41].
|
||||
* In order to display edits from other users in real-time, you need to receive notifications of
|
||||
those edits and efficiently update the user interface accordingly. A sync engine combined with a
|
||||
*reactive programming* model is a good way of implementing this
|
||||
[[42](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#vanHardenberg2020)].
|
||||
[^42].
|
||||
|
||||
Sync engines work best when all the data that the user may need is downloaded in advance and stored
|
||||
persistently on the client. This means that the data is available for offline access when needed,
|
||||
|
|
@ -993,7 +991,7 @@ of data. For example, downloading all the files that the user themselves created
|
|||
e-commerce website probably doesn’t make sense.
|
||||
|
||||
The sync engine was pioneered by Lotus Notes in the 1980s
|
||||
[[43](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kawell1988)]
|
||||
[^43]
|
||||
(without using that term), and sync for specific apps such as calendars has also existed for a long
|
||||
time. Today there are a number of general-purpose sync engines, some of which use a proprietary
|
||||
backend service (e.g., Google Firestore, Realm, or Ditto), and some have an open source backend,
|
||||
|
|
@ -1003,7 +1001,7 @@ Multiplayer video games have a similar need to respond immediately to the user
|
|||
reconcile them with other players’ actions received asynchronously over the network. In game
|
||||
development jargon the equivalent of a sync engine is called *netcode*. The techniques used in
|
||||
netcode are quite specific to the requirements of games
|
||||
[[44](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Pusch2019)], and don’t directly
|
||||
[^44], and don’t directly
|
||||
carry over to other types of software, so we won’t consider them further in this book.
|
||||
|
||||
## Dealing with Conflicting Writes
|
||||
|
|
@ -1040,7 +1038,7 @@ One strategy for conflicts is to avoid them occurring in the first place. For ex
|
|||
application can ensure that all writes for a particular record go through the same leader, then
|
||||
conflicts cannot occur, even if the database as a whole is multi-leader. This approach is not
|
||||
possible in the case of a sync engine client being updated offline, but it is sometimes possible in
|
||||
geo-replicated server systems [[30](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hodges2012)].
|
||||
geo-replicated server systems [^30].
|
||||
|
||||
For example, in an application where a user can only edit their own data, you can ensure that
|
||||
requests from a particular user are always routed to the same region and use the leader in that
|
||||
|
|
@ -1126,7 +1124,7 @@ suffers from a number of problems:
|
|||
union of the carts). This meant that if the customer had removed an item from their cart in one
|
||||
sibling, but another sibling still contained that old item, the removed item would unexpectedly
|
||||
reappear in the customer’s cart
|
||||
[[45](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DeCandia2007_ch6)].
|
||||
[^45].
|
||||
[Figure 6-10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#fig_replication_amazon_anomaly) shows an example where Device 1 removes Book from the shopping
|
||||
cart and concurrently Device 2 removes DVD, but after merging the conflict both items reappear.
|
||||
* If multiple nodes observe the conflict and concurrently resolve it, the conflict resolution
|
||||
|
|
@ -1177,8 +1175,8 @@ then conflict resolution is inevitable, and automating it is often the best appr
|
|||
|
||||
Two families of algorithms are commonly used to implement automatic conflict resolution:
|
||||
*Conflict-free replicated datatypes* (CRDTs)
|
||||
[[46](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Shapiro2011)] and *Operational Transformation* (OT)
|
||||
[[47](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Sun1998)].
|
||||
[^46] and *Operational Transformation* (OT)
|
||||
[^47].
|
||||
They have different design philosophies and performance characteristics, but both are able to
|
||||
perform automatic merges for all the aforementioned types of data.
|
||||
|
||||
|
|
@ -1214,12 +1212,12 @@ There are many algorithms based on variations of these ideas. Lists/arrays can b
|
|||
similarly, using list elements instead of characters, and other datatypes such as key-value maps can
|
||||
be added quite easily. There are some performance and functionality trade-offs between OT and CRDTs,
|
||||
but it’s possible to combine the advantages of CRDTs and OT in one algorithm
|
||||
[[48](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gentle2025)].
|
||||
[^48].
|
||||
|
||||
OT is most often used for real-time collaborative editing of text, e.g. in Google Docs
|
||||
[[32](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DayRichter2010)], whereas CRDTs can be found in
|
||||
[^32], whereas CRDTs can be found in
|
||||
distributed databases such as Redis Enterprise, Riak, and Azure Cosmos DB
|
||||
[[49](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Shukla2018)].
|
||||
[^49].
|
||||
Sync engines for JSON data can be implemented both with CRDTs (e.g., Automerge or Yjs) and with OT
|
||||
(e.g., ShareDB).
|
||||
|
||||
|
|
@ -1256,17 +1254,17 @@ systems were leaderless [[1](https://learning.oreilly.com/library/view/designing
|
|||
[50](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gifford1979)], but the
|
||||
idea was mostly forgotten during the era of dominance of relational databases. It once again became
|
||||
a fashionable architecture for databases after Amazon used it for its in-house *Dynamo* system in
|
||||
2007 [[45](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DeCandia2007_ch6)].
|
||||
2007 [^45].
|
||||
Riak, Cassandra, and ScyllaDB are open source datastores with leaderless replication models inspired
|
||||
by Dynamo, so this kind of database is also known as *Dynamo-style*.
|
||||
|
||||
###### Note
|
||||
|
||||
The original *Dynamo* system was only described in a paper
|
||||
[[45](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DeCandia2007_ch6)], but never released outside of
|
||||
[^45], but never released outside of
|
||||
Amazon. The similarly-named *DynamoDB* is a more recent cloud database from AWS, but it has a
|
||||
completely different architecture: it uses single-leader replication based on the Multi-Paxos
|
||||
consensus algorithm [[5](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Elhemali2022_ch6)].
|
||||
consensus algorithm [^5].
|
||||
|
||||
In some leaderless implementations, the client directly sends its writes to several replicas, while
|
||||
in others, a coordinator node does this on behalf of the client. However, unlike a leader database,
|
||||
|
|
@ -1348,7 +1346,7 @@ considered successful, and we must query at least *r* nodes for each read. (In o
|
|||
*n* = 3, *w* = 2, *r* = 2.) As long as *w* + *r* >
|
||||
*n*, we expect to get an up-to-date value when reading, because at least one of the *r* nodes we’re
|
||||
reading from must be up to date. Reads and writes that obey these *r* and *w* values are called
|
||||
*quorum* reads and writes [[50](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gifford1979)].
|
||||
*quorum* reads and writes [^50].
|
||||
You can think of *r* and *w* as the minimum number of votes required for the read or write to be
|
||||
valid.
|
||||
|
||||
|
|
@ -1402,7 +1400,7 @@ Often, *r* and *w* are chosen to be a majority (more than *n*/2) of nodes, becau
|
|||
not necessarily majorities—it only matters that the sets of nodes used by the read and write
|
||||
operations overlap in at least one node. Other quorum assignments are possible, which allows some
|
||||
flexibility in the design of distributed algorithms
|
||||
[[51](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Howard2016_ch6)].
|
||||
[^51].
|
||||
|
||||
You may also set *w* and *r* to smaller numbers, so that *w* + *r* ≤ *n* (i.e.,
|
||||
the quorum condition is not satisfied). In this case, reads and writes will still be sent to *n*
|
||||
|
|
@ -1432,7 +1430,7 @@ properties can be confusing. Some scenarios include:
|
|||
nodes are full), and overall succeeded on fewer than *w* replicas, it is not rolled back on the
|
||||
replicas where it succeeded. This means that if a write was reported as failed, subsequent reads
|
||||
may or may not return the value from that write
|
||||
[[52](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Blomstedt2012ricon)].
|
||||
[^52].
|
||||
* If the database uses timestamps from a real-time clock to determine which write is newer (as
|
||||
Cassandra and ScyllaDB do, for example), writes might be silently dropped if another node with a
|
||||
faster clock has written to the same key—an issue we previously saw in [“Last write wins (discarding concurrent writes)”](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#sec_replication_lww).
|
||||
|
|
@ -1445,7 +1443,7 @@ properties can be confusing. Some scenarios include:
|
|||
Thus, although quorums appear to guarantee that a read returns the latest written value, in practice
|
||||
it is not so simple. Dynamo-style databases are generally optimized for use cases that can tolerate
|
||||
eventual consistency. The parameters *w* and *r* allow you to adjust the probability of stale values
|
||||
being read [[53](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Bailis2014pbs)],
|
||||
being read [^53],
|
||||
but it’s wise to not take them as absolute guarantees.
|
||||
|
||||
### Monitoring staleness
|
||||
|
|
@ -1464,7 +1462,7 @@ current position, you can measure the amount of replication lag.
|
|||
However, in systems with leaderless replication, there is no fixed order in which writes are
|
||||
applied, which makes monitoring more difficult. The number of hints that a replica stores for
|
||||
handoff can be one measure of system health, but it’s difficult to interpret usefully
|
||||
[[54](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Breck2019)].
|
||||
[^54].
|
||||
Eventual consistency is a deliberately vague guarantee, but for operability it’s important to be
|
||||
able to quantify “eventual.”
|
||||
|
||||
|
|
@ -1493,13 +1491,13 @@ Because there is no failover, and requests go to multiple replicas in parallel a
|
|||
becoming slow or unavailable has very little impact on response times: the client simply uses the
|
||||
responses from the other replicas that are faster to respond. Using the fastest responses is called
|
||||
*request hedging*, and it can significantly reduce tail latency
|
||||
[[55](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Dean2013_ch6)]).
|
||||
[^55]).
|
||||
|
||||
At its core, the resilience of a leaderless system comes from the fact that it doesn’t distinguish
|
||||
between the normal case and the failure case. This is especially helpful when handling so-called
|
||||
*gray failures*, in which a node isn’t completely down, but running in a degraded state where it is
|
||||
unusually slow to handle requests
|
||||
[[56](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Huang2017_ch6)],
|
||||
[^56],
|
||||
or when a node is simply overloaded (for example, if a node has been offline for a while, recovery
|
||||
via hinted handoff can cause a lot of additional load). A leader-based system has to decide whether
|
||||
the situation is bad enough to warrant a failover (which can itself cause further disruption),
|
||||
|
|
@ -1511,7 +1509,7 @@ That said, leaderless systems can have performance problems as well:
|
|||
another replica is unavailable so that it can store hints about writes that the unavailable
|
||||
replica missed. When the unavailable replica comes back, the handoff process needs to send it
|
||||
those hints. This puts additional load on the replicas at a time when the system is already under
|
||||
strain [[54](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Breck2019)].
|
||||
strain [^54].
|
||||
* The more replicas you have, the bigger the size of your quorums, and the more responses you have
|
||||
to wait for before a request can complete. Even if you wait only for the fastest *r* or *w*
|
||||
replicas to respond, and even if you make the requests in parallel, a bigger *r* or *w* increases
|
||||
|
|
@ -1521,7 +1519,7 @@ That said, leaderless systems can have performance problems as well:
|
|||
make it impossible to form a quorum. Some leaderless databases offer a configuration option that
|
||||
allows any reachable replica to accept writes, even if it’s not one of the usual replicas for that
|
||||
key (Riak and Dynamo call this a *sloppy quorum*
|
||||
[[45](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DeCandia2007_ch6)];
|
||||
[^45];
|
||||
Cassandra and ScyllaDB call it *consistency level ANY*). There is no guarantee that subsequent
|
||||
reads will see the written value, but depending on the application it may still be better than
|
||||
having the write fail.
|
||||
|
|
@ -1603,7 +1601,7 @@ An operation A *happens before* another operation B if B knows about A, or depen
|
|||
upon A in some way. Whether one operation happens before another operation is the key to defining
|
||||
what concurrency means. In fact, we can simply say that two operations are *concurrent* if neither
|
||||
happens before the other (i.e., neither knows about the other)
|
||||
[[57](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lamport1978_ch6)].
|
||||
[^57].
|
||||
|
||||
Thus, whenever you have two operations A and B, there are three possibilities: either A happened
|
||||
before B, or B happened before A, or A and B are concurrent. What we need is an algorithm to tell us
|
||||
|
|
@ -1621,7 +1619,7 @@ at exactly the same time—an issue we will discuss in more detail in [Chapter
|
|||
For defining concurrency, exact time doesn’t matter: we simply call two operations concurrent if
|
||||
they are both unaware of each other, regardless of the physical time at which they occurred. People
|
||||
sometimes make a connection between this principle and the special theory of relativity in physics
|
||||
[[57](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lamport1978_ch6)], which introduced the idea that
|
||||
[^57], which introduced the idea that
|
||||
information cannot travel faster than the speed of light. Consequently, two events that occur some
|
||||
distance apart cannot possibly affect each other if the time between the events is shorter than the
|
||||
time it takes light to travel the distance between them.
|
||||
|
|
@ -1719,7 +1717,7 @@ version numbers it has seen from each of the other replicas. This information in
|
|||
to overwrite and which values to keep as siblings.
|
||||
|
||||
The collection of version numbers from all the replicas is called a *version vector*
|
||||
[[58](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#ParkerJr1983)].
|
||||
[^58].
|
||||
A few variants of this idea are in use, but the most interesting is probably the *dotted version
|
||||
vector*
|
||||
[[59](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Preguica2010),
|
||||
|
|
@ -1827,350 +1825,71 @@ machine to store only a subset of the data.
|
|||
|
||||
##### Footnotes
|
||||
|
||||
|
||||
##### References
|
||||
|
||||
[[1](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lindsay1979_ch6-marker)] B. G. Lindsay, P. G. Selinger, C. Galtieri, J. N.
|
||||
Gray, R. A. Lorie, T. G. Price, F. Putzolu, I. L. Traiger, and B. W. Wade.
|
||||
[Notes on Distributed Databases](https://dominoweb.draco.res.ibm.com/reports/RJ2571.pdf).
|
||||
IBM Research, Research Report RJ2571(33471), July 1979.
|
||||
Archived at [perma.cc/EPZ3-MHDD](https://perma.cc/EPZ3-MHDD)
|
||||
|
||||
[[2](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gryp2020-marker)] Kenny Gryp.
|
||||
[MySQL Terminology
|
||||
Updates](https://dev.mysql.com/blog-archive/mysql-terminology-updates/). *dev.mysql.com*, July 2020.
|
||||
Archived at [perma.cc/S62G-6RJ2](https://perma.cc/S62G-6RJ2)
|
||||
|
||||
[[3](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Oracle2019-marker)] Oracle Corporation.
|
||||
[Oracle
|
||||
(Active) Data Guard 19c: Real-Time Data Protection and Availability](https://www.oracle.com/technetwork/database/availability/dg-adg-technical-overview-wp-5347548.pdf). White Paper, *oracle.com*, March 2019.
|
||||
Archived at [perma.cc/P5ST-RPKE](https://perma.cc/P5ST-RPKE)
|
||||
|
||||
[[4](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#AlwaysOn2012-marker)] Microsoft.
|
||||
[What
|
||||
is an Always On availability group?](https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server) *learn.microsoft.com*, September 2024.
|
||||
Archived at [perma.cc/ABH6-3MXF](https://perma.cc/ABH6-3MXF)
|
||||
|
||||
[[5](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Elhemali2022_ch6-marker)] Mostafa Elhemali, Niall Gallagher, Nicholas
|
||||
Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu
|
||||
Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul,
|
||||
Doug Terry, and Akshat Vig.
|
||||
[Amazon DynamoDB: A Scalable,
|
||||
Predictably Performant, and Fully Managed NoSQL Database Service](https://www.usenix.org/conference/atc22/presentation/elhemali). At *USENIX Annual Technical
|
||||
Conference* (ATC), July 2022.
|
||||
|
||||
[[6](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Taft2020_ch6-marker)] Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan
|
||||
VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul
|
||||
Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis.
|
||||
[CockroachDB: The Resilient
|
||||
Geo-Distributed SQL Database](https://dl.acm.org/doi/abs/10.1145/3318464.3386134). At *ACM SIGMOD International Conference on Management of
|
||||
Data* (SIGMOD), pages 1493–1509, June 2020.
|
||||
[doi:10.1145/3318464.3386134](https://doi.org/10.1145/3318464.3386134)
|
||||
|
||||
[[7](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Huang2020_ch6-marker)] Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang,
|
||||
Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang,
|
||||
Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan
|
||||
Pei, and Xin Tang.
|
||||
[TiDB: a Raft-based HTAP database](https://www.vldb.org/pvldb/vol13/p3072-huang.pdf).
|
||||
*Proceedings of the VLDB Endowment*, volume 13, issue 12, pages 3072–3084.
|
||||
[doi:10.14778/3415478.3415535](https://doi.org/10.14778/3415478.3415535)
|
||||
|
||||
[[8](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Knodel2023-marker)] Mallory Knodel and Niels ten Oever.
|
||||
[Terminology, Power, and
|
||||
Inclusive Language in Internet-Drafts and RFCs](https://www.ietf.org/archive/id/draft-knodel-terminology-14.html). *IETF Internet-Draft*, August 2023.
|
||||
Archived at [perma.cc/5ZY9-725E](https://perma.cc/5ZY9-725E)
|
||||
|
||||
[[9](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hodges2018-marker)] Buck Hodges.
|
||||
[Postmortem: VSTS 4 September 2018](https://devblogs.microsoft.com/devopsservice/?p=17485).
|
||||
*devblogs.microsoft.com*, September 2018.
|
||||
Archived at [perma.cc/ZF5R-DYZS](https://perma.cc/ZF5R-DYZS)
|
||||
|
||||
[[10](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Morling2024_ch6-marker)] Gunnar Morling.
|
||||
[Leader
|
||||
Election With S3 Conditional Writes](https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/). *www.morling.dev*, August 2024.
|
||||
Archived at [perma.cc/7V2N-J78Y](https://perma.cc/7V2N-J78Y)
|
||||
|
||||
[[11](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Chandramohan2024-marker)] Vignesh Chandramohan, Rohan Desai, and Chris Riccomini.
|
||||
[SlateDB Manifest
|
||||
Design](https://github.com/slatedb/slatedb/blob/main/rfcs/0001-manifest.md). *github.com*, May 2024.
|
||||
Archived at [perma.cc/8EUY-P32Z](https://perma.cc/8EUY-P32Z)
|
||||
|
||||
[[12](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kelvich2022-marker)] Stas Kelvich.
|
||||
[Why does Neon use Paxos instead of Raft, and what’s the
|
||||
difference?](https://neon.tech/blog/paxos) *neon.tech*, August 2022.
|
||||
Archived at [perma.cc/SEZ4-2GXU](https://perma.cc/SEZ4-2GXU)
|
||||
|
||||
[[13](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Fontaine2021-marker)] Dimitri Fontaine.
|
||||
[An
|
||||
introduction to the pg\_auto\_failover project](https://tapoueh.org/blog/2021/11/an-introduction-to-the-pg_auto_failover-project/). *tapoueh.org*, November 2021.
|
||||
Archived at [perma.cc/3WH5-6BAF](https://perma.cc/3WH5-6BAF)
|
||||
|
||||
[[14](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Newland2012-marker)] Jesse Newland.
|
||||
[GitHub
|
||||
availability this week](https://github.blog/news-insights/the-library/github-availability-this-week/). *github.blog*, September 2012.
|
||||
Archived at [perma.cc/3YRF-FTFJ](https://perma.cc/3YRF-FTFJ)
|
||||
|
||||
[[15](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Imbriaco2012_ch6-marker)] Mark Imbriaco.
|
||||
[Downtime last Saturday](https://github.blog/news-insights/the-library/downtime-last-saturday/).
|
||||
*github.blog*, December 2012.
|
||||
Archived at [perma.cc/M7X5-E8SQ](https://perma.cc/M7X5-E8SQ)
|
||||
|
||||
[[16](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hugg2015-marker)] John Hugg.
|
||||
[‘All In’ with Determinism for Performance and
|
||||
Testing in Distributed Systems](https://www.youtube.com/watch?v=gJRj3vJL4wE). At *Strange Loop*, September 2015.
|
||||
|
||||
[[17](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Suzuki2017_ch6-marker)] Hironobu Suzuki.
|
||||
[The Internals of PostgreSQL](https://www.interdb.jp/pg/). *interdb.jp*, 2017.
|
||||
|
||||
[[18](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kapila2012-marker)] Amit Kapila.
|
||||
[WAL
|
||||
Internals of PostgreSQL](https://www.pgcon.org/2012/schedule/attachments/258_212_Internals%20Of%20PostgreSQL%20Wal.pdf). At *PostgreSQL Conference* (PGCon), May 2012.
|
||||
Archived at [perma.cc/6225-3SUX](https://perma.cc/6225-3SUX)
|
||||
|
||||
[[19](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kapila2023-marker)] Amit Kapila.
|
||||
[Evolution
|
||||
of Logical Replication](https://amitkapila16.blogspot.com/2023/09/evolution-of-logical-replication.html). *amitkapila16.blogspot.com*, September 2023.
|
||||
Archived at [perma.cc/F9VX-JLER](https://perma.cc/F9VX-JLER)
|
||||
|
||||
[[20](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Petchimuthu2021-marker)] Aru Petchimuthu.
|
||||
[Upgrade
|
||||
your Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL database, Part 2: Using the pglogical
|
||||
extension](https://aws.amazon.com/blogs/database/part-2-upgrade-your-amazon-rds-for-postgresql-database-using-the-pglogical-extension/). *aws.amazon.com*, August 2021.
|
||||
Archived at [perma.cc/RXT8-FS2T](https://perma.cc/RXT8-FS2T)
|
||||
|
||||
[[21](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Sharma2015te_ch6-marker)] Yogeshwer Sharma, Philippe Ajoux, Petchean
|
||||
Ang, David Callies, Abhishek Choudhary, Laurent Demailly, Thomas Fersch, Liat Atsmon Guz, Andrzej
|
||||
Kotulski, Sachin Kulkarni, Sanjeev Kumar, Harry Li, Jun Li, Evgeniy Makeev, Kowshik Prakasam,
|
||||
Robbert van Renesse, Sabyasachi Roy, Pratyush Seth, Yee Jiun Song, Benjamin Wester, Kaushik
|
||||
Veeraraghavan, and Peter Xie.
|
||||
[Wormhole:
|
||||
Reliable Pub-Sub to Support Geo-Replicated Internet Services](https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf). At *12th USENIX
|
||||
Symposium on Networked Systems Design and Implementation* (NSDI), May 2015.
|
||||
|
||||
[[22](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry2011-marker)] Douglas B. Terry.
|
||||
[Replicated
|
||||
Data Consistency Explained Through Baseball](https://www.microsoft.com/en-us/research/publication/replicated-data-consistency-explained-through-baseball/). Microsoft Research, Technical Report
|
||||
MSR-TR-2011-137, October 2011.
|
||||
Archived at [perma.cc/F4KZ-AR38](https://perma.cc/F4KZ-AR38)
|
||||
|
||||
[[23](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Terry1994-marker)] Douglas B. Terry, Alan J. Demers, Karin Petersen,
|
||||
Mike J. Spreitzer, Marvin M. Theher, and Brent B. Welch.
|
||||
[Session Guarantees
|
||||
for Weakly Consistent Replicated Data](https://csis.pace.edu/~marchese/CS865/Papers/SessionGuaranteesPDIS.pdf). At *3rd International Conference on Parallel and
|
||||
Distributed Information Systems* (PDIS), September 1994.
|
||||
[doi:10.1109/PDIS.1994.331722](https://doi.org/10.1109/PDIS.1994.331722)
|
||||
|
||||
[[24](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Vogels2008-marker)] Werner Vogels.
|
||||
[Eventually Consistent](https://queue.acm.org/detail.cfm?id=1466448).
|
||||
*ACM Queue*, volume 6, issue 6, pages 14–19, October 2008.
|
||||
[doi:10.1145/1466443.1466448](https://doi.org/10.1145/1466443.1466448)
|
||||
|
||||
[[25](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Willison2022-marker)] Simon Willison.
|
||||
[Reply to: “My thoughts about Fly.io (so
|
||||
far) and other newish technology I’m getting into”](https://news.ycombinator.com/item?id=31434055). *news.ycombinator.com*, May 2022.
|
||||
Archived at [perma.cc/ZRV4-WWV8](https://perma.cc/ZRV4-WWV8)
|
||||
|
||||
[[26](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Tharakan2020-marker)] Nithin Tharakan.
|
||||
[Scaling Bitbucket’s
|
||||
Database](https://www.atlassian.com/blog/bitbucket/scaling-bitbuckets-database). *atlassian.com*, October 2020.
|
||||
Archived at [perma.cc/JAB7-9FGX](https://perma.cc/JAB7-9FGX)
|
||||
|
||||
[[27](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Pratchett1991-marker)] Terry Pratchett. *Reaper Man: A Discworld
|
||||
Novel*. Victor Gollancz, 1991. ISBN: 978-0-575-04979-6
|
||||
|
||||
[[28](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Bailis2014coord_ch6-marker)] Peter Bailis, Alan Fekete, Michael J.
|
||||
Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica.
|
||||
[Coordination Avoidance in Database Systems](https://arxiv.org/abs/1402.2237).
|
||||
*Proceedings of the VLDB Endowment*, volume 8, issue 3, pages 185–196, November 2014.
|
||||
[doi:10.14778/2735508.2735509](https://doi.org/10.14778/2735508.2735509)
|
||||
|
||||
[[29](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Raja2022-marker)] Yaser Raja and Peter Celentano.
|
||||
[PostgreSQL
|
||||
bi-directional replication using pglogical](https://aws.amazon.com/blogs/database/postgresql-bi-directional-replication-using-pglogical/). *aws.amazon.com*, January 2022.
|
||||
Archived at <https://perma.cc/BUQ2-5QWN>
|
||||
|
||||
[[30](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hodges2012-marker)] Robert Hodges.
|
||||
[If
|
||||
You \*Must\* Deploy Multi-Master Replication, Read This First](https://scale-out-blog.blogspot.com/2012/04/if-you-must-deploy-multi-master.html). *scale-out-blog.blogspot.com*,
|
||||
April 2012. Archived at [perma.cc/C2JN-F6Y8](https://perma.cc/C2JN-F6Y8)
|
||||
|
||||
[[31](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#HBase7709-marker)] Lars Hofhansl.
|
||||
[HBASE-7709: Infinite Loop Possible in
|
||||
Master/Master Replication](https://issues.apache.org/jira/browse/HBASE-7709). *issues.apache.org*, January 2013.
|
||||
Archived at [perma.cc/24G2-8NLC](https://perma.cc/24G2-8NLC)
|
||||
|
||||
[[32](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DayRichter2010-marker)] John Day-Richter.
|
||||
[What’s
|
||||
Different About the New Google Docs: Making Collaboration Fast](https://drive.googleblog.com/2010/09/whats-different-about-new-google-docs.html). *drive.googleblog.com*,
|
||||
September 2010. Archived at [perma.cc/5TL8-TSJ2](https://perma.cc/5TL8-TSJ2)
|
||||
|
||||
[[33](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Wallace2019-marker)] Evan Wallace.
|
||||
[How Figma’s
|
||||
multiplayer technology works](https://www.figma.com/blog/how-figmas-multiplayer-technology-works/). *figma.com*, October 2019.
|
||||
Archived at [perma.cc/L49H-LY4D](https://perma.cc/L49H-LY4D)
|
||||
|
||||
[[34](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Artman2023-marker)] Tuomas Artman.
|
||||
[Scaling the Linear Sync Engine](https://linear.app/blog/scaling-the-linear-sync-engine).
|
||||
*linear.app*, June 2023.
|
||||
|
||||
[[35](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Saafan2024-marker)] Amr Saafan.
|
||||
[Why Sync
|
||||
Engines Might Be the Future of Web Applications](https://www.nilebits.com/blog/2024/09/sync-engines-future-web-applications/). *nilebits.com*, September 2024.
|
||||
Archived at [perma.cc/5N73-5M3V](https://perma.cc/5N73-5M3V)
|
||||
|
||||
[[36](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hagoel2024-marker)] Isaac Hagoel.
|
||||
[Are Sync
|
||||
Engines The Future of Web Applications?](https://dev.to/isaachagoel/are-sync-engines-the-future-of-web-applications-1bbi) *dev.to*, July 2024.
|
||||
Archived at [perma.cc/R9HF-BKKL](https://perma.cc/R9HF-BKKL)
|
||||
|
||||
[[37](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Jayakar2024-marker)] Sujay Jayakar.
|
||||
[A Map of Sync](https://stack.convex.dev/a-map-of-sync). *stack.convex.dev*,
|
||||
October 2024. Archived at [perma.cc/82R3-H42A](https://perma.cc/82R3-H42A)
|
||||
|
||||
[[38](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Feyerke2013-marker)] Alex Feyerke.
|
||||
[Designing Offline-First Web Apps](https://alistapart.com/article/offline-first/).
|
||||
*alistapart.com*, December 2013.
|
||||
Archived at [perma.cc/WH7R-S2DS](https://perma.cc/WH7R-S2DS)
|
||||
|
||||
[[39](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kleppmann2019_ch6-marker)] Martin Kleppmann,
|
||||
Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan.
|
||||
[Local-first software: You own your data, in
|
||||
spite of the cloud](https://www.inkandswitch.com/local-first/). At *ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and
|
||||
Reflections on Programming and Software* (Onward!), October 2019, pages 154–178.
|
||||
[doi:10.1145/3359591.3359737](https://doi.org/10.1145/3359591.3359737)
|
||||
|
||||
[[40](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kleppmann2024lofi-marker)] Martin Kleppmann.
|
||||
[The past, present, and
|
||||
future of local-first](https://martin.kleppmann.com/2024/05/30/local-first-conference.html). At *Local-First Conference*, May 2024.
|
||||
|
||||
[[41](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Hofmeyr2024-marker)] Conrad Hofmeyr.
|
||||
[API
|
||||
Calling is to Sync Engines as jQuery is to React](https://www.powersync.com/blog/api-calling-is-to-sync-engines-as-jquery-is-to-react). *powersync.com*, November 2024.
|
||||
Archived at [perma.cc/2FP9-7WJJ](https://perma.cc/2FP9-7WJJ)
|
||||
|
||||
[[42](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#vanHardenberg2020-marker)] Peter van Hardenberg and Martin Kleppmann.
|
||||
[PushPin: Towards
|
||||
Production-Quality Peer-to-Peer Collaboration](https://martin.kleppmann.com/papers/pushpin-papoc20.pdf). At *7th Workshop on Principles and Practice
|
||||
of Consistency for Distributed Data* (PaPoC), April 2020.
|
||||
[doi:10.1145/3380787.3393683](https://doi.org/10.1145/3380787.3393683)
|
||||
|
||||
[[43](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Kawell1988-marker)] Leonard Kawell, Jr., Steven Beckhardt, Timothy
|
||||
Halvorsen, Raymond Ozzie, and Irene Greif.
|
||||
[Replicated document management in a group
|
||||
communication system](https://dl.acm.org/doi/pdf/10.1145/62266.1024798). At *ACM Conference on Computer-Supported Cooperative Work* (CSCW),
|
||||
September 1988.
|
||||
[doi:10.1145/62266.1024798](https://doi.org/10.1145/62266.1024798)
|
||||
|
||||
[[44](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Pusch2019-marker)] Ricky Pusch.
|
||||
[Explaining how fighting games use delay-based and
|
||||
rollback netcode](https://words.infil.net/w02-netcode.html). *words.infil.net* and *arstechnica.com*, October 2019.
|
||||
Archived at [perma.cc/DE7W-RDJ8](https://perma.cc/DE7W-RDJ8)
|
||||
|
||||
[[45](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#DeCandia2007_ch6-marker)] Giuseppe DeCandia, Deniz Hastorun, Madan
|
||||
Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian,
|
||||
Peter Vosshall, and Werner Vogels.
|
||||
[Dynamo: Amazon’s
|
||||
Highly Available Key-Value Store](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf). At *21st ACM Symposium on Operating Systems Principles*
|
||||
(SOSP), October 2007.
|
||||
[doi:10.1145/1323293.1294281](https://doi.org/10.1145/1323293.1294281)
|
||||
|
||||
[[46](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Shapiro2011-marker)] Marc Shapiro, Nuno Preguiça, Carlos Baquero, and
|
||||
Marek Zawirski. [A Comprehensive Study
|
||||
of Convergent and Commutative Replicated Data Types](https://inria.hal.science/inria-00555588v1/document). INRIA Research Report no. 7506, January
|
||||
2011.
|
||||
|
||||
[[47](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Sun1998-marker)] Chengzheng Sun and Clarence Ellis.
|
||||
[Operational
|
||||
Transformation in Real-Time Group Editors: Issues, Algorithms, and Achievements](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aef660812c5a9c4d3f06775f9455eeb090a4ff0f). At
|
||||
*ACM Conference on Computer Supported Cooperative Work* (CSCW), November 1998.
|
||||
[doi:10.1145/289444.289469](https://doi.org/10.1145/289444.289469)
|
||||
|
||||
[[48](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gentle2025-marker)] Joseph Gentle and Martin Kleppmann.
|
||||
[Collaborative Text Editing with Eg-walker: Better,
|
||||
Faster, Smaller](https://arxiv.org/abs/2409.14252). At *20th European Conference on Computer Systems* (EuroSys), March 2025.
|
||||
[doi:10.1145/3689031.3696076](https://doi.org/10.1145/3689031.3696076)
|
||||
|
||||
[[49](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Shukla2018-marker)] Dharma Shukla.
|
||||
[Azure
|
||||
Cosmos DB: Pushing the frontier of globally distributed databases](https://azure.microsoft.com/en-us/blog/azure-cosmos-db-pushing-the-frontier-of-globally-distributed-databases/). *azure.microsoft.com*, September 2018.
|
||||
Archived at [perma.cc/UT3B-HH6R](https://perma.cc/UT3B-HH6R)
|
||||
|
||||
[[50](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Gifford1979-marker)] David K. Gifford.
|
||||
[Weighted Voting for
|
||||
Replicated Data](https://www.cs.cmu.edu/~15-749/READINGS/required/availability/gifford79.pdf). At *7th ACM Symposium on Operating Systems Principles* (SOSP), December 1979.
|
||||
[doi:10.1145/800215.806583](https://doi.org/10.1145/800215.806583)
|
||||
|
||||
[[51](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Howard2016_ch6-marker)] Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman.
|
||||
[Flexible Paxos:
|
||||
Quorum Intersection Revisited](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2016.25). At *20th International Conference on Principles of Distributed
|
||||
Systems* (OPODIS), December 2016.
|
||||
[doi:10.4230/LIPIcs.OPODIS.2016.25](https://doi.org/10.4230/LIPIcs.OPODIS.2016.25)
|
||||
|
||||
[[52](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Blomstedt2012ricon-marker)] Joseph Blomstedt.
|
||||
[Bringing Consistency to Riak](https://vimeo.com/51973001). At *RICON West*,
|
||||
October 2012.
|
||||
|
||||
[[53](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Bailis2014pbs-marker)] Peter Bailis, Shivaram Venkataraman,
|
||||
Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica.
|
||||
[Quantifying eventual consistency with
|
||||
PBS](http://www.bailis.org/papers/pbs-vldbj2014.pdf). *The VLDB Journal*, volume 23, pages 279–302, April 2014.
|
||||
[doi:10.1007/s00778-013-0330-1](https://doi.org/10.1007/s00778-013-0330-1)
|
||||
|
||||
[[54](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Breck2019-marker)] Colin Breck.
|
||||
[Shared-Nothing
|
||||
Architectures for Server Replication and Synchronization](https://blog.colinbreck.com/shared-nothing-architectures-for-server-replication-and-synchronization/). *blog.colinbreck.com*, December 2019.
|
||||
Archived at [perma.cc/48P3-J6CJ](https://perma.cc/48P3-J6CJ)
|
||||
|
||||
[[55](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Dean2013_ch6-marker)] Jeffrey Dean and Luiz André Barroso.
|
||||
[The Tail at Scale](https://cacm.acm.org/research/the-tail-at-scale/).
|
||||
*Communications of the ACM*, volume 56, issue 2, pages 74–80, February 2013.
|
||||
[doi:10.1145/2408776.2408794](https://doi.org/10.1145/2408776.2408794)
|
||||
|
||||
[[56](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Huang2017_ch6-marker)] Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R.
|
||||
Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao.
|
||||
[Gray
|
||||
Failure: The Achilles’ Heel of Cloud-Scale Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/paper-1.pdf). At *16th Workshop on Hot Topics in
|
||||
Operating Systems* (HotOS), May 2017.
|
||||
[doi:10.1145/3102980.3103005](https://doi.org/10.1145/3102980.3103005)
|
||||
|
||||
[[57](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Lamport1978_ch6-marker)] Leslie Lamport.
|
||||
[Time,
|
||||
Clocks, and the Ordering of Events in a Distributed System](https://www.microsoft.com/en-us/research/publication/time-clocks-ordering-events-distributed-system/). *Communications of the ACM*,
|
||||
volume 21, issue 7, pages 558–565, July 1978.
|
||||
[doi:10.1145/359545.359563](https://doi.org/10.1145/359545.359563)
|
||||
|
||||
[[58](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#ParkerJr1983-marker)] D. Stott Parker Jr., Gerald J. Popek, Gerard
|
||||
Rudisin, Allen Stoughton, Bruce J. Walker, Evelyn Walton, Johanna M. Chow, David Edwards, Stephen
|
||||
Kiser, and Charles Kline.
|
||||
[Detection of
|
||||
Mutual Inconsistency in Distributed Systems](https://pages.cs.wisc.edu/~remzi/Classes/739/Papers/parker83detection.pdf). *IEEE Transactions on Software Engineering*,
|
||||
volume SE-9, issue 3, pages 240–247, May 1983.
|
||||
[doi:10.1109/TSE.1983.236733](https://doi.org/10.1109/TSE.1983.236733)
|
||||
|
||||
[[59](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Preguica2010-marker)] Nuno Preguiça, Carlos Baquero, Paulo Sérgio
|
||||
Almeida, Victor Fonte, and Ricardo Gonçalves. [Dotted
|
||||
Version Vectors: Logical Clocks for Optimistic Replication](https://arxiv.org/abs/1011.5808). arXiv:1011.5808, November 2010.
|
||||
|
||||
[[60](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Manepalli2022-marker)] Giridhar Manepalli.
|
||||
[Clocks and Causality - Ordering Events
|
||||
in Distributed Systems](https://www.exhypothesi.com/clocks-and-causality/). *exhypothesi.com*, November 2022.
|
||||
Archived at [perma.cc/8REU-KVLQ](https://perma.cc/8REU-KVLQ)
|
||||
|
||||
[[61](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Cribbs2014-marker)] Sean Cribbs.
|
||||
[A Brief History of Time in Riak](https://speakerdeck.com/seancribbs/a-brief-history-of-time-in-riak).
|
||||
At *RICON*, October 2014. Archived at [perma.cc/7U9P-6JFX](https://perma.cc/7U9P-6JFX)
|
||||
|
||||
[[62](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Brown2015-marker)] Russell Brown.
|
||||
[Vector
|
||||
Clocks Revisited Part 2: Dotted Version Vectors](https://riak.com/posts/technical/vector-clocks-revisited-part-2-dotted-version-vectors/). *riak.com*, November 2015.
|
||||
Archived at [perma.cc/96QP-W98R](https://perma.cc/96QP-W98R)
|
||||
|
||||
[[63](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Baquero2011-marker)] Carlos Baquero.
|
||||
[Version
|
||||
Vectors Are Not Vector Clocks](https://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/). *haslab.wordpress.com*, July 2011.
|
||||
Archived at [perma.cc/7PNU-4AMG](https://perma.cc/7PNU-4AMG)
|
||||
|
||||
[[64](https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch06.html#Schwarz1994-marker)] Reinhard Schwarz and Friedemann Mattern.
|
||||
[Detecting Causal
|
||||
Relationships in Distributed Computations: In Search of the Holy Grail](https://disco.ethz.ch/courses/hs08/seminar/papers/mattern4.pdf). *Distributed
|
||||
Computing*, volume 7, issue 3, pages 149–174, March 1994.
|
||||
[doi:10.1007/BF02277859](https://doi.org/10.1007/BF02277859)
|
||||
[^1]: B. G. Lindsay, P. G. Selinger, C. Galtieri, J. N. Gray, R. A. Lorie, T. G. Price, F. Putzolu, I. L. Traiger, and B. W. Wade. [Notes on Distributed Databases](https://dominoweb.draco.res.ibm.com/reports/RJ2571.pdf). IBM Research, Research Report RJ2571(33471), July 1979. Archived at [perma.cc/EPZ3-MHDD](https://perma.cc/EPZ3-MHDD)
|
||||
[^2]: Kenny Gryp. [MySQL Terminology Updates](https://dev.mysql.com/blog-archive/mysql-terminology-updates/). *dev.mysql.com*, July 2020. Archived at [perma.cc/S62G-6RJ2](https://perma.cc/S62G-6RJ2)
|
||||
[^3]: Oracle Corporation. [Oracle (Active) Data Guard 19c: Real-Time Data Protection and Availability](https://www.oracle.com/technetwork/database/availability/dg-adg-technical-overview-wp-5347548.pdf). White Paper, *oracle.com*, March 2019. Archived at [perma.cc/P5ST-RPKE](https://perma.cc/P5ST-RPKE)
|
||||
[^4]: Microsoft. [What is an Always On availability group?](https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server) *learn.microsoft.com*, September 2024. Archived at [perma.cc/ABH6-3MXF](https://perma.cc/ABH6-3MXF)
|
||||
[^5]: Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, and Akshat Vig. [Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service](https://www.usenix.org/conference/atc22/presentation/elhemali). At *USENIX Annual Technical Conference* (ATC), July 2022.
|
||||
[^6]: Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. [CockroachDB: The Resilient Geo-Distributed SQL Database](https://dl.acm.org/doi/abs/10.1145/3318464.3386134). At *ACM SIGMOD International Conference on Management of Data* (SIGMOD), pages 1493–1509, June 2020. [doi:10.1145/3318464.3386134](https://doi.org/10.1145/3318464.3386134)
|
||||
[^7]: Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. [TiDB: a Raft-based HTAP database](https://www.vldb.org/pvldb/vol13/p3072-huang.pdf). *Proceedings of the VLDB Endowment*, volume 13, issue 12, pages 3072–3084. [doi:10.14778/3415478.3415535](https://doi.org/10.14778/3415478.3415535)
|
||||
[^8]: Mallory Knodel and Niels ten Oever. [Terminology, Power, and Inclusive Language in Internet-Drafts and RFCs](https://www.ietf.org/archive/id/draft-knodel-terminology-14.html). *IETF Internet-Draft*, August 2023. Archived at [perma.cc/5ZY9-725E](https://perma.cc/5ZY9-725E)
|
||||
[^9]: Buck Hodges. [Postmortem: VSTS 4 September 2018](https://devblogs.microsoft.com/devopsservice/?p=17485). *devblogs.microsoft.com*, September 2018. Archived at [perma.cc/ZF5R-DYZS](https://perma.cc/ZF5R-DYZS)
|
||||
[^10]: Gunnar Morling. [Leader Election With S3 Conditional Writes](https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/). *www.morling.dev*, August 2024. Archived at [perma.cc/7V2N-J78Y](https://perma.cc/7V2N-J78Y)
|
||||
[^11]: Vignesh Chandramohan, Rohan Desai, and Chris Riccomini. [SlateDB Manifest Design](https://github.com/slatedb/slatedb/blob/main/rfcs/0001-manifest.md). *github.com*, May 2024. Archived at [perma.cc/8EUY-P32Z](https://perma.cc/8EUY-P32Z)
|
||||
[^12]: Stas Kelvich. [Why does Neon use Paxos instead of Raft, and what’s the difference?](https://neon.tech/blog/paxos) *neon.tech*, August 2022. Archived at [perma.cc/SEZ4-2GXU](https://perma.cc/SEZ4-2GXU)
|
||||
[^13]: Dimitri Fontaine. [An introduction to the pg\_auto\_failover project](https://tapoueh.org/blog/2021/11/an-introduction-to-the-pg_auto_failover-project/). *tapoueh.org*, November 2021. Archived at [perma.cc/3WH5-6BAF](https://perma.cc/3WH5-6BAF)
|
||||
[^14]: Jesse Newland. [GitHub availability this week](https://github.blog/news-insights/the-library/github-availability-this-week/). *github.blog*, September 2012. Archived at [perma.cc/3YRF-FTFJ](https://perma.cc/3YRF-FTFJ)
|
||||
[^15]: Mark Imbriaco. [Downtime last Saturday](https://github.blog/news-insights/the-library/downtime-last-saturday/). *github.blog*, December 2012. Archived at [perma.cc/M7X5-E8SQ](https://perma.cc/M7X5-E8SQ)
|
||||
[^16]: John Hugg. [‘All In’ with Determinism for Performance and Testing in Distributed Systems](https://www.youtube.com/watch?v=gJRj3vJL4wE). At *Strange Loop*, September 2015.
|
||||
[^17]: Hironobu Suzuki. [The Internals of PostgreSQL](https://www.interdb.jp/pg/). *interdb.jp*, 2017.
|
||||
[^18]: Amit Kapila. [WAL Internals of PostgreSQL](https://www.pgcon.org/2012/schedule/attachments/258_212_Internals%20Of%20PostgreSQL%20Wal.pdf). At *PostgreSQL Conference* (PGCon), May 2012. Archived at [perma.cc/6225-3SUX](https://perma.cc/6225-3SUX)
|
||||
[^19]: Amit Kapila. [Evolution of Logical Replication](https://amitkapila16.blogspot.com/2023/09/evolution-of-logical-replication.html). *amitkapila16.blogspot.com*, September 2023. Archived at [perma.cc/F9VX-JLER](https://perma.cc/F9VX-JLER)
|
||||
[^20]: Aru Petchimuthu. [Upgrade your Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL database, Part 2: Using the pglogical extension](https://aws.amazon.com/blogs/database/part-2-upgrade-your-amazon-rds-for-postgresql-database-using-the-pglogical-extension/). *aws.amazon.com*, August 2021. Archived at [perma.cc/RXT8-FS2T](https://perma.cc/RXT8-FS2T)
|
||||
[^21]: Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, David Callies, Abhishek Choudhary, Laurent Demailly, Thomas Fersch, Liat Atsmon Guz, Andrzej Kotulski, Sachin Kulkarni, Sanjeev Kumar, Harry Li, Jun Li, Evgeniy Makeev, Kowshik Prakasam, Robbert van Renesse, Sabyasachi Roy, Pratyush Seth, Yee Jiun Song, Benjamin Wester, Kaushik Veeraraghavan, and Peter Xie. [Wormhole: Reliable Pub-Sub to Support Geo-Replicated Internet Services](https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf). At *12th USENIX Symposium on Networked Systems Design and Implementation* (NSDI), May 2015.
|
||||
[^22]: Douglas B. Terry. [Replicated Data Consistency Explained Through Baseball](https://www.microsoft.com/en-us/research/publication/replicated-data-consistency-explained-through-baseball/). Microsoft Research, Technical Report MSR-TR-2011-137, October 2011. Archived at [perma.cc/F4KZ-AR38](https://perma.cc/F4KZ-AR38)
|
||||
[^23]: Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike J. Spreitzer, Marvin M. Theher, and Brent B. Welch. [Session Guarantees for Weakly Consistent Replicated Data](https://csis.pace.edu/~marchese/CS865/Papers/SessionGuaranteesPDIS.pdf). At *3rd International Conference on Parallel and Distributed Information Systems* (PDIS), September 1994. [doi:10.1109/PDIS.1994.331722](https://doi.org/10.1109/PDIS.1994.331722)
|
||||
[^24]: Werner Vogels. [Eventually Consistent](https://queue.acm.org/detail.cfm?id=1466448). *ACM Queue*, volume 6, issue 6, pages 14–19, October 2008. [doi:10.1145/1466443.1466448](https://doi.org/10.1145/1466443.1466448)
|
||||
[^25]: Simon Willison. [Reply to: “My thoughts about Fly.io (so far) and other newish technology I’m getting into”](https://news.ycombinator.com/item?id=31434055). *news.ycombinator.com*, May 2022. Archived at [perma.cc/ZRV4-WWV8](https://perma.cc/ZRV4-WWV8)
|
||||
[^26]: Nithin Tharakan. [Scaling Bitbucket’s Database](https://www.atlassian.com/blog/bitbucket/scaling-bitbuckets-database). *atlassian.com*, October 2020. Archived at [perma.cc/JAB7-9FGX](https://perma.cc/JAB7-9FGX)
|
||||
[^27]: Terry Pratchett. *Reaper Man: A Discworld Novel*. Victor Gollancz, 1991. ISBN: 978-0-575-04979-6
|
||||
[^28]: Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. [Coordination Avoidance in Database Systems](https://arxiv.org/abs/1402.2237). *Proceedings of the VLDB Endowment*, volume 8, issue 3, pages 185–196, November 2014. [doi:10.14778/2735508.2735509](https://doi.org/10.14778/2735508.2735509)
|
||||
[^29]: Yaser Raja and Peter Celentano. [PostgreSQL bi-directional replication using pglogical](https://aws.amazon.com/blogs/database/postgresql-bi-directional-replication-using-pglogical/). *aws.amazon.com*, January 2022. Archived at <https://perma.cc/BUQ2-5QWN>
|
||||
[^30]: Robert Hodges. [If You \*Must\* Deploy Multi-Master Replication, Read This First](https://scale-out-blog.blogspot.com/2012/04/if-you-must-deploy-multi-master.html). *scale-out-blog.blogspot.com*, April 2012. Archived at [perma.cc/C2JN-F6Y8](https://perma.cc/C2JN-F6Y8)
|
||||
[^31]: Lars Hofhansl. [HBASE-7709: Infinite Loop Possible in Master/Master Replication](https://issues.apache.org/jira/browse/HBASE-7709). *issues.apache.org*, January 2013. Archived at [perma.cc/24G2-8NLC](https://perma.cc/24G2-8NLC)
|
||||
[^32]: John Day-Richter. [What’s Different About the New Google Docs: Making Collaboration Fast](https://drive.googleblog.com/2010/09/whats-different-about-new-google-docs.html). *drive.googleblog.com*, September 2010. Archived at [perma.cc/5TL8-TSJ2](https://perma.cc/5TL8-TSJ2)
|
||||
[^33]: Evan Wallace. [How Figma’s multiplayer technology works](https://www.figma.com/blog/how-figmas-multiplayer-technology-works/). *figma.com*, October 2019. Archived at [perma.cc/L49H-LY4D](https://perma.cc/L49H-LY4D)
|
||||
[^34]: Tuomas Artman. [Scaling the Linear Sync Engine](https://linear.app/blog/scaling-the-linear-sync-engine). *linear.app*, June 2023.
|
||||
[^35]: Amr Saafan. [Why Sync Engines Might Be the Future of Web Applications](https://www.nilebits.com/blog/2024/09/sync-engines-future-web-applications/). *nilebits.com*, September 2024. Archived at [perma.cc/5N73-5M3V](https://perma.cc/5N73-5M3V)
|
||||
[^36]: Isaac Hagoel. [Are Sync Engines The Future of Web Applications?](https://dev.to/isaachagoel/are-sync-engines-the-future-of-web-applications-1bbi) *dev.to*, July 2024. Archived at [perma.cc/R9HF-BKKL](https://perma.cc/R9HF-BKKL)
|
||||
[^37]: Sujay Jayakar. [A Map of Sync](https://stack.convex.dev/a-map-of-sync). *stack.convex.dev*, October 2024. Archived at [perma.cc/82R3-H42A](https://perma.cc/82R3-H42A)
|
||||
[^38]: Alex Feyerke. [Designing Offline-First Web Apps](https://alistapart.com/article/offline-first/). *alistapart.com*, December 2013. Archived at [perma.cc/WH7R-S2DS](https://perma.cc/WH7R-S2DS)
|
||||
[^39]: Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. [Local-first software: You own your data, in spite of the cloud](https://www.inkandswitch.com/local-first/). At *ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software* (Onward!), October 2019, pages 154–178. [doi:10.1145/3359591.3359737](https://doi.org/10.1145/3359591.3359737)
|
||||
[^40]: Martin Kleppmann. [The past, present, and future of local-first](https://martin.kleppmann.com/2024/05/30/local-first-conference.html). At *Local-First Conference*, May 2024.
|
||||
[^41]: Conrad Hofmeyr. [API Calling is to Sync Engines as jQuery is to React](https://www.powersync.com/blog/api-calling-is-to-sync-engines-as-jquery-is-to-react). *powersync.com*, November 2024. Archived at [perma.cc/2FP9-7WJJ](https://perma.cc/2FP9-7WJJ)
|
||||
[^42]: Peter van Hardenberg and Martin Kleppmann. [PushPin: Towards Production-Quality Peer-to-Peer Collaboration](https://martin.kleppmann.com/papers/pushpin-papoc20.pdf). At *7th Workshop on Principles and Practice of Consistency for Distributed Data* (PaPoC), April 2020. [doi:10.1145/3380787.3393683](https://doi.org/10.1145/3380787.3393683)
|
||||
[^43]: Leonard Kawell, Jr., Steven Beckhardt, Timothy Halvorsen, Raymond Ozzie, and Irene Greif. [Replicated document management in a group communication system](https://dl.acm.org/doi/pdf/10.1145/62266.1024798). At *ACM Conference on Computer-Supported Cooperative Work* (CSCW), September 1988. [doi:10.1145/62266.1024798](https://doi.org/10.1145/62266.1024798)
|
||||
[^44]: Ricky Pusch. [Explaining how fighting games use delay-based and rollback netcode](https://words.infil.net/w02-netcode.html). *words.infil.net* and *arstechnica.com*, October 2019. Archived at [perma.cc/DE7W-RDJ8](https://perma.cc/DE7W-RDJ8)
|
||||
[^45]: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. [Dynamo: Amazon’s Highly Available Key-Value Store](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf). At *21st ACM Symposium on Operating Systems Principles* (SOSP), October 2007. [doi:10.1145/1323293.1294281](https://doi.org/10.1145/1323293.1294281)
|
||||
[^46]: Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. [A Comprehensive Study of Convergent and Commutative Replicated Data Types](https://inria.hal.science/inria-00555588v1/document). INRIA Research Report no. 7506, January 2011.
|
||||
[^47]: Chengzheng Sun and Clarence Ellis. [Operational Transformation in Real-Time Group Editors: Issues, Algorithms, and Achievements](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aef660812c5a9c4d3f06775f9455eeb090a4ff0f). At *ACM Conference on Computer Supported Cooperative Work* (CSCW), November 1998. [doi:10.1145/289444.289469](https://doi.org/10.1145/289444.289469)
|
||||
[^48]: Joseph Gentle and Martin Kleppmann. [Collaborative Text Editing with Eg-walker: Better, Faster, Smaller](https://arxiv.org/abs/2409.14252). At *20th European Conference on Computer Systems* (EuroSys), March 2025. [doi:10.1145/3689031.3696076](https://doi.org/10.1145/3689031.3696076)
|
||||
[^49]: Dharma Shukla. [Azure Cosmos DB: Pushing the frontier of globally distributed databases](https://azure.microsoft.com/en-us/blog/azure-cosmos-db-pushing-the-frontier-of-globally-distributed-databases/). *azure.microsoft.com*, September 2018. Archived at [perma.cc/UT3B-HH6R](https://perma.cc/UT3B-HH6R)
|
||||
[^50]: David K. Gifford. [Weighted Voting for Replicated Data](https://www.cs.cmu.edu/~15-749/READINGS/required/availability/gifford79.pdf). At *7th ACM Symposium on Operating Systems Principles* (SOSP), December 1979. [doi:10.1145/800215.806583](https://doi.org/10.1145/800215.806583)
|
||||
[^51]: Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman. [Flexible Paxos: Quorum Intersection Revisited](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2016.25). At *20th International Conference on Principles of Distributed Systems* (OPODIS), December 2016. [doi:10.4230/LIPIcs.OPODIS.2016.25](https://doi.org/10.4230/LIPIcs.OPODIS.2016.25)
|
||||
[^52]: Joseph Blomstedt. [Bringing Consistency to Riak](https://vimeo.com/51973001). At *RICON West*, October 2012.
|
||||
[^53]: Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. [Quantifying eventual consistency with PBS](http://www.bailis.org/papers/pbs-vldbj2014.pdf). *The VLDB Journal*, volume 23, pages 279–302, April 2014. [doi:10.1007/s00778-013-0330-1](https://doi.org/10.1007/s00778-013-0330-1)
|
||||
[^54]: Colin Breck. [Shared-Nothing Architectures for Server Replication and Synchronization](https://blog.colinbreck.com/shared-nothing-architectures-for-server-replication-and-synchronization/). *blog.colinbreck.com*, December 2019. Archived at [perma.cc/48P3-J6CJ](https://perma.cc/48P3-J6CJ)
|
||||
[^55]: Jeffrey Dean and Luiz André Barroso. [The Tail at Scale](https://cacm.acm.org/research/the-tail-at-scale/). *Communications of the ACM*, volume 56, issue 2, pages 74–80, February 2013. [doi:10.1145/2408776.2408794](https://doi.org/10.1145/2408776.2408794)
|
||||
[^56]: Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. [Gray Failure: The Achilles’ Heel of Cloud-Scale Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/paper-1.pdf). At *16th Workshop on Hot Topics in Operating Systems* (HotOS), May 2017. [doi:10.1145/3102980.3103005](https://doi.org/10.1145/3102980.3103005)
|
||||
[^57]: Leslie Lamport. [Time, Clocks, and the Ordering of Events in a Distributed System](https://www.microsoft.com/en-us/research/publication/time-clocks-ordering-events-distributed-system/). *Communications of the ACM*, volume 21, issue 7, pages 558–565, July 1978. [doi:10.1145/359545.359563](https://doi.org/10.1145/359545.359563)
|
||||
[^58]: D. Stott Parker Jr., Gerald J. Popek, Gerard Rudisin, Allen Stoughton, Bruce J. Walker, Evelyn Walton, Johanna M. Chow, David Edwards, Stephen Kiser, and Charles Kline. [Detection of Mutual Inconsistency in Distributed Systems](https://pages.cs.wisc.edu/~remzi/Classes/739/Papers/parker83detection.pdf). *IEEE Transactions on Software Engineering*, volume SE-9, issue 3, pages 240–247, May 1983. [doi:10.1109/TSE.1983.236733](https://doi.org/10.1109/TSE.1983.236733)
|
||||
[^59]: Nuno Preguiça, Carlos Baquero, Paulo Sérgio Almeida, Victor Fonte, and Ricardo Gonçalves. [Dotted Version Vectors: Logical Clocks for Optimistic Replication](https://arxiv.org/abs/1011.5808). arXiv:1011.5808, November 2010.
|
||||
[^60]: Giridhar Manepalli. [Clocks and Causality - Ordering Events in Distributed Systems](https://www.exhypothesi.com/clocks-and-causality/). *exhypothesi.com*, November 2022. Archived at [perma.cc/8REU-KVLQ](https://perma.cc/8REU-KVLQ)
|
||||
[^61]: Sean Cribbs. [A Brief History of Time in Riak](https://speakerdeck.com/seancribbs/a-brief-history-of-time-in-riak). At *RICON*, October 2014. Archived at [perma.cc/7U9P-6JFX](https://perma.cc/7U9P-6JFX)
|
||||
[^62]: Russell Brown. [Vector Clocks Revisited Part 2: Dotted Version Vectors](https://riak.com/posts/technical/vector-clocks-revisited-part-2-dotted-version-vectors/). *riak.com*, November 2015. Archived at [perma.cc/96QP-W98R](https://perma.cc/96QP-W98R)
|
||||
[^63]: Carlos Baquero. [Version Vectors Are Not Vector Clocks](https://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/). *haslab.wordpress.com*, July 2011. Archived at [perma.cc/7PNU-4AMG](https://perma.cc/7PNU-4AMG)
|
||||
[^64]: Reinhard Schwarz and Friedemann Mattern. [Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail](https://disco.ethz.ch/courses/hs08/seminar/papers/mattern4.pdf). *Distributed Computing*, volume 7, issue 3, pages 149–174, March 1994. [doi:10.1007/BF02277859](https://doi.org/10.1007/BF02277859)
|
||||
|
|
@ -58,7 +58,7 @@ In many other systems, partitioning is just another word for sharding.
|
|||
While *partitioning* is quite descriptive, the term *sharding* is perhaps surprising. According to
|
||||
one theory, the term arose from the online role-play game *Ultima Online*, in which a magic crystal
|
||||
was shattered into pieces, and each of those shards refracted a copy of the game world
|
||||
[[3](/en/ch7#Koster2009)].
|
||||
[^3].
|
||||
The term *shard* thus came to mean one of a set of parallel game servers, and later was carried over
|
||||
to databases. Another theory is that *shard* was originally an acronym of *System for Highly
|
||||
Available Replicated Data*—reportedly a 1980s database, details of which are lost to history.
|
||||
|
|
@ -88,7 +88,7 @@ single-shard database.
|
|||
The reason for this recommendation is that sharding often adds complexity: you typically have to
|
||||
decide which records to put in which shard by choosing a *partition key*; all records with the
|
||||
same partition key are placed in the same shard
|
||||
[[4](/en/ch7#Fidalgo2021)].
|
||||
[^4].
|
||||
This choice matters because accessing a record is fast if you know which shard it’s in, but if you
|
||||
don’t know the shard you have to do an inefficient search across all shards, and the sharding scheme
|
||||
is difficult to change.
|
||||
|
|
@ -108,10 +108,10 @@ some systems don’t support them at all.
|
|||
Some systems use sharding even on a single machine, typically running one single-threaded process
|
||||
per CPU core to make use of the parallelism in the CPU, or to take advantage of a *nonuniform memory
|
||||
access* (NUMA) architecture in which some banks of memory are closer to one CPU than to others
|
||||
[[5](/en/ch7#Drepper2007)].
|
||||
[^5].
|
||||
For example, Redis, VoltDB, and FoundationDB use one process per core, and rely on sharding to
|
||||
spread load across CPU cores in the same machine
|
||||
[[6](/en/ch7#Zhou2021_ch7)].
|
||||
[^6].
|
||||
|
||||
## Sharding for Multitenancy
|
||||
|
||||
|
|
@ -125,7 +125,7 @@ Sometimes sharding is used to implement multitenant systems: either each tenant
|
|||
shard, or multiple small tenants may be grouped together into a larger shard. These shards might be
|
||||
physically separate databases (which we previously touched on in [“Embedded storage engines”](/en/ch4#sidebar_embedded)), or
|
||||
separately manageable portions of a larger logical database
|
||||
[[7](/en/ch7#Slot2023)].
|
||||
[^7].
|
||||
Using sharding for multitenancy has several advantages:
|
||||
|
||||
Resource isolation
|
||||
|
|
@ -143,19 +143,19 @@ Cell-based architecture
|
|||
tenants are grouped into a self-contained *cell*, and different cells are set up such that they
|
||||
can run largely independently from each other. This approach provides *fault isolation*: that is,
|
||||
a fault in one cell remains limited to that cell, and tenants in other cells are not affected
|
||||
[[8](/en/ch7#Oliveira2023)].
|
||||
[^8].
|
||||
|
||||
Per-tenant backup and restore
|
||||
: Backing up each tenant’s shard separately makes it possible to restore a tenant’s state from a
|
||||
backup without affecting other tenants, which can be useful in case the tenant accidentally
|
||||
deletes or overwrites important data
|
||||
[[9](/en/ch7#Shapira2023dont)].
|
||||
[^9].
|
||||
|
||||
Regulatory compliance
|
||||
: Data privacy regulation such as the GDPR gives individuals the right to access and delete all data
|
||||
stored about them. If each person’s data is stored in a separate shard, this translates into
|
||||
simple data export and deletion operations on their shard
|
||||
[[10](/en/ch7#Schwarzkopf2019)].
|
||||
[^10].
|
||||
|
||||
Data residence
|
||||
: If a particular tenant’s data needs to be stored in a particular jurisdiction in order to comply
|
||||
|
|
@ -166,14 +166,14 @@ Gradual schema rollout
|
|||
: Schema migrations (previously discussed in [“Schema flexibility in the document model”](/en/ch3#sec_datamodels_schema_flexibility)) can be rolled
|
||||
out gradually, one tenant at a time. This reduces risk, as you can detect problems before they
|
||||
affect all tenants, but it can be difficult to do transactionally
|
||||
[[11](/en/ch7#Shapira2024)].
|
||||
[^11].
|
||||
|
||||
The main challenges around using sharding for multitenancy are:
|
||||
|
||||
* It assumes that each individual tenant is small enough to fit on a single node. If that is not the
|
||||
case, and you have a single tenant that’s too big for one machine, you would need to additionally
|
||||
perform sharding within a single tenant, which brings us back to the topic of sharding for
|
||||
scalability [[12](/en/ch7#Ganguli2020)].
|
||||
scalability [^12].
|
||||
* If you have many small tenants, then creating a separate shard for each one may incur too much
|
||||
overhead. You could group several small tenants together into a bigger shard, but then you have
|
||||
the problem of how you move tenants from one shard to another as they grow.
|
||||
|
|
@ -227,7 +227,7 @@ The shard boundaries might be chosen manually by an administrator, or the databa
|
|||
automatically. Manual key-range sharding is used by Vitess (a sharding layer for MySQL), for
|
||||
example; the automatic variant is used by Bigtable, its open source equivalent HBase, the
|
||||
range-based sharding option in MongoDB, CockroachDB, RethinkDB, and FoundationDB
|
||||
[[6](/en/ch7#Zhou2021_ch7)]. YugabyteDB offers both manual and automatic
|
||||
[^6]. YugabyteDB offers both manual and automatic
|
||||
tablet splitting.
|
||||
|
||||
Within each shard, keys are stored in sorted order (e.g., in a B-tree or SSTables, as discussed in
|
||||
|
|
@ -242,7 +242,7 @@ lot of writes to nearby keys. For example, if the key is a timestamp, then the s
|
|||
ranges of time—e.g., one shard per month. Unfortunately, if you write data from the sensors to the
|
||||
database as the measurements happen, all the writes end up going to the same shard (the one for
|
||||
this month), so that shard can be overloaded with writes while others sit idle
|
||||
[[13](/en/ch7#Lan2011)].
|
||||
[^13].
|
||||
|
||||
To avoid this problem in the sensor database, you need to use something other than the timestamp as
|
||||
the first element of the key. For example, you could prefix each timestamp with the sensor ID so
|
||||
|
|
@ -257,7 +257,7 @@ When you first set up your database, there are no key ranges to split into shard
|
|||
such as HBase and MongoDB, allow you to configure an initial set of shards on an empty database,
|
||||
which is called *pre-splitting*. This requires that you already have some idea of what the key
|
||||
distribution is going to look like, so that you can choose appropriate key range boundaries
|
||||
[[14](/en/ch7#Soztutar2013split)].
|
||||
[^14].
|
||||
|
||||
Later on, as your data volume and write throughput grow, a system with key-range sharding grows by
|
||||
splitting an existing shard into two or more smaller shards, each of which holds a contiguous
|
||||
|
|
@ -276,7 +276,7 @@ With databases that manage shard boundaries automatically, a shard split is typi
|
|||
An advantage of key-range sharding is that the number of shards adapts to the data volume. If there
|
||||
is only a small amount of data, a small number of shards is sufficient, so overheads are small; if
|
||||
there is a huge amount of data, the size of each individual shard is limited to a configurable
|
||||
maximum [[15](/en/ch7#Evans2013)].
|
||||
maximum [^15].
|
||||
|
||||
A downside of this approach is that splitting a shard is an expensive operation, since it requires
|
||||
all of its data to be rewritten into new files, similarly to a compaction in a log-structured
|
||||
|
|
@ -301,7 +301,7 @@ uses MD5, whereas Cassandra and ScyllaDB use Murmur3. Many programming languages
|
|||
functions built in (as they are used for hash tables), but they may not be suitable for sharding:
|
||||
for example, in Java’s `Object.hashCode()` and Ruby’s `Object#hash`, the same key may have a
|
||||
different hash value in different processes, making them unsuitable for sharding
|
||||
[[16](/en/ch7#Kleppmann2012hash)].
|
||||
[^16].
|
||||
|
||||
### Hash modulo number of nodes
|
||||
|
||||
|
|
@ -350,7 +350,7 @@ used for any reads and writes that happen while the transfer is in progress.
|
|||
|
||||
It’s common to choose the number of shards to be a number that is divisible by many factors, so that
|
||||
the dataset can be evenly split across various different numbers of nodes—not requiring the number
|
||||
of nodes to be a power of 2, for example [[4](/en/ch7#Fidalgo2021)].
|
||||
of nodes to be a power of 2, for example [^4].
|
||||
You can even account for mismatched hardware in your cluster: by assigning more shards to nodes that
|
||||
are more powerful, you can make those nodes take a greater share of the load.
|
||||
|
||||
|
|
@ -412,7 +412,7 @@ supports cluster keys. Clustering data not only improves range scan performance,
|
|||
improve compression and filtering performance as well.
|
||||
|
||||
Hash-range sharding is used in YugabyteDB and DynamoDB
|
||||
[[17](/en/ch7#Elhemali2022_ch7)], and is an option in MongoDB.
|
||||
[^17], and is an option in MongoDB.
|
||||
Cassandra and ScyllaDB use a variant of this approach that is illustrated in
|
||||
[Figure 7-6](/en/ch7#fig_sharding_cassandra): the space of hash values is split into a number of ranges proportional
|
||||
to the number of nodes (3 ranges per node in [Figure 7-6](/en/ch7#fig_sharding_cassandra), but actual numbers are 8
|
||||
|
|
@ -427,7 +427,7 @@ those imbalances tend to even out
|
|||
###### Figure 7-6. Cassandra and ScyllaDB split the range of possible hash values (here 0–1023) into contiguous ranges with random boundaries, and assign several ranges to each node.
|
||||
|
||||
When nodes are added or removed, range boundaries are added and removed, and shards are split or
|
||||
merged accordingly [[19](/en/ch7#Lambov2016)].
|
||||
merged accordingly [^19].
|
||||
In the example of [Figure 7-6](/en/ch7#fig_sharding_cassandra), when node 3 is added, node 1
|
||||
transfers parts of two of its ranges to node 3, and node 2 transfers part of one of its ranges to
|
||||
node 3. This has the effect of giving the new node an approximately fair share of the dataset,
|
||||
|
|
@ -447,13 +447,13 @@ the same shard as much as possible.
|
|||
|
||||
The sharding algorithm used by Cassandra and ScyllaDB is similar to the original definition of
|
||||
consistent hashing
|
||||
[[20](/en/ch7#Karger1997)],
|
||||
[^20],
|
||||
but several other consistent hashing algorithms have also been proposed
|
||||
[[21](/en/ch7#Gryski2018)],
|
||||
[^21],
|
||||
such as *highest random weight*, also known as *rendezvous hashing*
|
||||
[[22](/en/ch7#Thaler1998)],
|
||||
[^22],
|
||||
and *jump consistent hash*
|
||||
[[23](/en/ch7#Lamping2014)].
|
||||
[^23].
|
||||
With Cassandra’s algorithm, if one node is added, a small number of existing shards are split into
|
||||
sub-ranges; on the other hand, with rendezvous and jump consistent hashes, the new node is assigned
|
||||
individual keys that were previously scattered across all of the other nodes. Which one is
|
||||
|
|
@ -468,7 +468,7 @@ some keys is much higher than to others—you can still end up with some servers
|
|||
while others sit almost idle.
|
||||
|
||||
For example, on a social media site, a celebrity user with millions of followers may cause a storm
|
||||
of activity when they do something [[24](/en/ch7#Axon2010_ch7)].
|
||||
of activity when they do something [^24].
|
||||
This event can result in a large volume of reads and writes to the same key (where the partition key
|
||||
is perhaps the user ID of the celebrity, or the ID of the action that people are commenting on).
|
||||
|
||||
|
|
@ -477,7 +477,7 @@ In such situations, a more flexible sharding policy is required
|
|||
[26](/en/ch7#Lee2021)].
|
||||
A system that defines shards based on ranges of keys (or ranges of hashes) makes it possible to put
|
||||
an individual hot key in a shard by its own, and perhaps even assigning it a dedicated machine
|
||||
[[27](/en/ch7#Fritchie2018)].
|
||||
[^27].
|
||||
|
||||
It’s also possible to compensate for skew at the application level. For example, if one key is known
|
||||
to be very hot, a simple technique is to add a random number to the beginning or end of the key.
|
||||
|
|
@ -499,8 +499,8 @@ necessitating different strategies for handling them.
|
|||
|
||||
Some systems (especially cloud services designed for large scale) have automated approaches for
|
||||
dealing with hot shards; for example, Amazon calls it *heat management*
|
||||
[[28](/en/ch7#Warfield2023_ch7)]
|
||||
or *adaptive capacity* [[17](/en/ch7#Elhemali2022_ch7)].
|
||||
[^28]
|
||||
or *adaptive capacity* [^17].
|
||||
The details of how these systems work go beyond the scope of this book.
|
||||
|
||||
## Operations: Automatic or Manual Rebalancing
|
||||
|
|
@ -527,7 +527,7 @@ another. If it is not done carefully, this process can overload the network or t
|
|||
might harm the performance of other requests. The system must continue processing writes while the
|
||||
rebalancing is in progress; if a system is near its maximum write throughput, the shard-splitting
|
||||
process might not even be able to keep up with the rate of incoming writes
|
||||
[[29](/en/ch7#Houlihan2017)].
|
||||
[^29].
|
||||
|
||||
Such automation can be dangerous in combination with automatic failure detection. For example, say
|
||||
one node is overloaded and is temporarily slow to respond to requests. The other nodes conclude that
|
||||
|
|
@ -667,7 +667,7 @@ shards. Whenever you write to the database—to add, remove, or update a records
|
|||
deal with the shard that contains the record that you are writing. For that reason, this type of
|
||||
secondary index is known as a *local index*. In an information retrieval context it is also known as
|
||||
a *document-partitioned index*
|
||||
[[30](/en/ch7#Manning2008_ch7)].
|
||||
[^30].
|
||||
|
||||
When reading from a local secondary index, if you already know the partition key of the record
|
||||
you’re looking for, you can just perform the search on the appropriate shard. Moreover, if you only
|
||||
|
|
@ -685,10 +685,10 @@ shards lets you store more data, but it doesn’t increase your query throughput
|
|||
process every query anyway.
|
||||
|
||||
Nevertheless, local secondary indexes are widely used
|
||||
[[31](/en/ch7#Busch2012)]:
|
||||
for example, MongoDB, Riak, Cassandra [[32](/en/ch7#HarEl2017)],
|
||||
Elasticsearch [[33](/en/ch7#Tong2013)], SolrCloud,
|
||||
and VoltDB [[34](/en/ch7#Pavlo2013)]
|
||||
[^31]:
|
||||
for example, MongoDB, Riak, Cassandra [^32],
|
||||
Elasticsearch [^33], SolrCloud,
|
||||
and VoltDB [^34]
|
||||
all use local secondary indexes.
|
||||
|
||||
## Global Secondary Indexes
|
||||
|
|
@ -709,7 +709,7 @@ The index on the make of car is partitioned similarly (with the shard boundary b
|
|||
###### Figure 7-10. A global secondary index reflects data from all shards, and is itself sharded by the indexed value.
|
||||
|
||||
This kind of index is also called *term-partitioned*
|
||||
[[30](/en/ch7#Manning2008_ch7)]:
|
||||
[^30]:
|
||||
recall from [“Full-Text Search”](/en/ch4#sec_storage_full_text) that in full-text search, a *term* is a keyword in a text that
|
||||
you can search for. Here we generalise it to mean any value that you can search for in the secondary
|
||||
index.
|
||||
|
|
@ -728,7 +728,7 @@ certain make, or searching for multiple words occurring in the same text), it’
|
|||
terms will be assigned to different shards. To compute the logical AND of the two conditions, the
|
||||
system needs to find all the IDs that occur in both of the postings lists. That’s no problem if the
|
||||
postings lists are short, but if they are long, it can be slow to send them over the network to
|
||||
compute their intersection [[30](/en/ch7#Manning2008_ch7)].
|
||||
compute their intersection [^30].
|
||||
|
||||
Another challenge with global secondary indexes is that writes are more complicated than with local
|
||||
indexes, because writing a single record might affect multiple shards of the index (every term in
|
||||
|
|
@ -797,191 +797,41 @@ that question in the following chapters.
|
|||
|
||||
##### Footnotes
|
||||
|
||||
|
||||
##### References
|
||||
|
||||
[[1](/en/ch7#Giordano2023-marker)] Claire Giordano.
|
||||
[Understanding
|
||||
partitioning and sharding in Postgres and Citus](https://www.citusdata.com/blog/2023/08/04/understanding-partitioning-and-sharding-in-postgres-and-citus/). *citusdata.com*, August 2023.
|
||||
Archived at [perma.cc/8BTK-8959](https://perma.cc/8BTK-8959)
|
||||
|
||||
[[2](/en/ch7#Leach2022-marker)] Brandur Leach.
|
||||
[Partitioning in Postgres, 2022
|
||||
edition](https://brandur.org/fragments/postgres-partitioning-2022). *brandur.org*, October 2022.
|
||||
Archived at [perma.cc/Z5LE-6AKX](https://perma.cc/Z5LE-6AKX)
|
||||
|
||||
[[3](/en/ch7#Koster2009-marker)] Raph Koster.
|
||||
[Database “sharding”
|
||||
came from UO?](https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/) *raphkoster.com*, January 2009.
|
||||
Archived at [perma.cc/4N9U-5KYF](https://perma.cc/4N9U-5KYF)
|
||||
|
||||
[[4](/en/ch7#Fidalgo2021-marker)] Garrett Fidalgo.
|
||||
[Herding elephants: Lessons learned
|
||||
from sharding Postgres at Notion](https://www.notion.com/blog/sharding-postgres-at-notion). *notion.com*, October 2021.
|
||||
Archived at [perma.cc/5J5V-W2VX](https://perma.cc/5J5V-W2VX)
|
||||
|
||||
[[5](/en/ch7#Drepper2007-marker)] Ulrich Drepper.
|
||||
[What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf).
|
||||
*akkadia.org*, November 2007. Archived at
|
||||
[perma.cc/NU6Q-DRXZ](https://perma.cc/NU6Q-DRXZ)
|
||||
|
||||
[[6](/en/ch7#Zhou2021_ch7-marker)] Jingyu Zhou, Meng Xu, Alexander Shraer, Bala
|
||||
Namasivayam, Alex Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears, John Leach,
|
||||
Dave Rosenthal, Xin Dong, Will Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin
|
||||
Moore, Bhaskar Muppana, Xiaoge Su, and Vishesh Yadav.
|
||||
[FoundationDB: A Distributed Unbundled
|
||||
Transactional Key Value Store](https://www.foundationdb.org/files/fdb-paper.pdf). At *ACM International Conference on Management of Data*
|
||||
(SIGMOD), June 2021.
|
||||
[doi:10.1145/3448016.3457559](https://doi.org/10.1145/3448016.3457559)
|
||||
|
||||
[[7](/en/ch7#Slot2023-marker)] Marco Slot.
|
||||
[Citus 12:
|
||||
Schema-based sharding for PostgreSQL](https://www.citusdata.com/blog/2023/07/18/citus-12-schema-based-sharding-for-postgres/). *citusdata.com*, July 2023.
|
||||
Archived at [perma.cc/R874-EC9W](https://perma.cc/R874-EC9W)
|
||||
|
||||
[[8](/en/ch7#Oliveira2023-marker)] Robisson Oliveira.
|
||||
[Reducing
|
||||
the Scope of Impact with Cell-Based Architecture](https://docs.aws.amazon.com/pdfs/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/reducing-scope-of-impact-with-cell-based-architecture.pdf). AWS Well-Architected white paper, Amazon Web
|
||||
Services, September 2023.
|
||||
Archived at [perma.cc/4KWW-47NR](https://perma.cc/4KWW-47NR)
|
||||
|
||||
[[9](/en/ch7#Shapira2023dont-marker)] Gwen Shapira.
|
||||
[Things DBs Don’t Do - But Should](https://www.thenile.dev/blog/things-dbs-dont-do).
|
||||
*thenile.dev*, February 2023.
|
||||
Archived at [perma.cc/C3J4-JSFW](https://perma.cc/C3J4-JSFW)
|
||||
|
||||
[[10](/en/ch7#Schwarzkopf2019-marker)] Malte Schwarzkopf, Eddie Kohler, M. Frans
|
||||
Kaashoek, and Robert Morris.
|
||||
[Position: GDPR
|
||||
Compliance by Construction](https://cs.brown.edu/people/malte/pub/papers/2019-poly-gdpr.pdf). At *Towards Polystores that manage multiple Databases, Privacy,
|
||||
Security and/or Policy Issues for Heterogenous Data* (Poly), August 2019.
|
||||
[doi:10.1007/978-3-030-33752-0\_3](https://doi.org/10.1007/978-3-030-33752-0_3)
|
||||
|
||||
[[11](/en/ch7#Shapira2024-marker)] Gwen Shapira.
|
||||
[Introducing pg\_karnak: Transactional schema
|
||||
migration across tenant databases](https://www.thenile.dev/blog/distributed-ddl). *thenile.dev*, November 2024.
|
||||
Archived at [perma.cc/R5RD-8HR9](https://perma.cc/R5RD-8HR9)
|
||||
|
||||
[[12](/en/ch7#Ganguli2020-marker)] Arka Ganguli, Guido Iaquinti,
|
||||
Maggie Zhou, and Rafael Chacón.
|
||||
[Scaling Datastores at
|
||||
Slack with Vitess](https://slack.engineering/scaling-datastores-at-slack-with-vitess/). *slack.engineering*, December 2020.
|
||||
Archived at [perma.cc/UW8F-ALJK](https://perma.cc/UW8F-ALJK)
|
||||
|
||||
[[13](/en/ch7#Lan2011-marker)] Ikai Lan.
|
||||
[App
|
||||
Engine Datastore Tip: Monotonically Increasing Values Are Bad](https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/). *ikaisays.com*,
|
||||
January 2011. Archived at [perma.cc/BPX8-RPJB](https://perma.cc/BPX8-RPJB)
|
||||
|
||||
[[14](/en/ch7#Soztutar2013split-marker)] Enis Soztutar.
|
||||
[Apache
|
||||
HBase Region Splitting and Merging](https://www.cloudera.com/blog/technical/apache-hbase-region-splitting-and-merging.html). *cloudera.com*, February 2013.
|
||||
Archived at [perma.cc/S9HS-2X2C](https://perma.cc/S9HS-2X2C)
|
||||
|
||||
[[15](/en/ch7#Evans2013-marker)] Eric Evans.
|
||||
[Rethinking Topology in Cassandra](https://www.youtube.com/watch?v=Qz6ElTdYjjU). At
|
||||
*Cassandra Summit*, June 2013.
|
||||
Archived at [perma.cc/2DKM-F438](https://perma.cc/2DKM-F438)
|
||||
|
||||
[[16](/en/ch7#Kleppmann2012hash-marker)] Martin Kleppmann.
|
||||
[Java’s
|
||||
hashCode Is Not Safe for Distributed Systems](https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html). *martin.kleppmann.com*, June 2012.
|
||||
Archived at [perma.cc/LK5U-VZSN](https://perma.cc/LK5U-VZSN)
|
||||
|
||||
[[17](/en/ch7#Elhemali2022_ch7-marker)] Mostafa Elhemali, Niall Gallagher, Nicholas
|
||||
Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu
|
||||
Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul,
|
||||
Doug Terry, and Akshat Vig.
|
||||
[Amazon DynamoDB: A Scalable,
|
||||
Predictably Performant, and Fully Managed NoSQL Database Service](https://www.usenix.org/conference/atc22/presentation/elhemali). At *USENIX Annual Technical
|
||||
Conference* (ATC), July 2022.
|
||||
|
||||
[[18](/en/ch7#Williams2012-marker)] Brandon Williams.
|
||||
[Virtual Nodes in Cassandra
|
||||
1.2](https://www.datastax.com/blog/virtual-nodes-cassandra-12). *datastax.com*, December 2012.
|
||||
Archived at [perma.cc/N385-EQXV](https://perma.cc/N385-EQXV)
|
||||
|
||||
[[19](/en/ch7#Lambov2016-marker)] Branimir Lambov.
|
||||
[New Token
|
||||
Allocation Algorithm in Cassandra 3.0](https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30). *datastax.com*, January 2016.
|
||||
Archived at [perma.cc/2BG7-LDWY](https://perma.cc/2BG7-LDWY)
|
||||
|
||||
[[20](/en/ch7#Karger1997-marker)] David Karger, Eric Lehman, Tom Leighton, Rina
|
||||
Panigrahy, Matthew Levine, and Daniel Lewin.
|
||||
[Consistent Hashing and Random Trees:
|
||||
Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](https://people.csail.mit.edu/karger/Papers/web.pdf).
|
||||
At *29th Annual ACM Symposium on Theory of Computing* (STOC), May 1997.
|
||||
[doi:10.1145/258533.258660](https://doi.org/10.1145/258533.258660)
|
||||
|
||||
[[21](/en/ch7#Gryski2018-marker)] Damian Gryski.
|
||||
[Consistent
|
||||
Hashing: Algorithmic Tradeoffs](https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8). *dgryski.medium.com*, April 2018.
|
||||
Archived at [perma.cc/B2WF-TYQ8](https://perma.cc/B2WF-TYQ8)
|
||||
|
||||
[[22](/en/ch7#Thaler1998-marker)] David G. Thaler and Chinya V. Ravishankar.
|
||||
[Using name-based mappings to increase
|
||||
hit rates](https://www.cs.kent.edu/~javed/DL/web/p1-thaler.pdf). *IEEE/ACM Transactions on Networking*, volume 6, issue 1, pages 1–14, February 1998.
|
||||
[doi:10.1109/90.663936](https://doi.org/10.1109/90.663936)
|
||||
|
||||
[[23](/en/ch7#Lamping2014-marker)] John Lamping and Eric Veach.
|
||||
[A Fast, Minimal Memory, Consistent Hash
|
||||
Algorithm](https://arxiv.org/abs/1406.2294). *arxiv.org*, June 2014.
|
||||
|
||||
[[24](/en/ch7#Axon2010_ch7-marker)] Samuel Axon.
|
||||
[3% of Twitter’s Servers
|
||||
Dedicated to Justin Bieber](https://mashable.com/archive/justin-bieber-twitter). *mashable.com*, September 2010.
|
||||
Archived at [perma.cc/F35N-CGVX](https://perma.cc/F35N-CGVX)
|
||||
|
||||
[[25](/en/ch7#Guo2020-marker)] Gerald Guo and Thawan Kooburat.
|
||||
[Scaling
|
||||
services with Shard Manager](https://engineering.fb.com/2020/08/24/production-engineering/scaling-services-with-shard-manager/). *engineering.fb.com*, August 2020.
|
||||
Archived at [perma.cc/EFS3-XQYT](https://perma.cc/EFS3-XQYT)
|
||||
|
||||
[[26](/en/ch7#Lee2021-marker)] Sangmin Lee, Zhenhua Guo, Omer Sunercan, Jun Ying, Thawan
|
||||
Kooburat, Suryadeep Biswal, Jun Chen, Kun Huang, Yatpang Cheung, Yiding Zhou, Kaushik Veeraraghavan,
|
||||
Biren Damani, Pol Mauri Ruiz, Vikas Mehta, and Chunqiang Tang.
|
||||
[Shard Manager: A Generic Shard
|
||||
Management Framework for Geo-distributed Applications](https://dl.acm.org/doi/pdf/10.1145/3477132.3483546). *28th ACM SIGOPS Symposium on
|
||||
Operating Systems Principles* (SOSP), pages 553–569, October 2021.
|
||||
[doi:10.1145/3477132.3483546](https://doi.org/10.1145/3477132.3483546)
|
||||
|
||||
[[27](/en/ch7#Fritchie2018-marker)] Scott Lystig Fritchie.
|
||||
[A Critique of Resizable Hash
|
||||
Tables: Riak Core & Random Slicing](https://www.infoq.com/articles/dynamo-riak-random-slicing/). *infoq.com*, August 2018.
|
||||
Archived at [perma.cc/RPX7-7BLN](https://perma.cc/RPX7-7BLN)
|
||||
|
||||
[[28](/en/ch7#Warfield2023_ch7-marker)] Andy Warfield.
|
||||
[Building
|
||||
and operating a pretty big storage system called S3](https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html). *allthingsdistributed.com*, July 2023.
|
||||
Archived at [perma.cc/6S7P-GLM4](https://perma.cc/6S7P-GLM4)
|
||||
|
||||
[[29](/en/ch7#Houlihan2017-marker)] Rich Houlihan.
|
||||
[DynamoDB adaptive capacity: smooth performance
|
||||
for chaotic workloads (DAT327)](https://www.youtube.com/watch?v=kMY0_m29YzU). At *AWS re:Invent*, November 2017.
|
||||
|
||||
[[30](/en/ch7#Manning2008_ch7-marker)] Christopher D. Manning, Prabhakar Raghavan,
|
||||
and Hinrich Schütze.
|
||||
[*Introduction to Information Retrieval*](https://nlp.stanford.edu/IR-book/).
|
||||
Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at
|
||||
[nlp.stanford.edu/IR-book](https://nlp.stanford.edu/IR-book/)
|
||||
|
||||
[[31](/en/ch7#Busch2012-marker)] Michael Busch, Krishna Gade, Brian Larson, Patrick
|
||||
Lok, Samuel Luckenbill, and Jimmy Lin.
|
||||
[Earlybird:
|
||||
Real-Time Search at Twitter](https://cs.uwaterloo.ca/~jimmylin/publications/Busch_etal_ICDE2012.pdf). At *28th IEEE International Conference on Data Engineering*
|
||||
(ICDE), April 2012.
|
||||
[doi:10.1109/ICDE.2012.149](https://doi.org/10.1109/ICDE.2012.149)
|
||||
|
||||
[[32](/en/ch7#HarEl2017-marker)] Nadav Har’El.
|
||||
[Indexing in Cassandra 3](https://github.com/scylladb/scylladb/wiki/Indexing-in-Cassandra-3).
|
||||
*github.com*, April 2017.
|
||||
Archived at [perma.cc/3ENV-8T9P](https://perma.cc/3ENV-8T9P)
|
||||
|
||||
[[33](/en/ch7#Tong2013-marker)] Zachary Tong.
|
||||
[Customizing Your
|
||||
Document Routing](https://www.elastic.co/blog/customizing-your-document-routing/). *elastic.co*, June 2013.
|
||||
Archived at [perma.cc/97VM-MREN](https://perma.cc/97VM-MREN)
|
||||
|
||||
[[34](/en/ch7#Pavlo2013-marker)] Andrew Pavlo.
|
||||
[H-Store Frequently Asked Questions](https://hstore.cs.brown.edu/documentation/faq/).
|
||||
*hstore.cs.brown.edu*, October 2013.
|
||||
Archived at [perma.cc/X3ZA-DW6Z](https://perma.cc/X3ZA-DW6Z)
|
||||
[^1]: Claire Giordano. [Understanding partitioning and sharding in Postgres and Citus](https://www.citusdata.com/blog/2023/08/04/understanding-partitioning-and-sharding-in-postgres-and-citus/). *citusdata.com*, August 2023. Archived at [perma.cc/8BTK-8959](https://perma.cc/8BTK-8959)
|
||||
[^2]: Brandur Leach. [Partitioning in Postgres, 2022 edition](https://brandur.org/fragments/postgres-partitioning-2022). *brandur.org*, October 2022. Archived at [perma.cc/Z5LE-6AKX](https://perma.cc/Z5LE-6AKX)
|
||||
[^3]: Raph Koster. [Database “sharding” came from UO?](https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/) *raphkoster.com*, January 2009. Archived at [perma.cc/4N9U-5KYF](https://perma.cc/4N9U-5KYF)
|
||||
[^4]: Garrett Fidalgo. [Herding elephants: Lessons learned from sharding Postgres at Notion](https://www.notion.com/blog/sharding-postgres-at-notion). *notion.com*, October 2021. Archived at [perma.cc/5J5V-W2VX](https://perma.cc/5J5V-W2VX)
|
||||
[^5]: Ulrich Drepper. [What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf). *akkadia.org*, November 2007. Archived at [perma.cc/NU6Q-DRXZ](https://perma.cc/NU6Q-DRXZ)
|
||||
[^6]: Jingyu Zhou, Meng Xu, Alexander Shraer, Bala Namasivayam, Alex Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears, John Leach, Dave Rosenthal, Xin Dong, Will Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin Moore, Bhaskar Muppana, Xiaoge Su, and Vishesh Yadav. [FoundationDB: A Distributed Unbundled Transactional Key Value Store](https://www.foundationdb.org/files/fdb-paper.pdf). At *ACM International Conference on Management of Data* (SIGMOD), June 2021. [doi:10.1145/3448016.3457559](https://doi.org/10.1145/3448016.3457559)
|
||||
[^7]: Marco Slot. [Citus 12: Schema-based sharding for PostgreSQL](https://www.citusdata.com/blog/2023/07/18/citus-12-schema-based-sharding-for-postgres/). *citusdata.com*, July 2023. Archived at [perma.cc/R874-EC9W](https://perma.cc/R874-EC9W)
|
||||
[^8]: Robisson Oliveira. [Reducing the Scope of Impact with Cell-Based Architecture](https://docs.aws.amazon.com/pdfs/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/reducing-scope-of-impact-with-cell-based-architecture.pdf). AWS Well-Architected white paper, Amazon Web Services, September 2023. Archived at [perma.cc/4KWW-47NR](https://perma.cc/4KWW-47NR)
|
||||
[^9]: Gwen Shapira. [Things DBs Don’t Do - But Should](https://www.thenile.dev/blog/things-dbs-dont-do). *thenile.dev*, February 2023. Archived at [perma.cc/C3J4-JSFW](https://perma.cc/C3J4-JSFW)
|
||||
[^10]: Malte Schwarzkopf, Eddie Kohler, M. Frans Kaashoek, and Robert Morris. [Position: GDPR Compliance by Construction](https://cs.brown.edu/people/malte/pub/papers/2019-poly-gdpr.pdf). At *Towards Polystores that manage multiple Databases, Privacy, Security and/or Policy Issues for Heterogenous Data* (Poly), August 2019. [doi:10.1007/978-3-030-33752-0\_3](https://doi.org/10.1007/978-3-030-33752-0_3)
|
||||
[^11]: Gwen Shapira. [Introducing pg\_karnak: Transactional schema migration across tenant databases](https://www.thenile.dev/blog/distributed-ddl). *thenile.dev*, November 2024. Archived at [perma.cc/R5RD-8HR9](https://perma.cc/R5RD-8HR9)
|
||||
[^12]: Arka Ganguli, Guido Iaquinti, Maggie Zhou, and Rafael Chacón. [Scaling Datastores at Slack with Vitess](https://slack.engineering/scaling-datastores-at-slack-with-vitess/). *slack.engineering*, December 2020. Archived at [perma.cc/UW8F-ALJK](https://perma.cc/UW8F-ALJK)
|
||||
[^13]: Ikai Lan. [App Engine Datastore Tip: Monotonically Increasing Values Are Bad](https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/). *ikaisays.com*, January 2011. Archived at [perma.cc/BPX8-RPJB](https://perma.cc/BPX8-RPJB)
|
||||
[^14]: Enis Soztutar. [Apache HBase Region Splitting and Merging](https://www.cloudera.com/blog/technical/apache-hbase-region-splitting-and-merging.html). *cloudera.com*, February 2013. Archived at [perma.cc/S9HS-2X2C](https://perma.cc/S9HS-2X2C)
|
||||
[^15]: Eric Evans. [Rethinking Topology in Cassandra](https://www.youtube.com/watch?v=Qz6ElTdYjjU). At *Cassandra Summit*, June 2013. Archived at [perma.cc/2DKM-F438](https://perma.cc/2DKM-F438)
|
||||
[^16]: Martin Kleppmann. [Java’s hashCode Is Not Safe for Distributed Systems](https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html). *martin.kleppmann.com*, June 2012. Archived at [perma.cc/LK5U-VZSN](https://perma.cc/LK5U-VZSN)
|
||||
[^17]: Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, and Akshat Vig. [Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service](https://www.usenix.org/conference/atc22/presentation/elhemali). At *USENIX Annual Technical Conference* (ATC), July 2022.
|
||||
[^18]: Brandon Williams. [Virtual Nodes in Cassandra 1.2](https://www.datastax.com/blog/virtual-nodes-cassandra-12). *datastax.com*, December 2012. Archived at [perma.cc/N385-EQXV](https://perma.cc/N385-EQXV)
|
||||
[^19]: Branimir Lambov. [New Token Allocation Algorithm in Cassandra 3.0](https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30). *datastax.com*, January 2016. Archived at [perma.cc/2BG7-LDWY](https://perma.cc/2BG7-LDWY)
|
||||
[^20]: David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. [Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](https://people.csail.mit.edu/karger/Papers/web.pdf). At *29th Annual ACM Symposium on Theory of Computing* (STOC), May 1997. [doi:10.1145/258533.258660](https://doi.org/10.1145/258533.258660)
|
||||
[^21]: Damian Gryski. [Consistent Hashing: Algorithmic Tradeoffs](https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8). *dgryski.medium.com*, April 2018. Archived at [perma.cc/B2WF-TYQ8](https://perma.cc/B2WF-TYQ8)
|
||||
[^22]: David G. Thaler and Chinya V. Ravishankar. [Using name-based mappings to increase hit rates](https://www.cs.kent.edu/~javed/DL/web/p1-thaler.pdf). *IEEE/ACM Transactions on Networking*, volume 6, issue 1, pages 1–14, February 1998. [doi:10.1109/90.663936](https://doi.org/10.1109/90.663936)
|
||||
[^23]: John Lamping and Eric Veach. [A Fast, Minimal Memory, Consistent Hash Algorithm](https://arxiv.org/abs/1406.2294). *arxiv.org*, June 2014.
|
||||
[^24]: Samuel Axon. [3% of Twitter’s Servers Dedicated to Justin Bieber](https://mashable.com/archive/justin-bieber-twitter). *mashable.com*, September 2010. Archived at [perma.cc/F35N-CGVX](https://perma.cc/F35N-CGVX)
|
||||
[^25]: Gerald Guo and Thawan Kooburat. [Scaling services with Shard Manager](https://engineering.fb.com/2020/08/24/production-engineering/scaling-services-with-shard-manager/). *engineering.fb.com*, August 2020. Archived at [perma.cc/EFS3-XQYT](https://perma.cc/EFS3-XQYT)
|
||||
[^26]: Sangmin Lee, Zhenhua Guo, Omer Sunercan, Jun Ying, Thawan Kooburat, Suryadeep Biswal, Jun Chen, Kun Huang, Yatpang Cheung, Yiding Zhou, Kaushik Veeraraghavan, Biren Damani, Pol Mauri Ruiz, Vikas Mehta, and Chunqiang Tang. [Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications](https://dl.acm.org/doi/pdf/10.1145/3477132.3483546). *28th ACM SIGOPS Symposium on Operating Systems Principles* (SOSP), pages 553–569, October 2021. [doi:10.1145/3477132.3483546](https://doi.org/10.1145/3477132.3483546)
|
||||
[^27]: Scott Lystig Fritchie. [A Critique of Resizable Hash Tables: Riak Core & Random Slicing](https://www.infoq.com/articles/dynamo-riak-random-slicing/). *infoq.com*, August 2018. Archived at [perma.cc/RPX7-7BLN](https://perma.cc/RPX7-7BLN)
|
||||
[^28]: Andy Warfield. [Building and operating a pretty big storage system called S3](https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html). *allthingsdistributed.com*, July 2023. Archived at [perma.cc/6S7P-GLM4](https://perma.cc/6S7P-GLM4)
|
||||
[^29]: Rich Houlihan. [DynamoDB adaptive capacity: smooth performance for chaotic workloads (DAT327)](https://www.youtube.com/watch?v=kMY0_m29YzU). At *AWS re:Invent*, November 2017.
|
||||
[^30]: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. [*Introduction to Information Retrieval*](https://nlp.stanford.edu/IR-book/). Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at [nlp.stanford.edu/IR-book](https://nlp.stanford.edu/IR-book/)
|
||||
[^31]: Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. [Earlybird: Real-Time Search at Twitter](https://cs.uwaterloo.ca/~jimmylin/publications/Busch_etal_ICDE2012.pdf). At *28th IEEE International Conference on Data Engineering* (ICDE), April 2012. [doi:10.1109/ICDE.2012.149](https://doi.org/10.1109/ICDE.2012.149)
|
||||
[^32]: Nadav Har’El. [Indexing in Cassandra 3](https://github.com/scylladb/scylladb/wiki/Indexing-in-Cassandra-3). *github.com*, April 2017. Archived at [perma.cc/3ENV-8T9P](https://perma.cc/3ENV-8T9P)
|
||||
[^33]: Zachary Tong. [Customizing Your Document Routing](https://www.elastic.co/blog/customizing-your-document-routing/). *elastic.co*, June 2013. Archived at [perma.cc/97VM-MREN](https://perma.cc/97VM-MREN)
|
||||
[^34]: Andrew Pavlo. [H-Store Frequently Asked Questions](https://hstore.cs.brown.edu/documentation/faq/). *hstore.cs.brown.edu*, October 2013. Archived at [perma.cc/X3ZA-DW6Z](https://perma.cc/X3ZA-DW6Z)
|
||||
File diff suppressed because it is too large
Load diff
1062
content/en/ch9.md
1062
content/en/ch9.md
File diff suppressed because it is too large
Load diff
30
hugo.yaml
30
hugo.yaml
|
|
@ -1,6 +1,6 @@
|
|||
baseURL: 'https://ddia.vonng.com/'
|
||||
languageCode: 'zh-CN'
|
||||
title: '设计数据密集型应用'
|
||||
title: '设计数据密集型应用第二版'
|
||||
|
||||
enableRobotsTXT: true
|
||||
# Parse Git commit
|
||||
|
|
@ -28,7 +28,7 @@ languages:
|
|||
languageCode: zh
|
||||
contentDir: content/zh
|
||||
weight: 1
|
||||
title: 设计数据密集型应用
|
||||
title: 设计数据密集型应用(第二版)
|
||||
v2:
|
||||
languageName: 第二版
|
||||
languageCode: v2
|
||||
|
|
@ -40,27 +40,29 @@ languages:
|
|||
languageCode: tw
|
||||
contentDir: content/tw
|
||||
weight: 3
|
||||
title: 設計資料密集型應用
|
||||
title: 設計資料密集型應用(第二版)
|
||||
en:
|
||||
languageName: English
|
||||
languageCode: en
|
||||
contentDir: content/en
|
||||
weight: 4
|
||||
title: Designing Data-Intensive Applications
|
||||
|
||||
title: Designing Data-Intensive Applications 2nd Edition
|
||||
|
||||
markup:
|
||||
highlight:
|
||||
noClasses: false
|
||||
goldmark:
|
||||
renderer:
|
||||
unsafe: true
|
||||
extensions:
|
||||
passthrough:
|
||||
delimiters:
|
||||
block: [['\[', '\]'], ['$$', '$$']]
|
||||
inline: [['\(', '\)']]
|
||||
enable: true
|
||||
footnote: true # 开启脚注语法:[^id] / [^id]: text
|
||||
linkify: true # 自动将 URL 文本转为链接
|
||||
table: true # 启用 Markdown 表格
|
||||
taskList: true # 启用任务列表 [ ] / [x]
|
||||
typographer: true # 智能排版(引号、破折号等)
|
||||
parser:
|
||||
attribute: true # 允许在标题后写 {#id .class key=val},用于显式锚点
|
||||
autoHeadingID: true # 为标题自动生成 ID(手写 {#id} 会覆盖自动生成)
|
||||
autoHeadingIDType: github # 自动 ID 规则:github / blackfriday / none
|
||||
tableOfContents:
|
||||
startLevel: 2 # ToC 从 h2 开始
|
||||
endLevel: 4 # ToC 到 h4 结束
|
||||
|
||||
menu:
|
||||
main:
|
||||
|
|
|
|||
Loading…
Reference in a new issue