--- title: Indexes weight: 550 breadcrumbs: false --- ### Symbols - 3FS (distributed filesystem, [Distributed Filesystems](/en/ch11#sec_batch_dfs) ### A - aborts (transactions), [Transactions](/en/ch8#ch_transactions), [Atomicity](/en/ch8#sec_transactions_acid_atomicity) - cascading, [No dirty reads](/en/ch8#no-dirty-reads) - in two-phase commit, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - performance of optimistic concurrency control, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - retrying aborted transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - abstraction, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Simplicity: Managing Complexity](/en/ch2#id38), [Data Models and Query Languages](/en/ch3#ch_datamodels), [Transactions](/en/ch8#ch_transactions), [Summary](/en/ch8#summary) - accidental complexity, [Simplicity: Managing Complexity](/en/ch2#id38) - accountability, [Responsibility and Accountability](/en/ch14#id371) - accounting (financial data), [Summary](/en/ch3#summary), [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - Accumulo (database) - wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression) - ACID properties (transactions), [The Meaning of ACID](/en/ch8#sec_transactions_acid) - atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Maintaining integrity in the face of software bugs](/en/ch13#id455) - durability, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability) - isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - acknowledgements (messaging), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - active/active replication (see multi-leader replication) - active/passive replication (see leader-based replication) - ActiveMQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - ActiveRecord (object-relational mapper), [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - activity (workflows) (see workflow engines) - actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - (see also event-driven architecture) - comparison to stream processing, [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc) - adaptive capacity, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - Advanced Message Queuing Protocol (see AMQP) - aerospace systems, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - Aerospike (database) - strong consistency mode, [Single-object writes](/en/ch8#sec_transactions_single_object) - AGE (graph database), [The Cypher Query Language](/en/ch3#id57) - aggregation - data cubes and materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - in batch processes, [Sorting Versus In-memory Aggregation](/en/ch11#id275) - in stream processes, [Stream analytics](/en/ch12#id318) - aggregation pipeline (MongoDB), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Query languages for documents](/en/ch3#query-languages-for-documents) - Agile, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability) - minimizing irreversibility, [Batch Processing](/en/ch11#ch_batch), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - moving faster with confidence, [The end-to-end argument again](/en/ch13#id456) - agreement, [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - (see also consensus) - AI (artificial intelligence) (see machine learning) - AI Act (European Union), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - AirByte, [Data Warehousing](/en/ch1#sec_introduction_dwh) - Airflow (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows) - cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges) - use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - Akamai - response time study, [Average, Median, and Percentiles](/en/ch2#id24) - algorithms - algorithm correctness, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm) - B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants) - for distributed systems, [System Model and Reality](/en/ch9#sec_distributed_system_model) - mergesort, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Shuffling Data](/en/ch11#sec_shuffle) - scheduling, [Resource Allocation](/en/ch11#id279) - SSTables and LSM-trees, [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - all-to-all replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies) - AllegroGraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language) - ALTER TABLE statement (SQL), [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding) - Amazon - Dynamo (see Dynamo (database)) - response time study, [Average, Median, and Percentiles](/en/ch2#id24) - Amazon Web Services (AWS) - Aurora (see Aurora (cloud database)) - ClockBound (see ClockBound (time sync)) - correctness testing, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal) - DynamoDB (see DynamoDB (database)) - EBS (see EBS (virtual block device)) - Kinesis (see Kinesis (messaging)) - Neptune (see Neptune (graph database)) - network reliability, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - S3 (see S3 (object storage)) - amplification - of bias, [Bias and Discrimination](/en/ch14#id370) - of failures, [Maintaining derived state](/en/ch13#id446) - of tail latency, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Local Secondary Indexes](/en/ch7#id166) - write amplification, [Write amplification](/en/ch4#write-amplification) - AMQP (Advanced Message Queuing Protocol), [Message brokers compared to databases](/en/ch12#id297) - (see also messaging systems) - comparison to log-based messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay) - message ordering, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - analytical systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - as derived data systems, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - ETL from operational systems, [Data Warehousing](/en/ch1#sec_introduction_dwh) - governance, [Beyond the data lake](/en/ch1#beyond-the-data-lake) - analytics, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)-[Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - comparison to transaction processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - data normalization, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization) - data warehousing (see data warehousing) - predictive (see predictive analytics) - relation to batch processing, [Analytics](/en/ch11#sec_batch_olap)-[Analytics](/en/ch11#sec_batch_olap) - schemas for, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - snapshot isolation for queries, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - stream analytics, [Stream analytics](/en/ch12#id318) - analytics engineering, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - anti-entropy, [Catching up on missed writes](/en/ch6#sec_replication_read_repair) - Antithesis (deterministic simulation testing), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - Apache Accumulo (see Accumulo) - Apache ActiveMQ (see ActiveMQ) - Apache AGE (see AGE) - Apache Arrow (see Arrow (data format)) - Apache Avro (see Avro) - Apache Beam (see Beam) - Apache BookKeeper (see BookKeeper) - Apache Cassandra (see Cassandra) - Apache Curator (see Curator) - Apache DataFusion (see DataFusion (query engine)) - Apache Druid (see Druid (database)) - Apache Flink (see Flink (processing framework)) - Apache HBase (see HBase) - Apache Iceberg (see Iceberg (table format)) - Apache Jena (see Jena) - Apache Kafka (see Kafka) - Apache Lucene (see Lucene) - Apache Oozie (see Oozie (workflow scheduler)) - Apache ORC (see ORC (data format)) - Apache Parquet (see Parquet (data format)) - Apache Pig (query language), [Query languages](/en/ch11#sec_batch_query_lanauges) - Apache Pinot (see Pinot (database)) - Apache Pulsar (see Pulsar) - Apache Qpid (see Qpid) - Apache Samza (see Samza) - Apache Solr (see Solr) - Apache Spark (see Spark) (see Spark (processing framework)) - Apache Storm (see Storm) - Apache Superset (see Superset (data visualization software)) - Apache Thrift (see Thrift) - Apache ZooKeeper (see ZooKeeper) - Apama (stream analytics), [Complex event processing](/en/ch12#id317) - append-only files (see logs) - Application Programming Interfaces (APIs), [Data Models and Query Languages](/en/ch3#ch_datamodels) - for change streams, [API support for change streams](/en/ch12#sec_stream_change_api) - for distributed transactions, [XA transactions](/en/ch8#xa-transactions) - for services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - (see also services) - evolvability, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - RESTful, [Web services](/en/ch5#sec_web_services) - application state (see state) - approximate search (see similarity search) - archival storage, data from databases, [Archival storage](/en/ch5#archival-storage) - arcs (see edges) - ArcticDB (database), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - arithmetic mean, [Average, Median, and Percentiles](/en/ch2#id24) - arrays - array databases, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - multidimensional, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - Arrow (data format), [Column-Oriented Storage](/en/ch4#sec_storage_column), [DataFrames](/en/ch11#id287) - artificial intelligence (see machine learning) - ASCII text, [Protocol Buffers](/en/ch5#sec_encoding_protobuf) - ASN.1 (schema language), [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - associative table, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Property Graphs](/en/ch3#id56) - asynchronous networks, [Unreliable Networks](/en/ch9#sec_distributed_networks), [Glossary](/en/glossary) - comparison to synchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - system model, [System Model and Reality](/en/ch9#sec_distributed_system_model) - asynchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async), [Glossary](/en/glossary) - data loss on failover, [Leader failure: Failover](/en/ch6#leader-failure-failover) - reads from asynchronous follower, [Problems with Replication Lag](/en/ch6#sec_replication_lag) - with multiple leaders, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader) - Asynchronous Transfer Mode (ATM), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - atomic broadcast, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs) - atomic clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - (see also clocks) - atomicity (concurrency), [Glossary](/en/glossary) - atomic increment, [Single-object writes](/en/ch8#sec_transactions_single_object) - compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - (see also compare-and-set (CAS)) - denormalized data, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization) - fetch-and-add/increment, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical), [Consensus](/en/ch10#sec_consistency_consensus), [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus) - write operations, [Atomic write operations](/en/ch8#atomic-write-operations) - atomicity (transactions), [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [Glossary](/en/glossary) - atomic commit - avoiding, [Multi-shard request processing](/en/ch13#id360), [Coordination-avoiding data systems](/en/ch13#id454) - blocking and nonblocking, [Three-phase commit](/en/ch8#three-phase-commit) - in stream processing, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) - maintaining derived data, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - distributed transactions, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - for multi-object transactions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - for single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object) - relation to consensus, [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - auditability, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366) - designing for, [Designing for auditability](/en/ch13#id365) - self-auditing systems, [Don't just blindly trust what they promise](/en/ch13#id364) - through immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - tools for auditable data systems, [Tools for auditable data systems](/en/ch13#id366) - Aurora (cloud database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - Aurora DSQL (database) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - Automerge (CRDT library), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - availability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability) - (see also fault tolerance) - in CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem) - in leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - in service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - availability zones, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - Avro (data format), [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - dynamically generated schemas, [Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - object container files, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema), [Archival storage](/en/ch5#archival-storage) - reader determining writer's schema, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - schema evolution, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema) - use in batch processing, [MapReduce](/en/ch11#sec_batch_mapreduce) - awk (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278) - Axon Framework, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - Azkaban (workflow scheduler), [Batch Processing](/en/ch11#ch_batch) - Azure Blob Storage (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - conditional headers, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - Azure managed disks, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - Azure SQL DB (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - Azure Storage, [Object Stores](/en/ch11#id277) - Azure Synapse Analytics (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - Azure Virtual Machines - spot virtual machines, [Handling Faults](/en/ch11#id281) ### B - B-trees (indexes), [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants) - B+ trees, [B-tree variants](/en/ch4#b-tree-variants) - branching factor, [B-Trees](/en/ch4#sec_storage_b_trees) - comparison to LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage) - crash recovery, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - growing by splitting a page, [B-Trees](/en/ch4#sec_storage_b_trees) - immutable variants, [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - similarity to shard splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data) - variants, [B-tree variants](/en/ch4#b-tree-variants) - B2 (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - Backblaze B2 (see B2 (object storage)) - backend, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - backoff, exponential, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - backpressure, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Read performance](/en/ch4#read-performance), [Messaging Systems](/en/ch12#sec_stream_messaging), [Glossary](/en/glossary) - in batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - in TCP, [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - backups - database snapshot for replication, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - in multitenant systems, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - integrity of, [Don't just blindly trust what they promise](/en/ch13#id364) - snapshot isolation for, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - using object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - versus replication, [Replication](/en/ch6#ch_replication) - backward compatibility, [Encoding and Evolution](/en/ch5#ch_encoding) - BadgerDB (database) - serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi) - BASE, contrast to ACID, [The Meaning of ACID](/en/ch8#sec_transactions_acid) - bash shell (Unix), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp) - batch processing, [Batch Processing](/en/ch11#ch_batch)-[Summary](/en/ch11#id292), [Glossary](/en/glossary) - and functional programming, [MapReduce](/en/ch11#sec_batch_mapreduce) - benefits of, [Batch Processing](/en/ch11#ch_batch) - combining with stream processing, [Unifying batch and stream processing](/en/ch13#id338) - comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing) - dataflow engines, [Dataflow Engines](/en/ch11#sec_batch_dataflow)-[Dataflow Engines](/en/ch11#sec_batch_dataflow) - fault tolerance, [Handling Faults](/en/ch11#id281), [Messaging Systems](/en/ch12#sec_stream_messaging) - for data integration, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338) - graphs and iterative processing, [Machine Learning](/en/ch11#id290) - high-level APIs and languages, [Query languages](/en/ch11#sec_batch_query_lanauges)-[Query languages](/en/ch11#sec_batch_query_lanauges) - in cloud data warehouses, [Query languages](/en/ch11#sec_batch_query_lanauges) - in distributed systems, [Batch Processing in Distributed Systems](/en/ch11#sec_batch_distributed) - join and group by, [JOIN and GROUP BY](/en/ch11#sec_batch_join)-[JOIN and GROUP BY](/en/ch11#sec_batch_join) - limitations, [Batch Processing](/en/ch11#ch_batch) - log-based messaging and, [Replaying old messages](/en/ch12#sec_stream_replay) - maintaining derived state, [Maintaining derived state](/en/ch13#id446) - measuring performance, [Batch Processing](/en/ch11#ch_batch) - models of, [Batch Processing Models](/en/ch11#id431) - resource allocation, [Resource Allocation](/en/ch11#id279)-[Resource Allocation](/en/ch11#id279) - resource managers, [Distributed Job Orchestration](/en/ch11#id278) - schedulers, [Distributed Job Orchestration](/en/ch11#id278) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)-[Serving Derived Data](/en/ch11#sec_batch_serving_derived) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)-[Shuffling Data](/en/ch11#sec_shuffle) - task execution, [Distributed Job Orchestration](/en/ch11#id278) - use cases, [Batch Use Cases](/en/ch11#sec_batch_output)-[Serving Derived Data](/en/ch11#sec_batch_serving_derived) - using Unix tools (example), [Batch Processing with Unix Tools](/en/ch11#sec_batch_unix)-[Sorting Versus In-memory Aggregation](/en/ch11#id275) - batch processing frameworks - comparison to operating systems, [Batch Processing in Distributed Systems](/en/ch11#sec_batch_distributed) - Beam (dataflow library), [Unifying batch and stream processing](/en/ch13#id338) - BERT (language model), [Vector Embeddings](/en/ch4#id92) - bias, [Bias and Discrimination](/en/ch14#id370) - bidirectional replication (see multi-leader replication) - big ball of mud, [Simplicity: Managing Complexity](/en/ch2#id38) - big data - versus data minimization, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - BigQuery (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Batch Processing](/en/ch11#ch_batch) - DataFrames, [Query languages](/en/ch11#sec_batch_query_lanauges) - sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - Bigtable (database) - sharding scheme, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - storage layout, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - tablets (sharding), [Sharding](/en/ch7#ch_sharding) - wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression) - binary data encodings, [Binary encoding](/en/ch5#binary-encoding)-[The Merits of Schemas](/en/ch5#sec_encoding_schemas) - Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - MessagePack, [Binary encoding](/en/ch5#binary-encoding)-[Binary encoding](/en/ch5#binary-encoding) - Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - binary encoding - based on schemas, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - by network drivers, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - binary strings, lack of support in JSON and XML, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - Bitcoin (cryptocurrency), [Tools for auditable data systems](/en/ch13#id366) - Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - concurrency bugs in exchanges, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - bitmap indexes, [Column Compression](/en/ch4#sec_storage_column_compression) - BitTorrent uTP protocol, [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - Bkd-trees (indexes), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - blameless postmortems, [Humans and Reliability](/en/ch2#id31) - Blazegraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language) - blob storage (see object storage) - block (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - block device (disk), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - blockchains, [Summary](/en/ch3#summary) - Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine), [Consensus](/en/ch10#sec_consistency_consensus), [Tools for auditable data systems](/en/ch13#id366) - blocking atomic commit, [Three-phase commit](/en/ch8#three-phase-commit) - Bloom filter (algorithm), [Bloom filters](/en/ch4#bloom-filters), [Read performance](/en/ch4#read-performance), [Stream analytics](/en/ch12#id318) - BookKeeper (replicated log), [Allocating work to nodes](/en/ch10#allocating-work-to-nodes) - bounded datasets, [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary) - (see also batch processing) - bounded delays, [Glossary](/en/glossary) - in networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - process pauses, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime) - broadcast - total order broadcast (see shared logs) - brokerless messaging, [Direct messaging from producers to consumers](/en/ch12#id296) - Brubeck (metrics aggregator), [Direct messaging from producers to consumers](/en/ch12#id296) - BTM (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - Buf - Bufstream (messaging), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - Bufstream (messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage) - build or buy, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud) - bursty network traffic patterns, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - business analyst, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - business data processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - business intelligence, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)-[Data Warehousing](/en/ch1#sec_introduction_dwh) - Business Process Execution Language (BPEL), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - Business Process Model and Notation (BPMN), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - example, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - byte sequence, encoding data in, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying), [System Model and Reality](/en/ch9#sec_distributed_system_model), [Glossary](/en/glossary) - Byzantine fault-tolerant systems, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - Byzantine Generals Problem, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - consensus algorithms and, [Consensus](/en/ch10#sec_consistency_consensus), [Tools for auditable data systems](/en/ch13#id366) ### C - caches, [Keeping everything in memory](/en/ch4#sec_storage_inmemory), [Glossary](/en/glossary) - and materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - in CPUs, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - invalidation and maintenance, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - linearizability, [Linearizability](/en/ch10#sec_consistency_linearizability) - local disks in the cloud, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - calendar sync, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - California Consumer Privacy Act (CCPA), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - Camunda (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - canonical version (of data), [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem)-[The CAP theorem](/en/ch10#the-cap-theorem), [Glossary](/en/glossary) - capacity planning, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - Cap'n Proto (data format), [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - carbon emissions, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - cascading aborts, [No dirty reads](/en/ch8#no-dirty-reads) - cascading failures, [Software faults](/en/ch2#software-faults), [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations), [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing) - Cassandra (database) - change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api) - compaction strategy, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - consistency level ANY, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range) - last-write-wins conflict resolution, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent) - leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless) - lightweight transactions, [Single-object writes](/en/ch8#sec_transactions_single_object) - linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - multi-region support, [Multi-region operation](/en/ch6#multi-region-operation) - secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - use of clocks, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - vnodes (sharding), [Sharding](/en/ch7#ch_sharding) - cat (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis) - catalog, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - causal context, [Version vectors](/en/ch6#version-vectors) - (see also causal dependencies) - causal dependencies, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)-[Version vectors](/en/ch6#version-vectors) - capturing, [Version vectors](/en/ch6#version-vectors), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality), [Reads are events too](/en/ch13#sec_future_read_events) - by total ordering, [The limits of total ordering](/en/ch13#id335) - in transactions, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise) - sending message to friends (example), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - causality, [Glossary](/en/glossary) - causal ordering - total order consistent with, [Logical Clocks](/en/ch10#sec_consistency_timestamps) - consistency with, [Logical Clocks](/en/ch10#sec_consistency_timestamps)-[Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks) - happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - in serializable transactions, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - mismatch with clocks, [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - ordering events to capture, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - violations of, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix), [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - with synchronized clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - cell-based architecture, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - CEP (see complex event processing) - CephFS (distributed filesystem), [Batch Processing](/en/ch11#ch_batch), [Object Stores](/en/ch11#id277) - certificate transparency, [Tools for auditable data systems](/en/ch13#id366) - cgroups, [Distributed Job Orchestration](/en/ch11#id278) - change data capture, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication), [Change Data Capture](/en/ch12#sec_stream_cdc) - API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api) - comparison to event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - implementing, [Implementing change data capture](/en/ch12#id307) - initial snapshot, [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot) - log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction) - changelogs, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability) - change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc) - for operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - in stream joins, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction) - maintaining derived state, [Databases and Streams](/en/ch12#sec_stream_databases) - chaos engineering, [Fault Tolerance](/en/ch2#id27), [Fault injection](/en/ch9#sec_fault_injection) - checkpointing - in high-performance computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - in stream processors, [Microbatching and checkpointing](/en/ch12#id329) - circuit breaker (limiting retries), [Describing Performance](/en/ch2#sec_introduction_percentiles) - circuit-switched networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - circular buffers, [Disk space usage](/en/ch12#sec_stream_disk_usage) - circular replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies) - Citus (database) - hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - ClickHouse (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - clickstream data, analysis of, [JOIN and GROUP BY](/en/ch11#sec_batch_join) - clients - calling services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc) - offline-capable, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Stateful, offline-capable clients](/en/ch13#id347) - pushing state changes to, [Pushing state changes to clients](/en/ch13#id348) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - ClockBound (time sync), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval) - use in YugabyteDB, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - clocks, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - atomic clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - confidence interval, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - for global snapshots, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks) - logical (see logical clocks) - skew, [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww), [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - slewing, [Monotonic clocks](/en/ch9#monotonic-clocks) - synchronization and accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)-[Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - synchronization using GPS, [Unreliable Clocks](/en/ch9#sec_distributed_clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - time-of-day versus monotonic clocks, [Monotonic Versus Time-of-Day Clocks](/en/ch9#sec_distributed_monotonic_timeofday) - timestamping events, [Whose clock are you using, anyway?](/en/ch12#id438) - cloud services, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)-[Cloud Computing Versus Supercomputing](/en/ch1#id17) - availability zones, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - need for service discovery, [Service discovery](/en/ch10#service-discovery) - network glitches, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - pros and cons, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs)-[Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs) - quotas, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - regions (see regions (geographic distribution)) - serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - shared resources, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - versus supercomputing, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - cloud-native, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)-[Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - Cloudflare - R2 (see R2 (object storage)) - clustered indexes, [Storing values within the index](/en/ch4#sec_storage_index_heap) - clustering (record ordering), [Sharding by hash range](/en/ch7#sharding-by-hash-range) - CockroachDB (database) - consensus-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - key-range sharding, [Sharding](/en/ch7#ch_sharding), [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi) - sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167) - transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - code generation - for query execution, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - with Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf) - collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - column families (Bigtable), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression) - column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)-[Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - column compression, [Column Compression](/en/ch4#sec_storage_column_compression) - Parquet, [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage) - sort order in, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)-[Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage) - vectorized processing, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - versus wide-column model, [Column Compression](/en/ch4#sec_storage_column_compression) - writing to, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - comma-separated values (see CSV) - command query responsibility segregation (CQRS), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)-[Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - commands (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - commits (transactions), [Transactions](/en/ch8#ch_transactions) - atomic commit, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - (see also atomicity; transactions) - read committed isolation, [Read Committed](/en/ch8#sec_transactions_read_committed) - three-phase commit (3PC), [Three-phase commit](/en/ch8#three-phase-commit) - two-phase commit (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Coordinator failure](/en/ch8#coordinator-failure) - commutative operations, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - compaction - of changelogs, [Log compaction](/en/ch12#sec_stream_log_compaction) - (see also log compaction) - for stream operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - of log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - issues with, [Read performance](/en/ch4#read-performance) - size-tiered and leveled approaches, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage) - compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - implementing locks, [Coordination Services](/en/ch10#sec_consistency_coordination) - implementing uniqueness constraints, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness) - on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - relation to consensus, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable), [Consensus](/en/ch10#sec_consistency_consensus), [Compare-and-set as consensus](/en/ch10#compare-and-set-as-consensus) - relation to fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - relation to transactions, [Single-object writes](/en/ch8#sec_transactions_single_object) - compatibility, [Encoding and Evolution](/en/ch5#ch_encoding), [Modes of Dataflow](/en/ch5#sec_encoding_dataflow) - calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - properties of encoding formats, [Summary](/en/ch5#summary) - using databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage) - compensating transactions, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Loosely interpreted constraints](/en/ch13#id362) - compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - complex event processing (CEP), [Complex event processing](/en/ch12#id317) - complexity - distilling in theoretical models, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world) - essential and accidental, [Simplicity: Managing Complexity](/en/ch2#id38) - hiding using abstraction, [Data Models and Query Languages](/en/ch3#ch_datamodels) - managing, [Simplicity: Managing Complexity](/en/ch2#id38) - composing data systems (see unbundling databases) - compression - in SSTables, [The SSTable file format](/en/ch4#the-sstable-file-format) - compute-intensive applications, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - computer games, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - concatenated indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - in hash-sharded systems, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - concurrency - actor programming model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc) - (see also event-driven architecture) - bugs from weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts) - definition, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts) - detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors) - dual writes, problems with, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - in replicated systems, [Problems with Replication Lag](/en/ch6#sec_replication_lag)-[Version vectors](/en/ch6#version-vectors), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update) - multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - ordering of operations, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - reducing, through event logs, [Concurrency control](/en/ch12#sec_stream_concurrency), [Dataflow: Interplay between state changes and application code](/en/ch13#id450) - time and relativity, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - transaction isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation) - write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts) - conditional write, [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set) - in transactions, [Single-object writes](/en/ch8#sec_transactions_single_object) - on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - conference management system (example), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - conflict-free replicated datatypes (CRDTs), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - for leaderless replication, [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship) - preventing lost updates, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - conflicts - avoidance, [Conflict avoidance](/en/ch6#conflict-avoidance) - causal dependencies, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - conflict detection - in distributed transactions, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - in log-based systems, [Uniqueness constraints require consensus](/en/ch13#id452) - in serializable snapshot isolation (SSI), [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - in two-phase commit, [A system of promises](/en/ch8#a-system-of-promises) - conflict resolution - by aborting transactions, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - by apologizing, [Loosely interpreted constraints](/en/ch13#id362) - last write wins (LWW), [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - using atomic operations, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - determining what is a conflict, [Types of conflict](/en/ch6#sec_replication_write_conflicts), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - in leaderless replication, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent) - lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - materializing, [Materializing conflicts](/en/ch8#materializing-conflicts) - resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts) - automatic, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution) - in leaderless systems, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent) - last write wins (LWW), [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww) - using custom logic, [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship) - siblings, [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship) - merging, [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship) - write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts) - Confluent - Freight (messaging), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Disk space usage](/en/ch12#sec_stream_disk_usage) - schema registry, [JSON Schema](/en/ch5#json-schema), [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - congestion (networks) - avoidance, [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - limiting accuracy of clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval) - queueing delays, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - consensus, [Consensus](/en/ch10#sec_consistency_consensus)-[Summary](/en/ch10#summary), [Glossary](/en/glossary) - algorithms, [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - consensus numbers, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus) - coordination services, [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery) - cost of, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - impossibility of, [Consensus](/en/ch10#sec_consistency_consensus) - preventing split brain, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - reconfiguration, [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - relation to atomic commitment, [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - relation to compare-and-set (CAS), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable), [Compare-and-set as consensus](/en/ch10#compare-and-set-as-consensus) - relation to fetch-and-add, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus) - relation to replication, [Using shared logs](/en/ch10#sec_consistency_smr) - relation to shared logs, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs) - relation to uniqueness constraints, [Uniqueness constraints require consensus](/en/ch13#id452) - safety and liveness properties, [Single-value consensus](/en/ch10#single-value-consensus) - single-value consensus, [Single-value consensus](/en/ch10#single-value-consensus) - consent (GDPR), [Consent and Freedom of Choice](/en/ch14#id375) - consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Timeliness and Integrity](/en/ch13#sec_future_integrity) - across different databases, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions) - causal, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix), [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix)-[Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix) - consistent snapshots, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner), [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot), [Creating an index](/en/ch13#id340) - (see also snapshots) - crash recovery, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - enforcing constraints (see constraints) - eventual, [Problems with Replication Lag](/en/ch6#sec_replication_lag) - (see also eventual consistency) - in ACID transactions, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Maintaining integrity in the face of software bugs](/en/ch13#id455) - in CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem) - in leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - in microservices, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems) - linearizability, [Solutions for Replication Lag](/en/ch6#id131), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - meanings of, [Consistency](/en/ch8#sec_transactions_acid_consistency) - monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads)-[Monotonic Reads](/en/ch6#sec_replication_monotonic_reads) - of secondary indexes, [The need for multi-object transactions](/en/ch8#sec_transactions_need), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation), [Reasoning about dataflows](/en/ch13#id443), [Creating an index](/en/ch13#id340) - read-after-write, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)-[Reading Your Own Writes](/en/ch6#sec_replication_ryw) - in derived data systems, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions) - strong (see linearizability) - timeliness and integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - using quorums, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - consistent hashing, [Consistent hashing](/en/ch7#sec_sharding_consistent_hashing) - consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix) - constraints (databases), [Consistency](/en/ch8#sec_transactions_acid_consistency), [Characterizing write skew](/en/ch8#characterizing-write-skew) - asynchronously checked, [Loosely interpreted constraints](/en/ch13#id362) - coordination avoidance, [Coordination-avoiding data systems](/en/ch13#id454) - ensuring idempotence, [Uniquely identifying requests](/en/ch13#id355) - in log-based systems, [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Multi-shard request processing](/en/ch13#id360) - across multiple shards, [Multi-shard request processing](/en/ch13#id360) - in two-phase commit, [Distributed Transactions](/en/ch8#sec_transactions_distributed), [A system of promises](/en/ch8#a-system-of-promises) - relation to consensus, [Uniqueness constraints require consensus](/en/ch13#id452) - requiring linearizability, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness) - Consul (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination) - use for service discovery, [Service discovery](/en/ch10#service-discovery) - consumers (message streams), [Message brokers](/en/ch5#message-brokers), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - backpressure, [Messaging Systems](/en/ch12#sec_stream_messaging) - consumer groups, [Multiple consumers](/en/ch12#id298) - consumer offsets in logs, [Consumer offsets](/en/ch12#sec_stream_log_offsets) - failures, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering), [Consumer offsets](/en/ch12#sec_stream_log_offsets) - fan-out, [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing), [Multiple consumers](/en/ch12#id298), [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging) - load balancing, [Multiple consumers](/en/ch12#id298), [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging) - not keeping up with producers, [Messaging Systems](/en/ch12#sec_stream_messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage), [Making unbundling work](/en/ch13#sec_future_unbundling_favor) - content models (JSON Schema), [JSON Schema](/en/ch5#json-schema) - contention - between transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - blocking threads, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - performance of optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - under two-phase locking, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking) - context switches, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - convergence (conflict resolution), [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution)-[CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - coordination - avoidance, [Coordination-avoiding data systems](/en/ch13#id454) - cross-datacenter, [The limits of total ordering](/en/ch13#id335) - cross-region, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - cross-shard ordering, [Sharding](/en/ch8#sharding), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner), [Using shared logs](/en/ch10#sec_consistency_smr), [Multi-shard request processing](/en/ch13#id360) - routing requests to shards, [Request Routing](/en/ch7#sec_sharding_routing) - services, [Locking and leader election](/en/ch10#locking-and-leader-election), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery) - coordinator (in 2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - failure, [Coordinator failure](/en/ch8#coordinator-failure) - in XA transactions, [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - recovery, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure) - copy-on-write (B-trees), [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - CORBA (Common Object Request Broker Architecture), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - coronal mass ejection (see solar storm) - correctness - auditability, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366) - Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - dealing with partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - in log-based systems, [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Multi-shard request processing](/en/ch13#id360) - of algorithm within system model, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm) - of derived data, [Designing for auditability](/en/ch13#id365) - of immutable data, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - of personal data, [Responsibility and Accountability](/en/ch14#id371), [Privacy and Use of Data](/en/ch14#id457) - of time, [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - of transactions, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Aiming for Correctness](/en/ch13#sec_future_correctness), [Maintaining integrity in the face of software bugs](/en/ch13#id455) - timeliness and integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)-[Coordination-avoiding data systems](/en/ch13#id454) - corruption of data - detecting, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364)-[Tools for auditable data systems](/en/ch13#id366) - due to pathological memory access, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - due to radiation, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - due to split brain, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing) - due to weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - integrity as absence of, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - network packets, [Weak forms of lying](/en/ch9#weak-forms-of-lying) - on disks, [Durability](/en/ch8#durability) - preventing using write-ahead logs, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - recovering from, [Batch Processing](/en/ch11#ch_batch), [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - cosine similarity (semantic search), [Vector Embeddings](/en/ch4#id92) - Couchbase (database) - document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - durability, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - vBuckets (sharding), [Sharding](/en/ch7#ch_sharding) - CouchDB (database) - as sync engine, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - B-tree storage, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - conflict resolution, [Manual conflict resolution](/en/ch6#manual-conflict-resolution) - coupling (loose and tight), [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability) - covering indexes, [Storing values within the index](/en/ch4#sec_storage_index_heap) - CozoDB (database), [Datalog: Recursive Relational Queries](/en/ch3#id62) - CPUs - cache coherence and memory barriers, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - caching and pipelining, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - computing the wrong result, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - SIMD instructions, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - crash-stop and crash-recovery faults, [System Model and Reality](/en/ch9#sec_distributed_system_model) - CRDTs (see conflict-free replicated datatypes) - CREATE INDEX statement (SQL), [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Creating an index](/en/ch13#id340) - credit rating agencies, [Responsibility and Accountability](/en/ch14#id371) - crypto-shredding, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - cryptocurrencies, [Summary](/en/ch3#summary) - cryptography - defense against attackers, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - end-to-end encryption and authentication, [The end-to-end argument](/en/ch13#sec_future_e2e_argument) - CSV (comma-separated values), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - Curator (ZooKeeper recipes), [Locking and leader election](/en/ch10#locking-and-leader-election), [Allocating work to nodes](/en/ch10#allocating-work-to-nodes) - Cypher (query language), [The Cypher Query Language](/en/ch3#id57) - comparison to SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language) ### D - Daft (processing framework) - DataFrames, [DataFrames](/en/ch11#id287) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - Dagster (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows) - cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges) - dashboard (business intelligence), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - Dask (processing framework), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - data catalog, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - data connectors, [Data Warehousing](/en/ch1#sec_introduction_dwh) - data contracts, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - data corruption (see corruption of data) - data cubes, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - data engineering, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - data fabric, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - data formats (see encoding) - data infrastructure, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - data integration, [Data Integration](/en/ch13#sec_future_integration)-[Unifying batch and stream processing](/en/ch13#id338), [Summary](/en/ch13#id367) - batch and stream processing, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338) - maintaining derived state, [Maintaining derived state](/en/ch13#id446) - reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - unifying, [Unifying batch and stream processing](/en/ch13#id338) - by unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - comparison to federated databases, [The meta-database of everything](/en/ch13#id341) - combining tools by deriving data, [Combining Specialized Tools by Deriving Data](/en/ch13#id442)-[Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - derived data versus distributed transactions, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions) - limits of total ordering, [The limits of total ordering](/en/ch13#id335) - ordering events to capture causality, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - reasoning about dataflows, [Reasoning about dataflows](/en/ch13#id443) - need for, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - using batch processing, [Batch Processing](/en/ch11#ch_batch), [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - data lake, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - data lakehouse, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Analytics](/en/ch11#sec_batch_olap) - data locality (see locality) - data mesh, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - data minimization, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - data models, [Data Models and Query Languages](/en/ch3#ch_datamodels)-[Summary](/en/ch3#summary) - DataFrames and arrays, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - graph-like models, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)-[GraphQL](/en/ch3#id63) - Datalog language, [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62) - property graphs, [Property Graphs](/en/ch3#id56) - RDF and triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language) - relational model versus document model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - supporting multiple, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - data pipelines, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - data products, [Beyond the data lake](/en/ch1#beyond-the-data-lake) - data protection regulations (see GDPR) - data residence laws, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - data science, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - data silo, [Data Warehousing](/en/ch1#sec_introduction_dwh) - data systems - correctness, constraints, and integrity, [Aiming for Correctness](/en/ch13#sec_future_correctness)-[Tools for auditable data systems](/en/ch13#id366) - data integration, [Data Integration](/en/ch13#sec_future_integration)-[Unifying batch and stream processing](/en/ch13#id338) - goals for using, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - heterogeneous, keeping in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - maintainability, [Maintainability](/en/ch2#sec_introduction_maintainability)-[Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability) - possible faults in, [Transactions](/en/ch8#ch_transactions) - reliability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31) - hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - human errors, [Humans and Reliability](/en/ch2#id31) - importance of, [Humans and Reliability](/en/ch2#id31) - software faults, [Software faults](/en/ch2#software-faults) - scalability, [Scalability](/en/ch2#sec_introduction_scalability)-[Principles for Scalability](/en/ch2#id35) - unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - unreliable clocks, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - data warehousing, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Glossary](/en/glossary) - cloud-based solutions, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - ETL (extract-transform-load), [Data Warehousing](/en/ch1#sec_introduction_dwh), [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - for batch processing, [Batch Processing](/en/ch11#ch_batch) - keeping data systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - schema design, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - slowly changing dimension (SCD), [Time-dependence of joins](/en/ch12#sec_stream_join_time) - data-intensive applications, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - database administrator, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - database-internal distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) - databases - archival storage, [Archival storage](/en/ch5#archival-storage) - comparison of message brokers to, [Message brokers compared to databases](/en/ch12#id297) - dataflow through, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db) - end-to-end argument for, [The end-to-end argument](/en/ch13#sec_future_e2e_argument)-[Applying end-to-end thinking in data systems](/en/ch13#id357) - checking integrity, [The end-to-end argument again](/en/ch13#id456) - relation to event streams, [Databases and Streams](/en/ch12#sec_stream_databases)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - (see also changelogs) - API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api), [Separation of application code and state](/en/ch13#id344) - change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api) - event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - keeping systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)-[Keeping Systems in Sync](/en/ch12#sec_stream_sync) - philosophy of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - unbundling, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - designing applications around dataflow, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345) - observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - datacenters - failures of, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - geographically distributed (see regions (geographic distribution)) - multitenancy and shared resources, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - network architecture, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - dataflow, [Modes of Dataflow](/en/ch5#sec_encoding_dataflow)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345) - correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453) - dataflow engines, [Dataflow Engines](/en/ch11#sec_batch_dataflow) - comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing) - DataFrames, [DataFrames](/en/ch11#id287) - support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch) - event-driven, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - reasoning about, [Reasoning about dataflows](/en/ch13#id443) - through databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db) - through services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - workflow engines (see workflow engines) - DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - implementation, [DataFrames](/en/ch11#id287) - in batch processing, [DataFrames](/en/ch11#id287) - in notebooks, [Machine Learning](/en/ch11#id290) - support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch) - DataFusion (query engine), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - Datalog (query language), [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62) - Datastream (change data capture), [API support for change streams](/en/ch12#sec_stream_change_api) - datatypes - binary strings in XML and JSON, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - conflict-free, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - in Avro encodings, [Avro](/en/ch5#sec_encoding_avro) - in Protocol Buffers, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - numbers in XML and JSON, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - Datensparsamkeit, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - Datomic (database) - B-tree storage, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - data model, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph), [Triple-Stores and SPARQL](/en/ch3#id59) - Datalog query language, [Datalog: Recursive Relational Queries](/en/ch3#id62) - excision (deleting data), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - languages for transactions, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - serial execution of transactions, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - Daylight Saving Time (DST), [Time-of-day clocks](/en/ch9#time-of-day-clocks) - Db2 (database) - change data capture, [Implementing change data capture](/en/ch12#id307) - DBA (database administrator), [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - deadlocks, [Explicit locking](/en/ch8#explicit-locking) - detection, in distributed transaction, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - in two-phase locking (2PL), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - Debezium (change data capture), [Implementing change data capture](/en/ch12#id307) - Cassandra, [API support for change streams](/en/ch12#sec_stream_change_api) - for data integration, [Unbundled versus integrated systems](/en/ch13#id448) - declarative languages, [Data Models and Query Languages](/en/ch3#ch_datamodels), [Glossary](/en/glossary) - and sync engines, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62) - in document databases, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58) - SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language) - DeepSeek - 3FS (see 3FS) - delays - bounded network delays, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - bounded process pauses, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime) - unbounded network delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing) - unbounded process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - deleting data, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - in LSM storage, [Disk space usage](/en/ch4#disk-space-usage) - legal basis, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - Delta Lake (table format), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - demilitarized zone (networking), [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - denormalization (data representation), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)-[Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Glossary](/en/glossary) - in derived data systems, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - in event sourcing/CQRS, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - in social network case study, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study) - materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - updating derived data, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [The need for multi-object transactions](/en/ch8#sec_transactions_need), [Combining Specialized Tools by Deriving Data](/en/ch13#id442) - versus normalization, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary) - batch processing, [Batch Processing](/en/ch11#ch_batch) - event sourcing and CQRS, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - from change data capture, [Implementing change data capture](/en/ch12#id307) - maintaining derived state through logs, [Databases and Streams](/en/ch12#sec_stream_databases)-[API support for change streams](/en/ch12#sec_stream_change_api), [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Concurrency control](/en/ch12#sec_stream_concurrency) - observing, by subscribing to streams, [End-to-end event streams](/en/ch13#id349) - outputs of batch and stream processing, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming) - through application code, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation) - versus distributed transactions, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions) - design patterns, [Simplicity: Managing Complexity](/en/ch2#id38) - deterministic operations, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Glossary](/en/glossary) - and idempotence, [Idempotence](/en/ch12#sec_stream_idempotence), [Reasoning about dataflows](/en/ch13#id443) - computing derived data, [Maintaining derived state](/en/ch13#id446), [Correctness of dataflow systems](/en/ch13#id453), [Designing for auditability](/en/ch13#id365) - in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - in state machine replication, [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases) - in statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) - in testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - joins, [Time-dependence of joins](/en/ch12#sec_stream_join_time) - making code deterministic, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - overview, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - deterministic simulation testing (DST), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - DevOps, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - dimension tables, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - dimensional modeling (see star schemas) - directed acyclic graphs (DAG) - workflows, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - (see also workflow engines) - dirty reads (transaction isolation), [No dirty reads](/en/ch8#no-dirty-reads) - dirty writes (transaction isolation), [No dirty writes](/en/ch8#sec_transactions_dirty_write) - disaggregation - of storage and compute, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - Discord (group chat) - GraphQL example, [GraphQL](/en/ch3#id63) - discrimination, [Bias and Discrimination](/en/ch14#id370) - disks (see hard disks) - distributed actor frameworks, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs)-[Distributed Filesystems](/en/ch11#sec_batch_dfs) - comparison to object storage, [Object Stores](/en/ch11#id277) - use by Flink, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - distributed ledgers, [Summary](/en/ch3#summary) - distributed systems, [The Trouble with Distributed Systems](/en/ch9#ch_distributed)-[Summary](/en/ch9#summary), [Glossary](/en/glossary) - Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying) - detecting network faults, [Detecting Faults](/en/ch9#id307) - faults and partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - formalization of consensus, [Single-value consensus](/en/ch10#single-value-consensus) - impossibility results, [The CAP theorem](/en/ch10#the-cap-theorem), [Consensus](/en/ch10#sec_consistency_consensus) - issues with failover, [Leader failure: Failover](/en/ch6#leader-failure-failover) - multi-region (see regions (geographic distribution)) - network problems, [Unreliable Networks](/en/ch9#sec_distributed_networks)-[Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - problems with, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems) - quorums, relying on, [The Majority Rules](/en/ch9#sec_distributed_majority) - reasons for using, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Replication](/en/ch6#ch_replication) - synchronized clocks, relying on, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - system models, [System Model and Reality](/en/ch9#sec_distributed_system_model)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - use of clocks and time, [Unreliable Clocks](/en/ch9#sec_distributed_clocks) - distributed transactions (see transactions) - Django (web framework), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - DMZ (demilitarized zone), [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - DNS (Domain Name System), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery) - for load balancing, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - Docker (container manager), [Separation of application code and state](/en/ch13#id344) - document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - comparison to relational model, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - multi-object transactions, need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - sharded secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes) - versus relational model - convergence of models, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - document-partitioned indexes (see local secondary indexes) - domain-driven design (DDD), [Simplicity: Managing Complexity](/en/ch2#id38), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - dotted version vectors, [Version vectors](/en/ch6#version-vectors) - double-entry bookkeeping, [Summary](/en/ch3#summary) - DRBD (Distributed Replicated Block Device), [Single-Leader Replication](/en/ch6#sec_replication_leader) - drift (clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - Druid (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - pre-aggregation, [Analytics](/en/ch11#sec_batch_olap) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - Dryad (dataflow engine), [Dataflow Engines](/en/ch11#sec_batch_dataflow) - dual writes, problems with, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - DuckDB (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column) - use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - duplicates, suppression of, [Duplicate suppression](/en/ch13#id354) - (see also idempotence) - using a unique ID, [Uniquely identifying requests](/en/ch13#id355), [Multi-shard request processing](/en/ch13#id360) - durability (transactions), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability), [Glossary](/en/glossary) - durable execution, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - Restate (see Restate (workflow engine)) - Temporal (see Temporal (workflow engine)) - durable functions (see workflow engines) - duration (time), [Unreliable Clocks](/en/ch9#sec_distributed_clocks) - measurement with monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks) - dynamically typed languages - analogy to schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - Dynamo (database), [Leaderless Replication](/en/ch6#sec_replication_leaderless) - Dynamo-style databases (see leaderless replication) - DynamoDB (database) - auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - hash-range sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167) ### E - EBS (virtual block device), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - compared to object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - ECC (see error-correcting codes) - EDB Postgres Distributed (database), [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - edges (in graphs), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - property graph model, [Property Graphs](/en/ch3#id56) - edit distance (full-text search), [Full-Text Search](/en/ch4#sec_storage_full_text) - effectively-once semantics, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance), [Exactly-once execution of an operation](/en/ch13#id353) - (see also exactly-once semantics) - preservation of integrity, [Correctness of dataflow systems](/en/ch13#id453) - Elastic Compute Cloud (EC2) - spot instances, [Handling Faults](/en/ch11#id281) - elasticity, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - cloud data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Query languages](/en/ch11#sec_batch_query_lanauges) - Elasticsearch (search server) - local secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - percolator (stream search), [Search on streams](/en/ch12#id320) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - shard rebalancing, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - use of Lucene, [Full-Text Search](/en/ch4#sec_storage_full_text) - Elm (programming language), [End-to-end event streams](/en/ch13#id349) - ELT (extract-load-transform), [Data Warehousing](/en/ch1#sec_introduction_dwh) - relation to batch processing, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - embarassingly parallel (algorithms) - ETL (see ETL (extract-transform-load)) - MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce) - (see also MapReduce) - embedded storage engines, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - embedding (vector), [Vector Embeddings](/en/ch4#id92) - encodings (data formats), [Encoding and Evolution](/en/ch5#ch_encoding)-[The Merits of Schemas](/en/ch5#sec_encoding_schemas) - Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - binary variants of JSON and XML, [Binary encoding](/en/ch5#binary-encoding) - compatibility, [Encoding and Evolution](/en/ch5#ch_encoding) - calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - using databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage) - defined, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - JSON, XML, and CSV, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - language-specific formats, [Language-Specific Formats](/en/ch5#id96) - merits of schemas, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - representations of data, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - end-to-end argument, [The end-to-end argument](/en/ch13#sec_future_e2e_argument)-[Applying end-to-end thinking in data systems](/en/ch13#id357) - checking integrity, [The end-to-end argument again](/en/ch13#id456) - publish/subscribe streams, [End-to-end event streams](/en/ch13#id349) - enrichment (stream), [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - Enterprise JavaBeans (EJB), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - enterprise software, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - entities (see vertices) - ephemeral storage, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - epoch (consensus algorithms), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - epoch (Unix timestamps), [Time-of-day clocks](/en/ch9#time-of-day-clocks) - erasure coding (error correction), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - error handling - for network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - in transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - error-correcting codes, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - Esper (CEP engine), [Complex event processing](/en/ch12#id317) - essential complexity, [Simplicity: Managing Complexity](/en/ch2#id38) - etcd (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery) - generating fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens), [Coordination Services](/en/ch10#sec_consistency_coordination) - linearizable operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable), [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - locks and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election) - use for service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Service discovery](/en/ch10#service-discovery) - use for shard assignment, [Request Routing](/en/ch7#sec_sharding_routing) - use of Raft algorithm, [Single-Leader Replication](/en/ch6#sec_replication_leader) - Ethereum (blockchain), [Tools for auditable data systems](/en/ch13#id366) - Ethernet (networks), [Cloud Computing Versus Supercomputing](/en/ch1#id17), [Unreliable Networks](/en/ch9#sec_distributed_networks), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - packet checksums, [Weak forms of lying](/en/ch9#weak-forms-of-lying), [The end-to-end argument](/en/ch13#sec_future_e2e_argument) - ethics, [Doing the Right Thing](/en/ch14)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - code of ethics and professional practice, [Doing the Right Thing](/en/ch14) - legislation and self-regulation, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - predictive analytics, [Predictive Analytics](/en/ch14#id369)-[Feedback Loops](/en/ch14#id372) - amplifying bias, [Bias and Discrimination](/en/ch14#id370) - feedback loops, [Feedback Loops](/en/ch14#id372) - privacy and tracking, [Privacy and Tracking](/en/ch14#id373)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - consent and freedom of choice, [Consent and Freedom of Choice](/en/ch14#id375) - data as assets and power, [Data as Assets and Power](/en/ch14#id376) - meaning of privacy, [Privacy and Use of Data](/en/ch14#id457) - surveillance, [Surveillance](/en/ch14#id374) - respect, dignity, and agency, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - unintended consequences, [Doing the Right Thing](/en/ch14), [Feedback Loops](/en/ch14#id372) - ETL (extract-transform-load), [Data Warehousing](/en/ch1#sec_introduction_dwh), [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Glossary](/en/glossary) - relation to batch processing, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)-[Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - using batch processing, [Batch Processing](/en/ch11#ch_batch) - Euclidean distance (semantic search), [Vector Embeddings](/en/ch4#id92) - European Union - AI Act (see AI Act) - GDPR (see GDPR) - event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)-[Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - and change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - comparison to change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - immutability and auditability, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability), [Designing for auditability](/en/ch13#id365) - large, reliable data systems, [Uniquely identifying requests](/en/ch13#id355), [Correctness of dataflow systems](/en/ch13#id453) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - event streams (see streams) - event-driven architecture, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - distributed actor frameworks, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - events, [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - deciding on total order of, [The limits of total ordering](/en/ch13#id335) - deriving views from event log, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338) - immutable, advantages of, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Designing for auditability](/en/ch13#id365) - ordering to capture causality, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - reads as, [Reads are events too](/en/ch13#sec_future_read_events) - stragglers, [Handling straggler events](/en/ch12#id323) - timestamp of, in stream processing, [Whose clock are you using, anyway?](/en/ch12#id438) - EventSource (browser API), [Pushing state changes to clients](/en/ch13#id348) - EventStoreDB (database), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - eventual consistency, [Replication](/en/ch6#ch_replication), [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Safety and liveness](/en/ch9#sec_distributed_safety_liveness) - (see also conflicts) - and perpetual inconsistency, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - strong eventual consistency, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution) - evidence - data used as, [Humans and Reliability](/en/ch2#id31) - evolvability, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [Encoding and Evolution](/en/ch5#ch_encoding) - calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - graph-structured data, [Property Graphs](/en/ch3#id56) - of databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338) - schema evolution in Avro, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema) - schema evolution in Protocol Buffers, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding), [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - exactly-once semantics, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance), [Exactly-once execution of an operation](/en/ch13#id353) - parity with batch processors, [Unifying batch and stream processing](/en/ch13#id338) - preservation of integrity, [Correctness of dataflow systems](/en/ch13#id453) - using durable execution, [Durable execution](/en/ch5#durable-execution) - exclusive mode (locks), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - exponential backoff, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - ext4 (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - eXtended Architecture transactions (see XA transactions) - extract-transform-load (see ETL) ### F - Facebook - Faiss (vector index), [Vector Embeddings](/en/ch4#id92) - React (user interface library), [End-to-end event streams](/en/ch13#id349) - social graphs, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - facts - fact table (star schema), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - in Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62) - in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - fail-slow faults, [System Model and Reality](/en/ch9#sec_distributed_system_model) - fail-stop model, [System Model and Reality](/en/ch9#sec_distributed_system_model) - failover, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Glossary](/en/glossary) - (see also leader-based replication) - in leaderless replication, absence of, [Writing to the Database When a Node Is Down](/en/ch6#id287) - leader election, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing), [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - potential problems, [Leader failure: Failover](/en/ch6#leader-failure-failover) - failures - amplification by distributed transactions, [Maintaining derived state](/en/ch13#id446) - failure detection, [Detecting Faults](/en/ch9#id307) - automatic rebalancing causing cascading failures, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - timeouts and unbounded delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - using a coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination) - faults versus, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability) - partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Summary](/en/ch9#summary) - Faiss (vector index), [Vector Embeddings](/en/ch4#id92) - false positive (Bloom filters), [Bloom filters](/en/ch4#bloom-filters) - fan-out (messaging systems), [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing), [Multiple consumers](/en/ch12#id298) - fault injection, [Fault Tolerance](/en/ch2#id27), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults), [Fault injection](/en/ch9#sec_fault_injection) - fault isolation, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - fault tolerance, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31), [Glossary](/en/glossary) - formalization in consensus, [Single-value consensus](/en/ch10#single-value-consensus) - human fault tolerance, [Batch Processing](/en/ch11#ch_batch) - in batch processing, [Handling Faults](/en/ch11#id281) - in log-based systems, [Applying end-to-end thinking in data systems](/en/ch13#id357), [Timeliness and Integrity](/en/ch13#sec_future_integrity)-[Correctness of dataflow systems](/en/ch13#id453) - in stream processing, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance)-[Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - atomic commit, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) - idempotence, [Idempotence](/en/ch12#sec_stream_idempotence) - maintaining derived state, [Maintaining derived state](/en/ch13#id446) - microbatching and checkpointing, [Microbatching and checkpointing](/en/ch12#id329) - rebuilding state after a failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - of distributed transactions, [XA transactions](/en/ch8#xa-transactions)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - of leader-based and leaderless replication, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - transaction atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing](/en/ch8#sec_transactions_exactly_once) - faults - Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying) - failures versus, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability) - handled by transactions, [Transactions](/en/ch8#ch_transactions) - handling in supercomputers and cloud computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - hardware, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - in distributed systems, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - introducing deliberately (see fault injection) - network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)-[Detecting Faults](/en/ch9#id307) - asymmetric faults, [The Majority Rules](/en/ch9#sec_distributed_majority) - detecting, [Detecting Faults](/en/ch9#id307) - tolerance of, in multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - software faults, [Software faults](/en/ch2#software-faults) - tolerating (see fault tolerance) - feature engineering (machine learning), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - federated databases, [The meta-database of everything](/en/ch13#id341) - Feldera (database) - incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - fence (CPU instruction), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - fencing (preventing split brain), [Leader failure: Failover](/en/ch6#leader-failure-failover), [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas) - generating fencing tokens, [Using shared logs](/en/ch10#sec_consistency_smr), [Coordination Services](/en/ch10#sec_consistency_coordination) - properties of fencing tokens, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm) - stream processors writing to databases, [Idempotence](/en/ch12#sec_stream_idempotence), [Exactly-once execution of an operation](/en/ch13#id353) - fetch-and-add - relation to consensus, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus) - Fibre Channel (networks), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - field tags (Protocol Buffers), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - Figma (graphics software), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - filesystem in userspace (FUSE), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - on object storage, [Object Stores](/en/ch11#id277) - financial data - accounting ledgers, [Summary](/en/ch3#summary) - immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - time series data, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - Fivetran, [Data Warehousing](/en/ch1#sec_introduction_dwh) - FizzBee (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - flat index (vector index), [Vector Embeddings](/en/ch4#id92) - FlatBuffers (data format), [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - Flink (processing framework), [Batch Processing](/en/ch11#ch_batch), [Dataflow Engines](/en/ch11#sec_batch_dataflow) - cost efficiency, [Query languages](/en/ch11#sec_batch_query_lanauges) - DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287) - fault tolerance, [Handling Faults](/en/ch11#id281), [Microbatching and checkpointing](/en/ch12#id329), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - FlinkML, [Machine Learning](/en/ch11#id290) - for data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - high availability using ZooKeeper, [Coordination Services](/en/ch10#sec_consistency_coordination) - integration of batch and stream processing, [Unifying batch and stream processing](/en/ch13#id338) - query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - stream processing, [Stream analytics](/en/ch12#id318) - streaming SQL support, [Complex event processing](/en/ch12#id317) - flow control, [The Limitations of TCP](/en/ch9#sec_distributed_tcp), [Messaging Systems](/en/ch12#sec_stream_messaging), [Glossary](/en/glossary) - FLP result (on consensus), [Consensus](/en/ch10#sec_consistency_consensus) - Flyte (workflow scheduler), [Machine Learning](/en/ch11#id290) - followers, [Single-Leader Replication](/en/ch6#sec_replication_leader), [Glossary](/en/glossary) - (see also leader-based replication) - formal methods, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - forward compatibility, [Encoding and Evolution](/en/ch5#ch_encoding) - forward decay (algorithm), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - Fossil (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency) - shunning (deleting data), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - FoundationDB (database) - consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - deterministic simulation testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi), [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - fractional indexing, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary) - fragmentation (of B-trees), [Disk space usage](/en/ch4#disk-space-usage) - frame (computer graphics), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - frontend (web development), [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - FrostDB (database) - deterministic simulation testing (DST), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - fsync (system call), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability) - full-text search, [Full-Text Search](/en/ch4#sec_storage_full_text), [Glossary](/en/glossary) - and fuzzy indexes, [Full-Text Search](/en/ch4#sec_storage_full_text) - Lucene storage engine, [Full-Text Search](/en/ch4#sec_storage_full_text) - sharded indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes) - Function as a Service (FaaS), [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - functional programming - inspiration for MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce) - functional requirements, [Defining Nonfunctional Requirements](/en/ch2#ch_nonfunctional) - FUSE (see filesystem in userspace (FUSE)) - fuzzing, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal) - fuzzy search (see similarity search) ### G - Gallina (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - game development, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - garbage collection - immutability and, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - process pauses for, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact), [The Majority Rules](/en/ch9#sec_distributed_majority) - (see also process pauses) - gas stations algorithmic pricing, [Feedback Loops](/en/ch14#id372) - GDPR (regulation), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - consent, [Consent and Freedom of Choice](/en/ch14#id375) - data minimization, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - legitimate interest, [Consent and Freedom of Choice](/en/ch14#id375) - right of access, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - GenBank (genome database), [Summary](/en/ch3#summary) - General Data Protection Regulation (see GDPR (regulation)) - genome analysis, [Summary](/en/ch3#summary) - geographic distribution (see regions (geographic distribution)) - geospatial indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - Git (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency) - local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - merge conflicts, [Manual conflict resolution](/en/ch6#manual-conflict-resolution) - GitHub, postmortems, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Leader failure: Failover](/en/ch6#leader-failure-failover), [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world) - global secondary indexes, [Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary) - globally unique identifiers (see UUIDs) - GlusterFS (distributed filesystem), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277) - GNU Coreutils (Linux), [Sorting Versus In-memory Aggregation](/en/ch11#id275) - Go (programming language) - garbage collection, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - GoldenGate (change data capture), [Implementing change data capture](/en/ch12#id307) - (see also Oracle) - Google - BigQuery (see BigQuery (database)) - Bigtable (see Bigtable (database)) - Chubby (lock service), [Coordination Services](/en/ch10#sec_consistency_coordination) - Cloud Storage (object storage), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Object Stores](/en/ch11#id277) - request preconditions, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - Compute Engine - preemptible instances, [Handling Faults](/en/ch11#id281) - Dataflow (stream processing) - data warehouse integration, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - Dataflow (stream processor), [Stream analytics](/en/ch12#id318), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit), [Unifying batch and stream processing](/en/ch13#id338) - (see also Beam) - Datastream (change data capture), [API support for change streams](/en/ch12#sec_stream_change_api) - Docs (collaborative editor), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - operational transformation, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - Dremel (query engine), [Column-Oriented Storage](/en/ch4#sec_storage_column) - Firestore (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - MapReduce (batch processing), [Batch Processing](/en/ch11#ch_batch) - (see also MapReduce) - Percolator (transaction system), [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator) - persistent disks (cloud service), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - Pub/Sub (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297), [Using logs for message storage](/en/ch12#id300) - response time study, [Average, Median, and Percentiles](/en/ch2#id24) - Sheets (collaborative spreadsheet), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - Spanner (see Spanner (database)) - TrueTime (clock API), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval) - gossip protocol, [Request Routing](/en/ch7#sec_sharding_routing) - governance, [Beyond the data lake](/en/ch1#beyond-the-data-lake) - government use of data, [Data as Assets and Power](/en/ch14#id376) - GPS (Global Positioning System) - use for clock synchronization, [Unreliable Clocks](/en/ch9#sec_distributed_clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - GPT (language model), [Vector Embeddings](/en/ch4#id92) - GPU (graphics processing unit), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - gradual rollout (see rolling upgrades) - GraphQL (query language), [GraphQL](/en/ch3#id63) - validation, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - graphs, [Glossary](/en/glossary) - as data models, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)-[GraphQL](/en/ch3#id63) - property graphs, [Property Graphs](/en/ch3#id56) - RDF and triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language) - DAGs (see directed acyclic graphs) - processing and analysis, [Machine Learning](/en/ch11#id290) - query languages - Cypher, [The Cypher Query Language](/en/ch3#id57) - Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62) - GraphQL, [GraphQL](/en/ch3#id63) - Gremlin, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58) - SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)-[The SPARQL query language](/en/ch3#the-sparql-query-language) - traversal, [Property Graphs](/en/ch3#id56) - gray failures, [System Model and Reality](/en/ch9#sec_distributed_system_model) - in leaderless replication, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - Gremlin (graph query language), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - grep (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis) - gRPC (service calls), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Web services](/en/ch5#sec_web_services) - forward and backward compatibility, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - GUIDs (see UUIDs) ### H - Hadoop (data infrastructure) - comparison to distributed databases, [Batch Processing](/en/ch11#ch_batch) - MapReduce (see MapReduce) - NodeManager, [Distributed Job Orchestration](/en/ch11#id278) - YARN (see YARN (job scheduler)) - HANA (see SAP HANA (database)) - happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - hard disks - access patterns, [Sequential versus random writes](/en/ch4#sidebar_sequential) - detecting corruption, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364) - faults in, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults), [Durability](/en/ch8#durability) - sequential vs. random writes, [Sequential versus random writes](/en/ch4#sidebar_sequential) - sequential write throughput, [Disk space usage](/en/ch12#sec_stream_disk_usage) - hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - hash function - in Bloom filters, [Bloom filters](/en/ch4#bloom-filters) - hash join - in stream processing, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - hash sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash)-[Consistent hashing](/en/ch7#sec_sharding_consistent_hashing), [Summary](/en/ch7#summary) - consistent hashing, [Consistent hashing](/en/ch7#sec_sharding_consistent_hashing) - problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes) - range queries, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - suitable hash functions, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash) - with fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - hash tables, [Log-Structured Storage](/en/ch4#sec_storage_log_structured) - Hazelcast (in-memory data grid) - FencedLock, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - Flake ID Generator, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - HBase (database) - bug due to lack of fencing, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing) - key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - regions (sharding), [Sharding](/en/ch7#ch_sharding) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - size-tiered compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression) - HDFS (Hadoop Distributed File System), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - (see also distributed filesystems) - checking data integrity, [Don't just blindly trust what they promise](/en/ch13#id364) - DataNode, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - NameNode, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - use in MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce) - workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - HdrHistogram (numerical library), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - head (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278) - head vertex (property graphs), [Property Graphs](/en/ch3#id56) - head-of-line blocking, [Latency and Response Time](/en/ch2#id23) - heap files (databases), [Storing values within the index](/en/ch4#sec_storage_index_heap) - in multiversion concurrency control, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl) - heat management, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - hedged requests, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - heterogeneous distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa), [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - heuristic decisions (in 2PC), [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure) - Hex (notebook), [Machine Learning](/en/ch11#id290) - hexagons - for geospatial indexing, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - Hibernate (object-relational mapper), [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm) - hierarchical model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - hierarchical navigable small world (vector index), [Vector Embeddings](/en/ch4#id92) - hierarchical queries (see recursive common table expressions) - high availability (see fault tolerance) - high-frequency trading, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - high-performance computing (HPC), [Cloud Computing Versus Supercomputing](/en/ch1#id17) - hinted handoff (leaderless replication), [Catching up on missed writes](/en/ch6#sec_replication_read_repair) - histograms, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - Hive (data warehouse), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges) - HNSW (vector index), [Vector Embeddings](/en/ch4#id92) - hopping windows (stream processing), [Types of windows](/en/ch12#id324) - (see also windows) - Hoptimator (query engine), [The meta-database of everything](/en/ch13#id341) - Horizon scandal, [Humans and Reliability](/en/ch2#id31) - lack of transactions, [Transactions](/en/ch8#ch_transactions) - horizontal scaling (see scaling out) - by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - HornetQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - hot keys, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - hot spots, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - due to celebrities, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - for time-series data, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - relieving, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - hot standbys (see leader-based replication) - HTAP (see hybrid transactional/analytic processing) - HTTP, use in APIs (see services) - human errors, [Humans and Reliability](/en/ch2#id31), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults), [Batch Processing](/en/ch11#ch_batch) - hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks) - hybrid transactional/analytic processing, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Data Storage for Analytics](/en/ch4#sec_storage_analytics) - hydrating IDs (join), [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study) - hypergraph, [Property Graphs](/en/ch3#id56) - HyperLogLog (algorithm), [Stream analytics](/en/ch12#id318) ### I - I/O operations, waiting for, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - IaaS (see infrastructure as a service (IaaS)) - IBM - Db2 (database) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - serializable isolation, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - MQ (messaging), [Message brokers compared to databases](/en/ch12#id297) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - System R (database), [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview) - WebSphere (messaging), [Message brokers](/en/ch5#message-brokers) - Iceberg (table format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - databases on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - log-based message broker storage, [Disk space usage](/en/ch12#sec_stream_disk_usage) - idempotence, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc), [Idempotence](/en/ch12#sec_stream_idempotence), [Glossary](/en/glossary) - by giving operations unique IDs, [Multi-shard request processing](/en/ch13#id360) - by giving requests unique IDs, [Uniquely identifying requests](/en/ch13#id355) - for exactly-once semantics, [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - idempotent operations, [Exactly-once execution of an operation](/en/ch13#id353) - in workflow engines, [Durable execution](/en/ch5#durable-execution) - immutability - advantages of, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Designing for auditability](/en/ch13#id365) - and right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage) - crypto-shredding for deletion, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - deriving state from event log, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - for crash recovery, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - in B-trees, [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - limitations of, [Concurrency control](/en/ch12#sec_stream_concurrency) - impedance mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document) - in doubt (transaction status), [Coordinator failure](/en/ch8#coordinator-failure) - holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt) - orphaned transactions, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure) - in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - durability, [Durability](/en/ch8#durability) - serial transaction execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - incidents - accounting software bugs leading to wrongful convictions, [Humans and Reliability](/en/ch2#id31) - blameless postmortems, [Humans and Reliability](/en/ch2#id31) - crashes due to leap seconds, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - data corruption and financial losses due to concurrency bugs, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - data corruption on hard disks, [Durability](/en/ch8#durability) - data loss due to last-write-wins, [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - data on disks unreadable, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world) - disclosure of sensitive data due to primary key reuse, [Leader failure: Failover](/en/ch6#leader-failure-failover) - errors in transaction serializability, [Maintaining integrity in the face of software bugs](/en/ch13#id455) - gigabit network interface with 1 Kb/s throughput, [System Model and Reality](/en/ch9#sec_distributed_system_model) - leap second crash, [Software faults](/en/ch2#software-faults) - network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - network interface dropping only inbound packets, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - network partitions and whole-datacenter failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - poor handling of network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - sending message to ex-partner, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - sharks biting undersea cables, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - split brain due to 1-minute packet delay, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - SSD failure after 32,768 hours, [Software faults](/en/ch2#software-faults) - thread contention bringing down a service, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - vibrations in server rack, [Latency and Response Time](/en/ch2#id23) - violation of uniqueness constraint, [Maintaining integrity in the face of software bugs](/en/ch13#id455) - incremental view maintenance (IVM), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - for data integration, [Unbundled versus integrated systems](/en/ch13#id448) - indexes, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [Glossary](/en/glossary) - and snapshot isolation, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants) - clustered, [Storing values within the index](/en/ch4#sec_storage_index_heap) - comparison of B-trees and LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage) - covering (with included columns), [Storing values within the index](/en/ch4#sec_storage_index_heap) - creating, [Creating an index](/en/ch13#id340) - full-text search, [Full-Text Search](/en/ch4#sec_storage_full_text) - geospatial, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - index-range locking, [Index-range locks](/en/ch8#sec_transactions_2pl_range) - multi-column (concatenated), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - secondary, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn) - (see also secondary indexes) - problems with dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Reasoning about dataflows](/en/ch13#id443) - sharding and secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary) - sparse, [The SSTable file format](/en/ch4#the-sstable-file-format) - SSTables and LSM-trees, [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - updating when data changes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - Industrial Revolution, [Remembering the Industrial Revolution](/en/ch14#id377) - InfiniBand (networks), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - InfluxDB IOx (storage engine), [Column-Oriented Storage](/en/ch4#sec_storage_column) - information retrieval (see full-text search) - infrastructure as a service (IaaS), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Layering of cloud services](/en/ch1#layering-of-cloud-services) - InnoDB (storage engine) - clustered index on primary key, [Storing values within the index](/en/ch4#sec_storage_index_heap) - not preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates) - preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - serializable isolation, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - instance (cloud computing), [Layering of cloud services](/en/ch1#layering-of-cloud-services) - integrating different data systems (see data integration) - integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - coordination-avoiding data systems, [Coordination-avoiding data systems](/en/ch13#id454) - correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453) - in consensus formalization, [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - integrity checks, [Don't just blindly trust what they promise](/en/ch13#id364) - (see also auditing) - end-to-end, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [The end-to-end argument again](/en/ch13#id456) - use of snapshot isolation, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - maintaining despite software bugs, [Maintaining integrity in the face of software bugs](/en/ch13#id455) - Interface Definition Language (IDL), [Protocol Buffers](/en/ch5#sec_encoding_protobuf), [Avro](/en/ch5#sec_encoding_avro), [Web services](/en/ch5#sec_web_services) - invariants, [Consistency](/en/ch8#sec_transactions_acid_consistency) - (see also constraints) - inverted file index (vector index), [Vector Embeddings](/en/ch4#id92) - inverted index, [Full-Text Search](/en/ch4#sec_storage_full_text) - irreversibility, minimizing, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Batch Processing](/en/ch11#ch_batch) - ISDN (Integrated Services Digital Network), [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - isolation (in operating systems) - cgroups (see cgroups) - isolation (in transactions), [Isolation](/en/ch8#sec_transactions_acid_isolation), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [Glossary](/en/glossary) - correctness and, [Aiming for Correctness](/en/ch13#sec_future_correctness) - for single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object) - serializability, [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - actual serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution) - serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range) - violating, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - weak isolation levels, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)-[Materializing conflicts](/en/ch8#materializing-conflicts) - preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - read committed, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - snapshot isolation, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - IVF (vector index), [Vector Embeddings](/en/ch4#id92) ### J - Java Database Connectivity (JDBC) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - network drivers, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - Java Enterprise Edition (EE), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions) - Java Message Service (JMS), [Message brokers compared to databases](/en/ch12#id297) - (see also messaging systems) - comparison to log-based messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - message ordering, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - Java Transaction API (JTA), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions) - Java Virtual Machine (JVM) - garbage collection, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses), [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - JIT compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - process reuse in batch processors, [Dataflow Engines](/en/ch11#sec_batch_dataflow) - Jena (RDF framework), [The RDF data model](/en/ch3#the-rdf-data-model) - SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language) - Jepsen (fault tolerance testing), [Fault injection](/en/ch9#sec_fault_injection), [Aiming for Correctness](/en/ch13#sec_future_correctness) - jitter (network delay), [Average, Median, and Percentiles](/en/ch2#id24), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - JMESPath (query language), [Query languages](/en/ch11#sec_batch_query_lanauges) - join table, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Property Graphs](/en/ch3#id56) - joins, [Glossary](/en/glossary) - expressing as relational operators, [Query languages](/en/ch11#sec_batch_query_lanauges) - handling GraphQL query, [GraphQL](/en/ch3#id63) - in application code, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study) - in DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - in relational and document databases, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization) - secondary indexes and, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn) - sort-merge joins, [JOIN and GROUP BY](/en/ch11#sec_batch_join) - stream joins, [Stream Joins](/en/ch12#sec_stream_joins)-[Time-dependence of joins](/en/ch12#sec_stream_join_time) - stream-stream join, [Stream-stream join (window join)](/en/ch12#id440) - stream-table join, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - table-table join, [Table-table join (materialized view maintenance)](/en/ch12#id326) - time-dependence of, [Time-dependence of joins](/en/ch12#sec_stream_join_time) - support in document databases, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - JOTM (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - journaling (filesystems), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - JSON - aggregation pipeline (query language), [Query languages for documents](/en/ch3#query-languages-for-documents) - Avro schema representation, [Avro](/en/ch5#sec_encoding_avro) - binary variants, [Binary encoding](/en/ch5#binary-encoding) - data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - for application data, issues with, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - GraphQL response, [GraphQL](/en/ch3#id63) - in relational databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - representing a résumé (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - Schema, [JSON Schema](/en/ch5#json-schema) - JSON-LD, [Triple-Stores and SPARQL](/en/ch3#id59) - JsonPath (query language), [Query languages](/en/ch11#sec_batch_query_lanauges) - JuiceFS (distributed filesystem), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277) - Jupyter (notebook), [Machine Learning](/en/ch11#id290) - just-in-time (JIT) compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) ### K - Kafka (messaging), [Message brokers](/en/ch5#message-brokers), [Using logs for message storage](/en/ch12#id300) - consumer groups, [Multiple consumers](/en/ch12#id298) - for data integration, [Unbundled versus integrated systems](/en/ch13#id448) - for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - Kafka Connect (database integration), [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - Kafka Streams (stream processor), [Stream analytics](/en/ch12#id318), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - exactly-once semantics, [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - fault tolerance, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - ksqlDB (stream database), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - message offsets, [Using logs for message storage](/en/ch12#id300), [Idempotence](/en/ch12#sec_stream_idempotence) - partitions (sharding), [Sharding](/en/ch7#ch_sharding) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - schema registry, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - tiered storage, [Disk space usage](/en/ch12#sec_stream_disk_usage) - transactions, [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) - unclean leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - kappa architecture, [Unifying batch and stream processing](/en/ch13#id338) - key-value stores, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp) - comparison to object stores, [Object Stores](/en/ch11#id277) - in-memory, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - LSM storage, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)-[Disk space usage](/en/ch4#disk-space-usage) - sharding, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)-[Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - by hash of key, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Summary](/en/ch7#summary) - by key range, [Sharding by Key Range](/en/ch7#sec_sharding_key_range), [Summary](/en/ch7#summary) - skew and hot spots, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - Kinesis (messaging), [Message brokers](/en/ch5#message-brokers), [Using logs for message storage](/en/ch12#id300) - data warehouse integration, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - Kryo (Java), [Language-Specific Formats](/en/ch5#id96) - ksqlDB (stream database), [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - Kubernetes (cluster manager), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Distributed Job Orchestration](/en/ch11#id278), [Separation of application code and state](/en/ch13#id344) - Kubeflow, [Machine Learning](/en/ch11#id290) - kubelet, [Distributed Job Orchestration](/en/ch11#id278) - operators, [Distributed Job Orchestration](/en/ch11#id278) - use of etcd, [Request Routing](/en/ch7#sec_sharding_routing), [Coordination Services](/en/ch10#sec_consistency_coordination) - KùzuDB (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - as embedded storage engine, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - Cypher query language, [The Cypher Query Language](/en/ch3#id57) ### L - labeled property graphs (see property graphs) - lambda architecture, [Unifying batch and stream processing](/en/ch13#id338) - Lamport timestamps, [Lamport timestamps](/en/ch10#lamport-timestamps) - Lance (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column) - (see also column-oriented storage) - large language models (LLMs) - pre-processing training data, [Machine Learning](/en/ch11#id290) - last write wins (LWW), [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww), [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - problems with, [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - prone to lost updates, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - latency, [Latency and Response Time](/en/ch2#id23) - (see also response time) - across regions, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - instability under two-phase locking, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking) - network latency and resource utilization, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - reducing by request hedging, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - response time versus, [Latency and Response Time](/en/ch2#id23) - tail latency, [Average, Median, and Percentiles](/en/ch2#id24), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Local Secondary Indexes](/en/ch7#id166) - law (see legal matters) - layering (of cloud services), [Layering of cloud services](/en/ch1#layering-of-cloud-services) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - (see also replication) - failover, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing) - handling node outages, [Handling Node Outages](/en/ch6#sec_replication_failover) - implementation of replication logs - change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api) - (see also changelogs) - statement-based, [Statement-based replication](/en/ch6#statement-based-replication) - write-ahead log (WAL) shipping, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping) - linearizability of operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - locking and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election) - log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Consumer offsets](/en/ch12#sec_stream_log_offsets) - read-scaling architecture, [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - relation to consensus, [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus), [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - setting up new followers, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - synchronous versus asynchronous, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)-[Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async) - leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless)-[Version vectors](/en/ch6#version-vectors) - (see also replication) - catching up on missed writes, [Catching up on missed writes](/en/ch6#sec_replication_read_repair) - detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors) - version vectors, [Version vectors](/en/ch6#version-vectors) - multi-region, [Multi-region operation](/en/ch6#multi-region-operation) - quorums, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition)-[Multi-region operation](/en/ch6#multi-region-operation) - consistency limitations, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - leap seconds, [Software faults](/en/ch2#software-faults), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - in time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks) - leases, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - implementation with coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination) - need for fencing, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing) - relation to consensus, [Single-value consensus](/en/ch10#single-value-consensus) - ledgers (accounting), [Summary](/en/ch3#summary) - immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - legacy systems, maintenance of, [Maintainability](/en/ch2#sec_introduction_maintainability) - legal matters, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)-[Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - data deletion, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage) - data residence, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - privacy regulation, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - legitimate interest (GDPR), [Consent and Freedom of Choice](/en/ch14#id375) - leveled compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage) - Levenshtein automata, [Full-Text Search](/en/ch4#sec_storage_full_text) - limping (partial failure), [System Model and Reality](/en/ch9#sec_distributed_system_model) - Linear (project management software), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - linear algebra, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - linear scalability, [Describing Load](/en/ch2#id33) - linearizability, [Solutions for Replication Lag](/en/ch6#id131), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays), [Glossary](/en/glossary) - and consensus, [Consensus](/en/ch10#sec_consistency_consensus) - cost of, [The Cost of Linearizability](/en/ch10#sec_linearizability_cost)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem) - memory on multi-core CPUs, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - definition, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)-[What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - ID generation, [Linearizable ID Generators](/en/ch10#sec_consistency_linearizable_id) - in coordination services, [Coordination Services](/en/ch10#sec_consistency_coordination) - of derived data systems - avoiding coordination, [Coordination-avoiding data systems](/en/ch13#id454) - of different replication methods, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)-[Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - using quorums, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - reads in consensus systems, [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - relying on, [Relying on Linearizability](/en/ch10#sec_consistency_linearizability_usage)-[Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - constraints and uniqueness, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness) - cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - locking and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election) - versus serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - linked data, [Triple-Stores and SPARQL](/en/ch3#id59) - LinkedIn - Espresso (database), [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - LIquid (database), [Datalog: Recursive Relational Queries](/en/ch3#id62) - profile (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - Linux, leap second bug, [Software faults](/en/ch2#software-faults), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - Litestream (backup tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - liveness properties, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness) - LLVM (compiler), [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - LMDB (storage engine), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - load - coping with, [Principles for Scalability](/en/ch2#id35) - describing, [Describing Load](/en/ch2#id33) - load balancing, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - in hardware, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - in software, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - using message brokers, [Multiple consumers](/en/ch12#id298) - load shedding, [Describing Performance](/en/ch2#sec_introduction_percentiles) - local secondary indexes, [Local Secondary Indexes](/en/ch7#id166), [Summary](/en/ch7#summary) - local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - locality (data access), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Glossary](/en/glossary) - in batch processing, [Dataflow Engines](/en/ch11#sec_batch_dataflow) - in stateful clients, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Stateful, offline-capable clients](/en/ch13#id347) - in stream processing, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance), [Stream processors and services](/en/ch13#id345), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - location transparency, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - in the actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks) - lock-in, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs) - locks, [Glossary](/en/glossary) - deadlock, [Explicit locking](/en/ch8#explicit-locking), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - distributed locking, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas), [Locking and leader election](/en/ch10#locking-and-leader-election) - fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - implementation with coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination) - relation to consensus, [Single-value consensus](/en/ch10#single-value-consensus) - for transaction isolation - in snapshot isolation, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl) - in two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range) - making operations atomic, [Atomic write operations](/en/ch8#atomic-write-operations) - performance, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking) - preventing dirty writes, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - preventing phantoms with index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range), [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - read locks (shared mode), [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - shared mode and exclusive mode, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - in distributed transactions - deadlock detection, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - in-doubt transactions holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt) - materializing conflicts with, [Materializing conflicts](/en/ch8#materializing-conflicts) - preventing lost updates by explicit locking, [Explicit locking](/en/ch8#explicit-locking) - log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Consumer offsets](/en/ch12#sec_stream_log_offsets) - logical clocks, [Timestamps for ordering events](/en/ch9#sec_distributed_lww), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)-[Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - for last-write-wins, [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww) - for read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks) - insufficiency for enforcing constraints, [Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks) - Lamport timestamps, [Lamport timestamps](/en/ch10#lamport-timestamps) - logical replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - for change data capture, [Implementing change data capture](/en/ch12#id307) - LogicBlox (database), [Datalog: Recursive Relational Queries](/en/ch3#id62) - logs (data structure), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs), [Glossary](/en/glossary) - (see also shared logs) - advantages of immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros) - and right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage) - compaction, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Log compaction](/en/ch12#sec_stream_log_compaction), [State, Streams, and Immutability](/en/ch12#sec_stream_immutability) - for stream operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - implementing uniqueness constraints, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - log-based messaging, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay) - comparison to traditional messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay) - consumer offsets, [Consumer offsets](/en/ch12#sec_stream_log_offsets) - disk space usage, [Disk space usage](/en/ch12#sec_stream_disk_usage) - replaying old messages, [Replaying old messages](/en/ch12#sec_stream_replay), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338) - slow consumers, [When consumers cannot keep up with producers](/en/ch12#id459) - using logs for message storage, [Using logs for message storage](/en/ch12#id300) - log-structured storage, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - log-structured merge tree (see LSM-trees) - relation to consensus, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs) - replication, [Single-Leader Replication](/en/ch6#sec_replication_leader), [Implementation of Replication Logs](/en/ch6#sec_replication_implementation)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api) - (see also changelogs) - coordination with snapshot, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - logical (row-based) replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) - write-ahead log (WAL) shipping, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping) - scalability limits, [The limits of total ordering](/en/ch13#id335) - Looker (business intelligence software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap) - loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor) - lost updates (see updates) - Lotus Notes (sync engine), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - LSM-trees (indexes), [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - comparison to B-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage) - Lucene (storage engine), [Full-Text Search](/en/ch4#sec_storage_full_text) - similarity search, [Full-Text Search](/en/ch4#sec_storage_full_text) - LWW (see last write wins) ### M - machine learning - batch inference, [Machine Learning](/en/ch11#id290) - data preparation with DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - deleting training data, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - deploying data products, [Beyond the data lake](/en/ch1#beyond-the-data-lake) - ethical considerations, [Predictive Analytics](/en/ch14#id369) - (see also ethics) - feature engineering, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Machine Learning](/en/ch11#id290) - in analytics systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - iterative processing, [Machine Learning](/en/ch11#id290) - LLMs (see large language models (LLMs)) - models derived from training data, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation) - relation to batch processing, [Machine Learning](/en/ch11#id290)-[Machine Learning](/en/ch11#id290) - using a data lake, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - using GPUs, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - using matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - madsim (deterministic simulation testing), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - magic scaling sauce, [Principles for Scalability](/en/ch2#id35) - maintainability, [Maintainability](/en/ch2#sec_introduction_maintainability)-[Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy) - evolvability (see evolvability) - operability, [Operability: Making Life Easy for Operations](/en/ch2#id37) - simplicity and managing complexity, [Simplicity: Managing Complexity](/en/ch2#id38) - many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many) - modeling as graphs, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - many-to-one relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many) - in star schema, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - MapReduce (batch processing), [Batch Processing](/en/ch11#ch_batch), [MapReduce](/en/ch11#sec_batch_mapreduce)-[MapReduce](/en/ch11#sec_batch_mapreduce) - analysis of user activity events (example), [JOIN and GROUP BY](/en/ch11#sec_batch_join) - comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing) - disadvantages and limitations of, [MapReduce](/en/ch11#sec_batch_mapreduce) - fault tolerance, [Handling Faults](/en/ch11#id281) - higher-level tools, [Query languages](/en/ch11#sec_batch_query_lanauges) - mapper and reducer functions, [MapReduce](/en/ch11#sec_batch_mapreduce) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - sort-merge joins, [JOIN and GROUP BY](/en/ch11#sec_batch_join) - workflows, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - (see also workflow engines) - marshalling (see encoding) - MartenDB (database), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - master-slave replication (obsolete term), [Single-Leader Replication](/en/ch6#sec_replication_leader) - materialization, [Glossary](/en/glossary) - aggregate values, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts) - materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - (see also incremental view maintenance (IVM)) - maintaining, using stream processing, [Maintaining materialized views](/en/ch12#sec_stream_mat_view), [Table-table join (materialized view maintenance)](/en/ch12#id326) - social network timeline example, [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing) - Materialize (database), [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - sparse, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - Maxwell (change data capture), [Implementing change data capture](/en/ch12#id307) - mean, [Average, Median, and Percentiles](/en/ch2#id24) - media monitoring, [Search on streams](/en/ch12#id320) - median, [Average, Median, and Percentiles](/en/ch2#id24) - meeting room booking (example), [More examples of write skew](/en/ch8#more-examples-of-write-skew), [Predicate locks](/en/ch8#predicate-locks), [Enforcing Constraints](/en/ch13#sec_future_constraints) - Memcached (caching server), [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - Memgraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - Cypher query language, [The Cypher Query Language](/en/ch3#id57) - memory - barrier (CPU instruction), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - corruption, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - durability, [Durability](/en/ch8#durability) - serial transaction execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - in-memory representation of data, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - memtable (in LSM-trees), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - random bit-flips in, [Trust, but Verify](/en/ch13#sec_future_verification) - use by indexes, [Log-Structured Storage](/en/ch4#sec_storage_log_structured) - memtable (in LSM-trees), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - Mercurial (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency) - merge (DataFrame operator), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - merging sorted files, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Shuffling Data](/en/ch11#sec_shuffle) - Merkle trees, [Tools for auditable data systems](/en/ch13#id366) - Mesos (cluster manager), [Separation of application code and state](/en/ch13#id344) - message brokers (see messaging systems) - message-passing (see event-driven architecture) - MessagePack (encoding format), [Binary encoding](/en/ch5#binary-encoding) - messaging systems, [Stream Processing](/en/ch12#ch_stream)-[Replaying old messages](/en/ch12#sec_stream_replay) - (see also streams) - backpressure, buffering, or dropping messages, [Messaging Systems](/en/ch12#sec_stream_messaging) - brokerless messaging, [Direct messaging from producers to consumers](/en/ch12#id296) - event logs, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay) - as data model, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - comparison to traditional messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay) - consumer offsets, [Consumer offsets](/en/ch12#sec_stream_log_offsets) - replaying old messages, [Replaying old messages](/en/ch12#sec_stream_replay), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338) - slow consumers, [When consumers cannot keep up with producers](/en/ch12#id459) - exactly-once semantics, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance) - message brokers, [Message brokers](/en/ch12#id433)-[Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - acknowledgements and redelivery, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - comparison to event logs, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay) - multiple consumers of same topic, [Multiple consumers](/en/ch12#id298) - versus RPC, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg) - message loss, [Messaging Systems](/en/ch12#sec_stream_messaging) - reliability, [Messaging Systems](/en/ch12#sec_stream_messaging) - uniqueness in log-based messaging, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - metastable failure, [Describing Performance](/en/ch2#sec_introduction_percentiles) - metered billing - serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - storage, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - microbatching, [Microbatching and checkpointing](/en/ch12#id329) - microservices, [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - (see also services) - causal dependencies across services, [The limits of total ordering](/en/ch13#id335) - loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor) - relation to batch/stream processors, [Batch Processing](/en/ch11#ch_batch), [Stream processors and services](/en/ch13#id345) - Microsoft - Azure Blob Storage (see Azure Blob Storage) - Azure managed disks, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - Azure Service Bus (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297) - Azure SQL DB (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - Azure Storage, [Object Stores](/en/ch11#id277) - Azure Stream Analytics, [Stream analytics](/en/ch12#id318) - Azure Synapse Analytics (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - DCOM (Distributed Component Object Model), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - MSDTC (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - SQL Server (see SQL Server) - Microsoft Power BI (see Power BI (business intelligence software)) - migrating (rewriting) data, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Different values written at different times](/en/ch5#different-values-written-at-different-times), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - MinIO (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - mobile apps, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - embedded databases, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - model checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - modulus operator (%), [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes) - Mojo (programming language) - memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - MongoDB (database) - aggregation pipeline, [Query languages for documents](/en/ch3#query-languages-for-documents) - atomic operations, [Atomic write operations](/en/ch8#atomic-write-operations) - BSON, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range) - in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - joins (\$lookup operator), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization) - JSON Schema validation, [JSON Schema](/en/ch5#json-schema) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - ObjectIds, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - range-based sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - shard splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data) - stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - monitoring, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations), [Humans and Reliability](/en/ch2#id31), [Operability: Making Life Easy for Operations](/en/ch2#id37) - monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks) - monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads) - Morel (query language), [Query languages](/en/ch11#sec_batch_query_lanauges) - MSMQ (messaging), [XA transactions](/en/ch8#xa-transactions) - multi-column indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - multi-leader replication, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)-[Types of conflict](/en/ch6#sec_replication_write_conflicts) - (see also replication) - collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - conflict detection, [Types of conflict](/en/ch6#sec_replication_write_conflicts) - conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts) - for multi-region replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost) - linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - offline-capable clients, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients) - replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)-[Problems with different topologies](/en/ch6#problems-with-different-topologies) - multi-object transactions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - Multi-Paxos (consensus algorithm), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - multi-reader single-writer lock, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - multi-table index cluster tables (Oracle), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Summary](/en/ch8#summary) - detecting stale MVCC reads, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads) - indexes and snapshot isolation, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - using synchronized clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - multidimensional arrays, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - multitenancy, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - by sharding, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - using embedded databases, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - versus Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - mutual exclusion, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - (see also locks) - MySQL (database) - archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - binlog coordinates, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api) - circular replication topology, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies) - consistent snapshots, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - global transaction identifiers (GTIDs), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - InnoDB storage engine (see InnoDB) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - row-based replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - sharding (see Vitess (database)) - snapshot isolation support, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - (see also InnoDB) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) ### N - N+1 query problem, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm) - nanomsg (messaging library), [Direct messaging from producers to consumers](/en/ch12#id296) - Narayana (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) - NATS (messaging), [Message brokers](/en/ch5#message-brokers) - natural language processing, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - Neo4j (database) - Cypher query language, [The Cypher Query Language](/en/ch3#id57) - graph data model, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - Neon (database), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - Nephele (dataflow engine), [Dataflow Engines](/en/ch11#sec_batch_dataflow) - Neptune (graph database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - Cypher query language, [The Cypher Query Language](/en/ch3#id57) - SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language) - netcode (game development), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - Network Attached Storage (NAS), [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - network model (data representation), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - Network Time Protocol (see NTP) - networks - congestion and queueing, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - datacenter network topologies, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - faults (see faults) - linearizability and network delays, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - network partitions, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - in CAP theorem, [The Cost of Linearizability](/en/ch10#sec_linearizability_cost) - timeouts and unbounded delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing) - NewSQL, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Solutions for Replication Lag](/en/ch6#id131) - transactions and, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - next-key locking, [Index-range locks](/en/ch8#sec_transactions_2pl_range) - NFS (network file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - on object storage, [Object Stores](/en/ch11#id277) - Nimble (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column) - (see also column-oriented storage) - node (in graphs) (see vertices) - nodes (processes), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Glossary](/en/glossary) - handling outages in leader-based replication, [Handling Node Outages](/en/ch6#sec_replication_failover) - system models for failure, [System Model and Reality](/en/ch9#sec_distributed_system_model) - noisy neighbors, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - nonblocking atomic commit, [Three-phase commit](/en/ch8#three-phase-commit) - nondeterministic operations, [Statement-based replication](/en/ch6#statement-based-replication) - (see also deterministic operations) - in distributed systems, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - in workflow engines, [Durable execution](/en/ch5#durable-execution) - partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - sources of nondeterminism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - nonfunctional requirements, [Defining Nonfunctional Requirements](/en/ch2#ch_nonfunctional), [Summary](/en/ch2#summary) - nonrepeatable reads, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - (see also read skew) - normalization (data representation), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)-[Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Glossary](/en/glossary) - foreign key references, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - in social network case study, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study) - in systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - versus denormalization, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - NoSQL, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Solutions for Replication Lag](/en/ch6#id131), [Unbundling Databases](/en/ch13#sec_future_unbundling) - transactions and, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview) - Notation3 (N3), [Triple-Stores and SPARQL](/en/ch3#id59) - NTP (Network Time Protocol), [Unreliable Clocks](/en/ch9#sec_distributed_clocks) - accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - adjustments to monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks) - multiple server addresses, [Weak forms of lying](/en/ch9#weak-forms-of-lying) - numbers, in XML and JSON encodings, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - NumPy (Python library), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [Column-Oriented Storage](/en/ch4#sec_storage_column) - NVMe (Non-Volatile Memory Express) (see solid state drives (SSDs)) ### O - object databases, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - object storage, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Object Stores](/en/ch11#id277)-[Object Stores](/en/ch11#id277) - Azure Blob Storage (see Azure Blob Storage) - comparison to distributed filesystems, [Object Stores](/en/ch11#id277) - comparison to key-value stores, [Object Stores](/en/ch11#id277) - databases backed by, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - for backups, [Replication](/en/ch6#ch_replication) - for cloud data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - for database replication, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - Google Cloud Storage (see Google Cloud Storage) - object size, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - S3 (see S3 (object storage)) - storing LSM segment files, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - support for fencing, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - use in data lakes, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - object-relational mapping (ORM) frameworks, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm) - error handling and aborted transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - unsafe read-modify-write cycle code, [Atomic write operations](/en/ch8#atomic-write-operations) - object-relational mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document) - observability, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Humans and Reliability](/en/ch2#id31), [Operability: Making Life Easy for Operations](/en/ch2#id37) - observer pattern, [Separation of application code and state](/en/ch13#id344) - OBT (one big table), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - offline systems, [Batch Processing](/en/ch11#ch_batch) - (see also batch processing) - offline-first applications, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [Stateful, offline-capable clients](/en/ch13#id347) - offsets - consumer offsets in sharded logs, [Consumer offsets](/en/ch12#sec_stream_log_offsets) - messages in sharded logs, [Using logs for message storage](/en/ch12#id300) - OLAP (online analytic processing), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Glossary](/en/glossary) - data cubes, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views) - OLTP (online transaction processing), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Glossary](/en/glossary) - analytics queries versus, [Analytics](/en/ch11#sec_batch_olap) - data normalization, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization) - workload characteristics, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - on-premises deployment, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud) - data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - one big table (data warehouse schema), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - one-hot encoding, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - one-to-few relationships, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - one-to-many relationships, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - JSON representation, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - online systems, [Batch Processing](/en/ch11#ch_batch) - (see also services) - versus scientific computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - ontologies, [Triple-Stores and SPARQL](/en/ch3#id59) - Oozie (workflow scheduler), [Batch Processing](/en/ch11#ch_batch) - OpenAPI (service definition format), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Web services](/en/ch5#sec_web_services), [Web services](/en/ch5#sec_web_services) - use of JSON Schema, [JSON Schema](/en/ch5#json-schema) - openCypher (see Cypher (query language)) - OpenLink Virtuoso (see Virtuoso (database)) - OpenStack - Swift (object storage), [Object Stores](/en/ch11#id277) - operability, [Operability: Making Life Easy for Operations](/en/ch2#id37) - operating systems versus databases, [Unbundling Databases](/en/ch13#sec_future_unbundling) - operational systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - (see also OLTP) - as systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - ETL into analytical systems, [Data Warehousing](/en/ch1#sec_introduction_dwh) - operational transformation, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - operations teams, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - operators (query execution), [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - in stream processing, [Processing Streams](/en/ch12#sec_stream_processing) - optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - optimistic locking, [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set) - Oracle (database) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - GoldenGate (change data capture), [Implementing change data capture](/en/ch12#id307) - hierarchical queries, [Graph Queries in SQL](/en/ch3#id58), [Graph Queries in SQL](/en/ch3#id58) - lack of serializability, [Isolation](/en/ch8#sec_transactions_acid_isolation) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - multi-table index cluster tables, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - not preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew) - PL/SQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates) - read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - Real Application Clusters (RAC), [Locking and leader election](/en/ch10#locking-and-leader-election) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - TimesTen (in-memory database), [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - WAL-based replication, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping) - ORC (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column) - (see also column-oriented storage) - orchestration (service deployment), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - batch job execution, [Distributed Job Orchestration](/en/ch11#id278)-[Distributed Job Orchestration](/en/ch11#id278) - workflow engines, [Batch Processing](/en/ch11#ch_batch) - ordering - event logs, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - limits of total ordering, [The limits of total ordering](/en/ch13#id335) - logical timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps) - of auto-incrementing IDs, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - shared logs, [Consensus in Practice](/en/ch10#sec_consistency_total_order)-[Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - Orkes (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - orphan pages (B-trees), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - outbox pattern, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - outliers (response time), [Average, Median, and Percentiles](/en/ch2#id24) - outsourcing, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud) - overload, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) ### P - PACELC principle, [The CAP theorem](/en/ch10#the-cap-theorem) - package managers, [Separation of application code and state](/en/ch13#id344) - packet switching, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - packets - corruption of, [Weak forms of lying](/en/ch9#weak-forms-of-lying) - sending via UDP, [Direct messaging from producers to consumers](/en/ch12#id296) - PageRank (algorithm), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph), [Query languages](/en/ch11#sec_batch_query_lanauges), [Machine Learning](/en/ch11#id290) - paging (see virtual memory) - pandas (Python library), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [Column-Oriented Storage](/en/ch4#sec_storage_column), [DataFrames](/en/ch11#id287) - Parquet (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage), [Query languages](/en/ch11#sec_batch_query_lanauges) - (see also column-oriented storage) - databases on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - document data model, [Column-Oriented Storage](/en/ch4#sec_storage_column) - use in batch processing, [MapReduce](/en/ch11#sec_batch_mapreduce) - partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Summary](/en/ch9#summary) - limping, [System Model and Reality](/en/ch9#sec_distributed_system_model) - partial synchrony (system model), [System Model and Reality](/en/ch9#sec_distributed_system_model) - partition key, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons), [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - partitioning (see sharding) - Paxos (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - ballot number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - Multi-Paxos, [Consensus in Practice](/en/ch10#sec_consistency_total_order) - payment card industry (PCI), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - PCI (payment card industry) compliance, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - percentiles, [Average, Median, and Percentiles](/en/ch2#id24), [Glossary](/en/glossary) - calculating efficiently, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - importance of high percentiles, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - use in service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - Percolator (Google), [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator) - Percona XtraBackup (MySQL tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - performance - degradation as fault, [System Model and Reality](/en/ch9#sec_distributed_system_model) - describing, [Describing Performance](/en/ch2#sec_introduction_percentiles) - of distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa) - of in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - of linearizability, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - of multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - permission isolation, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - perpetual inconsistency, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - pessimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - pglogical (PostgreSQL extension), [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - pgvector (vector index), [Vector Embeddings](/en/ch4#id92) - phantoms (transaction isolation), [Phantoms causing write skew](/en/ch8#sec_transactions_phantom) - materializing conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts) - preventing, in serializability, [Predicate locks](/en/ch8#predicate-locks) - physical clocks (see clocks) - pickle (Python), [Language-Specific Formats](/en/ch5#id96) - Pinot (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Column-Oriented Storage](/en/ch4#sec_storage_column) - handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - pre-aggregation, [Analytics](/en/ch11#sec_batch_olap) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived), [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - pipelined execution - in data warehouse queries, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - pivot table, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - point in time, [Unreliable Clocks](/en/ch9#sec_distributed_clocks) - point query, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - Polaris (data catalog), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - polling, [Representing Users, Posts, and Follows](/en/ch2#id20) - polystores, [The meta-database of everything](/en/ch13#id341) - POSIX (portable operating system interface) - compliant filesystems, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277) - Post Office Horizon scandal, [Humans and Reliability](/en/ch2#id31) - lack of transactions, [Transactions](/en/ch8#ch_transactions) - PostgreSQL (database) - archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - foreign data wrappers, [The meta-database of everything](/en/ch13#id341) - full text search support, [Combining Specialized Tools by Deriving Data](/en/ch13#id442) - in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - JSON Schema validation, [JSON Schema](/en/ch5#json-schema) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - logical decoding, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - materialized view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - MVCC implementation, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - partitioning vs. sharding, [Sharding](/en/ch7#ch_sharding) - pgvector (vector index), [Vector Embeddings](/en/ch4#id92) - PL/pgSQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - PostGIS geospatial indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates) - preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi) - read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - representing graphs, [Property Graphs](/en/ch3#id56) - serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi) - sharding (see Citus (database)) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - WAL-based replication, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping) - postings list, [Full-Text Search](/en/ch4#sec_storage_full_text) - in sharded indexes, [Local Secondary Indexes](/en/ch7#id166) - postmortems, blameless, [Humans and Reliability](/en/ch2#id31) - PouchDB (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - Power BI (business intelligence software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap) - pre-aggregation, [Analytics](/en/ch11#sec_batch_olap) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - pre-splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data) - Precision Time Protocol (PTP), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - predicate locks, [Predicate locks](/en/ch8#predicate-locks) - predictive analytics, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [Predictive Analytics](/en/ch14#id369)-[Feedback Loops](/en/ch14#id372) - amplifying bias, [Bias and Discrimination](/en/ch14#id370) - ethics of (see ethics) - feedback loops, [Feedback Loops](/en/ch14#id372) - preemption, [Resource Allocation](/en/ch11#id279) - in distributed schedulers, [Handling Faults](/en/ch11#id281) - of threads, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - Prefect (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows) - cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges) - Presto (query engine), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - primary keys, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Glossary](/en/glossary) - auto-incrementing, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - versus partition key, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - primary-backup replication (see leader-based replication) - privacy, [Privacy and Tracking](/en/ch14#id373)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - consent and freedom of choice, [Consent and Freedom of Choice](/en/ch14#id375) - data as assets and power, [Data as Assets and Power](/en/ch14#id376) - deleting data, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - ethical considerations (see ethics) - legislation and self-regulation, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - meaning of, [Privacy and Use of Data](/en/ch14#id457) - regulation, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - surveillance, [Surveillance](/en/ch14#id374) - tracking behavioral data, [Privacy and Tracking](/en/ch14#id373) - probabilistic algorithms, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Stream analytics](/en/ch12#id318) - process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - processing time (of events), [Reasoning About Time](/en/ch12#sec_stream_time) - producers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - product analytics, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column) - programming languages - for stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - projections (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - Prolog (language), [Datalog: Recursive Relational Queries](/en/ch3#id62) - (see also Datalog) - property graphs, [Property Graphs](/en/ch3#id56) - Cypher query language, [The Cypher Query Language](/en/ch3#id57) - Property Graph Query Language (PGQL), [Graph Queries in SQL](/en/ch3#id58) - property-based testing, [Humans and Reliability](/en/ch2#id31), [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal) - Protocol Buffers (data format), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution), [Protocol Buffers](/en/ch5#sec_encoding_protobuf) - field tags and schema evolution, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - provenance of data, [Designing for auditability](/en/ch13#id365) - publish/subscribe model, [Messaging Systems](/en/ch12#sec_stream_messaging) - publishers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - Pulsar (streaming platform), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - PyTorch (machine learning library), [Machine Learning](/en/ch11#id290) ### Q - Qpid (messaging), [Message brokers compared to databases](/en/ch12#id297) - quality of service (QoS), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - Quantcast File System (distributed filesystem), [Object Stores](/en/ch11#id277) - query engines - compilation and vectorization, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - in cloud data warehouse, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - operators, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - optimizing declarative queries, [Data Models and Query Languages](/en/ch3#ch_datamodels) - query languages - Cypher, [The Cypher Query Language](/en/ch3#id57) - Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62) - GraphQL, [GraphQL](/en/ch3#id63) - MongoDB aggregation pipeline, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Query languages for documents](/en/ch3#query-languages-for-documents) - recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58) - SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language) - SQL, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization) - query optimizers, [Query languages](/en/ch11#sec_batch_query_lanauges) - query plans, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - queueing delays, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - head-of-line blocking, [Latency and Response Time](/en/ch2#id23) - latency and response time, [Latency and Response Time](/en/ch2#id23) - queues (messaging), [Message brokers](/en/ch5#message-brokers) - QUIC (protocol), [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - quorums, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition)-[Multi-region operation](/en/ch6#multi-region-operation), [Glossary](/en/glossary) - for leaderless replication, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition) - in consensus algorithms, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - limitations of consistency, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - making decisions in distributed systems, [The Majority Rules](/en/ch9#sec_distributed_majority) - monitoring staleness, [Monitoring staleness](/en/ch6#monitoring-staleness) - multi-region replication, [Multi-region operation](/en/ch6#multi-region-operation) - relying on durability, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world) - quotas, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) ### R - R (language), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287) - R-trees (indexes), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional) - R2 (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - RabbitMQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297) - quorum queues (replication), [Single-Leader Replication](/en/ch6#sec_replication_leader) - race conditions, [Isolation](/en/ch8#sec_transactions_acid_isolation) - (see also concurrency) - avoiding with linearizability, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - caused by dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - causing loss of money, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - dirty writes, [No dirty writes](/en/ch8#sec_transactions_dirty_write) - in counter increments, [No dirty writes](/en/ch8#sec_transactions_dirty_write) - lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - preventing with event logs, [Concurrency control](/en/ch12#sec_stream_concurrency), [Dataflow: Interplay between state changes and application code](/en/ch13#id450) - preventing with serializable isolation, [Serializability](/en/ch8#sec_transactions_serializability) - weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels) - write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts) - Raft (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - sensitivity to network problems, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - term number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - use in etcd, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - RAID (Redundant Array of Independent Disks), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute), [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - railways, schema migration on, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - RAM (see memory) - RAMCloud (in-memory storage), [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - random writes (access pattern), [Sequential versus random writes](/en/ch4#sidebar_sequential) - range queries - in B-trees, [B-Trees](/en/ch4#sec_storage_b_trees), [Read performance](/en/ch4#read-performance) - in LSM-trees, [Read performance](/en/ch4#read-performance) - not efficient in hash maps, [Log-Structured Storage](/en/ch4#sec_storage_log_structured) - with hash sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - ranking algorithms, [Machine Learning](/en/ch11#id290) - Ray (workflow scheduler), [Machine Learning](/en/ch11#id290) - RDF (Resource Description Framework), [The RDF data model](/en/ch3#the-rdf-data-model) - querying with SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language) - RDMA (Remote Direct Memory Access), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Cloud Computing Versus Supercomputing](/en/ch1#id17) - React (user interface library), [End-to-end event streams](/en/ch13#id349) - reactive programming, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - read committed isolation level, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - implementing, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl) - no dirty reads, [No dirty reads](/en/ch8#no-dirty-reads) - no dirty writes, [No dirty writes](/en/ch8#sec_transactions_dirty_write) - read models (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - read path (derived data), [Observing Derived State](/en/ch13#sec_future_observing) - read repair (leaderless replication), [Catching up on missed writes](/en/ch6#sec_replication_read_repair) - for linearizability, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - read replicas (see leader-based replication) - read skew (transaction isolation), [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Summary](/en/ch8#summary) - read uncommitted isolation level, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw), [Timeliness and Integrity](/en/ch13#sec_future_integrity) - cross-device, [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - in derived data systems, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions) - read-modify-write cycle, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update) - read-scaling architecture, [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - versus sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - reads as events, [Reads are events too](/en/ch13#sec_future_read_events) - real-time - analytics (see product analytics) - collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - publish/subscribe dataflow, [End-to-end event streams](/en/ch13#id349) - response time guarantees, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime) - time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks) - Realm (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - rebalancing shards, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)-[Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations), [Glossary](/en/glossary) - (see also sharding) - automatic or manual rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - fixed number of shards per node, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes) - recency guarantee, [Linearizability](/en/ch10#sec_consistency_linearizability) - recommendation engines, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics) - building using DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - iterative processing, [Machine Learning](/en/ch11#id290) - reconfiguration (consensus), [Subtleties of consensus](/en/ch10#subtleties-of-consensus) - records, [MapReduce](/en/ch11#sec_batch_mapreduce) - events in stream processing, [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - recursive queries - in Cypher, [The Cypher Query Language](/en/ch3#id57) - in Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62) - in SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language) - lack of, in GraphQL, [GraphQL](/en/ch3#id63) - SQL common table expressions, [Graph Queries in SQL](/en/ch3#id58) - Red Hat - Apicurio Registry, [JSON Schema](/en/ch5#json-schema) - red-black tree, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - redelivery (messaging), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering) - Redis (database) - atomic operations, [Atomic write operations](/en/ch8#atomic-write-operations) - CRDT support, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts) - durability, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - Lua scripting, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - single-threaded execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - redo log (see write-ahead log) - Redpanda (messaging), [Message brokers](/en/ch5#message-brokers), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - tiered storage, [Disk space usage](/en/ch12#sec_stream_disk_usage) - Redshift (database), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - redundancy - hardware components, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy) - of derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived) - (see also derived data) - Reed--Solomon codes (error correction), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - refactoring, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability) - (see also evolvability) - regions (geographic distribution), [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - (see also datacenters) - consensus across, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - definition, [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - latency, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - linearizable ID generation, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator) - replication across, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)-[Problems with different topologies](/en/ch6#problems-with-different-topologies), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost), [The limits of total ordering](/en/ch13#id335) - leaderless, [Multi-region operation](/en/ch6#multi-region-operation) - multi-leader, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - regions (sharding), [Sharding](/en/ch7#ch_sharding) - register (data structure), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - regulation (see legal matters) - relational data model, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - comparison to document model, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - graph queries in SQL, [Graph Queries in SQL](/en/ch3#id58) - in-memory databases with, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - many-to-one and many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many) - multi-object transactions, need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - object-relational mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document) - representing a reorderable list, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary) - versus document model - convergence of models, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - relational databases - eventual consistency, [Problems with Replication Lag](/en/ch6#sec_replication_lag) - history, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - logical logs, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - philosophy compared to Unix, [Unbundling Databases](/en/ch13#sec_future_unbundling), [The meta-database of everything](/en/ch13#id341) - schema changes, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding), [Different values written at different times](/en/ch5#different-values-written-at-different-times) - sharded secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) - use of B-tree indexes, [B-Trees](/en/ch4#sec_storage_b_trees) - relationships (see edges) - reliability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy) - building a reliable system from unreliable components, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - human errors, [Humans and Reliability](/en/ch2#id31) - importance of, [Humans and Reliability](/en/ch2#id31) - of messaging systems, [Messaging Systems](/en/ch12#sec_stream_messaging) - software faults, [Software faults](/en/ch2#software-faults) - Remote Method Invocation (Java RMI), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - remote procedure calls (RPCs), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - (see also services) - data encoding and evolution, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - issues with, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - using Avro, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - versus message brokers, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg) - renewable energy, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - repeatable reads (transaction isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - replicas, [Single-Leader Replication](/en/ch6#sec_replication_leader) - replication, [Replication](/en/ch6#ch_replication)-[Summary](/en/ch6#summary), [Glossary](/en/glossary) - and durability, [Durability](/en/ch8#durability) - conflict resolution and, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - consistency properties, [Problems with Replication Lag](/en/ch6#sec_replication_lag)-[Solutions for Replication Lag](/en/ch6#id131) - consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix) - monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads) - reading your own writes, [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - in distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - leaderless, [Leaderless Replication](/en/ch6#sec_replication_leaderless)-[Version vectors](/en/ch6#version-vectors) - detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors) - limitations of quorum consistency, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - monitoring staleness, [Monitoring staleness](/en/ch6#monitoring-staleness) - multi-leader, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)-[Types of conflict](/en/ch6#sec_replication_write_conflicts) - across multiple regions, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost) - conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts) - replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)-[Problems with different topologies](/en/ch6#problems-with-different-topologies) - reasons for using, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Replication](/en/ch6#ch_replication) - sharding and, [Sharding](/en/ch7#ch_sharding) - single-leader, [Single-Leader Replication](/en/ch6#sec_replication_leader)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - failover, [Leader failure: Failover](/en/ch6#leader-failure-failover) - implementation of replication logs, [Implementation of Replication Logs](/en/ch6#sec_replication_implementation)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - relation to consensus, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus), [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - setting up new followers, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - synchronous versus asynchronous, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)-[Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async) - state machine replication, [Statement-based replication](/en/ch6#statement-based-replication), [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases) - event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - using consensus, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus) - using erasure coding, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - using object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - versus backups, [Replication](/en/ch6#ch_replication) - with heterogeneous data systems, [Keeping Systems in Sync](/en/ch12#sec_stream_sync) - replication logs (see logs) - representations of data (see data models) - reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338) - (see also evolvability) - from log-based messaging, [Replaying old messages](/en/ch12#sec_stream_replay) - request hedging, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - request identifiers, [Uniquely identifying requests](/en/ch13#id355), [Multi-shard request processing](/en/ch13#id360) - request routing, [Request Routing](/en/ch7#sec_sharding_routing)-[Request Routing](/en/ch7#sec_sharding_routing) - approaches to, [Request Routing](/en/ch7#sec_sharding_routing) - residence laws for data, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - resilient systems, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability) - (see also fault tolerance) - resource isolation, [Cloud Computing Versus Supercomputing](/en/ch1#id17), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - resource limits, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - response time - as performance metric, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Batch Processing](/en/ch11#ch_batch) - guarantees on, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime) - impact on users, [Average, Median, and Percentiles](/en/ch2#id24) - in replicated systems, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - latency versus, [Latency and Response Time](/en/ch2#id23) - mean and percentiles, [Average, Median, and Percentiles](/en/ch2#id24) - user experience, [Average, Median, and Percentiles](/en/ch2#id24) - responsibility and accountability, [Responsibility and Accountability](/en/ch14#id371) - REST (Representational State Transfer), [Web services](/en/ch5#sec_web_services) - (see also services) - Restate (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - RethinkDB (database) - join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases) - key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - retry storm, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Software faults](/en/ch2#software-faults) - reverse ETL, [Beyond the data lake](/en/ch1#beyond-the-data-lake) - Riak (database) - CRDT support, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts), [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent) - dotted version vectors, [Version vectors](/en/ch6#version-vectors) - gossip protocol, [Request Routing](/en/ch7#sec_sharding_routing) - hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless) - linearizability, lack of, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable) - multi-region support, [Multi-region operation](/en/ch6#multi-region-operation) - rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - sloppy quorums, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - vnodes (sharding), [Sharding](/en/ch7#ch_sharding) - ring buffers, [Disk space usage](/en/ch12#sec_stream_disk_usage) - RisingWave (database) - incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - rockets, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - RocksDB (storage engine), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - as embedded storage engine, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - leveled compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - rollbacks (transactions), [Transactions](/en/ch8#ch_transactions) - rolling upgrades, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Encoding and Evolution](/en/ch5#ch_encoding), [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure) - in a multitenant system, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - routing (see request routing) - row-based replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication) - row-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column) - rowhammer (memory corruption), [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - RPCs (see remote procedure calls) - rules (Datalog), [Datalog: Recursive Relational Queries](/en/ch3#id62) - Rust (programming language) - memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) ### S - S3 (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277) - checking data integrity, [Don't just blindly trust what they promise](/en/ch13#id364) - conditional writes, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - object size, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - S3 Express One Zone, [Object Stores](/en/ch11#id277), [Object Stores](/en/ch11#id277) - use in MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce) - workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - SaaS (see software as a service (SaaS)) - safety and liveness properties, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness) - in consensus algorithms, [Single-value consensus](/en/ch10#single-value-consensus) - in transactions, [Transactions](/en/ch8#ch_transactions) - sagas (see compensating transactions) - Samza (stream processor), [Stream analytics](/en/ch12#id318) - SAP HANA (database), [Data Storage for Analytics](/en/ch4#sec_storage_analytics) - scalability, [Scalability](/en/ch2#sec_introduction_scalability)-[Principles for Scalability](/en/ch2#id35), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy) - auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - describing load, [Describing Load](/en/ch2#id33) - describing performance, [Describing Performance](/en/ch2#sec_introduction_percentiles) - linear, [Describing Load](/en/ch2#id33) - principles for, [Principles for Scalability](/en/ch2#id35) - replication and, [Problems with Replication Lag](/en/ch6#sec_replication_lag) - scaling up versus scaling out, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing) - scaling out, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing) - (see also shared-nothing architecture) - by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - scaling up, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing) - SCD (slowly changing dimension), [Time-dependence of joins](/en/ch12#sec_stream_join_time) - scheduling - algorithms, [Resource Allocation](/en/ch11#id279) - batch jobs, [Distributed Job Orchestration](/en/ch11#id278)-[Scheduling Workflows](/en/ch11#sec_batch_workflows) - gang scheduling, [Resource Allocation](/en/ch11#id279) - schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - comparison to evolvable schema, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - schema-on-write, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - schemaless databases (see schema-on-read) - schemas, [Glossary](/en/glossary) - Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - reader determining writer's schema, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - schema evolution, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema) - dynamically generated, [Dynamically generated schemas](/en/ch5#dynamically-generated-schemas) - evolution of, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - affecting application code, [Encoding and Evolution](/en/ch5#ch_encoding) - compatibility checking, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema) - in databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage) - in service calls, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - flexibility in document model, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - for analytics, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - for JSON and XML, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json), [JSON Schema](/en/ch5#json-schema) - generation and migration using ORMs, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm) - merits of, [The Merits of Schemas](/en/ch5#sec_encoding_schemas) - migration, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - schema evolution, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution) - schema migration on railways, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing) - traditional approach to design, fallacy in, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views) - scientific computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - scikit-learn (Python library), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - ScyllaDB (database) - cluster metadata, [Request Routing](/en/ch7#sec_sharding_routing) - consistency level ANY, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range) - last-write-wins conflict resolution, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent) - leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless) - lightweight transactions, [Single-object writes](/en/ch8#sec_transactions_single_object) - linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - multi-region support, [Multi-region operation](/en/ch6#multi-region-operation) - use of clocks, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - vnodes (sharding), [Sharding](/en/ch7#ch_sharding) - search engines (see full-text search) - searching on streams, [Search on streams](/en/ch12#id320) - secondaries (see leader-based replication) - secondary indexes, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Glossary](/en/glossary) - for many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many) - problems with dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Reasoning about dataflows](/en/ch13#id443) - sharding, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary) - global, [Global Secondary Indexes](/en/ch7#id167) - index maintenance, [Maintaining derived state](/en/ch13#id446) - local, [Local Secondary Indexes](/en/ch7#id166) - updating, transaction isolation and, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - secondary sort (MapReduce), [JOIN and GROUP BY](/en/ch11#sec_batch_join) - sed (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis) - self-hosting, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud) - data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - self-joins, [Summary](/en/ch12#id332) - self-validating systems, [Don't just blindly trust what they promise](/en/ch13#id364) - semantic search, [Vector Embeddings](/en/ch4#id92) - semantic similarity, [Vector Embeddings](/en/ch4#id92) - semantic web, [Triple-Stores and SPARQL](/en/ch3#id59) - semi-synchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async) - sequential writes (access pattern), [Sequential versus random writes](/en/ch4#sidebar_sequential) - serializability, [Isolation](/en/ch8#sec_transactions_acid_isolation), [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels), [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation), [Glossary](/en/glossary) - linearizability versus, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - pessimistic versus optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution) - sharding, [Sharding](/en/ch8#sharding) - using stored procedures, [Encapsulating transactions in stored procedures](/en/ch8#encapsulating-transactions-in-stored-procedures), [Using shared logs](/en/ch10#sec_consistency_smr) - serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - detecting stale MVCC reads, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads) - detecting writes that affect prior reads, [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - distributed execution, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - performance of SSI, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - preventing write skew, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - timeliness vs. integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range) - index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range) - performance, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking) - Serializable (Java), [Language-Specific Formats](/en/ch5#id96) - serialization, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - (see also encoding) - serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery) - registration, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - using DNS, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery) - service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Describing Load](/en/ch2#id33) - service mesh, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery) - Service Organization Control (SOC), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - service time, [Latency and Response Time](/en/ch2#id23) - service-oriented architecture (SOA), [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - (see also services) - services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - microservices, [Microservices and Serverless](/en/ch1#sec_introduction_microservices) - causal dependencies across services, [The limits of total ordering](/en/ch13#id335) - loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor) - relation to batch/stream processors, [Batch Processing](/en/ch11#ch_batch), [Stream processors and services](/en/ch13#id345) - remote procedure calls (RPCs), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc) - issues with, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - similarity to databases, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc) - web services, [Web services](/en/ch5#sec_web_services) - session windows (stream processing), [Types of windows](/en/ch12#id324) - (see also windows) - sharding, [Sharding](/en/ch7#ch_sharding)-[Summary](/en/ch7#summary), [Glossary](/en/glossary) - and consensus, [Using shared logs](/en/ch10#sec_consistency_smr) - and replication, [Sharding](/en/ch7#ch_sharding) - distributed transactions across shards, [Distributed Transactions](/en/ch8#sec_transactions_distributed) - hot shards, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - in batch processing, [Batch Processing](/en/ch11#ch_batch) - key-range splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data) - multi-shard operations, [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - enforcing constraints, [Multi-shard request processing](/en/ch13#id360) - secondary index maintenance, [Maintaining derived state](/en/ch13#id446) - of key-value data, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)-[Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - by key range, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - skew and hot spots, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - origin of the term, [Sharding](/en/ch7#ch_sharding) - partition key, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons), [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - rebalancing - of key-range sharded data, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data) - rebalancing shards, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)-[Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - automatic or manual rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations) - problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes) - using fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards) - using N shards per node, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - request routing, [Request Routing](/en/ch7#sec_sharding_routing)-[Request Routing](/en/ch7#sec_sharding_routing) - secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167) - global, [Global Secondary Indexes](/en/ch7#id167) - local, [Local Secondary Indexes](/en/ch7#id166) - serial execution of transactions and, [Sharding](/en/ch8#sharding) - sorting sharded data, [Shuffling Data](/en/ch11#sec_shuffle) - shared logs, [Consensus in Practice](/en/ch10#sec_consistency_total_order)-[Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus), [The limits of total ordering](/en/ch13#id335), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - algorithms, [Consensus in Practice](/en/ch10#sec_consistency_total_order) - for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - for messaging, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay) - relation to consensus, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs) - using, [Using shared logs](/en/ch10#sec_consistency_smr) - shared mode (locks), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - shared-disk architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - shared-memory architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing) - shared-nothing architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Glossary](/en/glossary) - distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - (see also distributed filesystems) - use of network, [Unreliable Networks](/en/ch9#sec_distributed_networks) - sharks - biting undersea cables, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults) - counting (example), [Query languages for documents](/en/ch3#query-languages-for-documents) - shredding (deletion) (see crypto-shredding) - shredding (in columnar encoding), [Column-Oriented Storage](/en/ch4#sec_storage_column) - shredding (in relational model), [When to Use Which Model](/en/ch3#sec_datamodels_document_summary) - shuffle (batch processing), [Shuffling Data](/en/ch11#sec_shuffle)-[Shuffling Data](/en/ch11#sec_shuffle) - siblings (concurrent values), [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship), [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - (see also conflicts) - silo, [Data Warehousing](/en/ch1#sec_introduction_dwh) - similarity search - edit distance, [Full-Text Search](/en/ch4#sec_storage_full_text) - genome data, [Summary](/en/ch3#summary) - simplicity, [Simplicity: Managing Complexity](/en/ch2#id38) - Singer, [Data Warehousing](/en/ch1#sec_introduction_dwh) - single-instruction-multi-data (SIMD) instructions, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - single-leader replication (see leader-based replication) - single-threaded execution, [Atomic write operations](/en/ch8#atomic-write-operations), [Actual Serial Execution](/en/ch8#sec_transactions_serial) - in stream processing, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Concurrency control](/en/ch12#sec_stream_concurrency), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - SingleStore (database) - in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - site reliability engineer, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - size-tiered compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage) - skew, [Glossary](/en/glossary) - clock skew, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - in transaction isolation - read skew, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Summary](/en/ch8#summary) - write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts), [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - (see also write skew) - meanings of, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - unbalanced workload, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value) - compensating for, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - due to celebrities, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew) - for time-series data, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - skip list, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - SLA (see service level agreements) - Slack (group chat) - GraphQL example, [GraphQL](/en/ch3#id63) - SlateDB (database), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - sliding windows (stream processing), [Types of windows](/en/ch12#id324) - (see also windows) - sloppy quorums, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf) - slowly changing dimension (data warehouses), [Time-dependence of joins](/en/ch12#sec_stream_join_time) - smearing (leap seconds adjustments), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - snapshots (databases) - as backups, [Replication](/en/ch6#ch_replication) - computing derived data, [Creating an index](/en/ch13#id340) - in change data capture, [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot) - serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - setting up a new replica, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - snapshot isolation and repeatable read, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - implementing with MVCC, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl) - indexes and MVCC, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation) - visibility rules, [Visibility rules for observing a consistent snapshot](/en/ch8#sec_transactions_mvcc_visibility) - synchronized clocks for global snapshots, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - Snowflake (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Batch Processing](/en/ch11#ch_batch) - column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column) - handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - Snowpark, [Query languages](/en/ch11#sec_batch_query_lanauges) - Snowflake (ID generator), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - snowflake schemas, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - SOAP (web services), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - SOC2 (see Service Organization Control (SOC)) - social graph, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - society - responsibility towards, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation) - sociotechnical systems, [Humans and Reliability](/en/ch2#id31) - software as a service (SaaS), [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud) - ETL from, [Data Warehousing](/en/ch1#sec_introduction_dwh) - multitenancy, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy) - software bugs, [Software faults](/en/ch2#software-faults) - maintaining integrity, [Maintaining integrity in the face of software bugs](/en/ch13#id455) - solar storm, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - solid state drives (SSDs) - access patterns, [Sequential versus random writes](/en/ch4#sidebar_sequential) - compared to object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - detecting corruption, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364) - failure rate, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults) - faults in, [Durability](/en/ch8#durability) - firmware bugs, [Software faults](/en/ch2#software-faults) - read throughput, [Read performance](/en/ch4#read-performance) - sequential vs. random writes, [Sequential versus random writes](/en/ch4#sidebar_sequential) - Solr (search server) - local secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - use of Lucene, [Full-Text Search](/en/ch4#sec_storage_full_text) - sort (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Sorting Versus In-memory Aggregation](/en/ch11#id275), [Distributed Job Orchestration](/en/ch11#id278) - sort-merge joins (MapReduce), [JOIN and GROUP BY](/en/ch11#sec_batch_join) - Sorted String Tables (see SSTables) - sorting - sort order in column storage, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage) - source of truth (see systems of record) - Spanner (database) - consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native) - snapshot isolation using clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - TrueTime API, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval) - Spark (processing framework), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Batch Processing](/en/ch11#ch_batch), [Dataflow Engines](/en/ch11#sec_batch_dataflow) - cost efficiency, [Query languages](/en/ch11#sec_batch_query_lanauges) - DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287) - fault tolerance, [Handling Faults](/en/ch11#id281) - for data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - high availability using ZooKeeper, [Coordination Services](/en/ch10#sec_consistency_coordination) - MLlib, [Machine Learning](/en/ch11#id290) - query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges) - shuffling data, [Shuffling Data](/en/ch11#sec_shuffle) - Spark Streaming, [Stream analytics](/en/ch12#id318) - microbatching, [Microbatching and checkpointing](/en/ch12#id329) - streaming SQL support, [Complex event processing](/en/ch12#id317) - use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - SPARQL (query language), [The SPARQL query language](/en/ch3#the-sparql-query-language) - sparse index, [The SSTable file format](/en/ch4#the-sstable-file-format) - sparse matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - split brain, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Request Routing](/en/ch7#sec_sharding_routing), [Glossary](/en/glossary) - enforcing constraints, [Uniqueness constraints require consensus](/en/ch13#id452) - in consensus algorithms, [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - preventing, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - using fencing tokens to avoid, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas) - spot instances, [Handling Faults](/en/ch11#id281) - spreadsheets, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - dataflow programming, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow) - pivot table, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - SQL (Structured Query Language), [Simplicity: Managing Complexity](/en/ch2#id38), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - for analytics, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Column-Oriented Storage](/en/ch4#sec_storage_column) - graph queries in, [Graph Queries in SQL](/en/ch3#id58) - isolation levels standard, issues with, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion) - joins, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization) - résumé (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships) - social network home timelines (example), [Representing Users, Posts, and Follows](/en/ch2#id20) - SQL injection vulnerability, [Byzantine Faults](/en/ch9#sec_distributed_byzantine) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) - stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch) - views, [Datalog: Recursive Relational Queries](/en/ch3#id62) - SQL Server (database) - archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - change data capture, [Implementing change data capture](/en/ch12#id307) - data warehousing support, [Data Storage for Analytics](/en/ch4#sec_storage_analytics) - distributed transaction support, [XA transactions](/en/ch8#xa-transactions) - leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates) - preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl) - serializable isolation, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - T-SQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - SQLite (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - SRE (site reliability engineer), [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - SSDs (see solid state drives) - SSTables (storage format), [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - constructing and maintaining, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - making LSM-Tree from, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - staged rollout (see rolling upgrades) - staleness (old data), [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - in leaderless databases, [Writing to the Database When a Node Is Down](/en/ch6#id287) - in multi-version concurrency control, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads) - monitoring for, [Monitoring staleness](/en/ch6#monitoring-staleness) - of client state, [Pushing state changes to clients](/en/ch13#id348) - versus linearizability, [Linearizability](/en/ch10#sec_consistency_linearizability) - versus timeliness, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - standbys (see leader-based replication) - star replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies) - star schemas, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics) - Star Wars analogy (event time versus processing time), [Event time versus processing time](/en/ch12#id322) - starvation (scheduling), [Resource Allocation](/en/ch11#id279) - state - derived from log of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability) - interplay between state changes and application code, [Dataflow: Interplay between state changes and application code](/en/ch13#id450) - maintaining derived state, [Maintaining derived state](/en/ch13#id446) - maintenance by stream processor in stream-stream joins, [Stream-stream join (window join)](/en/ch12#id440) - observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - rebuilding after stream processor failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - separation of application code and, [Separation of application code and state](/en/ch13#id344) - state machine replication, [Statement-based replication](/en/ch6#statement-based-replication), [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases) - event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - stateless systems, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - statically typed languages - analogy to schema-on-write, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - statistical and numerical algorithms, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - StatsD (metrics aggregator), [Direct messaging from producers to consumers](/en/ch12#id296) - stock market feeds, [Direct messaging from producers to consumers](/en/ch12#id296) - STONITH (Shoot The Other Node In The Head), [Leader failure: Failover](/en/ch6#leader-failure-failover) - problems with, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - stop-the-world (see garbage collection) - storage - composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - Storage Area Network (SAN), [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - storage engines, [Storage and Retrieval](/en/ch4#ch_storage)-[Summary](/en/ch4#summary) - column-oriented, [Column-Oriented Storage](/en/ch4#sec_storage_column)-[Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - column compression, [Column Compression](/en/ch4#sec_storage_column_compression)-[Column Compression](/en/ch4#sec_storage_column_compression) - defined, [Column-Oriented Storage](/en/ch4#sec_storage_column) - Parquet, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage) - sort order in, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)-[Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage) - versus wide-column model, [Column Compression](/en/ch4#sec_storage_column_compression) - writing to, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - durability, [Durability](/en/ch8#durability) - row-oriented, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)-[Keeping everything in memory](/en/ch4#sec_storage_inmemory) - B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants) - comparing B-trees and LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage) - defined, [Column-Oriented Storage](/en/ch4#sec_storage_column) - log-structured, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction) - stored procedures, [Encapsulating transactions in stored procedures](/en/ch8#encapsulating-transactions-in-stored-procedures)-[Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Glossary](/en/glossary) - and shared logs, [Using shared logs](/en/ch10#sec_consistency_smr) - pros and cons of, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - similarity to stream processors, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation) - Storm (stream processor), [Stream analytics](/en/ch12#id318) - distributed RPC, [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc), [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - Trident state handling, [Idempotence](/en/ch12#sec_stream_idempotence) - straggler events, [Handling straggler events](/en/ch12#id323) - Stream Control Transmission Protocol (SCTP), [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - stream processing, [Processing Streams](/en/ch12#sec_stream_processing)-[Summary](/en/ch12#id332), [Glossary](/en/glossary) - accessing external services within job, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins), [Microbatching and checkpointing](/en/ch12#id329), [Idempotence](/en/ch12#sec_stream_idempotence), [Exactly-once execution of an operation](/en/ch13#id353) - combining with batch processing, [Unifying batch and stream processing](/en/ch13#id338) - comparison to batch processing, [Processing Streams](/en/ch12#sec_stream_processing) - complex event processing (CEP), [Complex event processing](/en/ch12#id317) - fault tolerance, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance)-[Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - atomic commit, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) - idempotence, [Idempotence](/en/ch12#sec_stream_idempotence) - microbatching and checkpointing, [Microbatching and checkpointing](/en/ch12#id329) - rebuilding state after a failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - for data integration, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338) - for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - maintaining derived state, [Maintaining derived state](/en/ch13#id446) - maintenance of materialized views, [Maintaining materialized views](/en/ch12#sec_stream_mat_view) - messaging systems (see messaging systems) - reasoning about time, [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324) - event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338) - knowing when window is ready, [Handling straggler events](/en/ch12#id323) - types of windows, [Types of windows](/en/ch12#id324) - relation to databases (see streams) - relation to services, [Stream processors and services](/en/ch13#id345) - relationship to batch processing, [Batch Processing](/en/ch11#ch_batch) - search on streams, [Search on streams](/en/ch12#id320) - single-threaded execution, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Concurrency control](/en/ch12#sec_stream_concurrency) - stream analytics, [Stream analytics](/en/ch12#id318) - stream joins, [Stream Joins](/en/ch12#sec_stream_joins)-[Time-dependence of joins](/en/ch12#sec_stream_join_time) - stream-stream join, [Stream-stream join (window join)](/en/ch12#id440) - stream-table join, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - table-table join, [Table-table join (materialized view maintenance)](/en/ch12#id326) - time-dependence of, [Time-dependence of joins](/en/ch12#sec_stream_join_time) - streams, [Stream Processing](/en/ch12#ch_stream)-[Replaying old messages](/en/ch12#sec_stream_replay) - end-to-end, pushing events to clients, [End-to-end event streams](/en/ch13#id349) - messaging systems (see messaging systems) - processing (see stream processing) - relation to databases, [Databases and Streams](/en/ch12#sec_stream_databases)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - (see also changelogs) - API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api) - change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api) - derivative of state by time, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability) - event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing) - keeping systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)-[Keeping Systems in Sync](/en/ch12#sec_stream_sync) - philosophy of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations) - topics, [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - timeliness vs. integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - striping (in columnar encoding), [Column-Oriented Storage](/en/ch4#sec_storage_column) - strong consistency (see linearizability) - strong eventual consistency, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution) - strong one-copy serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - subjects, predicates, and objects (in triple-stores), [Triple-Stores and SPARQL](/en/ch3#id59) - subscribers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - (see also consumers) - supercomputers, [Cloud Computing Versus Supercomputing](/en/ch1#id17) - Superset (data visualization software), [Analytics](/en/ch11#sec_batch_olap) - surveillance, [Surveillance](/en/ch14#id374) - (see also privacy) - sushi principle, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake) - sustainability, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed) - Swagger (service definition format), [Web services](/en/ch5#sec_web_services) - swapping to disk (see virtual memory) - Swift (programming language) - memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - sync engines, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients)-[Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - examples of, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - for local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps) - synchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks), [Glossary](/en/glossary) - comparison to asynchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks) - system model, [System Model and Reality](/en/ch9#sec_distributed_system_model) - synchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async), [Glossary](/en/glossary) - with multiple leaders, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader) - system administrator, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations) - system models, [Knowledge, Truth, and Lies](/en/ch9#sec_distributed_truth), [System Model and Reality](/en/ch9#sec_distributed_system_model)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - assumptions in, [Trust, but Verify](/en/ch13#sec_future_verification) - correctness of algorithms, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm) - mapping to the real world, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world) - safety and liveness, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness) - systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Glossary](/en/glossary) - change data capture, [Implementing change data capture](/en/ch12#id307), [Reasoning about dataflows](/en/ch13#id443) - event logs, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events) - treating event log as, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability) - systems thinking, [Feedback Loops](/en/ch14#id372) ### T - t-digest (algorithm), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla) - table-table joins, [Table-table join (materialized view maintenance)](/en/ch12#id326) - Tableau (data visualization software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap) - tail (Unix tool), [Using logs for message storage](/en/ch12#id300) - tail latency (see latency) - tail vertex (property graphs), [Property Graphs](/en/ch3#id56) - task (workflows) (see workflow engines) - TCP (Transmission Control Protocol), [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - comparison to circuit switching, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - comparison to UDP, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - connection failures, [Detecting Faults](/en/ch9#id307) - flow control, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing), [Messaging Systems](/en/ch12#sec_stream_messaging) - packet checksums, [Weak forms of lying](/en/ch9#weak-forms-of-lying), [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Trust, but Verify](/en/ch13#sec_future_verification) - reliability and duplicate suppression, [Duplicate suppression](/en/ch13#id354) - retransmission timeouts, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - use for transaction sessions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object) - Temporal (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - Tensorflow (machine learning library), [Machine Learning](/en/ch11#id290) - Teradata (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - term-partitioned indexes (see global secondary indexes) - termination (consensus), [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - testing, [Humans and Reliability](/en/ch2#id31) - thrashing (out of memory), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - threads (concurrency) - actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc) - (see also event-driven architecture) - atomic operations, [Atomicity](/en/ch8#sec_transactions_acid_atomicity) - background threads, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables) - execution pauses, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - memory barriers, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays) - preemption, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - single (see single-threaded execution) - three-phase commit, [Three-phase commit](/en/ch8#three-phase-commit) - three-way relationships, [Property Graphs](/en/ch3#id56) - Thrift (data format), [Protocol Buffers](/en/ch5#sec_encoding_protobuf) - throughput, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Describing Load](/en/ch2#id33), [Batch Processing](/en/ch11#ch_batch) - TIBCO, [Message brokers](/en/ch5#message-brokers) - Enterprise Message Service, [Message brokers compared to databases](/en/ch12#id297) - StreamBase (stream analytics), [Complex event processing](/en/ch12#id317) - TiDB (database) - consensus-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader) - regions (sharding), [Sharding](/en/ch7#ch_sharding) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167) - snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - timestamp oracle, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator) - transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - tiered storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Disk space usage](/en/ch12#sec_stream_disk_usage) - TigerBeetle (database), [Summary](/en/ch3#summary) - deterministic simulation testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - TigerGraph (database) - GSQL language, [Graph Queries in SQL](/en/ch3#id58) - Tigris (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - TileDB (database), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - time - concurrency and, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before) - cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - in distributed systems, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - (see also clocks) - clock synchronization and accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - relying on synchronized clocks, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact) - reasoning about, in stream processors, [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324) - event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338) - knowing when window is ready, [Handling straggler events](/en/ch12#id323) - timestamp of events, [Whose clock are you using, anyway?](/en/ch12#id438) - types of windows, [Types of windows](/en/ch12#id324) - system models for distributed systems, [System Model and Reality](/en/ch9#sec_distributed_system_model) - time-dependence in stream joins, [Time-dependence of joins](/en/ch12#sec_stream_join_time) - time series data - as DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes) - column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column) - time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks) - hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks) - timeliness, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - coordination-avoiding data systems, [Coordination-avoiding data systems](/en/ch13#id454) - correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453) - timeouts, [Unreliable Networks](/en/ch9#sec_distributed_networks), [Glossary](/en/glossary) - dynamic configuration of, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - for failover, [Leader failure: Failover](/en/ch6#leader-failure-failover) - length of, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing) - TimescaleDB (database), [Column-Oriented Storage](/en/ch4#sec_storage_column) - timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps) - assigning to events in stream processing, [Whose clock are you using, anyway?](/en/ch12#id438) - for read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw) - for transaction ordering, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) - insufficiency for enforcing constraints, [Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks) - key range sharding by, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - Lamport, [Lamport timestamps](/en/ch10#lamport-timestamps) - logical, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality) - ordering events, [Timestamps for ordering events](/en/ch9#sec_distributed_lww) - timestamp oracle, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator) - TLA+ (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - token bucket (limiting retries), [Describing Performance](/en/ch2#sec_introduction_percentiles) - tombstones, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Disk space usage](/en/ch4#disk-space-usage), [Log compaction](/en/ch12#sec_stream_log_compaction) - topics (messaging), [Message brokers](/en/ch5#message-brokers), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - torn pages (B-trees), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - total order, [Glossary](/en/glossary) - broadcast (see shared logs) - limits of, [The limits of total ordering](/en/ch13#id335) - on logical timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps) - tracing, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems) - tracking behavioral data, [Privacy and Tracking](/en/ch14#id373) - (see also privacy) - trade-offs, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)-[Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance) - transaction coordinator (see coordinator) - transaction manager (see coordinator) - transaction processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)-[Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - comparison to analytics, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp) - comparison to data warehousing, [Data Storage for Analytics](/en/ch4#sec_storage_analytics) - transactions, [Transactions](/en/ch8#ch_transactions)-[Summary](/en/ch8#summary), [Glossary](/en/glossary) - ACID properties of, [The Meaning of ACID](/en/ch8#sec_transactions_acid) - atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity) - consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency) - durability, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability) - isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation) - and derived data integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity) - and replication, [Solutions for Replication Lag](/en/ch6#id131) - compensating (see compensating transactions) - concept of, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview) - distributed transactions, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited) - avoiding, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions), [Making unbundling work](/en/ch13#sec_future_unbundling_favor), [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Coordination-avoiding data systems](/en/ch13#id454) - failure amplification, [Maintaining derived state](/en/ch13#id446) - for sharded systems, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - in doubt/uncertain status, [Coordinator failure](/en/ch8#coordinator-failure), [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt) - two-phase commit, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Three-phase commit](/en/ch8#three-phase-commit) - use of, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa)-[Exactly-once message processing](/en/ch8#sec_transactions_exactly_once) - XA transactions, [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - OLTP versus analytics queries, [Analytics](/en/ch11#sec_batch_olap) - purpose of, [Transactions](/en/ch8#ch_transactions) - serializability, [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - actual serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution) - pessimistic versus optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control) - serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation) - two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range) - single-object and multi-object, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)-[Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - handling errors and aborts, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts) - need for multi-object transactions, [The need for multi-object transactions](/en/ch8#sec_transactions_need) - single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object) - snapshot isolation (see snapshots) - strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition) - weak isolation levels, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)-[Materializing conflicts](/en/ch8#materializing-conflicts) - preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - read committed, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation) - traversal (graphs), [Property Graphs](/en/ch3#id56) - trie (data structure), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Full-Text Search](/en/ch4#sec_storage_full_text) - as SSTable index, [The SSTable file format](/en/ch4#the-sstable-file-format) - triggers (databases), [Transmitting Event Streams](/en/ch12#sec_stream_transmit) - Trino (data warehouse), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - federated databases, [The meta-database of everything](/en/ch13#id341) - query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges) - use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage) - workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language) - SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language) - tumbling windows (stream processing), [Types of windows](/en/ch12#id324) - (see also windows) - in microbatching, [Microbatching and checkpointing](/en/ch12#id329) - Turbopuffer (vector search), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - Turtle (RDF data format), [Triple-Stores and SPARQL](/en/ch3#id59) - Twitter (see X (social network)) - two-phase commit (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Coordinator failure](/en/ch8#coordinator-failure), [Glossary](/en/glossary) - confusion with two-phase locking, [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl) - coordinator failure, [Coordinator failure](/en/ch8#coordinator-failure) - coordinator recovery, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure) - how it works, [A system of promises](/en/ch8#a-system-of-promises) - performance cost, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa) - problems with XA transactions, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - transactions holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt) - two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition), [Glossary](/en/glossary) - confusion with two-phase commit, [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl) - growing and shrinking phases, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking) - index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range) - performance of, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking) - type checking, dynamic versus static, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) ### U - UDP (User Datagram Protocol) - comparison to TCP, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - multicast, [Direct messaging from producers to consumers](/en/ch12#id296) - Ultima Online (game), [Sharding](/en/ch7#ch_sharding) - unbounded datasets, [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary) - (see also streams) - unbounded delays, [Glossary](/en/glossary) - in networks, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing) - process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448) - federation versus unbundling, [The meta-database of everything](/en/ch13#id341) - designing applications around dataflow, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345) - observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - materialized views and caching, [Materialized views and caching](/en/ch13#id451) - multi-shard data processing, [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard) - pushing state changes to clients, [Pushing state changes to clients](/en/ch13#id348) - uncertain (transaction status) (see in doubt) - union type (in Avro), [Schema evolution rules](/en/ch5#schema-evolution-rules) - uniq (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278) - uniqueness constraints - asynchronously checked, [Loosely interpreted constraints](/en/ch13#id362) - requiring consensus, [Uniqueness constraints require consensus](/en/ch13#id452) - requiring linearizability, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness) - uniqueness in log-based messaging, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log) - Unity (data catalog), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - universally unique identifiers (see UUIDs) - Unix philosophy - comparison to relational databases, [Unbundling Databases](/en/ch13#sec_future_unbundling), [The meta-database of everything](/en/ch13#id341) - comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing) - Unix pipes, [Simple Log Analysis](/en/ch11#sec_batch_log_analysis) - compared to distributed batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - UPDATE statement (SQL), [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - updates - preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - atomic write operations, [Atomic write operations](/en/ch8#atomic-write-operations) - automatically detecting lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates) - compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set) - conflict resolution and replication, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication) - using explicit locking, [Explicit locking](/en/ch8#explicit-locking) - preventing write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts) - utilization - batch process scheduling, [Resource Allocation](/en/ch11#id279) - increasing through preemption, [Handling Faults](/en/ch11#id281) - trade-off with latency, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable) - uTP protocol (BitTorrent), [The Limitations of TCP](/en/ch9#sec_distributed_tcp) - UUIDs, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) ### V - validity (consensus), [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus) - vBuckets (sharding), [Sharding](/en/ch7#ch_sharding) - vector clocks, [Version vectors](/en/ch6#version-vectors) - (see also version vectors) - and Lamport/hybrid logical clocks, [Lamport/hybrid logical clocks versus vector clocks](/en/ch10#lamporthybrid-logical-clocks-vs-vector-clocks) - and version vectors, [Version vectors](/en/ch6#version-vectors) - vector embedding, [Vector Embeddings](/en/ch4#id92) - vectorized processing, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized) - vendor lock-in, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs) - Venice (database), [Serving Derived Data](/en/ch11#sec_batch_serving_derived) - verification, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366) - avoiding blind trust, [Don't just blindly trust what they promise](/en/ch13#id364) - designing for auditability, [Designing for auditability](/en/ch13#id365) - end-to-end integrity checks, [The end-to-end argument again](/en/ch13#id456) - tools for auditable data systems, [Tools for auditable data systems](/en/ch13#id366) - version control systems - merge conflicts, [Manual conflict resolution](/en/ch6#manual-conflict-resolution) - reliance on immutable data, [Concurrency control](/en/ch12#sec_stream_concurrency) - version vectors, [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Version vectors](/en/ch6#version-vectors) - dotted, [Version vectors](/en/ch6#version-vectors) - versus vector clocks, [Version vectors](/en/ch6#version-vectors) - Vertica (database), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses) - handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage) - vertical scaling (see scaling up) - vertices (in graphs), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph) - property graph model, [Property Graphs](/en/ch3#id56) - video games, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - video transcoding (example), [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies) - views (SQL queries), [Datalog: Recursive Relational Queries](/en/ch3#id62) - materialized views (see materialization) - Viewstamped Replication (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages) - view number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus) - virtual block device, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute) - virtual file system, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - comparison to distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs) - virtual machines, [Layering of cloud services](/en/ch1#layering-of-cloud-services) - context switches, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - network performance, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - noisy neighbors, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - virtualized clocks in, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy) - virtual memory - process pauses due to page faults, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses) - Virtuoso (database), [The SPARQL query language](/en/ch3#the-sparql-query-language) - VisiCalc (spreadsheets), [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow) - Vitess (database) - key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - vnodes (sharding), [Sharding](/en/ch7#ch_sharding) - vocabularies, [Triple-Stores and SPARQL](/en/ch3#id59) - Voice over IP (VoIP), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing) - VoltDB (database) - cross-shard serializability, [Sharding](/en/ch8#sharding) - deterministic stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs) - in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory) - process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons) - secondary indexes, [Local Secondary Indexes](/en/ch7#id166) - serial execution of transactions, [Actual Serial Execution](/en/ch8#sec_transactions_serial) - statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance) - transactions in stream processing, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit) ### W - WAL (write-ahead log), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal) - WAL-G (backup tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - WarpStream (messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage) - web services (see services) - webhooks, [Direct messaging from producers to consumers](/en/ch12#id296) - webMethods (messaging), [Message brokers](/en/ch5#message-brokers) - WebSocket (protocol), [Pushing state changes to clients](/en/ch13#id348) - wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - versus column-oriented storage, [Column Compression](/en/ch4#sec_storage_column_compression) - windows (stream processing), [Stream analytics](/en/ch12#id318), [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324) - infinite windows for changelogs, [Maintaining materialized views](/en/ch12#sec_stream_mat_view), [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins) - knowing when all events have arrived, [Handling straggler events](/en/ch12#id323) - stream joins within a window, [Stream-stream join (window join)](/en/ch12#id440) - types of windows, [Types of windows](/en/ch12#id324) - WITH RECURSIVE syntax (SQL), [Graph Queries in SQL](/en/ch3#id58) - Word2Vec (language model), [Vector Embeddings](/en/ch4#id92) - workflow engines, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - Airflow (see Airflow (workflow scheduler)) - batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows) - Camunda (see Camunda (workflow engine)) - Dagster (see Dagster (workflow scheduler)) - durable execution, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - ETL (see ETL (extract-transform-load)) - executor, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows) - orchestrators, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch) - Orkes (see Orkes (workflow engine)) - Prefect (see Prefect (workflow scheduler)) - reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing) - Restate (see Restate (workflow engine)) - Temporal (see Temporal (workflow engine)) - working set, [Sorting Versus In-memory Aggregation](/en/ch11#id275) - write amplification, [Write amplification](/en/ch4#write-amplification) - write path (derived data), [Observing Derived State](/en/ch13#sec_future_observing) - write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts) - characterizing, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Phantoms causing write skew](/en/ch8#sec_transactions_phantom), [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise) - examples of, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew), [More examples of write skew](/en/ch8#more-examples-of-write-skew) - materializing conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts) - occurrence in practice, [Maintaining integrity in the face of software bugs](/en/ch13#id455) - phantoms, [Phantoms causing write skew](/en/ch8#sec_transactions_phantom) - preventing - in snapshot isolation, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - in two-phase locking, [Predicate locks](/en/ch8#predicate-locks)-[Index-range locks](/en/ch8#sec_transactions_2pl_range) - options for, [Characterizing write skew](/en/ch8#characterizing-write-skew) - write-ahead log (WAL), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping) - in durable execution, [Durable execution](/en/ch5#durable-execution) - writes (database) - atomic write operations, [Atomic write operations](/en/ch8#atomic-write-operations) - detecting writes affecting prior reads, [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads) - preventing dirty writes with read committed, [No dirty writes](/en/ch8#sec_transactions_dirty_write) - WS-\* framework, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc) - WS-AtomicTransaction (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc) ### X - X (social network) - constructing home timelines (example), [Case Study: Social Network Home Timelines](/en/ch2#sec_introduction_twitter), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Table-table join (materialized view maintenance)](/en/ch12#id326), [Materialized views and caching](/en/ch13#id451) - cost of joins, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study) - describing load, [Describing Load](/en/ch2#id33) - fault tolerance, [Fault Tolerance](/en/ch2#id27) - performance metrics, [Describing Performance](/en/ch2#sec_introduction_percentiles) - DistributedLog (event log), [Using logs for message storage](/en/ch12#id300) - Snowflake (ID generator), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical) - XA transactions, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - heuristic decisions, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure) - problems with, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions) - xargs (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis) - XFS (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs) - XGBoost (machine learning library), [Machine Learning](/en/ch11#id290) - XML - binary variants, [Binary encoding](/en/ch5#binary-encoding) - data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality) - encoding RDF data, [The RDF data model](/en/ch3#the-rdf-data-model) - for application data, issues with, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json) - in relational databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility) - XML databases, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Query languages for documents](/en/ch3#query-languages-for-documents) - Xorq (query engine), [The meta-database of everything](/en/ch13#id341) - XPath, [Query languages for documents](/en/ch3#query-languages-for-documents) - XQuery, [Query languages for documents](/en/ch3#query-languages-for-documents) ### Y - Yahoo - response time study, [Average, Median, and Percentiles](/en/ch2#id24) - YARN (job scheduler), [Distributed Job Orchestration](/en/ch11#id278), [Separation of application code and state](/en/ch13#id344) - ApplicationMaster, [Distributed Job Orchestration](/en/ch11#id278) - Yjs (CRDT library), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines) - YugabyteDB (database) - hash-range sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range) - key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range) - multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc) - request routing, [Request Routing](/en/ch7#sec_sharding_routing) - sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167) - tablets (sharding), [Sharding](/en/ch7#ch_sharding) - transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal) - use of clock synchronization, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner) ### Z - Zab (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order) - use in ZooKeeper, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - zero-copy, [Formats for Encoding Data](/en/ch5#sec_encoding_formats) - zero-disk architecture (ZDA), [Setting Up New Followers](/en/ch6#sec_replication_new_replica) - ZeroMQ (messaging library), [Direct messaging from producers to consumers](/en/ch12#id296) - zombies (split brain), [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens) - zones (cloud computing) (see availability zones) - ZooKeeper (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery) - generating fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens), [Using shared logs](/en/ch10#sec_consistency_smr), [Coordination Services](/en/ch10#sec_consistency_coordination) - linearizable operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable) - locks and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election) - observers, [Service discovery](/en/ch10#service-discovery) - use for service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Service discovery](/en/ch10#service-discovery) - use for shard assignment, [Request Routing](/en/ch7#sec_sharding_routing) - use of Zab algorithm, [Consensus](/en/ch10#sec_consistency_consensus)