mirror of
https://github.com/Vonng/ddia.git
synced 2026-06-21 17:07:12 +08:00
3542 lines
328 KiB
Markdown
3542 lines
328 KiB
Markdown
---
|
|
title: Indexes
|
|
weight: 550
|
|
breadcrumbs: false
|
|
---
|
|
|
|
### Symbols
|
|
|
|
- 3FS (distributed filesystem, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
|
|
### A
|
|
|
|
- aborts (transactions), [Transactions](/en/ch8#ch_transactions), [Atomicity](/en/ch8#sec_transactions_acid_atomicity)
|
|
- cascading, [No dirty reads](/en/ch8#no-dirty-reads)
|
|
- in two-phase commit, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- performance of optimistic concurrency control, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- retrying aborted transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- abstraction, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Simplicity: Managing Complexity](/en/ch2#id38), [Data Models and Query Languages](/en/ch3#ch_datamodels), [Transactions](/en/ch8#ch_transactions), [Summary](/en/ch8#summary)
|
|
- accidental complexity, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- accountability, [Responsibility and Accountability](/en/ch14#id371)
|
|
- accounting (financial data), [Summary](/en/ch3#summary), [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- Accumulo (database)
|
|
- wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- ACID properties (transactions), [The Meaning of ACID](/en/ch8#sec_transactions_acid)
|
|
- atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- durability, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability)
|
|
- isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- acknowledgements (messaging), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- active/active replication (see multi-leader replication)
|
|
- active/passive replication (see leader-based replication)
|
|
- ActiveMQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- ActiveRecord (object-relational mapper), [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- activity (workflows) (see workflow engines)
|
|
- actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- (see also event-driven architecture)
|
|
- comparison to stream processing, [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc)
|
|
- adaptive capacity, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- Advanced Message Queuing Protocol (see AMQP)
|
|
- aerospace systems, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- Aerospike (database)
|
|
- strong consistency mode, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- AGE (graph database), [The Cypher Query Language](/en/ch3#id57)
|
|
- aggregation
|
|
- data cubes and materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- in batch processes, [Sorting Versus In-memory Aggregation](/en/ch11#id275)
|
|
- in stream processes, [Stream analytics](/en/ch12#id318)
|
|
- aggregation pipeline (MongoDB), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- Agile, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability)
|
|
- minimizing irreversibility, [Batch Processing](/en/ch11#ch_batch), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- moving faster with confidence, [The end-to-end argument again](/en/ch13#id456)
|
|
- agreement, [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- (see also consensus)
|
|
- AI (artificial intelligence) (see machine learning)
|
|
- AI Act (European Union), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- AirByte, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- Airflow (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- Akamai
|
|
- response time study, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- algorithms
|
|
- algorithm correctness, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm)
|
|
- B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants)
|
|
- for distributed systems, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- mergesort, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- scheduling, [Resource Allocation](/en/ch11#id279)
|
|
- SSTables and LSM-trees, [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- all-to-all replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)
|
|
- AllegroGraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- ALTER TABLE statement (SQL), [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- Amazon
|
|
- Dynamo (see Dynamo (database))
|
|
- response time study, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- Amazon Web Services (AWS)
|
|
- Aurora (see Aurora (cloud database))
|
|
- ClockBound (see ClockBound (time sync))
|
|
- correctness testing, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal)
|
|
- DynamoDB (see DynamoDB (database))
|
|
- EBS (see EBS (virtual block device))
|
|
- Kinesis (see Kinesis (messaging))
|
|
- Neptune (see Neptune (graph database))
|
|
- network reliability, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- S3 (see S3 (object storage))
|
|
- amplification
|
|
- of bias, [Bias and Discrimination](/en/ch14#id370)
|
|
- of failures, [Maintaining derived state](/en/ch13#id446)
|
|
- of tail latency, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Local Secondary Indexes](/en/ch7#id166)
|
|
- write amplification, [Write amplification](/en/ch4#write-amplification)
|
|
- AMQP (Advanced Message Queuing Protocol), [Message brokers compared to databases](/en/ch12#id297)
|
|
- (see also messaging systems)
|
|
- comparison to log-based messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- message ordering, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- analytical systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- as derived data systems, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- ETL from operational systems, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- governance, [Beyond the data lake](/en/ch1#beyond-the-data-lake)
|
|
- analytics, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)-[Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- comparison to transaction processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- data normalization, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization)
|
|
- data warehousing (see data warehousing)
|
|
- predictive (see predictive analytics)
|
|
- relation to batch processing, [Analytics](/en/ch11#sec_batch_olap)-[Analytics](/en/ch11#sec_batch_olap)
|
|
- schemas for, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- snapshot isolation for queries, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- stream analytics, [Stream analytics](/en/ch12#id318)
|
|
- analytics engineering, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- anti-entropy, [Catching up on missed writes](/en/ch6#sec_replication_read_repair)
|
|
- Antithesis (deterministic simulation testing), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- Apache Accumulo (see Accumulo)
|
|
- Apache ActiveMQ (see ActiveMQ)
|
|
- Apache AGE (see AGE)
|
|
- Apache Arrow (see Arrow (data format))
|
|
- Apache Avro (see Avro)
|
|
- Apache Beam (see Beam)
|
|
- Apache BookKeeper (see BookKeeper)
|
|
- Apache Cassandra (see Cassandra)
|
|
- Apache Curator (see Curator)
|
|
- Apache DataFusion (see DataFusion (query engine))
|
|
- Apache Druid (see Druid (database))
|
|
- Apache Flink (see Flink (processing framework))
|
|
- Apache HBase (see HBase)
|
|
- Apache Iceberg (see Iceberg (table format))
|
|
- Apache Jena (see Jena)
|
|
- Apache Kafka (see Kafka)
|
|
- Apache Lucene (see Lucene)
|
|
- Apache Oozie (see Oozie (workflow scheduler))
|
|
- Apache ORC (see ORC (data format))
|
|
- Apache Parquet (see Parquet (data format))
|
|
- Apache Pig (query language), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- Apache Pinot (see Pinot (database))
|
|
- Apache Pulsar (see Pulsar)
|
|
- Apache Qpid (see Qpid)
|
|
- Apache Samza (see Samza)
|
|
- Apache Solr (see Solr)
|
|
- Apache Spark (see Spark) (see Spark (processing framework))
|
|
- Apache Storm (see Storm)
|
|
- Apache Superset (see Superset (data visualization software))
|
|
- Apache Thrift (see Thrift)
|
|
- Apache ZooKeeper (see ZooKeeper)
|
|
- Apama (stream analytics), [Complex event processing](/en/ch12#id317)
|
|
- append-only files (see logs)
|
|
- Application Programming Interfaces (APIs), [Data Models and Query Languages](/en/ch3#ch_datamodels)
|
|
- for change streams, [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- for distributed transactions, [XA transactions](/en/ch8#xa-transactions)
|
|
- for services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- (see also services)
|
|
- evolvability, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- RESTful, [Web services](/en/ch5#sec_web_services)
|
|
- application state (see state)
|
|
- approximate search (see similarity search)
|
|
- archival storage, data from databases, [Archival storage](/en/ch5#archival-storage)
|
|
- arcs (see edges)
|
|
- ArcticDB (database), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- arithmetic mean, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- arrays
|
|
- array databases, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- multidimensional, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- Arrow (data format), [Column-Oriented Storage](/en/ch4#sec_storage_column), [DataFrames](/en/ch11#id287)
|
|
- artificial intelligence (see machine learning)
|
|
- ASCII text, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)
|
|
- ASN.1 (schema language), [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- associative table, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Property Graphs](/en/ch3#id56)
|
|
- asynchronous networks, [Unreliable Networks](/en/ch9#sec_distributed_networks), [Glossary](/en/glossary)
|
|
- comparison to synchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- system model, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- asynchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async), [Glossary](/en/glossary)
|
|
- data loss on failover, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- reads from asynchronous follower, [Problems with Replication Lag](/en/ch6#sec_replication_lag)
|
|
- with multiple leaders, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)
|
|
- Asynchronous Transfer Mode (ATM), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- atomic broadcast, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs)
|
|
- atomic clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- (see also clocks)
|
|
- atomicity (concurrency), [Glossary](/en/glossary)
|
|
- atomic increment, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- (see also compare-and-set (CAS))
|
|
- denormalized data, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization)
|
|
- fetch-and-add/increment, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical), [Consensus](/en/ch10#sec_consistency_consensus), [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus)
|
|
- write operations, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- atomicity (transactions), [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [Glossary](/en/glossary)
|
|
- atomic commit
|
|
- avoiding, [Multi-shard request processing](/en/ch13#id360), [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- blocking and nonblocking, [Three-phase commit](/en/ch8#three-phase-commit)
|
|
- in stream processing, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
- maintaining derived data, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- distributed transactions, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- for multi-object transactions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- for single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- relation to consensus, [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- auditability, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366)
|
|
- designing for, [Designing for auditability](/en/ch13#id365)
|
|
- self-auditing systems, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- through immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- tools for auditable data systems, [Tools for auditable data systems](/en/ch13#id366)
|
|
- Aurora (cloud database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- Aurora DSQL (database)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- Automerge (CRDT library), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- availability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)
|
|
- (see also fault tolerance)
|
|
- in CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem)
|
|
- in leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- in service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- availability zones, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- Avro (data format), [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- dynamically generated schemas, [Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- object container files, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema), [Archival storage](/en/ch5#archival-storage)
|
|
- reader determining writer's schema, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- schema evolution, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema)
|
|
- use in batch processing, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- awk (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278)
|
|
- Axon Framework, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- Azkaban (workflow scheduler), [Batch Processing](/en/ch11#ch_batch)
|
|
- Azure Blob Storage (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- conditional headers, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- Azure managed disks, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- Azure SQL DB (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- Azure Storage, [Object Stores](/en/ch11#id277)
|
|
- Azure Synapse Analytics (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- Azure Virtual Machines
|
|
- spot virtual machines, [Handling Faults](/en/ch11#id281)
|
|
|
|
### B
|
|
|
|
- B-trees (indexes), [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants)
|
|
- B+ trees, [B-tree variants](/en/ch4#b-tree-variants)
|
|
- branching factor, [B-Trees](/en/ch4#sec_storage_b_trees)
|
|
- comparison to LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage)
|
|
- crash recovery, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- growing by splitting a page, [B-Trees](/en/ch4#sec_storage_b_trees)
|
|
- immutable variants, [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- similarity to shard splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)
|
|
- variants, [B-tree variants](/en/ch4#b-tree-variants)
|
|
- B2 (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- Backblaze B2 (see B2 (object storage))
|
|
- backend, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- backoff, exponential, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- backpressure, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Read performance](/en/ch4#read-performance), [Messaging Systems](/en/ch12#sec_stream_messaging), [Glossary](/en/glossary)
|
|
- in batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- in TCP, [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- backups
|
|
- database snapshot for replication, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- in multitenant systems, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- integrity of, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- snapshot isolation for, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- using object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- versus replication, [Replication](/en/ch6#ch_replication)
|
|
- backward compatibility, [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- BadgerDB (database)
|
|
- serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)
|
|
- BASE, contrast to ACID, [The Meaning of ACID](/en/ch8#sec_transactions_acid)
|
|
- bash shell (Unix), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)
|
|
- batch processing, [Batch Processing](/en/ch11#ch_batch)-[Summary](/en/ch11#id292), [Glossary](/en/glossary)
|
|
- and functional programming, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- benefits of, [Batch Processing](/en/ch11#ch_batch)
|
|
- combining with stream processing, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- dataflow engines, [Dataflow Engines](/en/ch11#sec_batch_dataflow)-[Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- fault tolerance, [Handling Faults](/en/ch11#id281), [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- for data integration, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338)
|
|
- graphs and iterative processing, [Machine Learning](/en/ch11#id290)
|
|
- high-level APIs and languages, [Query languages](/en/ch11#sec_batch_query_lanauges)-[Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- in cloud data warehouses, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- in distributed systems, [Batch Processing in Distributed Systems](/en/ch11#sec_batch_distributed)
|
|
- join and group by, [JOIN and GROUP BY](/en/ch11#sec_batch_join)-[JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- limitations, [Batch Processing](/en/ch11#ch_batch)
|
|
- log-based messaging and, [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- maintaining derived state, [Maintaining derived state](/en/ch13#id446)
|
|
- measuring performance, [Batch Processing](/en/ch11#ch_batch)
|
|
- models of, [Batch Processing Models](/en/ch11#id431)
|
|
- resource allocation, [Resource Allocation](/en/ch11#id279)-[Resource Allocation](/en/ch11#id279)
|
|
- resource managers, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- schedulers, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)-[Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)-[Shuffling Data](/en/ch11#sec_shuffle)
|
|
- task execution, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- use cases, [Batch Use Cases](/en/ch11#sec_batch_output)-[Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- using Unix tools (example), [Batch Processing with Unix Tools](/en/ch11#sec_batch_unix)-[Sorting Versus In-memory Aggregation](/en/ch11#id275)
|
|
- batch processing frameworks
|
|
- comparison to operating systems, [Batch Processing in Distributed Systems](/en/ch11#sec_batch_distributed)
|
|
- Beam (dataflow library), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- BERT (language model), [Vector Embeddings](/en/ch4#id92)
|
|
- bias, [Bias and Discrimination](/en/ch14#id370)
|
|
- bidirectional replication (see multi-leader replication)
|
|
- big ball of mud, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- big data
|
|
- versus data minimization, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- BigQuery (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Batch Processing](/en/ch11#ch_batch)
|
|
- DataFrames, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- Bigtable (database)
|
|
- sharding scheme, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- storage layout, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- tablets (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- binary data encodings, [Binary encoding](/en/ch5#binary-encoding)-[The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- MessagePack, [Binary encoding](/en/ch5#binary-encoding)-[Binary encoding](/en/ch5#binary-encoding)
|
|
- Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- binary encoding
|
|
- based on schemas, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- by network drivers, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- binary strings, lack of support in JSON and XML, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- Bitcoin (cryptocurrency), [Tools for auditable data systems](/en/ch13#id366)
|
|
- Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- concurrency bugs in exchanges, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- bitmap indexes, [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- BitTorrent uTP protocol, [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- Bkd-trees (indexes), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- blameless postmortems, [Humans and Reliability](/en/ch2#id31)
|
|
- Blazegraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- blob storage (see object storage)
|
|
- block (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- block device (disk), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- blockchains, [Summary](/en/ch3#summary)
|
|
- Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine), [Consensus](/en/ch10#sec_consistency_consensus), [Tools for auditable data systems](/en/ch13#id366)
|
|
- blocking atomic commit, [Three-phase commit](/en/ch8#three-phase-commit)
|
|
- Bloom filter (algorithm), [Bloom filters](/en/ch4#bloom-filters), [Read performance](/en/ch4#read-performance), [Stream analytics](/en/ch12#id318)
|
|
- BookKeeper (replicated log), [Allocating work to nodes](/en/ch10#allocating-work-to-nodes)
|
|
- bounded datasets, [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary)
|
|
- (see also batch processing)
|
|
- bounded delays, [Glossary](/en/glossary)
|
|
- in networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- process pauses, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime)
|
|
- broadcast
|
|
- total order broadcast (see shared logs)
|
|
- brokerless messaging, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- Brubeck (metrics aggregator), [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- BTM (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- Buf
|
|
- Bufstream (messaging), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- Bufstream (messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- build or buy, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)
|
|
- bursty network traffic patterns, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- business analyst, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- business data processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- business intelligence, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)-[Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- Business Process Execution Language (BPEL), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- Business Process Model and Notation (BPMN), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- example, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- byte sequence, encoding data in, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying), [System Model and Reality](/en/ch9#sec_distributed_system_model), [Glossary](/en/glossary)
|
|
- Byzantine fault-tolerant systems, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- Byzantine Generals Problem, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- consensus algorithms and, [Consensus](/en/ch10#sec_consistency_consensus), [Tools for auditable data systems](/en/ch13#id366)
|
|
|
|
### C
|
|
|
|
- caches, [Keeping everything in memory](/en/ch4#sec_storage_inmemory), [Glossary](/en/glossary)
|
|
- and materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- in CPUs, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- invalidation and maintenance, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- linearizability, [Linearizability](/en/ch10#sec_consistency_linearizability)
|
|
- local disks in the cloud, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- calendar sync, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- California Consumer Privacy Act (CCPA), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- Camunda (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- canonical version (of data), [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem)-[The CAP theorem](/en/ch10#the-cap-theorem), [Glossary](/en/glossary)
|
|
- capacity planning, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- Cap'n Proto (data format), [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- carbon emissions, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- cascading aborts, [No dirty reads](/en/ch8#no-dirty-reads)
|
|
- cascading failures, [Software faults](/en/ch2#software-faults), [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations), [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing)
|
|
- Cassandra (database)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- compaction strategy, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- consistency level ANY, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- last-write-wins conflict resolution, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)
|
|
- leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless)
|
|
- lightweight transactions, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- multi-region support, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- use of clocks, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- vnodes (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- cat (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis)
|
|
- catalog, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- causal context, [Version vectors](/en/ch6#version-vectors)
|
|
- (see also causal dependencies)
|
|
- causal dependencies, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)-[Version vectors](/en/ch6#version-vectors)
|
|
- capturing, [Version vectors](/en/ch6#version-vectors), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality), [Reads are events too](/en/ch13#sec_future_read_events)
|
|
- by total ordering, [The limits of total ordering](/en/ch13#id335)
|
|
- in transactions, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)
|
|
- sending message to friends (example), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- causality, [Glossary](/en/glossary)
|
|
- causal ordering
|
|
- total order consistent with, [Logical Clocks](/en/ch10#sec_consistency_timestamps)
|
|
- consistency with, [Logical Clocks](/en/ch10#sec_consistency_timestamps)-[Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks)
|
|
- happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- in serializable transactions, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- mismatch with clocks, [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- ordering events to capture, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- violations of, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix), [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- with synchronized clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- cell-based architecture, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- CEP (see complex event processing)
|
|
- CephFS (distributed filesystem), [Batch Processing](/en/ch11#ch_batch), [Object Stores](/en/ch11#id277)
|
|
- certificate transparency, [Tools for auditable data systems](/en/ch13#id366)
|
|
- cgroups, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- change data capture, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication), [Change Data Capture](/en/ch12#sec_stream_cdc)
|
|
- API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- comparison to event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- implementing, [Implementing change data capture](/en/ch12#id307)
|
|
- initial snapshot, [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot)
|
|
- log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction)
|
|
- changelogs, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)
|
|
- change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)
|
|
- for operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- in stream joins, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction)
|
|
- maintaining derived state, [Databases and Streams](/en/ch12#sec_stream_databases)
|
|
- chaos engineering, [Fault Tolerance](/en/ch2#id27), [Fault injection](/en/ch9#sec_fault_injection)
|
|
- checkpointing
|
|
- in high-performance computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- in stream processors, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- circuit breaker (limiting retries), [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- circuit-switched networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- circular buffers, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- circular replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)
|
|
- Citus (database)
|
|
- hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- ClickHouse (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- clickstream data, analysis of, [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- clients
|
|
- calling services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)
|
|
- offline-capable, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Stateful, offline-capable clients](/en/ch13#id347)
|
|
- pushing state changes to, [Pushing state changes to clients](/en/ch13#id348)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- ClockBound (time sync), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)
|
|
- use in YugabyteDB, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- clocks, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- atomic clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- confidence interval, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- for global snapshots, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks)
|
|
- logical (see logical clocks)
|
|
- skew, [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww), [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- slewing, [Monotonic clocks](/en/ch9#monotonic-clocks)
|
|
- synchronization and accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)-[Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- synchronization using GPS, [Unreliable Clocks](/en/ch9#sec_distributed_clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- time-of-day versus monotonic clocks, [Monotonic Versus Time-of-Day Clocks](/en/ch9#sec_distributed_monotonic_timeofday)
|
|
- timestamping events, [Whose clock are you using, anyway?](/en/ch12#id438)
|
|
- cloud services, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)-[Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- availability zones, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- need for service discovery, [Service discovery](/en/ch10#service-discovery)
|
|
- network glitches, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- pros and cons, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs)-[Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs)
|
|
- quotas, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- regions (see regions (geographic distribution))
|
|
- serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- shared resources, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- versus supercomputing, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- cloud-native, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)-[Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- Cloudflare
|
|
- R2 (see R2 (object storage))
|
|
- clustered indexes, [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- clustering (record ordering), [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- CockroachDB (database)
|
|
- consensus-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- key-range sharding, [Sharding](/en/ch7#ch_sharding), [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)
|
|
- sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167)
|
|
- transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- code generation
|
|
- for query execution, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- with Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)
|
|
- collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- column families (Bigtable), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)-[Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- column compression, [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- Parquet, [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage)
|
|
- sort order in, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)-[Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)
|
|
- vectorized processing, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- versus wide-column model, [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- writing to, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- comma-separated values (see CSV)
|
|
- command query responsibility segregation (CQRS), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)-[Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- commands (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- commits (transactions), [Transactions](/en/ch8#ch_transactions)
|
|
- atomic commit, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- (see also atomicity; transactions)
|
|
- read committed isolation, [Read Committed](/en/ch8#sec_transactions_read_committed)
|
|
- three-phase commit (3PC), [Three-phase commit](/en/ch8#three-phase-commit)
|
|
- two-phase commit (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Coordinator failure](/en/ch8#coordinator-failure)
|
|
- commutative operations, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- compaction
|
|
- of changelogs, [Log compaction](/en/ch12#sec_stream_log_compaction)
|
|
- (see also log compaction)
|
|
- for stream operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- of log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- issues with, [Read performance](/en/ch4#read-performance)
|
|
- size-tiered and leveled approaches, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- implementing locks, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- implementing uniqueness constraints, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness)
|
|
- on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- relation to consensus, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable), [Consensus](/en/ch10#sec_consistency_consensus), [Compare-and-set as consensus](/en/ch10#compare-and-set-as-consensus)
|
|
- relation to fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- relation to transactions, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- compatibility, [Encoding and Evolution](/en/ch5#ch_encoding), [Modes of Dataflow](/en/ch5#sec_encoding_dataflow)
|
|
- calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- properties of encoding formats, [Summary](/en/ch5#summary)
|
|
- using databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage)
|
|
- compensating transactions, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Loosely interpreted constraints](/en/ch13#id362)
|
|
- compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- complex event processing (CEP), [Complex event processing](/en/ch12#id317)
|
|
- complexity
|
|
- distilling in theoretical models, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world)
|
|
- essential and accidental, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- hiding using abstraction, [Data Models and Query Languages](/en/ch3#ch_datamodels)
|
|
- managing, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- composing data systems (see unbundling databases)
|
|
- compression
|
|
- in SSTables, [The SSTable file format](/en/ch4#the-sstable-file-format)
|
|
- compute-intensive applications, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- computer games, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- concatenated indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- in hash-sharded systems, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- concurrency
|
|
- actor programming model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc)
|
|
- (see also event-driven architecture)
|
|
- bugs from weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- definition, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)
|
|
- detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors)
|
|
- dual writes, problems with, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- in replicated systems, [Problems with Replication Lag](/en/ch6#sec_replication_lag)-[Version vectors](/en/ch6#version-vectors), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)
|
|
- multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- ordering of operations, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- reducing, through event logs, [Concurrency control](/en/ch12#sec_stream_concurrency), [Dataflow: Interplay between state changes and application code](/en/ch13#id450)
|
|
- time and relativity, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- transaction isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation)
|
|
- write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- conditional write, [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set)
|
|
- in transactions, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- conference management system (example), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- conflict-free replicated datatypes (CRDTs), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- for leaderless replication, [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship)
|
|
- preventing lost updates, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- conflicts
|
|
- avoidance, [Conflict avoidance](/en/ch6#conflict-avoidance)
|
|
- causal dependencies, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- conflict detection
|
|
- in distributed transactions, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- in log-based systems, [Uniqueness constraints require consensus](/en/ch13#id452)
|
|
- in serializable snapshot isolation (SSI), [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- in two-phase commit, [A system of promises](/en/ch8#a-system-of-promises)
|
|
- conflict resolution
|
|
- by aborting transactions, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- by apologizing, [Loosely interpreted constraints](/en/ch13#id362)
|
|
- last write wins (LWW), [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- using atomic operations, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- determining what is a conflict, [Types of conflict](/en/ch6#sec_replication_write_conflicts), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- in leaderless replication, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)
|
|
- lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- materializing, [Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- automatic, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution)
|
|
- in leaderless systems, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)
|
|
- last write wins (LWW), [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww)
|
|
- using custom logic, [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship)
|
|
- siblings, [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship)
|
|
- merging, [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship)
|
|
- write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- Confluent
|
|
- Freight (messaging), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- schema registry, [JSON Schema](/en/ch5#json-schema), [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- congestion (networks)
|
|
- avoidance, [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- limiting accuracy of clocks, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)
|
|
- queueing delays, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- consensus, [Consensus](/en/ch10#sec_consistency_consensus)-[Summary](/en/ch10#summary), [Glossary](/en/glossary)
|
|
- algorithms, [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- consensus numbers, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus)
|
|
- coordination services, [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery)
|
|
- cost of, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- impossibility of, [Consensus](/en/ch10#sec_consistency_consensus)
|
|
- preventing split brain, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- reconfiguration, [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- relation to atomic commitment, [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- relation to compare-and-set (CAS), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable), [Compare-and-set as consensus](/en/ch10#compare-and-set-as-consensus)
|
|
- relation to fetch-and-add, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus)
|
|
- relation to replication, [Using shared logs](/en/ch10#sec_consistency_smr)
|
|
- relation to shared logs, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs)
|
|
- relation to uniqueness constraints, [Uniqueness constraints require consensus](/en/ch13#id452)
|
|
- safety and liveness properties, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- single-value consensus, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- consent (GDPR), [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- across different databases, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions)
|
|
- causal, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix), [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix)-[Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix)
|
|
- consistent snapshots, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner), [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot), [Creating an index](/en/ch13#id340)
|
|
- (see also snapshots)
|
|
- crash recovery, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- enforcing constraints (see constraints)
|
|
- eventual, [Problems with Replication Lag](/en/ch6#sec_replication_lag)
|
|
- (see also eventual consistency)
|
|
- in ACID transactions, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- in CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem)
|
|
- in leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- in microservices, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems)
|
|
- linearizability, [Solutions for Replication Lag](/en/ch6#id131), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- meanings of, [Consistency](/en/ch8#sec_transactions_acid_consistency)
|
|
- monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads)-[Monotonic Reads](/en/ch6#sec_replication_monotonic_reads)
|
|
- of secondary indexes, [The need for multi-object transactions](/en/ch8#sec_transactions_need), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation), [Reasoning about dataflows](/en/ch13#id443), [Creating an index](/en/ch13#id340)
|
|
- read-after-write, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)-[Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- in derived data systems, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions)
|
|
- strong (see linearizability)
|
|
- timeliness and integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- using quorums, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- consistent hashing, [Consistent hashing](/en/ch7#sec_sharding_consistent_hashing)
|
|
- consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix)
|
|
- constraints (databases), [Consistency](/en/ch8#sec_transactions_acid_consistency), [Characterizing write skew](/en/ch8#characterizing-write-skew)
|
|
- asynchronously checked, [Loosely interpreted constraints](/en/ch13#id362)
|
|
- coordination avoidance, [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- ensuring idempotence, [Uniquely identifying requests](/en/ch13#id355)
|
|
- in log-based systems, [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Multi-shard request processing](/en/ch13#id360)
|
|
- across multiple shards, [Multi-shard request processing](/en/ch13#id360)
|
|
- in two-phase commit, [Distributed Transactions](/en/ch8#sec_transactions_distributed), [A system of promises](/en/ch8#a-system-of-promises)
|
|
- relation to consensus, [Uniqueness constraints require consensus](/en/ch13#id452)
|
|
- requiring linearizability, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness)
|
|
- Consul (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- use for service discovery, [Service discovery](/en/ch10#service-discovery)
|
|
- consumers (message streams), [Message brokers](/en/ch5#message-brokers), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- backpressure, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- consumer groups, [Multiple consumers](/en/ch12#id298)
|
|
- consumer offsets in logs, [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- failures, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering), [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- fan-out, [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing), [Multiple consumers](/en/ch12#id298), [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging)
|
|
- load balancing, [Multiple consumers](/en/ch12#id298), [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging)
|
|
- not keeping up with producers, [Messaging Systems](/en/ch12#sec_stream_messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage), [Making unbundling work](/en/ch13#sec_future_unbundling_favor)
|
|
- content models (JSON Schema), [JSON Schema](/en/ch5#json-schema)
|
|
- contention
|
|
- between transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- blocking threads, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- performance of optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- under two-phase locking, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking)
|
|
- context switches, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- convergence (conflict resolution), [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution)-[CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- coordination
|
|
- avoidance, [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- cross-datacenter, [The limits of total ordering](/en/ch13#id335)
|
|
- cross-region, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- cross-shard ordering, [Sharding](/en/ch8#sharding), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner), [Using shared logs](/en/ch10#sec_consistency_smr), [Multi-shard request processing](/en/ch13#id360)
|
|
- routing requests to shards, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- services, [Locking and leader election](/en/ch10#locking-and-leader-election), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery)
|
|
- coordinator (in 2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- failure, [Coordinator failure](/en/ch8#coordinator-failure)
|
|
- in XA transactions, [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- recovery, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure)
|
|
- copy-on-write (B-trees), [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- CORBA (Common Object Request Broker Architecture), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- coronal mass ejection (see solar storm)
|
|
- correctness
|
|
- auditability, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366)
|
|
- Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- dealing with partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- in log-based systems, [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Multi-shard request processing](/en/ch13#id360)
|
|
- of algorithm within system model, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm)
|
|
- of derived data, [Designing for auditability](/en/ch13#id365)
|
|
- of immutable data, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- of personal data, [Responsibility and Accountability](/en/ch14#id371), [Privacy and Use of Data](/en/ch14#id457)
|
|
- of time, [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- of transactions, [Consistency](/en/ch8#sec_transactions_acid_consistency), [Aiming for Correctness](/en/ch13#sec_future_correctness), [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- timeliness and integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)-[Coordination-avoiding data systems](/en/ch13#id454)
|
|
- corruption of data
|
|
- detecting, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364)-[Tools for auditable data systems](/en/ch13#id366)
|
|
- due to pathological memory access, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- due to radiation, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- due to split brain, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)
|
|
- due to weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- integrity as absence of, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- network packets, [Weak forms of lying](/en/ch9#weak-forms-of-lying)
|
|
- on disks, [Durability](/en/ch8#durability)
|
|
- preventing using write-ahead logs, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- recovering from, [Batch Processing](/en/ch11#ch_batch), [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- cosine similarity (semantic search), [Vector Embeddings](/en/ch4#id92)
|
|
- Couchbase (database)
|
|
- document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- durability, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- vBuckets (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- CouchDB (database)
|
|
- as sync engine, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- B-tree storage, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- conflict resolution, [Manual conflict resolution](/en/ch6#manual-conflict-resolution)
|
|
- coupling (loose and tight), [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability)
|
|
- covering indexes, [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- CozoDB (database), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- CPUs
|
|
- cache coherence and memory barriers, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- caching and pipelining, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- computing the wrong result, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- SIMD instructions, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- crash-stop and crash-recovery faults, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- CRDTs (see conflict-free replicated datatypes)
|
|
- CREATE INDEX statement (SQL), [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Creating an index](/en/ch13#id340)
|
|
- credit rating agencies, [Responsibility and Accountability](/en/ch14#id371)
|
|
- crypto-shredding, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- cryptocurrencies, [Summary](/en/ch3#summary)
|
|
- cryptography
|
|
- defense against attackers, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- end-to-end encryption and authentication, [The end-to-end argument](/en/ch13#sec_future_e2e_argument)
|
|
- CSV (comma-separated values), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- Curator (ZooKeeper recipes), [Locking and leader election](/en/ch10#locking-and-leader-election), [Allocating work to nodes](/en/ch10#allocating-work-to-nodes)
|
|
- Cypher (query language), [The Cypher Query Language](/en/ch3#id57)
|
|
- comparison to SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
|
|
### D
|
|
|
|
- Daft (processing framework)
|
|
- DataFrames, [DataFrames](/en/ch11#id287)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- Dagster (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- dashboard (business intelligence), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- Dask (processing framework), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- data catalog, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- data connectors, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- data contracts, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- data corruption (see corruption of data)
|
|
- data cubes, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- data engineering, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- data fabric, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- data formats (see encoding)
|
|
- data infrastructure, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- data integration, [Data Integration](/en/ch13#sec_future_integration)-[Unifying batch and stream processing](/en/ch13#id338), [Summary](/en/ch13#id367)
|
|
- batch and stream processing, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338)
|
|
- maintaining derived state, [Maintaining derived state](/en/ch13#id446)
|
|
- reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- unifying, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- by unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- comparison to federated databases, [The meta-database of everything](/en/ch13#id341)
|
|
- combining tools by deriving data, [Combining Specialized Tools by Deriving Data](/en/ch13#id442)-[Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- derived data versus distributed transactions, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions)
|
|
- limits of total ordering, [The limits of total ordering](/en/ch13#id335)
|
|
- ordering events to capture causality, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- reasoning about dataflows, [Reasoning about dataflows](/en/ch13#id443)
|
|
- need for, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- using batch processing, [Batch Processing](/en/ch11#ch_batch), [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- data lake, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- data lakehouse, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Analytics](/en/ch11#sec_batch_olap)
|
|
- data locality (see locality)
|
|
- data mesh, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- data minimization, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- data models, [Data Models and Query Languages](/en/ch3#ch_datamodels)-[Summary](/en/ch3#summary)
|
|
- DataFrames and arrays, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- graph-like models, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)-[GraphQL](/en/ch3#id63)
|
|
- Datalog language, [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- property graphs, [Property Graphs](/en/ch3#id56)
|
|
- RDF and triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- relational model versus document model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- supporting multiple, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- data pipelines, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- data products, [Beyond the data lake](/en/ch1#beyond-the-data-lake)
|
|
- data protection regulations (see GDPR)
|
|
- data residence laws, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- data science, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- data silo, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- data systems
|
|
- correctness, constraints, and integrity, [Aiming for Correctness](/en/ch13#sec_future_correctness)-[Tools for auditable data systems](/en/ch13#id366)
|
|
- data integration, [Data Integration](/en/ch13#sec_future_integration)-[Unifying batch and stream processing](/en/ch13#id338)
|
|
- goals for using, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- heterogeneous, keeping in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- maintainability, [Maintainability](/en/ch2#sec_introduction_maintainability)-[Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability)
|
|
- possible faults in, [Transactions](/en/ch8#ch_transactions)
|
|
- reliability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31)
|
|
- hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- human errors, [Humans and Reliability](/en/ch2#id31)
|
|
- importance of, [Humans and Reliability](/en/ch2#id31)
|
|
- software faults, [Software faults](/en/ch2#software-faults)
|
|
- scalability, [Scalability](/en/ch2#sec_introduction_scalability)-[Principles for Scalability](/en/ch2#id35)
|
|
- unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- unreliable clocks, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- data warehousing, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Glossary](/en/glossary)
|
|
- cloud-based solutions, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- ETL (extract-transform-load), [Data Warehousing](/en/ch1#sec_introduction_dwh), [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- for batch processing, [Batch Processing](/en/ch11#ch_batch)
|
|
- keeping data systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- schema design, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- slowly changing dimension (SCD), [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- data-intensive applications, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- database administrator, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- database-internal distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
- databases
|
|
- archival storage, [Archival storage](/en/ch5#archival-storage)
|
|
- comparison of message brokers to, [Message brokers compared to databases](/en/ch12#id297)
|
|
- dataflow through, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)
|
|
- end-to-end argument for, [The end-to-end argument](/en/ch13#sec_future_e2e_argument)-[Applying end-to-end thinking in data systems](/en/ch13#id357)
|
|
- checking integrity, [The end-to-end argument again](/en/ch13#id456)
|
|
- relation to event streams, [Databases and Streams](/en/ch12#sec_stream_databases)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- (see also changelogs)
|
|
- API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api), [Separation of application code and state](/en/ch13#id344)
|
|
- change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- keeping systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)-[Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- philosophy of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- unbundling, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- designing applications around dataflow, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345)
|
|
- observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- datacenters
|
|
- failures of, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- geographically distributed (see regions (geographic distribution))
|
|
- multitenancy and shared resources, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- network architecture, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- dataflow, [Modes of Dataflow](/en/ch5#sec_encoding_dataflow)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345)
|
|
- correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453)
|
|
- dataflow engines, [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- DataFrames, [DataFrames](/en/ch11#id287)
|
|
- support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch)
|
|
- event-driven, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- reasoning about, [Reasoning about dataflows](/en/ch13#id443)
|
|
- through databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)
|
|
- through services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- workflow engines (see workflow engines)
|
|
- DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- implementation, [DataFrames](/en/ch11#id287)
|
|
- in batch processing, [DataFrames](/en/ch11#id287)
|
|
- in notebooks, [Machine Learning](/en/ch11#id290)
|
|
- support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch)
|
|
- DataFusion (query engine), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- Datalog (query language), [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- Datastream (change data capture), [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- datatypes
|
|
- binary strings in XML and JSON, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- conflict-free, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- in Avro encodings, [Avro](/en/ch5#sec_encoding_avro)
|
|
- in Protocol Buffers, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- numbers in XML and JSON, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- Datensparsamkeit, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- Datomic (database)
|
|
- B-tree storage, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- data model, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph), [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- Datalog query language, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- excision (deleting data), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- languages for transactions, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- serial execution of transactions, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- Daylight Saving Time (DST), [Time-of-day clocks](/en/ch9#time-of-day-clocks)
|
|
- Db2 (database)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307)
|
|
- DBA (database administrator), [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- deadlocks, [Explicit locking](/en/ch8#explicit-locking)
|
|
- detection, in distributed transaction, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- in two-phase locking (2PL), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- Debezium (change data capture), [Implementing change data capture](/en/ch12#id307)
|
|
- Cassandra, [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- for data integration, [Unbundled versus integrated systems](/en/ch13#id448)
|
|
- declarative languages, [Data Models and Query Languages](/en/ch3#ch_datamodels), [Glossary](/en/glossary)
|
|
- and sync engines, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- in document databases, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58)
|
|
- SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- DeepSeek
|
|
- 3FS (see 3FS)
|
|
- delays
|
|
- bounded network delays, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- bounded process pauses, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime)
|
|
- unbounded network delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing)
|
|
- unbounded process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- deleting data, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- in LSM storage, [Disk space usage](/en/ch4#disk-space-usage)
|
|
- legal basis, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- Delta Lake (table format), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- demilitarized zone (networking), [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- denormalization (data representation), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)-[Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Glossary](/en/glossary)
|
|
- in derived data systems, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- in event sourcing/CQRS, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- in social network case study, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study)
|
|
- materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- updating derived data, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [The need for multi-object transactions](/en/ch8#sec_transactions_need), [Combining Specialized Tools by Deriving Data](/en/ch13#id442)
|
|
- versus normalization, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary)
|
|
- batch processing, [Batch Processing](/en/ch11#ch_batch)
|
|
- event sourcing and CQRS, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- from change data capture, [Implementing change data capture](/en/ch12#id307)
|
|
- maintaining derived state through logs, [Databases and Streams](/en/ch12#sec_stream_databases)-[API support for change streams](/en/ch12#sec_stream_change_api), [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- observing, by subscribing to streams, [End-to-end event streams](/en/ch13#id349)
|
|
- outputs of batch and stream processing, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)
|
|
- through application code, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation)
|
|
- versus distributed transactions, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions)
|
|
- design patterns, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- deterministic operations, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Glossary](/en/glossary)
|
|
- and idempotence, [Idempotence](/en/ch12#sec_stream_idempotence), [Reasoning about dataflows](/en/ch13#id443)
|
|
- computing derived data, [Maintaining derived state](/en/ch13#id446), [Correctness of dataflow systems](/en/ch13#id453), [Designing for auditability](/en/ch13#id365)
|
|
- in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- in state machine replication, [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases)
|
|
- in statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- in testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- joins, [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- making code deterministic, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- overview, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- deterministic simulation testing (DST), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- DevOps, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- dimension tables, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- dimensional modeling (see star schemas)
|
|
- directed acyclic graphs (DAG)
|
|
- workflows, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- (see also workflow engines)
|
|
- dirty reads (transaction isolation), [No dirty reads](/en/ch8#no-dirty-reads)
|
|
- dirty writes (transaction isolation), [No dirty writes](/en/ch8#sec_transactions_dirty_write)
|
|
- disaggregation
|
|
- of storage and compute, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- Discord (group chat)
|
|
- GraphQL example, [GraphQL](/en/ch3#id63)
|
|
- discrimination, [Bias and Discrimination](/en/ch14#id370)
|
|
- disks (see hard disks)
|
|
- distributed actor frameworks, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs)-[Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- comparison to object storage, [Object Stores](/en/ch11#id277)
|
|
- use by Flink, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- distributed ledgers, [Summary](/en/ch3#summary)
|
|
- distributed systems, [The Trouble with Distributed Systems](/en/ch9#ch_distributed)-[Summary](/en/ch9#summary), [Glossary](/en/glossary)
|
|
- Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying)
|
|
- detecting network faults, [Detecting Faults](/en/ch9#id307)
|
|
- faults and partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- formalization of consensus, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- impossibility results, [The CAP theorem](/en/ch10#the-cap-theorem), [Consensus](/en/ch10#sec_consistency_consensus)
|
|
- issues with failover, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- multi-region (see regions (geographic distribution))
|
|
- network problems, [Unreliable Networks](/en/ch9#sec_distributed_networks)-[Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- problems with, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems)
|
|
- quorums, relying on, [The Majority Rules](/en/ch9#sec_distributed_majority)
|
|
- reasons for using, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Replication](/en/ch6#ch_replication)
|
|
- synchronized clocks, relying on, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- system models, [System Model and Reality](/en/ch9#sec_distributed_system_model)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- use of clocks and time, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)
|
|
- distributed transactions (see transactions)
|
|
- Django (web framework), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- DMZ (demilitarized zone), [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- DNS (Domain Name System), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery)
|
|
- for load balancing, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- Docker (container manager), [Separation of application code and state](/en/ch13#id344)
|
|
- document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- comparison to relational model, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- multi-object transactions, need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- sharded secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)
|
|
- versus relational model
|
|
- convergence of models, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- document-partitioned indexes (see local secondary indexes)
|
|
- domain-driven design (DDD), [Simplicity: Managing Complexity](/en/ch2#id38), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- dotted version vectors, [Version vectors](/en/ch6#version-vectors)
|
|
- double-entry bookkeeping, [Summary](/en/ch3#summary)
|
|
- DRBD (Distributed Replicated Block Device), [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- drift (clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- Druid (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- pre-aggregation, [Analytics](/en/ch11#sec_batch_olap)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- Dryad (dataflow engine), [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- dual writes, problems with, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- DuckDB (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- duplicates, suppression of, [Duplicate suppression](/en/ch13#id354)
|
|
- (see also idempotence)
|
|
- using a unique ID, [Uniquely identifying requests](/en/ch13#id355), [Multi-shard request processing](/en/ch13#id360)
|
|
- durability (transactions), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability), [Glossary](/en/glossary)
|
|
- durable execution, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- Restate (see Restate (workflow engine))
|
|
- Temporal (see Temporal (workflow engine))
|
|
- durable functions (see workflow engines)
|
|
- duration (time), [Unreliable Clocks](/en/ch9#sec_distributed_clocks)
|
|
- measurement with monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks)
|
|
- dynamically typed languages
|
|
- analogy to schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- Dynamo (database), [Leaderless Replication](/en/ch6#sec_replication_leaderless)
|
|
- Dynamo-style databases (see leaderless replication)
|
|
- DynamoDB (database)
|
|
- auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- hash-range sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167)
|
|
|
|
### E
|
|
|
|
- EBS (virtual block device), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- compared to object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- ECC (see error-correcting codes)
|
|
- EDB Postgres Distributed (database), [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- edges (in graphs), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- property graph model, [Property Graphs](/en/ch3#id56)
|
|
- edit distance (full-text search), [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- effectively-once semantics, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance), [Exactly-once execution of an operation](/en/ch13#id353)
|
|
- (see also exactly-once semantics)
|
|
- preservation of integrity, [Correctness of dataflow systems](/en/ch13#id453)
|
|
- Elastic Compute Cloud (EC2)
|
|
- spot instances, [Handling Faults](/en/ch11#id281)
|
|
- elasticity, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- cloud data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- Elasticsearch (search server)
|
|
- local secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- percolator (stream search), [Search on streams](/en/ch12#id320)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- shard rebalancing, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- use of Lucene, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- Elm (programming language), [End-to-end event streams](/en/ch13#id349)
|
|
- ELT (extract-load-transform), [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- relation to batch processing, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- embarassingly parallel (algorithms)
|
|
- ETL (see ETL (extract-transform-load))
|
|
- MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- (see also MapReduce)
|
|
- embedded storage engines, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- embedding (vector), [Vector Embeddings](/en/ch4#id92)
|
|
- encodings (data formats), [Encoding and Evolution](/en/ch5#ch_encoding)-[The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- binary variants of JSON and XML, [Binary encoding](/en/ch5#binary-encoding)
|
|
- compatibility, [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- using databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage)
|
|
- defined, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- JSON, XML, and CSV, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- language-specific formats, [Language-Specific Formats](/en/ch5#id96)
|
|
- merits of schemas, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- representations of data, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- end-to-end argument, [The end-to-end argument](/en/ch13#sec_future_e2e_argument)-[Applying end-to-end thinking in data systems](/en/ch13#id357)
|
|
- checking integrity, [The end-to-end argument again](/en/ch13#id456)
|
|
- publish/subscribe streams, [End-to-end event streams](/en/ch13#id349)
|
|
- enrichment (stream), [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- Enterprise JavaBeans (EJB), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- enterprise software, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- entities (see vertices)
|
|
- ephemeral storage, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- epoch (consensus algorithms), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- epoch (Unix timestamps), [Time-of-day clocks](/en/ch9#time-of-day-clocks)
|
|
- erasure coding (error correction), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- error handling
|
|
- for network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- in transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- error-correcting codes, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- Esper (CEP engine), [Complex event processing](/en/ch12#id317)
|
|
- essential complexity, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- etcd (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery)
|
|
- generating fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- linearizable operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable), [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- locks and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- use for service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Service discovery](/en/ch10#service-discovery)
|
|
- use for shard assignment, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- use of Raft algorithm, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- Ethereum (blockchain), [Tools for auditable data systems](/en/ch13#id366)
|
|
- Ethernet (networks), [Cloud Computing Versus Supercomputing](/en/ch1#id17), [Unreliable Networks](/en/ch9#sec_distributed_networks), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- packet checksums, [Weak forms of lying](/en/ch9#weak-forms-of-lying), [The end-to-end argument](/en/ch13#sec_future_e2e_argument)
|
|
- ethics, [Doing the Right Thing](/en/ch14)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- code of ethics and professional practice, [Doing the Right Thing](/en/ch14)
|
|
- legislation and self-regulation, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- predictive analytics, [Predictive Analytics](/en/ch14#id369)-[Feedback Loops](/en/ch14#id372)
|
|
- amplifying bias, [Bias and Discrimination](/en/ch14#id370)
|
|
- feedback loops, [Feedback Loops](/en/ch14#id372)
|
|
- privacy and tracking, [Privacy and Tracking](/en/ch14#id373)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- consent and freedom of choice, [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- data as assets and power, [Data as Assets and Power](/en/ch14#id376)
|
|
- meaning of privacy, [Privacy and Use of Data](/en/ch14#id457)
|
|
- surveillance, [Surveillance](/en/ch14#id374)
|
|
- respect, dignity, and agency, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- unintended consequences, [Doing the Right Thing](/en/ch14), [Feedback Loops](/en/ch14#id372)
|
|
- ETL (extract-transform-load), [Data Warehousing](/en/ch1#sec_introduction_dwh), [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Glossary](/en/glossary)
|
|
- relation to batch processing, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)-[Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- using batch processing, [Batch Processing](/en/ch11#ch_batch)
|
|
- Euclidean distance (semantic search), [Vector Embeddings](/en/ch4#id92)
|
|
- European Union
|
|
- AI Act (see AI Act)
|
|
- GDPR (see GDPR)
|
|
- event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)-[Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- and change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- comparison to change data capture, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- immutability and auditability, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability), [Designing for auditability](/en/ch13#id365)
|
|
- large, reliable data systems, [Uniquely identifying requests](/en/ch13#id355), [Correctness of dataflow systems](/en/ch13#id453)
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- event streams (see streams)
|
|
- event-driven architecture, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)-[Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- distributed actor frameworks, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- events, [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- deciding on total order of, [The limits of total ordering](/en/ch13#id335)
|
|
- deriving views from event log, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- immutable, advantages of, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Designing for auditability](/en/ch13#id365)
|
|
- ordering to capture causality, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- reads as, [Reads are events too](/en/ch13#sec_future_read_events)
|
|
- stragglers, [Handling straggler events](/en/ch12#id323)
|
|
- timestamp of, in stream processing, [Whose clock are you using, anyway?](/en/ch12#id438)
|
|
- EventSource (browser API), [Pushing state changes to clients](/en/ch13#id348)
|
|
- EventStoreDB (database), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- eventual consistency, [Replication](/en/ch6#ch_replication), [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Safety and liveness](/en/ch9#sec_distributed_safety_liveness)
|
|
- (see also conflicts)
|
|
- and perpetual inconsistency, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- strong eventual consistency, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution)
|
|
- evidence
|
|
- data used as, [Humans and Reliability](/en/ch2#id31)
|
|
- evolvability, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- calling services, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- graph-structured data, [Property Graphs](/en/ch3#id56)
|
|
- of databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- schema evolution in Avro, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema)
|
|
- schema evolution in Protocol Buffers, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding), [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- exactly-once semantics, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance), [Exactly-once execution of an operation](/en/ch13#id353)
|
|
- parity with batch processors, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- preservation of integrity, [Correctness of dataflow systems](/en/ch13#id453)
|
|
- using durable execution, [Durable execution](/en/ch5#durable-execution)
|
|
- exclusive mode (locks), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- exponential backoff, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- ext4 (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- eXtended Architecture transactions (see XA transactions)
|
|
- extract-transform-load (see ETL)
|
|
|
|
### F
|
|
|
|
- Facebook
|
|
- Faiss (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- React (user interface library), [End-to-end event streams](/en/ch13#id349)
|
|
- social graphs, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- facts
|
|
- fact table (star schema), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- in Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- fail-slow faults, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- fail-stop model, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- failover, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Glossary](/en/glossary)
|
|
- (see also leader-based replication)
|
|
- in leaderless replication, absence of, [Writing to the Database When a Node Is Down](/en/ch6#id287)
|
|
- leader election, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing), [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- potential problems, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- failures
|
|
- amplification by distributed transactions, [Maintaining derived state](/en/ch13#id446)
|
|
- failure detection, [Detecting Faults](/en/ch9#id307)
|
|
- automatic rebalancing causing cascading failures, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- timeouts and unbounded delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- using a coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- faults versus, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)
|
|
- partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Summary](/en/ch9#summary)
|
|
- Faiss (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- false positive (Bloom filters), [Bloom filters](/en/ch4#bloom-filters)
|
|
- fan-out (messaging systems), [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing), [Multiple consumers](/en/ch12#id298)
|
|
- fault injection, [Fault Tolerance](/en/ch2#id27), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults), [Fault injection](/en/ch9#sec_fault_injection)
|
|
- fault isolation, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- fault tolerance, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31), [Glossary](/en/glossary)
|
|
- formalization in consensus, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- human fault tolerance, [Batch Processing](/en/ch11#ch_batch)
|
|
- in batch processing, [Handling Faults](/en/ch11#id281)
|
|
- in log-based systems, [Applying end-to-end thinking in data systems](/en/ch13#id357), [Timeliness and Integrity](/en/ch13#sec_future_integrity)-[Correctness of dataflow systems](/en/ch13#id453)
|
|
- in stream processing, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance)-[Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- atomic commit, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
- idempotence, [Idempotence](/en/ch12#sec_stream_idempotence)
|
|
- maintaining derived state, [Maintaining derived state](/en/ch13#id446)
|
|
- microbatching and checkpointing, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- rebuilding state after a failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- of distributed transactions, [XA transactions](/en/ch8#xa-transactions)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- of leader-based and leaderless replication, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- transaction atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity), [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing](/en/ch8#sec_transactions_exactly_once)
|
|
- faults
|
|
- Byzantine faults, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)-[Weak forms of lying](/en/ch9#weak-forms-of-lying)
|
|
- failures versus, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)
|
|
- handled by transactions, [Transactions](/en/ch8#ch_transactions)
|
|
- handling in supercomputers and cloud computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- hardware, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- in distributed systems, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- introducing deliberately (see fault injection)
|
|
- network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)-[Detecting Faults](/en/ch9#id307)
|
|
- asymmetric faults, [The Majority Rules](/en/ch9#sec_distributed_majority)
|
|
- detecting, [Detecting Faults](/en/ch9#id307)
|
|
- tolerance of, in multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- software faults, [Software faults](/en/ch2#software-faults)
|
|
- tolerating (see fault tolerance)
|
|
- feature engineering (machine learning), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- federated databases, [The meta-database of everything](/en/ch13#id341)
|
|
- Feldera (database)
|
|
- incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- fence (CPU instruction), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- fencing (preventing split brain), [Leader failure: Failover](/en/ch6#leader-failure-failover), [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas)
|
|
- generating fencing tokens, [Using shared logs](/en/ch10#sec_consistency_smr), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- properties of fencing tokens, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm)
|
|
- stream processors writing to databases, [Idempotence](/en/ch12#sec_stream_idempotence), [Exactly-once execution of an operation](/en/ch13#id353)
|
|
- fetch-and-add
|
|
- relation to consensus, [Fetch-and-add as consensus](/en/ch10#fetch-and-add-as-consensus)
|
|
- Fibre Channel (networks), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- field tags (Protocol Buffers), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- Figma (graphics software), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- filesystem in userspace (FUSE), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- on object storage, [Object Stores](/en/ch11#id277)
|
|
- financial data
|
|
- accounting ledgers, [Summary](/en/ch3#summary)
|
|
- immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- time series data, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- Fivetran, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- FizzBee (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- flat index (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- FlatBuffers (data format), [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- Flink (processing framework), [Batch Processing](/en/ch11#ch_batch), [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- cost efficiency, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287)
|
|
- fault tolerance, [Handling Faults](/en/ch11#id281), [Microbatching and checkpointing](/en/ch12#id329), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- FlinkML, [Machine Learning](/en/ch11#id290)
|
|
- for data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- high availability using ZooKeeper, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- integration of batch and stream processing, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- stream processing, [Stream analytics](/en/ch12#id318)
|
|
- streaming SQL support, [Complex event processing](/en/ch12#id317)
|
|
- flow control, [The Limitations of TCP](/en/ch9#sec_distributed_tcp), [Messaging Systems](/en/ch12#sec_stream_messaging), [Glossary](/en/glossary)
|
|
- FLP result (on consensus), [Consensus](/en/ch10#sec_consistency_consensus)
|
|
- Flyte (workflow scheduler), [Machine Learning](/en/ch11#id290)
|
|
- followers, [Single-Leader Replication](/en/ch6#sec_replication_leader), [Glossary](/en/glossary)
|
|
- (see also leader-based replication)
|
|
- formal methods, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- forward compatibility, [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- forward decay (algorithm), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- Fossil (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- shunning (deleting data), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- FoundationDB (database)
|
|
- consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- deterministic simulation testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- serializable transactions, [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi), [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- fractional indexing, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)
|
|
- fragmentation (of B-trees), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- frame (computer graphics), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- frontend (web development), [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- FrostDB (database)
|
|
- deterministic simulation testing (DST), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- fsync (system call), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability)
|
|
- full-text search, [Full-Text Search](/en/ch4#sec_storage_full_text), [Glossary](/en/glossary)
|
|
- and fuzzy indexes, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- Lucene storage engine, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- sharded indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)
|
|
- Function as a Service (FaaS), [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- functional programming
|
|
- inspiration for MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- functional requirements, [Defining Nonfunctional Requirements](/en/ch2#ch_nonfunctional)
|
|
- FUSE (see filesystem in userspace (FUSE))
|
|
- fuzzing, [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal)
|
|
- fuzzy search (see similarity search)
|
|
|
|
### G
|
|
|
|
- Gallina (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- game development, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- garbage collection
|
|
- immutability and, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- process pauses for, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact), [The Majority Rules](/en/ch9#sec_distributed_majority)
|
|
- (see also process pauses)
|
|
- gas stations algorithmic pricing, [Feedback Loops](/en/ch14#id372)
|
|
- GDPR (regulation), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- consent, [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- data minimization, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- legitimate interest, [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- right of access, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- GenBank (genome database), [Summary](/en/ch3#summary)
|
|
- General Data Protection Regulation (see GDPR (regulation))
|
|
- genome analysis, [Summary](/en/ch3#summary)
|
|
- geographic distribution (see regions (geographic distribution))
|
|
- geospatial indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- Git (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- merge conflicts, [Manual conflict resolution](/en/ch6#manual-conflict-resolution)
|
|
- GitHub, postmortems, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Leader failure: Failover](/en/ch6#leader-failure-failover), [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world)
|
|
- global secondary indexes, [Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary)
|
|
- globally unique identifiers (see UUIDs)
|
|
- GlusterFS (distributed filesystem), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277)
|
|
- GNU Coreutils (Linux), [Sorting Versus In-memory Aggregation](/en/ch11#id275)
|
|
- Go (programming language)
|
|
- garbage collection, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- GoldenGate (change data capture), [Implementing change data capture](/en/ch12#id307)
|
|
- (see also Oracle)
|
|
- Google
|
|
- BigQuery (see BigQuery (database))
|
|
- Bigtable (see Bigtable (database))
|
|
- Chubby (lock service), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- Cloud Storage (object storage), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Object Stores](/en/ch11#id277)
|
|
- request preconditions, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- Compute Engine
|
|
- preemptible instances, [Handling Faults](/en/ch11#id281)
|
|
- Dataflow (stream processing)
|
|
- data warehouse integration, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- Dataflow (stream processor), [Stream analytics](/en/ch12#id318), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- (see also Beam)
|
|
- Datastream (change data capture), [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- Docs (collaborative editor), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- operational transformation, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- Dremel (query engine), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- Firestore (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- MapReduce (batch processing), [Batch Processing](/en/ch11#ch_batch)
|
|
- (see also MapReduce)
|
|
- Percolator (transaction system), [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator)
|
|
- persistent disks (cloud service), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- Pub/Sub (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297), [Using logs for message storage](/en/ch12#id300)
|
|
- response time study, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- Sheets (collaborative spreadsheet), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- Spanner (see Spanner (database))
|
|
- TrueTime (clock API), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)
|
|
- gossip protocol, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- governance, [Beyond the data lake](/en/ch1#beyond-the-data-lake)
|
|
- government use of data, [Data as Assets and Power](/en/ch14#id376)
|
|
- GPS (Global Positioning System)
|
|
- use for clock synchronization, [Unreliable Clocks](/en/ch9#sec_distributed_clocks), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- GPT (language model), [Vector Embeddings](/en/ch4#id92)
|
|
- GPU (graphics processing unit), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- gradual rollout (see rolling upgrades)
|
|
- GraphQL (query language), [GraphQL](/en/ch3#id63)
|
|
- validation, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- graphs, [Glossary](/en/glossary)
|
|
- as data models, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)-[GraphQL](/en/ch3#id63)
|
|
- property graphs, [Property Graphs](/en/ch3#id56)
|
|
- RDF and triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- DAGs (see directed acyclic graphs)
|
|
- processing and analysis, [Machine Learning](/en/ch11#id290)
|
|
- query languages
|
|
- Cypher, [The Cypher Query Language](/en/ch3#id57)
|
|
- Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)-[Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- GraphQL, [GraphQL](/en/ch3#id63)
|
|
- Gremlin, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58)
|
|
- SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)-[The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- traversal, [Property Graphs](/en/ch3#id56)
|
|
- gray failures, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- in leaderless replication, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- Gremlin (graph query language), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- grep (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis)
|
|
- gRPC (service calls), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Web services](/en/ch5#sec_web_services)
|
|
- forward and backward compatibility, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- GUIDs (see UUIDs)
|
|
|
|
### H
|
|
|
|
- Hadoop (data infrastructure)
|
|
- comparison to distributed databases, [Batch Processing](/en/ch11#ch_batch)
|
|
- MapReduce (see MapReduce)
|
|
- NodeManager, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- YARN (see YARN (job scheduler))
|
|
- HANA (see SAP HANA (database))
|
|
- happens-before relation, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- hard disks
|
|
- access patterns, [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- detecting corruption, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- faults in, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults), [Durability](/en/ch8#durability)
|
|
- sequential vs. random writes, [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- sequential write throughput, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- hash function
|
|
- in Bloom filters, [Bloom filters](/en/ch4#bloom-filters)
|
|
- hash join
|
|
- in stream processing, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- hash sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash)-[Consistent hashing](/en/ch7#sec_sharding_consistent_hashing), [Summary](/en/ch7#summary)
|
|
- consistent hashing, [Consistent hashing](/en/ch7#sec_sharding_consistent_hashing)
|
|
- problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes)
|
|
- range queries, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- suitable hash functions, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash)
|
|
- with fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- hash tables, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)
|
|
- Hazelcast (in-memory data grid)
|
|
- FencedLock, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- Flake ID Generator, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- HBase (database)
|
|
- bug due to lack of fencing, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)
|
|
- key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- regions (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- size-tiered compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- HDFS (Hadoop Distributed File System), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- (see also distributed filesystems)
|
|
- checking data integrity, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- DataNode, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- NameNode, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- use in MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- HdrHistogram (numerical library), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- head (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278)
|
|
- head vertex (property graphs), [Property Graphs](/en/ch3#id56)
|
|
- head-of-line blocking, [Latency and Response Time](/en/ch2#id23)
|
|
- heap files (databases), [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- in multiversion concurrency control, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl)
|
|
- heat management, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- hedged requests, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- heterogeneous distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa), [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- heuristic decisions (in 2PC), [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure)
|
|
- Hex (notebook), [Machine Learning](/en/ch11#id290)
|
|
- hexagons
|
|
- for geospatial indexing, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- Hibernate (object-relational mapper), [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm)
|
|
- hierarchical model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- hierarchical navigable small world (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- hierarchical queries (see recursive common table expressions)
|
|
- high availability (see fault tolerance)
|
|
- high-frequency trading, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- high-performance computing (HPC), [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- hinted handoff (leaderless replication), [Catching up on missed writes](/en/ch6#sec_replication_read_repair)
|
|
- histograms, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- Hive (data warehouse), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- HNSW (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- hopping windows (stream processing), [Types of windows](/en/ch12#id324)
|
|
- (see also windows)
|
|
- Hoptimator (query engine), [The meta-database of everything](/en/ch13#id341)
|
|
- Horizon scandal, [Humans and Reliability](/en/ch2#id31)
|
|
- lack of transactions, [Transactions](/en/ch8#ch_transactions)
|
|
- horizontal scaling (see scaling out)
|
|
- by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- HornetQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- hot keys, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- hot spots, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- due to celebrities, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- for time-series data, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- relieving, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- hot standbys (see leader-based replication)
|
|
- HTAP (see hybrid transactional/analytic processing)
|
|
- HTTP, use in APIs (see services)
|
|
- human errors, [Humans and Reliability](/en/ch2#id31), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults), [Batch Processing](/en/ch11#ch_batch)
|
|
- hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks)
|
|
- hybrid transactional/analytic processing, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Data Storage for Analytics](/en/ch4#sec_storage_analytics)
|
|
- hydrating IDs (join), [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study)
|
|
- hypergraph, [Property Graphs](/en/ch3#id56)
|
|
- HyperLogLog (algorithm), [Stream analytics](/en/ch12#id318)
|
|
|
|
### I
|
|
|
|
- I/O operations, waiting for, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- IaaS (see infrastructure as a service (IaaS))
|
|
- IBM
|
|
- Db2 (database)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- serializable isolation, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- MQ (messaging), [Message brokers compared to databases](/en/ch12#id297)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- System R (database), [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview)
|
|
- WebSphere (messaging), [Message brokers](/en/ch5#message-brokers)
|
|
- Iceberg (table format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- databases on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- log-based message broker storage, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- idempotence, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc), [Idempotence](/en/ch12#sec_stream_idempotence), [Glossary](/en/glossary)
|
|
- by giving operations unique IDs, [Multi-shard request processing](/en/ch13#id360)
|
|
- by giving requests unique IDs, [Uniquely identifying requests](/en/ch13#id355)
|
|
- for exactly-once semantics, [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- idempotent operations, [Exactly-once execution of an operation](/en/ch13#id353)
|
|
- in workflow engines, [Durable execution](/en/ch5#durable-execution)
|
|
- immutability
|
|
- advantages of, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros), [Designing for auditability](/en/ch13#id365)
|
|
- and right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- crypto-shredding for deletion, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- deriving state from event log, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- for crash recovery, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- in B-trees, [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- limitations of, [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- impedance mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document)
|
|
- in doubt (transaction status), [Coordinator failure](/en/ch8#coordinator-failure)
|
|
- holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt)
|
|
- orphaned transactions, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure)
|
|
- in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- durability, [Durability](/en/ch8#durability)
|
|
- serial transaction execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- incidents
|
|
- accounting software bugs leading to wrongful convictions, [Humans and Reliability](/en/ch2#id31)
|
|
- blameless postmortems, [Humans and Reliability](/en/ch2#id31)
|
|
- crashes due to leap seconds, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- data corruption and financial losses due to concurrency bugs, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- data corruption on hard disks, [Durability](/en/ch8#durability)
|
|
- data loss due to last-write-wins, [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- data on disks unreadable, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world)
|
|
- disclosure of sensitive data due to primary key reuse, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- errors in transaction serializability, [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- gigabit network interface with 1 Kb/s throughput, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- leap second crash, [Software faults](/en/ch2#software-faults)
|
|
- network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- network interface dropping only inbound packets, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- network partitions and whole-datacenter failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- poor handling of network faults, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- sending message to ex-partner, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- sharks biting undersea cables, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- split brain due to 1-minute packet delay, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- SSD failure after 32,768 hours, [Software faults](/en/ch2#software-faults)
|
|
- thread contention bringing down a service, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- vibrations in server rack, [Latency and Response Time](/en/ch2#id23)
|
|
- violation of uniqueness constraint, [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- incremental view maintenance (IVM), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- for data integration, [Unbundled versus integrated systems](/en/ch13#id448)
|
|
- indexes, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [Glossary](/en/glossary)
|
|
- and snapshot isolation, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants)
|
|
- clustered, [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- comparison of B-trees and LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage)
|
|
- covering (with included columns), [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- creating, [Creating an index](/en/ch13#id340)
|
|
- full-text search, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- geospatial, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- index-range locking, [Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- multi-column (concatenated), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- secondary, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn)
|
|
- (see also secondary indexes)
|
|
- problems with dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Reasoning about dataflows](/en/ch13#id443)
|
|
- sharding and secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary)
|
|
- sparse, [The SSTable file format](/en/ch4#the-sstable-file-format)
|
|
- SSTables and LSM-trees, [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- updating when data changes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- Industrial Revolution, [Remembering the Industrial Revolution](/en/ch14#id377)
|
|
- InfiniBand (networks), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- InfluxDB IOx (storage engine), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- information retrieval (see full-text search)
|
|
- infrastructure as a service (IaaS), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Layering of cloud services](/en/ch1#layering-of-cloud-services)
|
|
- InnoDB (storage engine)
|
|
- clustered index on primary key, [Storing values within the index](/en/ch4#sec_storage_index_heap)
|
|
- not preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates)
|
|
- preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- serializable isolation, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- instance (cloud computing), [Layering of cloud services](/en/ch1#layering-of-cloud-services)
|
|
- integrating different data systems (see data integration)
|
|
- integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- coordination-avoiding data systems, [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453)
|
|
- in consensus formalization, [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- integrity checks, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- (see also auditing)
|
|
- end-to-end, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [The end-to-end argument again](/en/ch13#id456)
|
|
- use of snapshot isolation, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- maintaining despite software bugs, [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- Interface Definition Language (IDL), [Protocol Buffers](/en/ch5#sec_encoding_protobuf), [Avro](/en/ch5#sec_encoding_avro), [Web services](/en/ch5#sec_web_services)
|
|
- invariants, [Consistency](/en/ch8#sec_transactions_acid_consistency)
|
|
- (see also constraints)
|
|
- inverted file index (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- inverted index, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- irreversibility, minimizing, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events), [Batch Processing](/en/ch11#ch_batch)
|
|
- ISDN (Integrated Services Digital Network), [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- isolation (in operating systems)
|
|
- cgroups (see cgroups)
|
|
- isolation (in transactions), [Isolation](/en/ch8#sec_transactions_acid_isolation), [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object), [Glossary](/en/glossary)
|
|
- correctness and, [Aiming for Correctness](/en/ch13#sec_future_correctness)
|
|
- for single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- serializability, [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- actual serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution)
|
|
- serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- violating, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- weak isolation levels, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- read committed, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- snapshot isolation, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- IVF (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
|
|
### J
|
|
|
|
- Java Database Connectivity (JDBC)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- network drivers, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- Java Enterprise Edition (EE), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions)
|
|
- Java Message Service (JMS), [Message brokers compared to databases](/en/ch12#id297)
|
|
- (see also messaging systems)
|
|
- comparison to log-based messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- message ordering, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- Java Transaction API (JTA), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions)
|
|
- Java Virtual Machine (JVM)
|
|
- garbage collection, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses), [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- JIT compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- process reuse in batch processors, [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- Jena (RDF framework), [The RDF data model](/en/ch3#the-rdf-data-model)
|
|
- SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- Jepsen (fault tolerance testing), [Fault injection](/en/ch9#sec_fault_injection), [Aiming for Correctness](/en/ch13#sec_future_correctness)
|
|
- jitter (network delay), [Average, Median, and Percentiles](/en/ch2#id24), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- JMESPath (query language), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- join table, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Property Graphs](/en/ch3#id56)
|
|
- joins, [Glossary](/en/glossary)
|
|
- expressing as relational operators, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- handling GraphQL query, [GraphQL](/en/ch3#id63)
|
|
- in application code, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study)
|
|
- in DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- in relational and document databases, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)
|
|
- secondary indexes and, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn)
|
|
- sort-merge joins, [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- stream joins, [Stream Joins](/en/ch12#sec_stream_joins)-[Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- stream-stream join, [Stream-stream join (window join)](/en/ch12#id440)
|
|
- stream-table join, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- table-table join, [Table-table join (materialized view maintenance)](/en/ch12#id326)
|
|
- time-dependence of, [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- support in document databases, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- JOTM (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- journaling (filesystems), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- JSON
|
|
- aggregation pipeline (query language), [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- Avro schema representation, [Avro](/en/ch5#sec_encoding_avro)
|
|
- binary variants, [Binary encoding](/en/ch5#binary-encoding)
|
|
- data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- for application data, issues with, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- GraphQL response, [GraphQL](/en/ch3#id63)
|
|
- in relational databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- representing a résumé (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- Schema, [JSON Schema](/en/ch5#json-schema)
|
|
- JSON-LD, [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- JsonPath (query language), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- JuiceFS (distributed filesystem), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277)
|
|
- Jupyter (notebook), [Machine Learning](/en/ch11#id290)
|
|
- just-in-time (JIT) compilation, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
|
|
### K
|
|
|
|
- Kafka (messaging), [Message brokers](/en/ch5#message-brokers), [Using logs for message storage](/en/ch12#id300)
|
|
- consumer groups, [Multiple consumers](/en/ch12#id298)
|
|
- for data integration, [Unbundled versus integrated systems](/en/ch13#id448)
|
|
- for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- Kafka Connect (database integration), [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- Kafka Streams (stream processor), [Stream analytics](/en/ch12#id318), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- exactly-once semantics, [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- fault tolerance, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- ksqlDB (stream database), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- log compaction, [Log compaction](/en/ch12#sec_stream_log_compaction), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- message offsets, [Using logs for message storage](/en/ch12#id300), [Idempotence](/en/ch12#sec_stream_idempotence)
|
|
- partitions (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- schema registry, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- tiered storage, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- transactions, [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal), [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
- unclean leader election, [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- kappa architecture, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- key-value stores, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)
|
|
- comparison to object stores, [Object Stores](/en/ch11#id277)
|
|
- in-memory, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- LSM storage, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)-[Disk space usage](/en/ch4#disk-space-usage)
|
|
- sharding, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)-[Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- by hash of key, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Summary](/en/ch7#summary)
|
|
- by key range, [Sharding by Key Range](/en/ch7#sec_sharding_key_range), [Summary](/en/ch7#summary)
|
|
- skew and hot spots, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- Kinesis (messaging), [Message brokers](/en/ch5#message-brokers), [Using logs for message storage](/en/ch12#id300)
|
|
- data warehouse integration, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- Kryo (Java), [Language-Specific Formats](/en/ch5#id96)
|
|
- ksqlDB (stream database), [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- Kubernetes (cluster manager), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Distributed Job Orchestration](/en/ch11#id278), [Separation of application code and state](/en/ch13#id344)
|
|
- Kubeflow, [Machine Learning](/en/ch11#id290)
|
|
- kubelet, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- operators, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- use of etcd, [Request Routing](/en/ch7#sec_sharding_routing), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- KùzuDB (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- as embedded storage engine, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- Cypher query language, [The Cypher Query Language](/en/ch3#id57)
|
|
|
|
### L
|
|
|
|
- labeled property graphs (see property graphs)
|
|
- lambda architecture, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- Lamport timestamps, [Lamport timestamps](/en/ch10#lamport-timestamps)
|
|
- Lance (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- (see also column-oriented storage)
|
|
- large language models (LLMs)
|
|
- pre-processing training data, [Machine Learning](/en/ch11#id290)
|
|
- last write wins (LWW), [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww), [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- problems with, [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- prone to lost updates, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- latency, [Latency and Response Time](/en/ch2#id23)
|
|
- (see also response time)
|
|
- across regions, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- instability under two-phase locking, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking)
|
|
- network latency and resource utilization, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- reducing by request hedging, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- response time versus, [Latency and Response Time](/en/ch2#id23)
|
|
- tail latency, [Average, Median, and Percentiles](/en/ch2#id24), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Local Secondary Indexes](/en/ch7#id166)
|
|
- law (see legal matters)
|
|
- layering (of cloud services), [Layering of cloud services](/en/ch1#layering-of-cloud-services)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- (see also replication)
|
|
- failover, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)
|
|
- handling node outages, [Handling Node Outages](/en/ch6#sec_replication_failover)
|
|
- implementation of replication logs
|
|
- change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- (see also changelogs)
|
|
- statement-based, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- write-ahead log (WAL) shipping, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping)
|
|
- linearizability of operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- locking and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- read-scaling architecture, [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- relation to consensus, [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus), [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- setting up new followers, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- synchronous versus asynchronous, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)-[Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)
|
|
- leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless)-[Version vectors](/en/ch6#version-vectors)
|
|
- (see also replication)
|
|
- catching up on missed writes, [Catching up on missed writes](/en/ch6#sec_replication_read_repair)
|
|
- detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors)
|
|
- version vectors, [Version vectors](/en/ch6#version-vectors)
|
|
- multi-region, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- quorums, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition)-[Multi-region operation](/en/ch6#multi-region-operation)
|
|
- consistency limitations, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- leap seconds, [Software faults](/en/ch2#software-faults), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- in time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks)
|
|
- leases, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- implementation with coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- need for fencing, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)
|
|
- relation to consensus, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- ledgers (accounting), [Summary](/en/ch3#summary)
|
|
- immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- legacy systems, maintenance of, [Maintainability](/en/ch2#sec_introduction_maintainability)
|
|
- legal matters, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)-[Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- data deletion, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- data residence, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- privacy regulation, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- legitimate interest (GDPR), [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- leveled compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- Levenshtein automata, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- limping (partial failure), [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- Linear (project management software), [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- linear algebra, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- linear scalability, [Describing Load](/en/ch2#id33)
|
|
- linearizability, [Solutions for Replication Lag](/en/ch6#id131), [Linearizability](/en/ch10#sec_consistency_linearizability)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays), [Glossary](/en/glossary)
|
|
- and consensus, [Consensus](/en/ch10#sec_consistency_consensus)
|
|
- cost of, [The Cost of Linearizability](/en/ch10#sec_linearizability_cost)-[Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- CAP theorem, [The CAP theorem](/en/ch10#the-cap-theorem)
|
|
- memory on multi-core CPUs, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- definition, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)-[What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- ID generation, [Linearizable ID Generators](/en/ch10#sec_consistency_linearizable_id)
|
|
- in coordination services, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- of derived data systems
|
|
- avoiding coordination, [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- of different replication methods, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)-[Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- using quorums, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- reads in consensus systems, [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- relying on, [Relying on Linearizability](/en/ch10#sec_consistency_linearizability_usage)-[Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- constraints and uniqueness, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness)
|
|
- cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- locking and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- versus serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- linked data, [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- LinkedIn
|
|
- Espresso (database), [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- LIquid (database), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- profile (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- Linux, leap second bug, [Software faults](/en/ch2#software-faults), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- Litestream (backup tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- liveness properties, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness)
|
|
- LLVM (compiler), [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- LMDB (storage engine), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [B-tree variants](/en/ch4#b-tree-variants), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- load
|
|
- coping with, [Principles for Scalability](/en/ch2#id35)
|
|
- describing, [Describing Load](/en/ch2#id33)
|
|
- load balancing, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- in hardware, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- in software, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- using message brokers, [Multiple consumers](/en/ch12#id298)
|
|
- load shedding, [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- local secondary indexes, [Local Secondary Indexes](/en/ch7#id166), [Summary](/en/ch7#summary)
|
|
- local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- locality (data access), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality), [Glossary](/en/glossary)
|
|
- in batch processing, [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- in stateful clients, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients), [Stateful, offline-capable clients](/en/ch13#id347)
|
|
- in stream processing, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance), [Stream processors and services](/en/ch13#id345), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- location transparency, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- in the actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks)
|
|
- lock-in, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs)
|
|
- locks, [Glossary](/en/glossary)
|
|
- deadlock, [Explicit locking](/en/ch8#explicit-locking), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- distributed locking, [Distributed Locks and Leases](/en/ch9#sec_distributed_lock_fencing)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas), [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- implementation with coordination service, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- relation to consensus, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- for transaction isolation
|
|
- in snapshot isolation, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl)
|
|
- in two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- making operations atomic, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- performance, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking)
|
|
- preventing dirty writes, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- preventing phantoms with index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range), [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- read locks (shared mode), [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- shared mode and exclusive mode, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- in distributed transactions
|
|
- deadlock detection, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- in-doubt transactions holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt)
|
|
- materializing conflicts with, [Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- preventing lost updates by explicit locking, [Explicit locking](/en/ch8#explicit-locking)
|
|
- log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- logical clocks, [Timestamps for ordering events](/en/ch9#sec_distributed_lww), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)-[Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks), [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- for last-write-wins, [Last write wins (discarding concurrent writes)](/en/ch6#sec_replication_lww)
|
|
- for read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks)
|
|
- insufficiency for enforcing constraints, [Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks)
|
|
- Lamport timestamps, [Lamport timestamps](/en/ch10#lamport-timestamps)
|
|
- logical replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- for change data capture, [Implementing change data capture](/en/ch12#id307)
|
|
- LogicBlox (database), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- logs (data structure), [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp), [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs), [Glossary](/en/glossary)
|
|
- (see also shared logs)
|
|
- advantages of immutability, [Advantages of immutable events](/en/ch12#sec_stream_immutability_pros)
|
|
- and right to erasure, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- compaction, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Log compaction](/en/ch12#sec_stream_log_compaction), [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)
|
|
- for stream operator state, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- implementing uniqueness constraints, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- log-based messaging, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- comparison to traditional messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- consumer offsets, [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- disk space usage, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- replaying old messages, [Replaying old messages](/en/ch12#sec_stream_replay), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- slow consumers, [When consumers cannot keep up with producers](/en/ch12#id459)
|
|
- using logs for message storage, [Using logs for message storage](/en/ch12#id300)
|
|
- log-structured storage, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- log-structured merge tree (see LSM-trees)
|
|
- relation to consensus, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs)
|
|
- replication, [Single-Leader Replication](/en/ch6#sec_replication_leader), [Implementation of Replication Logs](/en/ch6#sec_replication_implementation)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- (see also changelogs)
|
|
- coordination with snapshot, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- logical (row-based) replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- write-ahead log (WAL) shipping, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping)
|
|
- scalability limits, [The limits of total ordering](/en/ch13#id335)
|
|
- Looker (business intelligence software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap)
|
|
- loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor)
|
|
- lost updates (see updates)
|
|
- Lotus Notes (sync engine), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- LSM-trees (indexes), [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- comparison to B-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage)
|
|
- Lucene (storage engine), [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- similarity search, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- LWW (see last write wins)
|
|
|
|
### M
|
|
|
|
- machine learning
|
|
- batch inference, [Machine Learning](/en/ch11#id290)
|
|
- data preparation with DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- deleting training data, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- deploying data products, [Beyond the data lake](/en/ch1#beyond-the-data-lake)
|
|
- ethical considerations, [Predictive Analytics](/en/ch14#id369)
|
|
- (see also ethics)
|
|
- feature engineering, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Machine Learning](/en/ch11#id290)
|
|
- in analytics systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- iterative processing, [Machine Learning](/en/ch11#id290)
|
|
- LLMs (see large language models (LLMs))
|
|
- models derived from training data, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation)
|
|
- relation to batch processing, [Machine Learning](/en/ch11#id290)-[Machine Learning](/en/ch11#id290)
|
|
- using a data lake, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- using GPUs, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- using matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- madsim (deterministic simulation testing), [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- magic scaling sauce, [Principles for Scalability](/en/ch2#id35)
|
|
- maintainability, [Maintainability](/en/ch2#sec_introduction_maintainability)-[Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy)
|
|
- evolvability (see evolvability)
|
|
- operability, [Operability: Making Life Easy for Operations](/en/ch2#id37)
|
|
- simplicity and managing complexity, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many)
|
|
- modeling as graphs, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- many-to-one relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many)
|
|
- in star schema, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- MapReduce (batch processing), [Batch Processing](/en/ch11#ch_batch), [MapReduce](/en/ch11#sec_batch_mapreduce)-[MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- analysis of user activity events (example), [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- disadvantages and limitations of, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- fault tolerance, [Handling Faults](/en/ch11#id281)
|
|
- higher-level tools, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- mapper and reducer functions, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- sort-merge joins, [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- workflows, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- (see also workflow engines)
|
|
- marshalling (see encoding)
|
|
- MartenDB (database), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- master-slave replication (obsolete term), [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- materialization, [Glossary](/en/glossary)
|
|
- aggregate values, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- materialized views, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- as derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- in event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- (see also incremental view maintenance (IVM))
|
|
- maintaining, using stream processing, [Maintaining materialized views](/en/ch12#sec_stream_mat_view), [Table-table join (materialized view maintenance)](/en/ch12#id326)
|
|
- social network timeline example, [Materializing and Updating Timelines](/en/ch2#sec_introduction_materializing)
|
|
- Materialize (database), [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- sparse, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- Maxwell (change data capture), [Implementing change data capture](/en/ch12#id307)
|
|
- mean, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- media monitoring, [Search on streams](/en/ch12#id320)
|
|
- median, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- meeting room booking (example), [More examples of write skew](/en/ch8#more-examples-of-write-skew), [Predicate locks](/en/ch8#predicate-locks), [Enforcing Constraints](/en/ch13#sec_future_constraints)
|
|
- Memcached (caching server), [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- Memgraph (database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- Cypher query language, [The Cypher Query Language](/en/ch3#id57)
|
|
- memory
|
|
- barrier (CPU instruction), [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- corruption, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- durability, [Durability](/en/ch8#durability)
|
|
- serial transaction execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- in-memory representation of data, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- memtable (in LSM-trees), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- random bit-flips in, [Trust, but Verify](/en/ch13#sec_future_verification)
|
|
- use by indexes, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)
|
|
- memtable (in LSM-trees), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- Mercurial (version control system), [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- merge (DataFrame operator), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- merging sorted files, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- Merkle trees, [Tools for auditable data systems](/en/ch13#id366)
|
|
- Mesos (cluster manager), [Separation of application code and state](/en/ch13#id344)
|
|
- message brokers (see messaging systems)
|
|
- message-passing (see event-driven architecture)
|
|
- MessagePack (encoding format), [Binary encoding](/en/ch5#binary-encoding)
|
|
- messaging systems, [Stream Processing](/en/ch12#ch_stream)-[Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- (see also streams)
|
|
- backpressure, buffering, or dropping messages, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- brokerless messaging, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- event logs, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- as data model, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- comparison to traditional messaging, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- consumer offsets, [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- replaying old messages, [Replaying old messages](/en/ch12#sec_stream_replay), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- slow consumers, [When consumers cannot keep up with producers](/en/ch12#id459)
|
|
- exactly-once semantics, [Exactly-once message processing](/en/ch8#sec_transactions_exactly_once), [Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited), [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance)
|
|
- message brokers, [Message brokers](/en/ch12#id433)-[Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- acknowledgements and redelivery, [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- comparison to event logs, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- multiple consumers of same topic, [Multiple consumers](/en/ch12#id298)
|
|
- versus RPC, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)
|
|
- message loss, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- reliability, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- uniqueness in log-based messaging, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- metastable failure, [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- metered billing
|
|
- serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- storage, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- microbatching, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- microservices, [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- (see also services)
|
|
- causal dependencies across services, [The limits of total ordering](/en/ch13#id335)
|
|
- loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor)
|
|
- relation to batch/stream processors, [Batch Processing](/en/ch11#ch_batch), [Stream processors and services](/en/ch13#id345)
|
|
- Microsoft
|
|
- Azure Blob Storage (see Azure Blob Storage)
|
|
- Azure managed disks, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- Azure Service Bus (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297)
|
|
- Azure SQL DB (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- Azure Storage, [Object Stores](/en/ch11#id277)
|
|
- Azure Stream Analytics, [Stream analytics](/en/ch12#id318)
|
|
- Azure Synapse Analytics (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- DCOM (Distributed Component Object Model), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- MSDTC (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- SQL Server (see SQL Server)
|
|
- Microsoft Power BI (see Power BI (business intelligence software))
|
|
- migrating (rewriting) data, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Different values written at different times](/en/ch5#different-values-written-at-different-times), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- MinIO (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- mobile apps, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- embedded databases, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- model checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- modulus operator (%), [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes)
|
|
- Mojo (programming language)
|
|
- memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- MongoDB (database)
|
|
- aggregation pipeline, [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- atomic operations, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- BSON, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- document data model, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- joins (\$lookup operator), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)
|
|
- JSON Schema validation, [JSON Schema](/en/ch5#json-schema)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- ObjectIds, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- range-based sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- shard splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)
|
|
- stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- monitoring, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations), [Humans and Reliability](/en/ch2#id31), [Operability: Making Life Easy for Operations](/en/ch2#id37)
|
|
- monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks)
|
|
- monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads)
|
|
- Morel (query language), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- MSMQ (messaging), [XA transactions](/en/ch8#xa-transactions)
|
|
- multi-column indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- multi-leader replication, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)-[Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- (see also replication)
|
|
- collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- conflict detection, [Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)
|
|
- for multi-region replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost)
|
|
- linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- offline-capable clients, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients)
|
|
- replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)-[Problems with different topologies](/en/ch6#problems-with-different-topologies)
|
|
- multi-object transactions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- Multi-Paxos (consensus algorithm), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- multi-reader single-writer lock, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- multi-table index cluster tables (Oracle), [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Summary](/en/ch8#summary)
|
|
- detecting stale MVCC reads, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads)
|
|
- indexes and snapshot isolation, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- using synchronized clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- multidimensional arrays, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- multitenancy, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- by sharding, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- using embedded databases, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- versus Byzantine fault tolerance, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- mutual exclusion, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- (see also locks)
|
|
- MySQL (database)
|
|
- archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- binlog coordinates, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- circular replication topology, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)
|
|
- consistent snapshots, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- global transaction identifiers (GTIDs), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- InnoDB storage engine (see InnoDB)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- row-based replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- sharding (see Vitess (database))
|
|
- snapshot isolation support, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- (see also InnoDB)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
|
|
### N
|
|
|
|
- N+1 query problem, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm)
|
|
- nanomsg (messaging library), [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- Narayana (transaction coordinator), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
- NATS (messaging), [Message brokers](/en/ch5#message-brokers)
|
|
- natural language processing, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- Neo4j (database)
|
|
- Cypher query language, [The Cypher Query Language](/en/ch3#id57)
|
|
- graph data model, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- Neon (database), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- Nephele (dataflow engine), [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- Neptune (graph database), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- Cypher query language, [The Cypher Query Language](/en/ch3#id57)
|
|
- SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- netcode (game development), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- Network Attached Storage (NAS), [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- network model (data representation), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- Network Time Protocol (see NTP)
|
|
- networks
|
|
- congestion and queueing, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- datacenter network topologies, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- faults (see faults)
|
|
- linearizability and network delays, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- network partitions, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- in CAP theorem, [The Cost of Linearizability](/en/ch10#sec_linearizability_cost)
|
|
- timeouts and unbounded delays, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing)
|
|
- NewSQL, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Solutions for Replication Lag](/en/ch6#id131)
|
|
- transactions and, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- next-key locking, [Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- NFS (network file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- on object storage, [Object Stores](/en/ch11#id277)
|
|
- Nimble (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- (see also column-oriented storage)
|
|
- node (in graphs) (see vertices)
|
|
- nodes (processes), [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Glossary](/en/glossary)
|
|
- handling outages in leader-based replication, [Handling Node Outages](/en/ch6#sec_replication_failover)
|
|
- system models for failure, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- noisy neighbors, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- nonblocking atomic commit, [Three-phase commit](/en/ch8#three-phase-commit)
|
|
- nondeterministic operations, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- (see also deterministic operations)
|
|
- in distributed systems, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- in workflow engines, [Durable execution](/en/ch5#durable-execution)
|
|
- partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- sources of nondeterminism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- nonfunctional requirements, [Defining Nonfunctional Requirements](/en/ch2#ch_nonfunctional), [Summary](/en/ch2#summary)
|
|
- nonrepeatable reads, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- (see also read skew)
|
|
- normalization (data representation), [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)-[Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many), [Glossary](/en/glossary)
|
|
- foreign key references, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- in social network case study, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study)
|
|
- in systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- versus denormalization, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- NoSQL, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Solutions for Replication Lag](/en/ch6#id131), [Unbundling Databases](/en/ch13#sec_future_unbundling)
|
|
- transactions and, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview)
|
|
- Notation3 (N3), [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- NTP (Network Time Protocol), [Unreliable Clocks](/en/ch9#sec_distributed_clocks)
|
|
- accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy), [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- adjustments to monotonic clocks, [Monotonic clocks](/en/ch9#monotonic-clocks)
|
|
- multiple server addresses, [Weak forms of lying](/en/ch9#weak-forms-of-lying)
|
|
- numbers, in XML and JSON encodings, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- NumPy (Python library), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- NVMe (Non-Volatile Memory Express) (see solid state drives (SSDs))
|
|
|
|
### O
|
|
|
|
- object databases, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- object storage, [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Object Stores](/en/ch11#id277)-[Object Stores](/en/ch11#id277)
|
|
- Azure Blob Storage (see Azure Blob Storage)
|
|
- comparison to distributed filesystems, [Object Stores](/en/ch11#id277)
|
|
- comparison to key-value stores, [Object Stores](/en/ch11#id277)
|
|
- databases backed by, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- for backups, [Replication](/en/ch6#ch_replication)
|
|
- for cloud data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- for database replication, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- Google Cloud Storage (see Google Cloud Storage)
|
|
- object size, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- S3 (see S3 (object storage))
|
|
- storing LSM segment files, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- support for fencing, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- use in data lakes, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- object-relational mapping (ORM) frameworks, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm)
|
|
- error handling and aborted transactions, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- unsafe read-modify-write cycle code, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- object-relational mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document)
|
|
- observability, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Humans and Reliability](/en/ch2#id31), [Operability: Making Life Easy for Operations](/en/ch2#id37)
|
|
- observer pattern, [Separation of application code and state](/en/ch13#id344)
|
|
- OBT (one big table), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- offline systems, [Batch Processing](/en/ch11#ch_batch)
|
|
- (see also batch processing)
|
|
- offline-first applications, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps), [Stateful, offline-capable clients](/en/ch13#id347)
|
|
- offsets
|
|
- consumer offsets in sharded logs, [Consumer offsets](/en/ch12#sec_stream_log_offsets)
|
|
- messages in sharded logs, [Using logs for message storage](/en/ch12#id300)
|
|
- OLAP (online analytic processing), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Glossary](/en/glossary)
|
|
- data cubes, [Materialized Views and Data Cubes](/en/ch4#sec_storage_materialized_views)
|
|
- OLTP (online transaction processing), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Glossary](/en/glossary)
|
|
- analytics queries versus, [Analytics](/en/ch11#sec_batch_olap)
|
|
- data normalization, [Trade-offs of normalization](/en/ch3#trade-offs-of-normalization)
|
|
- workload characteristics, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- on-premises deployment, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)
|
|
- data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- one big table (data warehouse schema), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics), [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- one-hot encoding, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- one-to-few relationships, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- one-to-many relationships, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- JSON representation, [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- online systems, [Batch Processing](/en/ch11#ch_batch)
|
|
- (see also services)
|
|
- versus scientific computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- ontologies, [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- Oozie (workflow scheduler), [Batch Processing](/en/ch11#ch_batch)
|
|
- OpenAPI (service definition format), [Microservices and Serverless](/en/ch1#sec_introduction_microservices), [Web services](/en/ch5#sec_web_services), [Web services](/en/ch5#sec_web_services)
|
|
- use of JSON Schema, [JSON Schema](/en/ch5#json-schema)
|
|
- openCypher (see Cypher (query language))
|
|
- OpenLink Virtuoso (see Virtuoso (database))
|
|
- OpenStack
|
|
- Swift (object storage), [Object Stores](/en/ch11#id277)
|
|
- operability, [Operability: Making Life Easy for Operations](/en/ch2#id37)
|
|
- operating systems versus databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)
|
|
- operational systems, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- (see also OLTP)
|
|
- as systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- ETL into analytical systems, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- operational transformation, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- operations teams, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- operators (query execution), [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- in stream processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- optimistic locking, [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set)
|
|
- Oracle (database)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- GoldenGate (change data capture), [Implementing change data capture](/en/ch12#id307)
|
|
- hierarchical queries, [Graph Queries in SQL](/en/ch3#id58), [Graph Queries in SQL](/en/ch3#id58)
|
|
- lack of serializability, [Isolation](/en/ch8#sec_transactions_acid_isolation)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- multi-table index cluster tables, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- not preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew)
|
|
- PL/SQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates)
|
|
- read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- Real Application Clusters (RAC), [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- TimesTen (in-memory database), [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- WAL-based replication, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping)
|
|
- ORC (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- (see also column-oriented storage)
|
|
- orchestration (service deployment), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud), [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- batch job execution, [Distributed Job Orchestration](/en/ch11#id278)-[Distributed Job Orchestration](/en/ch11#id278)
|
|
- workflow engines, [Batch Processing](/en/ch11#ch_batch)
|
|
- ordering
|
|
- event logs, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- limits of total ordering, [The limits of total ordering](/en/ch13#id335)
|
|
- logical timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps)
|
|
- of auto-incrementing IDs, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- shared logs, [Consensus in Practice](/en/ch10#sec_consistency_total_order)-[Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- Orkes (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- orphan pages (B-trees), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- outbox pattern, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- outliers (response time), [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- outsourcing, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)
|
|
- overload, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
|
|
### P
|
|
|
|
- PACELC principle, [The CAP theorem](/en/ch10#the-cap-theorem)
|
|
- package managers, [Separation of application code and state](/en/ch13#id344)
|
|
- packet switching, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- packets
|
|
- corruption of, [Weak forms of lying](/en/ch9#weak-forms-of-lying)
|
|
- sending via UDP, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- PageRank (algorithm), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph), [Query languages](/en/ch11#sec_batch_query_lanauges), [Machine Learning](/en/ch11#id290)
|
|
- paging (see virtual memory)
|
|
- pandas (Python library), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [Column-Oriented Storage](/en/ch4#sec_storage_column), [DataFrames](/en/ch11#id287)
|
|
- Parquet (data format), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage), [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- (see also column-oriented storage)
|
|
- databases on object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- document data model, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- use in batch processing, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- partial failures, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure), [Summary](/en/ch9#summary)
|
|
- limping, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- partial synchrony (system model), [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- partition key, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons), [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- partitioning (see sharding)
|
|
- Paxos (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- ballot number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- Multi-Paxos, [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- payment card industry (PCI), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- PCI (payment card industry) compliance, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- percentiles, [Average, Median, and Percentiles](/en/ch2#id24), [Glossary](/en/glossary)
|
|
- calculating efficiently, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- importance of high percentiles, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- use in service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- Percolator (Google), [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator)
|
|
- Percona XtraBackup (MySQL tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- performance
|
|
- degradation as fault, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- describing, [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- of distributed transactions, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa)
|
|
- of in-memory databases, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- of linearizability, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- of multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- permission isolation, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- perpetual inconsistency, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- pessimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- pglogical (PostgreSQL extension), [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- pgvector (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- phantoms (transaction isolation), [Phantoms causing write skew](/en/ch8#sec_transactions_phantom)
|
|
- materializing conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- preventing, in serializability, [Predicate locks](/en/ch8#predicate-locks)
|
|
- physical clocks (see clocks)
|
|
- pickle (Python), [Language-Specific Formats](/en/ch5#id96)
|
|
- Pinot (database), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- pre-aggregation, [Analytics](/en/ch11#sec_batch_olap)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived), [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- pipelined execution
|
|
- in data warehouse queries, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- pivot table, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- point in time, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)
|
|
- point query, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- Polaris (data catalog), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- polling, [Representing Users, Posts, and Follows](/en/ch2#id20)
|
|
- polystores, [The meta-database of everything](/en/ch13#id341)
|
|
- POSIX (portable operating system interface)
|
|
- compliant filesystems, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277)
|
|
- Post Office Horizon scandal, [Humans and Reliability](/en/ch2#id31)
|
|
- lack of transactions, [Transactions](/en/ch8#ch_transactions)
|
|
- PostgreSQL (database)
|
|
- archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307), [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- foreign data wrappers, [The meta-database of everything](/en/ch13#id341)
|
|
- full text search support, [Combining Specialized Tools by Deriving Data](/en/ch13#id442)
|
|
- in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- JSON Schema validation, [JSON Schema](/en/ch5#json-schema)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- log sequence number, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- logical decoding, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- materialized view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- MVCC implementation, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl), [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- partitioning vs. sharding, [Sharding](/en/ch7#ch_sharding)
|
|
- pgvector (vector index), [Vector Embeddings](/en/ch4#id92)
|
|
- PL/pgSQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- PostGIS geospatial indexes, [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates)
|
|
- preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)
|
|
- read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- representing graphs, [Property Graphs](/en/ch3#id56)
|
|
- serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)
|
|
- sharding (see Citus (database))
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- WAL-based replication, [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping)
|
|
- postings list, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- in sharded indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- postmortems, blameless, [Humans and Reliability](/en/ch2#id31)
|
|
- PouchDB (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- Power BI (business intelligence software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap)
|
|
- pre-aggregation, [Analytics](/en/ch11#sec_batch_olap)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- pre-splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)
|
|
- Precision Time Protocol (PTP), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- predicate locks, [Predicate locks](/en/ch8#predicate-locks)
|
|
- predictive analytics, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics), [Predictive Analytics](/en/ch14#id369)-[Feedback Loops](/en/ch14#id372)
|
|
- amplifying bias, [Bias and Discrimination](/en/ch14#id370)
|
|
- ethics of (see ethics)
|
|
- feedback loops, [Feedback Loops](/en/ch14#id372)
|
|
- preemption, [Resource Allocation](/en/ch11#id279)
|
|
- in distributed schedulers, [Handling Faults](/en/ch11#id281)
|
|
- of threads, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- Prefect (workflow scheduler), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch), [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- cloud data warehouse integration, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- Presto (query engine), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- primary keys, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Glossary](/en/glossary)
|
|
- auto-incrementing, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- versus partition key, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- primary-backup replication (see leader-based replication)
|
|
- privacy, [Privacy and Tracking](/en/ch14#id373)-[Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- consent and freedom of choice, [Consent and Freedom of Choice](/en/ch14#id375)
|
|
- data as assets and power, [Data as Assets and Power](/en/ch14#id376)
|
|
- deleting data, [Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- ethical considerations (see ethics)
|
|
- legislation and self-regulation, [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- meaning of, [Privacy and Use of Data](/en/ch14#id457)
|
|
- regulation, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- surveillance, [Surveillance](/en/ch14#id374)
|
|
- tracking behavioral data, [Privacy and Tracking](/en/ch14#id373)
|
|
- probabilistic algorithms, [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Stream analytics](/en/ch12#id318)
|
|
- process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- processing time (of events), [Reasoning About Time](/en/ch12#sec_stream_time)
|
|
- producers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- product analytics, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- programming languages
|
|
- for stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- projections (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- Prolog (language), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- (see also Datalog)
|
|
- property graphs, [Property Graphs](/en/ch3#id56)
|
|
- Cypher query language, [The Cypher Query Language](/en/ch3#id57)
|
|
- Property Graph Query Language (PGQL), [Graph Queries in SQL](/en/ch3#id58)
|
|
- property-based testing, [Humans and Reliability](/en/ch2#id31), [Formal Methods and Randomized Testing](/en/ch9#sec_distributed_formal)
|
|
- Protocol Buffers (data format), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)
|
|
- field tags and schema evolution, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- provenance of data, [Designing for auditability](/en/ch13#id365)
|
|
- publish/subscribe model, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- publishers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- Pulsar (streaming platform), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- PyTorch (machine learning library), [Machine Learning](/en/ch11#id290)
|
|
|
|
### Q
|
|
|
|
- Qpid (messaging), [Message brokers compared to databases](/en/ch12#id297)
|
|
- quality of service (QoS), [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- Quantcast File System (distributed filesystem), [Object Stores](/en/ch11#id277)
|
|
- query engines
|
|
- compilation and vectorization, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- in cloud data warehouse, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- operators, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- optimizing declarative queries, [Data Models and Query Languages](/en/ch3#ch_datamodels)
|
|
- query languages
|
|
- Cypher, [The Cypher Query Language](/en/ch3#id57)
|
|
- Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- GraphQL, [GraphQL](/en/ch3#id63)
|
|
- MongoDB aggregation pipeline, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization), [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- recursive SQL queries, [Graph Queries in SQL](/en/ch3#id58)
|
|
- SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- SQL, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)
|
|
- query optimizers, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- query plans, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- queueing delays, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- head-of-line blocking, [Latency and Response Time](/en/ch2#id23)
|
|
- latency and response time, [Latency and Response Time](/en/ch2#id23)
|
|
- queues (messaging), [Message brokers](/en/ch5#message-brokers)
|
|
- QUIC (protocol), [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- quorums, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition)-[Multi-region operation](/en/ch6#multi-region-operation), [Glossary](/en/glossary)
|
|
- for leaderless replication, [Quorums for reading and writing](/en/ch6#sec_replication_quorum_condition)
|
|
- in consensus algorithms, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- limitations of consistency, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- making decisions in distributed systems, [The Majority Rules](/en/ch9#sec_distributed_majority)
|
|
- monitoring staleness, [Monitoring staleness](/en/ch6#monitoring-staleness)
|
|
- multi-region replication, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- relying on durability, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world)
|
|
- quotas, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
|
|
### R
|
|
|
|
- R (language), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287)
|
|
- R-trees (indexes), [Multidimensional and Full-Text Indexes](/en/ch4#sec_storage_multidimensional)
|
|
- R2 (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- RabbitMQ (messaging), [Message brokers](/en/ch5#message-brokers), [Message brokers compared to databases](/en/ch12#id297)
|
|
- quorum queues (replication), [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- race conditions, [Isolation](/en/ch8#sec_transactions_acid_isolation)
|
|
- (see also concurrency)
|
|
- avoiding with linearizability, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- caused by dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- causing loss of money, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- dirty writes, [No dirty writes](/en/ch8#sec_transactions_dirty_write)
|
|
- in counter increments, [No dirty writes](/en/ch8#sec_transactions_dirty_write)
|
|
- lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- preventing with event logs, [Concurrency control](/en/ch12#sec_stream_concurrency), [Dataflow: Interplay between state changes and application code](/en/ch13#id450)
|
|
- preventing with serializable isolation, [Serializability](/en/ch8#sec_transactions_serializability)
|
|
- weak transaction isolation, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)
|
|
- write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- Raft (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- sensitivity to network problems, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- term number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- use in etcd, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- RAID (Redundant Array of Independent Disks), [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute), [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- railways, schema migration on, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- RAM (see memory)
|
|
- RAMCloud (in-memory storage), [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- random writes (access pattern), [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- range queries
|
|
- in B-trees, [B-Trees](/en/ch4#sec_storage_b_trees), [Read performance](/en/ch4#read-performance)
|
|
- in LSM-trees, [Read performance](/en/ch4#read-performance)
|
|
- not efficient in hash maps, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)
|
|
- with hash sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- ranking algorithms, [Machine Learning](/en/ch11#id290)
|
|
- Ray (workflow scheduler), [Machine Learning](/en/ch11#id290)
|
|
- RDF (Resource Description Framework), [The RDF data model](/en/ch3#the-rdf-data-model)
|
|
- querying with SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- RDMA (Remote Direct Memory Access), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- React (user interface library), [End-to-end event streams](/en/ch13#id349)
|
|
- reactive programming, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- read committed isolation level, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- implementing, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- multi-version concurrency control (MVCC), [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl)
|
|
- no dirty reads, [No dirty reads](/en/ch8#no-dirty-reads)
|
|
- no dirty writes, [No dirty writes](/en/ch8#sec_transactions_dirty_write)
|
|
- read models (event sourcing), [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- read path (derived data), [Observing Derived State](/en/ch13#sec_future_observing)
|
|
- read repair (leaderless replication), [Catching up on missed writes](/en/ch6#sec_replication_read_repair)
|
|
- for linearizability, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- read replicas (see leader-based replication)
|
|
- read skew (transaction isolation), [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Summary](/en/ch8#summary)
|
|
- read uncommitted isolation level, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw), [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- cross-device, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- in derived data systems, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions)
|
|
- read-modify-write cycle, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)
|
|
- read-scaling architecture, [Problems with Replication Lag](/en/ch6#sec_replication_lag), [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- versus sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- reads as events, [Reads are events too](/en/ch13#sec_future_read_events)
|
|
- real-time
|
|
- analytics (see product analytics)
|
|
- collaborative editing, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- publish/subscribe dataflow, [End-to-end event streams](/en/ch13#id349)
|
|
- response time guarantees, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime)
|
|
- time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks)
|
|
- Realm (database), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- rebalancing shards, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)-[Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations), [Glossary](/en/glossary)
|
|
- (see also sharding)
|
|
- automatic or manual rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- fixed number of shards per node, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes)
|
|
- recency guarantee, [Linearizability](/en/ch10#sec_consistency_linearizability)
|
|
- recommendation engines, [Operational Versus Analytical Systems](/en/ch1#sec_introduction_analytics)
|
|
- building using DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- iterative processing, [Machine Learning](/en/ch11#id290)
|
|
- reconfiguration (consensus), [Subtleties of consensus](/en/ch10#subtleties-of-consensus)
|
|
- records, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- events in stream processing, [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- recursive queries
|
|
- in Cypher, [The Cypher Query Language](/en/ch3#id57)
|
|
- in Datalog, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- in SPARQL, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- lack of, in GraphQL, [GraphQL](/en/ch3#id63)
|
|
- SQL common table expressions, [Graph Queries in SQL](/en/ch3#id58)
|
|
- Red Hat
|
|
- Apicurio Registry, [JSON Schema](/en/ch5#json-schema)
|
|
- red-black tree, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- redelivery (messaging), [Acknowledgments and redelivery](/en/ch12#sec_stream_reordering)
|
|
- Redis (database)
|
|
- atomic operations, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- CRDT support, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts)
|
|
- durability, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- Lua scripting, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- single-threaded execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- redo log (see write-ahead log)
|
|
- Redpanda (messaging), [Message brokers](/en/ch5#message-brokers), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- tiered storage, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- Redshift (database), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- redundancy
|
|
- hardware components, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy)
|
|
- of derived data, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived)
|
|
- (see also derived data)
|
|
- Reed--Solomon codes (error correction), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- refactoring, [Evolvability: Making Change Easy](/en/ch2#sec_introduction_evolvability)
|
|
- (see also evolvability)
|
|
- regions (geographic distribution), [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- (see also datacenters)
|
|
- consensus across, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- definition, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- latency, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- linearizable ID generation, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator)
|
|
- replication across, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)-[Problems with different topologies](/en/ch6#problems-with-different-topologies), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost), [The limits of total ordering](/en/ch13#id335)
|
|
- leaderless, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- multi-leader, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- regions (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- register (data structure), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- regulation (see legal matters)
|
|
- relational data model, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- comparison to document model, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)-[Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- graph queries in SQL, [Graph Queries in SQL](/en/ch3#id58)
|
|
- in-memory databases with, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- many-to-one and many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many)
|
|
- multi-object transactions, need for, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- object-relational mismatch, [The Object-Relational Mismatch](/en/ch3#sec_datamodels_document)
|
|
- representing a reorderable list, [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)
|
|
- versus document model
|
|
- convergence of models, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- relational databases
|
|
- eventual consistency, [Problems with Replication Lag](/en/ch6#sec_replication_lag)
|
|
- history, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- logical logs, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- philosophy compared to Unix, [Unbundling Databases](/en/ch13#sec_future_unbundling), [The meta-database of everything](/en/ch13#id341)
|
|
- schema changes, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility), [Encoding and Evolution](/en/ch5#ch_encoding), [Different values written at different times](/en/ch5#different-values-written-at-different-times)
|
|
- sharded secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- use of B-tree indexes, [B-Trees](/en/ch4#sec_storage_b_trees)
|
|
- relationships (see edges)
|
|
- reliability, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)-[Humans and Reliability](/en/ch2#id31), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy)
|
|
- building a reliable system from unreliable components, [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- hardware faults, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- human errors, [Humans and Reliability](/en/ch2#id31)
|
|
- importance of, [Humans and Reliability](/en/ch2#id31)
|
|
- of messaging systems, [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- software faults, [Software faults](/en/ch2#software-faults)
|
|
- Remote Method Invocation (Java RMI), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- remote procedure calls (RPCs), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- (see also services)
|
|
- data encoding and evolution, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- issues with, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- using Avro, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- versus message brokers, [Event-Driven Architectures](/en/ch5#sec_encoding_dataflow_msg)
|
|
- renewable energy, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- repeatable reads (transaction isolation), [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- replicas, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- replication, [Replication](/en/ch6#ch_replication)-[Summary](/en/ch6#summary), [Glossary](/en/glossary)
|
|
- and durability, [Durability](/en/ch8#durability)
|
|
- conflict resolution and, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- consistency properties, [Problems with Replication Lag](/en/ch6#sec_replication_lag)-[Solutions for Replication Lag](/en/ch6#id131)
|
|
- consistent prefix reads, [Consistent Prefix Reads](/en/ch6#sec_replication_consistent_prefix)
|
|
- monotonic reads, [Monotonic Reads](/en/ch6#sec_replication_monotonic_reads)
|
|
- reading your own writes, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- in distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- leaderless, [Leaderless Replication](/en/ch6#sec_replication_leaderless)-[Version vectors](/en/ch6#version-vectors)
|
|
- detecting concurrent writes, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)-[Version vectors](/en/ch6#version-vectors)
|
|
- limitations of quorum consistency, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations)-[Monitoring staleness](/en/ch6#monitoring-staleness), [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- monitoring staleness, [Monitoring staleness](/en/ch6#monitoring-staleness)
|
|
- multi-leader, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)-[Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- across multiple regions, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc), [The Cost of Linearizability](/en/ch10#sec_linearizability_cost)
|
|
- conflict resolution, [Dealing with Conflicting Writes](/en/ch6#sec_replication_write_conflicts)-[Types of conflict](/en/ch6#sec_replication_write_conflicts)
|
|
- replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)-[Problems with different topologies](/en/ch6#problems-with-different-topologies)
|
|
- reasons for using, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Replication](/en/ch6#ch_replication)
|
|
- sharding and, [Sharding](/en/ch7#ch_sharding)
|
|
- single-leader, [Single-Leader Replication](/en/ch6#sec_replication_leader)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- failover, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- implementation of replication logs, [Implementation of Replication Logs](/en/ch6#sec_replication_implementation)-[Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- relation to consensus, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus), [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- setting up new followers, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- synchronous versus asynchronous, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)-[Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)
|
|
- state machine replication, [Statement-based replication](/en/ch6#statement-based-replication), [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases)
|
|
- event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- using consensus, [Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus)
|
|
- using erasure coding, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- using object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- versus backups, [Replication](/en/ch6#ch_replication)
|
|
- with heterogeneous data systems, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- replication logs (see logs)
|
|
- representations of data (see data models)
|
|
- reprocessing data, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- (see also evolvability)
|
|
- from log-based messaging, [Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- request hedging, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- request identifiers, [Uniquely identifying requests](/en/ch13#id355), [Multi-shard request processing](/en/ch13#id360)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)-[Request Routing](/en/ch7#sec_sharding_routing)
|
|
- approaches to, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- residence laws for data, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- resilient systems, [Reliability and Fault Tolerance](/en/ch2#sec_introduction_reliability)
|
|
- (see also fault tolerance)
|
|
- resource isolation, [Cloud Computing Versus Supercomputing](/en/ch1#id17), [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- resource limits, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- response time
|
|
- as performance metric, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Batch Processing](/en/ch11#ch_batch)
|
|
- guarantees on, [Response time guarantees](/en/ch9#sec_distributed_clocks_realtime)
|
|
- impact on users, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- in replicated systems, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- latency versus, [Latency and Response Time](/en/ch2#id23)
|
|
- mean and percentiles, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- user experience, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- responsibility and accountability, [Responsibility and Accountability](/en/ch14#id371)
|
|
- REST (Representational State Transfer), [Web services](/en/ch5#sec_web_services)
|
|
- (see also services)
|
|
- Restate (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- RethinkDB (database)
|
|
- join support, [Convergence of document and relational databases](/en/ch3#convergence-of-document-and-relational-databases)
|
|
- key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- retry storm, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Software faults](/en/ch2#software-faults)
|
|
- reverse ETL, [Beyond the data lake](/en/ch1#beyond-the-data-lake)
|
|
- Riak (database)
|
|
- CRDT support, [CRDTs and Operational Transformation](/en/ch6#sec_replication_crdts), [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)
|
|
- dotted version vectors, [Version vectors](/en/ch6#version-vectors)
|
|
- gossip protocol, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- hash sharding, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless)
|
|
- linearizability, lack of, [Linearizability and quorums](/en/ch10#sec_consistency_quorum_linearizable)
|
|
- multi-region support, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- sloppy quorums, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- vnodes (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- ring buffers, [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- RisingWave (database)
|
|
- incremental view maintenance, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- rockets, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- RocksDB (storage engine), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- as embedded storage engine, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- leveled compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- rollbacks (transactions), [Transactions](/en/ch8#ch_transactions)
|
|
- rolling upgrades, [Tolerating hardware faults through redundancy](/en/ch2#tolerating-hardware-faults-through-redundancy), [Encoding and Evolution](/en/ch5#ch_encoding), [Faults and Partial Failures](/en/ch9#sec_distributed_partial_failure)
|
|
- in a multitenant system, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- routing (see request routing)
|
|
- row-based replication, [Logical (row-based) log replication](/en/ch6#logical-row-based-log-replication)
|
|
- row-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- rowhammer (memory corruption), [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- RPCs (see remote procedure calls)
|
|
- rules (Datalog), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- Rust (programming language)
|
|
- memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
|
|
### S
|
|
|
|
- S3 (object storage), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Batch Processing](/en/ch11#ch_batch), [Distributed Filesystems](/en/ch11#sec_batch_dfs), [Object Stores](/en/ch11#id277)
|
|
- checking data integrity, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- conditional writes, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- object size, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- S3 Express One Zone, [Object Stores](/en/ch11#id277), [Object Stores](/en/ch11#id277)
|
|
- use in MapReduce, [MapReduce](/en/ch11#sec_batch_mapreduce)
|
|
- workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- SaaS (see software as a service (SaaS))
|
|
- safety and liveness properties, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness)
|
|
- in consensus algorithms, [Single-value consensus](/en/ch10#single-value-consensus)
|
|
- in transactions, [Transactions](/en/ch8#ch_transactions)
|
|
- sagas (see compensating transactions)
|
|
- Samza (stream processor), [Stream analytics](/en/ch12#id318)
|
|
- SAP HANA (database), [Data Storage for Analytics](/en/ch4#sec_storage_analytics)
|
|
- scalability, [Scalability](/en/ch2#sec_introduction_scalability)-[Principles for Scalability](/en/ch2#id35), [A Philosophy of Streaming Systems](/en/ch13#ch_philosophy)
|
|
- auto-scaling, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- describing load, [Describing Load](/en/ch2#id33)
|
|
- describing performance, [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- linear, [Describing Load](/en/ch2#id33)
|
|
- principles for, [Principles for Scalability](/en/ch2#id35)
|
|
- replication and, [Problems with Replication Lag](/en/ch6#sec_replication_lag)
|
|
- scaling up versus scaling out, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing)
|
|
- scaling out, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing)
|
|
- (see also shared-nothing architecture)
|
|
- by sharding, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- scaling up, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing)
|
|
- SCD (slowly changing dimension), [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- scheduling
|
|
- algorithms, [Resource Allocation](/en/ch11#id279)
|
|
- batch jobs, [Distributed Job Orchestration](/en/ch11#id278)-[Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- gang scheduling, [Resource Allocation](/en/ch11#id279)
|
|
- schema-on-read, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- comparison to evolvable schema, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- schema-on-write, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- schemaless databases (see schema-on-read)
|
|
- schemas, [Glossary](/en/glossary)
|
|
- Avro, [Avro](/en/ch5#sec_encoding_avro)-[Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- reader determining writer's schema, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- schema evolution, [The writer's schema and the reader's schema](/en/ch5#the-writers-schema-and-the-readers-schema)
|
|
- dynamically generated, [Dynamically generated schemas](/en/ch5#dynamically-generated-schemas)
|
|
- evolution of, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- affecting application code, [Encoding and Evolution](/en/ch5#ch_encoding)
|
|
- compatibility checking, [But what is the writer's schema?](/en/ch5#but-what-is-the-writers-schema)
|
|
- in databases, [Dataflow Through Databases](/en/ch5#sec_encoding_dataflow_db)-[Archival storage](/en/ch5#archival-storage)
|
|
- in service calls, [Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- flexibility in document model, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- for analytics, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- for JSON and XML, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json), [JSON Schema](/en/ch5#json-schema)
|
|
- generation and migration using ORMs, [Object-relational mapping (ORM)](/en/ch3#object-relational-mapping-orm)
|
|
- merits of, [The Merits of Schemas](/en/ch5#sec_encoding_schemas)
|
|
- migration, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- Protocol Buffers, [Protocol Buffers](/en/ch5#sec_encoding_protobuf)-[Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- schema evolution, [Field tags and schema evolution](/en/ch5#field-tags-and-schema-evolution)
|
|
- schema migration on railways, [Reprocessing data for application evolution](/en/ch13#sec_future_reprocessing)
|
|
- traditional approach to design, fallacy in, [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views)
|
|
- scientific computing, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- scikit-learn (Python library), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- ScyllaDB (database)
|
|
- cluster metadata, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- consistency level ANY, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- hash-range sharding, [Sharding by Hash of Key](/en/ch7#sec_sharding_hash), [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- last-write-wins conflict resolution, [Detecting Concurrent Writes](/en/ch6#sec_replication_concurrent)
|
|
- leaderless replication, [Leaderless Replication](/en/ch6#sec_replication_leaderless)
|
|
- lightweight transactions, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- linearizability, lack of, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- log-structured storage, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- multi-region support, [Multi-region operation](/en/ch6#multi-region-operation)
|
|
- use of clocks, [Limitations of Quorum Consistency](/en/ch6#sec_replication_quorum_limitations), [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- vnodes (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- search engines (see full-text search)
|
|
- searching on streams, [Search on streams](/en/ch12#id320)
|
|
- secondaries (see leader-based replication)
|
|
- secondary indexes, [Multi-Column and Secondary Indexes](/en/ch4#sec_storage_index_multicolumn), [Glossary](/en/glossary)
|
|
- for many-to-many relationships, [Many-to-One and Many-to-Many Relationships](/en/ch3#sec_datamodels_many_to_many)
|
|
- problems with dual writes, [Keeping Systems in Sync](/en/ch12#sec_stream_sync), [Reasoning about dataflows](/en/ch13#id443)
|
|
- sharding, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167), [Summary](/en/ch7#summary)
|
|
- global, [Global Secondary Indexes](/en/ch7#id167)
|
|
- index maintenance, [Maintaining derived state](/en/ch13#id446)
|
|
- local, [Local Secondary Indexes](/en/ch7#id166)
|
|
- updating, transaction isolation and, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- secondary sort (MapReduce), [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- sed (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis)
|
|
- self-hosting, [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)
|
|
- data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- self-joins, [Summary](/en/ch12#id332)
|
|
- self-validating systems, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- semantic search, [Vector Embeddings](/en/ch4#id92)
|
|
- semantic similarity, [Vector Embeddings](/en/ch4#id92)
|
|
- semantic web, [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- semi-synchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async)
|
|
- sequential writes (access pattern), [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- serializability, [Isolation](/en/ch8#sec_transactions_acid_isolation), [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels), [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation), [Glossary](/en/glossary)
|
|
- linearizability versus, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- pessimistic versus optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution)
|
|
- sharding, [Sharding](/en/ch8#sharding)
|
|
- using stored procedures, [Encapsulating transactions in stored procedures](/en/ch8#encapsulating-transactions-in-stored-procedures), [Using shared logs](/en/ch10#sec_consistency_smr)
|
|
- serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- detecting stale MVCC reads, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads)
|
|
- detecting writes that affect prior reads, [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- distributed execution, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- performance of SSI, [Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- preventing write skew, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- timeliness vs. integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- performance, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking)
|
|
- Serializable (Java), [Language-Specific Formats](/en/ch5#id96)
|
|
- serialization, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- (see also encoding)
|
|
- serverless, [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery)
|
|
- registration, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- using DNS, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Request Routing](/en/ch7#sec_sharding_routing), [Service discovery](/en/ch10#service-discovery)
|
|
- service level agreements (SLAs), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla), [Describing Load](/en/ch2#id33)
|
|
- service mesh, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery)
|
|
- Service Organization Control (SOC), [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- service time, [Latency and Response Time](/en/ch2#id23)
|
|
- service-oriented architecture (SOA), [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- (see also services)
|
|
- services, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- microservices, [Microservices and Serverless](/en/ch1#sec_introduction_microservices)
|
|
- causal dependencies across services, [The limits of total ordering](/en/ch13#id335)
|
|
- loose coupling, [Making unbundling work](/en/ch13#sec_future_unbundling_favor)
|
|
- relation to batch/stream processors, [Batch Processing](/en/ch11#ch_batch), [Stream processors and services](/en/ch13#id345)
|
|
- remote procedure calls (RPCs), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)-[Data encoding and evolution for RPC](/en/ch5#data-encoding-and-evolution-for-rpc)
|
|
- issues with, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- similarity to databases, [Dataflow Through Services: REST and RPC](/en/ch5#sec_encoding_dataflow_rpc)
|
|
- web services, [Web services](/en/ch5#sec_web_services)
|
|
- session windows (stream processing), [Types of windows](/en/ch12#id324)
|
|
- (see also windows)
|
|
- sharding, [Sharding](/en/ch7#ch_sharding)-[Summary](/en/ch7#summary), [Glossary](/en/glossary)
|
|
- and consensus, [Using shared logs](/en/ch10#sec_consistency_smr)
|
|
- and replication, [Sharding](/en/ch7#ch_sharding)
|
|
- distributed transactions across shards, [Distributed Transactions](/en/ch8#sec_transactions_distributed)
|
|
- hot shards, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- in batch processing, [Batch Processing](/en/ch11#ch_batch)
|
|
- key-range splitting, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)
|
|
- multi-shard operations, [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- enforcing constraints, [Multi-shard request processing](/en/ch13#id360)
|
|
- secondary index maintenance, [Maintaining derived state](/en/ch13#id446)
|
|
- of key-value data, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)-[Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- by key range, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- skew and hot spots, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- origin of the term, [Sharding](/en/ch7#ch_sharding)
|
|
- partition key, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons), [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- rebalancing
|
|
- of key-range sharded data, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)
|
|
- rebalancing shards, [Rebalancing key-range sharded data](/en/ch7#rebalancing-key-range-sharded-data)-[Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- automatic or manual rebalancing, [Operations: Automatic or Manual Rebalancing](/en/ch7#sec_sharding_operations)
|
|
- problems with hash mod N, [Hash modulo number of nodes](/en/ch7#hash-modulo-number-of-nodes)
|
|
- using fixed number of shards, [Fixed number of shards](/en/ch7#fixed-number-of-shards)
|
|
- using N shards per node, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)-[Request Routing](/en/ch7#sec_sharding_routing)
|
|
- secondary indexes, [Sharding and Secondary Indexes](/en/ch7#sec_sharding_secondary_indexes)-[Global Secondary Indexes](/en/ch7#id167)
|
|
- global, [Global Secondary Indexes](/en/ch7#id167)
|
|
- local, [Local Secondary Indexes](/en/ch7#id166)
|
|
- serial execution of transactions and, [Sharding](/en/ch8#sharding)
|
|
- sorting sharded data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- shared logs, [Consensus in Practice](/en/ch10#sec_consistency_total_order)-[Pros and cons of consensus](/en/ch10#pros-and-cons-of-consensus), [The limits of total ordering](/en/ch13#id335), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- algorithms, [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- for messaging, [Log-based Message Brokers](/en/ch12#sec_stream_log)-[Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- relation to consensus, [Shared logs as consensus](/en/ch10#sec_consistency_shared_logs)
|
|
- using, [Using shared logs](/en/ch10#sec_consistency_smr)
|
|
- shared mode (locks), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- shared-disk architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- shared-memory architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing)
|
|
- shared-nothing architecture, [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Glossary](/en/glossary)
|
|
- distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- (see also distributed filesystems)
|
|
- use of network, [Unreliable Networks](/en/ch9#sec_distributed_networks)
|
|
- sharks
|
|
- biting undersea cables, [Network Faults in Practice](/en/ch9#sec_distributed_network_faults)
|
|
- counting (example), [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- shredding (deletion) (see crypto-shredding)
|
|
- shredding (in columnar encoding), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- shredding (in relational model), [When to Use Which Model](/en/ch3#sec_datamodels_document_summary)
|
|
- shuffle (batch processing), [Shuffling Data](/en/ch11#sec_shuffle)-[Shuffling Data](/en/ch11#sec_shuffle)
|
|
- siblings (concurrent values), [Manual conflict resolution](/en/ch6#manual-conflict-resolution), [Capturing the happens-before relationship](/en/ch6#capturing-the-happens-before-relationship), [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- (see also conflicts)
|
|
- silo, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- similarity search
|
|
- edit distance, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- genome data, [Summary](/en/ch3#summary)
|
|
- simplicity, [Simplicity: Managing Complexity](/en/ch2#id38)
|
|
- Singer, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- single-instruction-multi-data (SIMD) instructions, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- single-leader replication (see leader-based replication)
|
|
- single-threaded execution, [Atomic write operations](/en/ch8#atomic-write-operations), [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- in stream processing, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Concurrency control](/en/ch12#sec_stream_concurrency), [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- SingleStore (database)
|
|
- in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- site reliability engineer, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- size-tiered compaction, [Compaction strategies](/en/ch4#sec_storage_lsm_compaction), [Disk space usage](/en/ch4#disk-space-usage)
|
|
- skew, [Glossary](/en/glossary)
|
|
- clock skew, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval), [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- in transaction isolation
|
|
- read skew, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation), [Summary](/en/ch8#summary)
|
|
- write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts), [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- (see also write skew)
|
|
- meanings of, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- unbalanced workload, [Sharding of Key-Value Data](/en/ch7#sec_sharding_key_value)
|
|
- compensating for, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- due to celebrities, [Skewed Workloads and Relieving Hot Spots](/en/ch7#sec_sharding_skew)
|
|
- for time-series data, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- skip list, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- SLA (see service level agreements)
|
|
- Slack (group chat)
|
|
- GraphQL example, [GraphQL](/en/ch3#id63)
|
|
- SlateDB (database), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- sliding windows (stream processing), [Types of windows](/en/ch12#id324)
|
|
- (see also windows)
|
|
- sloppy quorums, [Single-Leader Versus Leaderless Replication Performance](/en/ch6#sec_replication_leaderless_perf)
|
|
- slowly changing dimension (data warehouses), [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- smearing (leap seconds adjustments), [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- snapshots (databases)
|
|
- as backups, [Replication](/en/ch6#ch_replication)
|
|
- computing derived data, [Creating an index](/en/ch13#id340)
|
|
- in change data capture, [Initial snapshot](/en/ch12#sec_stream_cdc_snapshot)
|
|
- serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- setting up a new replica, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- snapshot isolation and repeatable read, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)-[Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- implementing with MVCC, [Multi-version concurrency control (MVCC)](/en/ch8#sec_transactions_snapshot_impl)
|
|
- indexes and MVCC, [Indexes and snapshot isolation](/en/ch8#indexes-and-snapshot-isolation)
|
|
- visibility rules, [Visibility rules for observing a consistent snapshot](/en/ch8#sec_transactions_mvcc_visibility)
|
|
- synchronized clocks for global snapshots, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- Snowflake (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Layering of cloud services](/en/ch1#layering-of-cloud-services), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Batch Processing](/en/ch11#ch_batch)
|
|
- column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- sharding and clustering, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- Snowpark, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- Snowflake (ID generator), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- snowflake schemas, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- SOAP (web services), [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- SOC2 (see Service Organization Control (SOC))
|
|
- social graph, [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- society
|
|
- responsibility towards, [Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance), [Legislation and Self-Regulation](/en/ch14#sec_future_legislation)
|
|
- sociotechnical systems, [Humans and Reliability](/en/ch2#id31)
|
|
- software as a service (SaaS), [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs), [Cloud Versus Self-Hosting](/en/ch1#sec_introduction_cloud)
|
|
- ETL from, [Data Warehousing](/en/ch1#sec_introduction_dwh)
|
|
- multitenancy, [Sharding for Multitenancy](/en/ch7#sec_sharding_multitenancy)
|
|
- software bugs, [Software faults](/en/ch2#software-faults)
|
|
- maintaining integrity, [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- solar storm, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- solid state drives (SSDs)
|
|
- access patterns, [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- compared to object storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- detecting corruption, [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- failure rate, [Hardware and Software Faults](/en/ch2#sec_introduction_hardware_faults)
|
|
- faults in, [Durability](/en/ch8#durability)
|
|
- firmware bugs, [Software faults](/en/ch2#software-faults)
|
|
- read throughput, [Read performance](/en/ch4#read-performance)
|
|
- sequential vs. random writes, [Sequential versus random writes](/en/ch4#sidebar_sequential)
|
|
- Solr (search server)
|
|
- local secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- use of Lucene, [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- sort (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Sorting Versus In-memory Aggregation](/en/ch11#id275), [Distributed Job Orchestration](/en/ch11#id278)
|
|
- sort-merge joins (MapReduce), [JOIN and GROUP BY](/en/ch11#sec_batch_join)
|
|
- Sorted String Tables (see SSTables)
|
|
- sorting
|
|
- sort order in column storage, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)
|
|
- source of truth (see systems of record)
|
|
- Spanner (database)
|
|
- consistency model, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- in the cloud, [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native)
|
|
- snapshot isolation using clocks, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- TrueTime API, [Clock readings with a confidence interval](/en/ch9#clock-readings-with-a-confidence-interval)
|
|
- Spark (processing framework), [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Batch Processing](/en/ch11#ch_batch), [Dataflow Engines](/en/ch11#sec_batch_dataflow)
|
|
- cost efficiency, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes), [DataFrames](/en/ch11#id287)
|
|
- fault tolerance, [Handling Faults](/en/ch11#id281)
|
|
- for data warehouses, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- high availability using ZooKeeper, [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- MLlib, [Machine Learning](/en/ch11#id290)
|
|
- query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- shuffling data, [Shuffling Data](/en/ch11#sec_shuffle)
|
|
- Spark Streaming, [Stream analytics](/en/ch12#id318)
|
|
- microbatching, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- streaming SQL support, [Complex event processing](/en/ch12#id317)
|
|
- use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- SPARQL (query language), [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- sparse index, [The SSTable file format](/en/ch4#the-sstable-file-format)
|
|
- sparse matrices, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- split brain, [Leader failure: Failover](/en/ch6#leader-failure-failover), [Request Routing](/en/ch7#sec_sharding_routing), [Glossary](/en/glossary)
|
|
- enforcing constraints, [Uniqueness constraints require consensus](/en/ch13#id452)
|
|
- in consensus algorithms, [Consensus](/en/ch10#sec_consistency_consensus), [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- preventing, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- using fencing tokens to avoid, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)-[Fencing with multiple replicas](/en/ch9#fencing-with-multiple-replicas)
|
|
- spot instances, [Handling Faults](/en/ch11#id281)
|
|
- spreadsheets, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- dataflow programming, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)
|
|
- pivot table, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- SQL (Structured Query Language), [Simplicity: Managing Complexity](/en/ch2#id38), [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- for analytics, [Data Warehousing](/en/ch1#sec_introduction_dwh), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- graph queries in, [Graph Queries in SQL](/en/ch3#id58)
|
|
- isolation levels standard, issues with, [Snapshot isolation, repeatable read, and naming confusion](/en/ch8#snapshot-isolation-repeatable-read-and-naming-confusion)
|
|
- joins, [Normalization, Denormalization, and Joins](/en/ch3#sec_datamodels_normalization)
|
|
- résumé (example), [The document data model for one-to-many relationships](/en/ch3#the-document-data-model-for-one-to-many-relationships)
|
|
- social network home timelines (example), [Representing Users, Posts, and Follows](/en/ch2#id20)
|
|
- SQL injection vulnerability, [Byzantine Faults](/en/ch9#sec_distributed_byzantine)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- support in batch processing frameworks, [Batch Processing](/en/ch11#ch_batch)
|
|
- views, [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- SQL Server (database)
|
|
- archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307)
|
|
- data warehousing support, [Data Storage for Analytics](/en/ch4#sec_storage_analytics)
|
|
- distributed transaction support, [XA transactions](/en/ch8#xa-transactions)
|
|
- leader-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- preventing lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates)
|
|
- preventing write skew, [Characterizing write skew](/en/ch8#characterizing-write-skew), [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- read committed isolation, [Implementing read committed](/en/ch8#sec_transactions_read_committed_impl)
|
|
- serializable isolation, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- T-SQL language, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- SQLite (database), [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems), [Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- archiving WAL to object stores, [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- SRE (site reliability engineer), [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- SSDs (see solid state drives)
|
|
- SSTables (storage format), [The SSTable file format](/en/ch4#the-sstable-file-format)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- constructing and maintaining, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- making LSM-Tree from, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- staged rollout (see rolling upgrades)
|
|
- staleness (old data), [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- in leaderless databases, [Writing to the Database When a Node Is Down](/en/ch6#id287)
|
|
- in multi-version concurrency control, [Detecting stale MVCC reads](/en/ch8#detecting-stale-mvcc-reads)
|
|
- monitoring for, [Monitoring staleness](/en/ch6#monitoring-staleness)
|
|
- of client state, [Pushing state changes to clients](/en/ch13#id348)
|
|
- versus linearizability, [Linearizability](/en/ch10#sec_consistency_linearizability)
|
|
- versus timeliness, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- standbys (see leader-based replication)
|
|
- star replication topologies, [Multi-leader replication topologies](/en/ch6#sec_replication_topologies)
|
|
- star schemas, [Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)-[Stars and Snowflakes: Schemas for Analytics](/en/ch3#sec_datamodels_analytics)
|
|
- Star Wars analogy (event time versus processing time), [Event time versus processing time](/en/ch12#id322)
|
|
- starvation (scheduling), [Resource Allocation](/en/ch11#id279)
|
|
- state
|
|
- derived from log of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)
|
|
- interplay between state changes and application code, [Dataflow: Interplay between state changes and application code](/en/ch13#id450)
|
|
- maintaining derived state, [Maintaining derived state](/en/ch13#id446)
|
|
- maintenance by stream processor in stream-stream joins, [Stream-stream join (window join)](/en/ch12#id440)
|
|
- observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- rebuilding after stream processor failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- separation of application code and, [Separation of application code and state](/en/ch13#id344)
|
|
- state machine replication, [Statement-based replication](/en/ch6#statement-based-replication), [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Using shared logs](/en/ch10#sec_consistency_smr), [Databases and Streams](/en/ch12#sec_stream_databases)
|
|
- event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- stateless systems, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication)
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- statically typed languages
|
|
- analogy to schema-on-write, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- statistical and numerical algorithms, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- StatsD (metrics aggregator), [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- stock market feeds, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- STONITH (Shoot The Other Node In The Head), [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- problems with, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- stop-the-world (see garbage collection)
|
|
- storage
|
|
- composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- Storage Area Network (SAN), [Shared-Memory, Shared-Disk, and Shared-Nothing Architecture](/en/ch2#sec_introduction_shared_nothing), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- storage engines, [Storage and Retrieval](/en/ch4#ch_storage)-[Summary](/en/ch4#summary)
|
|
- column-oriented, [Column-Oriented Storage](/en/ch4#sec_storage_column)-[Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- column compression, [Column Compression](/en/ch4#sec_storage_column_compression)-[Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- defined, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- Parquet, [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses), [Column-Oriented Storage](/en/ch4#sec_storage_column), [Archival storage](/en/ch5#archival-storage)
|
|
- sort order in, [Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)-[Sort Order in Column Storage](/en/ch4#sort-order-in-column-storage)
|
|
- versus wide-column model, [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- writing to, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- durability, [Durability](/en/ch8#durability)
|
|
- row-oriented, [Storage and Indexing for OLTP](/en/ch4#sec_storage_oltp)-[Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- B-trees, [B-Trees](/en/ch4#sec_storage_b_trees)-[B-tree variants](/en/ch4#b-tree-variants)
|
|
- comparing B-trees and LSM-trees, [Comparing B-Trees and LSM-Trees](/en/ch4#sec_storage_btree_lsm_comparison)-[Disk space usage](/en/ch4#disk-space-usage)
|
|
- defined, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- log-structured, [Log-Structured Storage](/en/ch4#sec_storage_log_structured)-[Compaction strategies](/en/ch4#sec_storage_lsm_compaction)
|
|
- stored procedures, [Encapsulating transactions in stored procedures](/en/ch8#encapsulating-transactions-in-stored-procedures)-[Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs), [Glossary](/en/glossary)
|
|
- and shared logs, [Using shared logs](/en/ch10#sec_consistency_smr)
|
|
- pros and cons of, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- similarity to stream processors, [Application code as a derivation function](/en/ch13#sec_future_dataflow_derivation)
|
|
- Storm (stream processor), [Stream analytics](/en/ch12#id318)
|
|
- distributed RPC, [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc), [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- Trident state handling, [Idempotence](/en/ch12#sec_stream_idempotence)
|
|
- straggler events, [Handling straggler events](/en/ch12#id323)
|
|
- Stream Control Transmission Protocol (SCTP), [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- stream processing, [Processing Streams](/en/ch12#sec_stream_processing)-[Summary](/en/ch12#id332), [Glossary](/en/glossary)
|
|
- accessing external services within job, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins), [Microbatching and checkpointing](/en/ch12#id329), [Idempotence](/en/ch12#sec_stream_idempotence), [Exactly-once execution of an operation](/en/ch13#id353)
|
|
- combining with batch processing, [Unifying batch and stream processing](/en/ch13#id338)
|
|
- comparison to batch processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- complex event processing (CEP), [Complex event processing](/en/ch12#id317)
|
|
- fault tolerance, [Fault Tolerance](/en/ch12#sec_stream_fault_tolerance)-[Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- atomic commit, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
- idempotence, [Idempotence](/en/ch12#sec_stream_idempotence)
|
|
- microbatching and checkpointing, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- rebuilding state after a failure, [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- for data integration, [Batch and Stream Processing](/en/ch13#sec_future_batch_streaming)-[Unifying batch and stream processing](/en/ch13#id338)
|
|
- for event sourcing, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- maintaining derived state, [Maintaining derived state](/en/ch13#id446)
|
|
- maintenance of materialized views, [Maintaining materialized views](/en/ch12#sec_stream_mat_view)
|
|
- messaging systems (see messaging systems)
|
|
- reasoning about time, [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324)
|
|
- event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- knowing when window is ready, [Handling straggler events](/en/ch12#id323)
|
|
- types of windows, [Types of windows](/en/ch12#id324)
|
|
- relation to databases (see streams)
|
|
- relation to services, [Stream processors and services](/en/ch13#id345)
|
|
- relationship to batch processing, [Batch Processing](/en/ch11#ch_batch)
|
|
- search on streams, [Search on streams](/en/ch12#id320)
|
|
- single-threaded execution, [Logs compared to traditional messaging](/en/ch12#sec_stream_logs_vs_messaging), [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- stream analytics, [Stream analytics](/en/ch12#id318)
|
|
- stream joins, [Stream Joins](/en/ch12#sec_stream_joins)-[Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- stream-stream join, [Stream-stream join (window join)](/en/ch12#id440)
|
|
- stream-table join, [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- table-table join, [Table-table join (materialized view maintenance)](/en/ch12#id326)
|
|
- time-dependence of, [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- streams, [Stream Processing](/en/ch12#ch_stream)-[Replaying old messages](/en/ch12#sec_stream_replay)
|
|
- end-to-end, pushing events to clients, [End-to-end event streams](/en/ch13#id349)
|
|
- messaging systems (see messaging systems)
|
|
- processing (see stream processing)
|
|
- relation to databases, [Databases and Streams](/en/ch12#sec_stream_databases)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- (see also changelogs)
|
|
- API support for change streams, [API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- change data capture, [Change Data Capture](/en/ch12#sec_stream_cdc)-[API support for change streams](/en/ch12#sec_stream_change_api)
|
|
- derivative of state by time, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)
|
|
- event sourcing, [Change data capture versus event sourcing](/en/ch12#sec_stream_event_sourcing)
|
|
- keeping systems in sync, [Keeping Systems in Sync](/en/ch12#sec_stream_sync)-[Keeping Systems in Sync](/en/ch12#sec_stream_sync)
|
|
- philosophy of immutable events, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)-[Limitations of immutability](/en/ch12#sec_stream_immutability_limitations)
|
|
- topics, [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- timeliness vs. integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- striping (in columnar encoding), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- strong consistency (see linearizability)
|
|
- strong eventual consistency, [Automatic conflict resolution](/en/ch6#automatic-conflict-resolution)
|
|
- strong one-copy serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- subjects, predicates, and objects (in triple-stores), [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- subscribers (message streams), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- (see also consumers)
|
|
- supercomputers, [Cloud Computing Versus Supercomputing](/en/ch1#id17)
|
|
- Superset (data visualization software), [Analytics](/en/ch11#sec_batch_olap)
|
|
- surveillance, [Surveillance](/en/ch14#id374)
|
|
- (see also privacy)
|
|
- sushi principle, [From data warehouse to data lake](/en/ch1#from-data-warehouse-to-data-lake)
|
|
- sustainability, [Distributed Versus Single-Node Systems](/en/ch1#sec_introduction_distributed)
|
|
- Swagger (service definition format), [Web services](/en/ch5#sec_web_services)
|
|
- swapping to disk (see virtual memory)
|
|
- Swift (programming language)
|
|
- memory management, [Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- sync engines, [Sync Engines and Local-First Software](/en/ch6#sec_replication_offline_clients)-[Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- examples of, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- for local-first software, [Real-time collaboration, offline-first, and local-first apps](/en/ch6#real-time-collaboration-offline-first-and-local-first-apps)
|
|
- synchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks), [Glossary](/en/glossary)
|
|
- comparison to asynchronous networks, [Synchronous Versus Asynchronous Networks](/en/ch9#sec_distributed_sync_networks)
|
|
- system model, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- synchronous replication, [Synchronous Versus Asynchronous Replication](/en/ch6#sec_replication_sync_async), [Glossary](/en/glossary)
|
|
- with multiple leaders, [Multi-Leader Replication](/en/ch6#sec_replication_multi_leader)
|
|
- system administrator, [Operations in the Cloud Era](/en/ch1#sec_introduction_operations)
|
|
- system models, [Knowledge, Truth, and Lies](/en/ch9#sec_distributed_truth), [System Model and Reality](/en/ch9#sec_distributed_system_model)-[Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- assumptions in, [Trust, but Verify](/en/ch13#sec_future_verification)
|
|
- correctness of algorithms, [Defining the correctness of an algorithm](/en/ch9#defining-the-correctness-of-an-algorithm)
|
|
- mapping to the real world, [Mapping system models to the real world](/en/ch9#mapping-system-models-to-the-real-world)
|
|
- safety and liveness, [Safety and liveness](/en/ch9#sec_distributed_safety_liveness)
|
|
- systems of record, [Systems of Record and Derived Data](/en/ch1#sec_introduction_derived), [Glossary](/en/glossary)
|
|
- change data capture, [Implementing change data capture](/en/ch12#id307), [Reasoning about dataflows](/en/ch13#id443)
|
|
- event logs, [Event Sourcing and CQRS](/en/ch3#sec_datamodels_events)
|
|
- treating event log as, [State, Streams, and Immutability](/en/ch12#sec_stream_immutability)
|
|
- systems thinking, [Feedback Loops](/en/ch14#id372)
|
|
|
|
### T
|
|
|
|
- t-digest (algorithm), [Use of Response Time Metrics](/en/ch2#sec_introduction_slo_sla)
|
|
- table-table joins, [Table-table join (materialized view maintenance)](/en/ch12#id326)
|
|
- Tableau (data visualization software), [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp), [Analytics](/en/ch11#sec_batch_olap)
|
|
- tail (Unix tool), [Using logs for message storage](/en/ch12#id300)
|
|
- tail latency (see latency)
|
|
- tail vertex (property graphs), [Property Graphs](/en/ch3#id56)
|
|
- task (workflows) (see workflow engines)
|
|
- TCP (Transmission Control Protocol), [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- comparison to circuit switching, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- comparison to UDP, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- connection failures, [Detecting Faults](/en/ch9#id307)
|
|
- flow control, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing), [Messaging Systems](/en/ch12#sec_stream_messaging)
|
|
- packet checksums, [Weak forms of lying](/en/ch9#weak-forms-of-lying), [The end-to-end argument](/en/ch13#sec_future_e2e_argument), [Trust, but Verify](/en/ch13#sec_future_verification)
|
|
- reliability and duplicate suppression, [Duplicate suppression](/en/ch13#id354)
|
|
- retransmission timeouts, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- use for transaction sessions, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)
|
|
- Temporal (workflow engine), [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- Tensorflow (machine learning library), [Machine Learning](/en/ch11#id290)
|
|
- Teradata (database), [Cloud-Native System Architecture](/en/ch1#sec_introduction_cloud_native), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- term-partitioned indexes (see global secondary indexes)
|
|
- termination (consensus), [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- testing, [Humans and Reliability](/en/ch2#id31)
|
|
- thrashing (out of memory), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- threads (concurrency)
|
|
- actor model, [Distributed actor frameworks](/en/ch5#distributed-actor-frameworks), [Event-Driven Architectures and RPC](/en/ch12#sec_stream_actors_drpc)
|
|
- (see also event-driven architecture)
|
|
- atomic operations, [Atomicity](/en/ch8#sec_transactions_acid_atomicity)
|
|
- background threads, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables)
|
|
- execution pauses, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- memory barriers, [Linearizability and network delays](/en/ch10#linearizability-and-network-delays)
|
|
- preemption, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- single (see single-threaded execution)
|
|
- three-phase commit, [Three-phase commit](/en/ch8#three-phase-commit)
|
|
- three-way relationships, [Property Graphs](/en/ch3#id56)
|
|
- Thrift (data format), [Protocol Buffers](/en/ch5#sec_encoding_protobuf)
|
|
- throughput, [Describing Performance](/en/ch2#sec_introduction_percentiles), [Describing Load](/en/ch2#id33), [Batch Processing](/en/ch11#ch_batch)
|
|
- TIBCO, [Message brokers](/en/ch5#message-brokers)
|
|
- Enterprise Message Service, [Message brokers compared to databases](/en/ch12#id297)
|
|
- StreamBase (stream analytics), [Complex event processing](/en/ch12#id317)
|
|
- TiDB (database)
|
|
- consensus-based replication, [Single-Leader Replication](/en/ch6#sec_replication_leader)
|
|
- regions (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- serving derived data, [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167)
|
|
- snapshot isolation support, [Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- timestamp oracle, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator)
|
|
- transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- tiered storage, [Setting Up New Followers](/en/ch6#sec_replication_new_replica), [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- TigerBeetle (database), [Summary](/en/ch3#summary)
|
|
- deterministic simulation testing, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- TigerGraph (database)
|
|
- GSQL language, [Graph Queries in SQL](/en/ch3#id58)
|
|
- Tigris (object storage), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- TileDB (database), [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- time
|
|
- concurrency and, [The "happens-before" relation and concurrency](/en/ch6#sec_replication_happens_before)
|
|
- cross-channel timing dependencies, [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- in distributed systems, [Unreliable Clocks](/en/ch9#sec_distributed_clocks)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- (see also clocks)
|
|
- clock synchronization and accuracy, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- relying on synchronized clocks, [Relying on Synchronized Clocks](/en/ch9#sec_distributed_clocks_relying)-[Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)-[Limiting the impact of garbage collection](/en/ch9#sec_distributed_gc_impact)
|
|
- reasoning about, in stream processors, [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324)
|
|
- event time versus processing time, [Event time versus processing time](/en/ch12#id322), [Microbatching and checkpointing](/en/ch12#id329), [Unifying batch and stream processing](/en/ch13#id338)
|
|
- knowing when window is ready, [Handling straggler events](/en/ch12#id323)
|
|
- timestamp of events, [Whose clock are you using, anyway?](/en/ch12#id438)
|
|
- types of windows, [Types of windows](/en/ch12#id324)
|
|
- system models for distributed systems, [System Model and Reality](/en/ch9#sec_distributed_system_model)
|
|
- time-dependence in stream joins, [Time-dependence of joins](/en/ch12#sec_stream_join_time)
|
|
- time series data
|
|
- as DataFrames, [DataFrames, Matrices, and Arrays](/en/ch3#sec_datamodels_dataframes)
|
|
- column-oriented storage, [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- time-of-day clocks, [Time-of-day clocks](/en/ch9#time-of-day-clocks)
|
|
- hybrid logical clocks, [Hybrid logical clocks](/en/ch10#hybrid-logical-clocks)
|
|
- timeliness, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- coordination-avoiding data systems, [Coordination-avoiding data systems](/en/ch13#id454)
|
|
- correctness of dataflow systems, [Correctness of dataflow systems](/en/ch13#id453)
|
|
- timeouts, [Unreliable Networks](/en/ch9#sec_distributed_networks), [Glossary](/en/glossary)
|
|
- dynamic configuration of, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- for failover, [Leader failure: Failover](/en/ch6#leader-failure-failover)
|
|
- length of, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing)
|
|
- TimescaleDB (database), [Column-Oriented Storage](/en/ch4#sec_storage_column)
|
|
- timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps)
|
|
- assigning to events in stream processing, [Whose clock are you using, anyway?](/en/ch12#id438)
|
|
- for read-after-write consistency, [Reading Your Own Writes](/en/ch6#sec_replication_ryw)
|
|
- for transaction ordering, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
- insufficiency for enforcing constraints, [Enforcing constraints using logical clocks](/en/ch10#enforcing-constraints-using-logical-clocks)
|
|
- key range sharding by, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- Lamport, [Lamport timestamps](/en/ch10#lamport-timestamps)
|
|
- logical, [Ordering events to capture causality](/en/ch13#sec_future_capture_causality)
|
|
- ordering events, [Timestamps for ordering events](/en/ch9#sec_distributed_lww)
|
|
- timestamp oracle, [Implementing a linearizable ID generator](/en/ch10#implementing-a-linearizable-id-generator)
|
|
- TLA+ (specification language), [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- token bucket (limiting retries), [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- tombstones, [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Disk space usage](/en/ch4#disk-space-usage), [Log compaction](/en/ch12#sec_stream_log_compaction)
|
|
- topics (messaging), [Message brokers](/en/ch5#message-brokers), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- torn pages (B-trees), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- total order, [Glossary](/en/glossary)
|
|
- broadcast (see shared logs)
|
|
- limits of, [The limits of total ordering](/en/ch13#id335)
|
|
- on logical timestamps, [Logical Clocks](/en/ch10#sec_consistency_timestamps)
|
|
- tracing, [Problems with Distributed Systems](/en/ch1#sec_introduction_dist_sys_problems)
|
|
- tracking behavioral data, [Privacy and Tracking](/en/ch14#id373)
|
|
- (see also privacy)
|
|
- trade-offs, [Trade-offs in Data Systems Architecture](/en/ch1#ch_tradeoffs)-[Data Systems, Law, and Society](/en/ch1#sec_introduction_compliance)
|
|
- transaction coordinator (see coordinator)
|
|
- transaction manager (see coordinator)
|
|
- transaction processing, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)-[Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- comparison to analytics, [Characterizing Transaction Processing and Analytics](/en/ch1#sec_introduction_oltp)
|
|
- comparison to data warehousing, [Data Storage for Analytics](/en/ch4#sec_storage_analytics)
|
|
- transactions, [Transactions](/en/ch8#ch_transactions)-[Summary](/en/ch8#summary), [Glossary](/en/glossary)
|
|
- ACID properties of, [The Meaning of ACID](/en/ch8#sec_transactions_acid)
|
|
- atomicity, [Atomicity](/en/ch8#sec_transactions_acid_atomicity)
|
|
- consistency, [Consistency](/en/ch8#sec_transactions_acid_consistency)
|
|
- durability, [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Durability](/en/ch8#durability)
|
|
- isolation, [Isolation](/en/ch8#sec_transactions_acid_isolation)
|
|
- and derived data integrity, [Timeliness and Integrity](/en/ch13#sec_future_integrity)
|
|
- and replication, [Solutions for Replication Lag](/en/ch6#id131)
|
|
- compensating (see compensating transactions)
|
|
- concept of, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview)
|
|
- distributed transactions, [Distributed Transactions](/en/ch8#sec_transactions_distributed)-[Exactly-once message processing revisited](/en/ch8#exactly-once-message-processing-revisited)
|
|
- avoiding, [Derived data versus distributed transactions](/en/ch13#sec_future_derived_vs_transactions), [Making unbundling work](/en/ch13#sec_future_unbundling_favor), [Enforcing Constraints](/en/ch13#sec_future_constraints)-[Coordination-avoiding data systems](/en/ch13#id454)
|
|
- failure amplification, [Maintaining derived state](/en/ch13#id446)
|
|
- for sharded systems, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- in doubt/uncertain status, [Coordinator failure](/en/ch8#coordinator-failure), [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt)
|
|
- two-phase commit, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Three-phase commit](/en/ch8#three-phase-commit)
|
|
- use of, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa)-[Exactly-once message processing](/en/ch8#sec_transactions_exactly_once)
|
|
- XA transactions, [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- OLTP versus analytics queries, [Analytics](/en/ch11#sec_batch_olap)
|
|
- purpose of, [Transactions](/en/ch8#ch_transactions)
|
|
- serializability, [Serializability](/en/ch8#sec_transactions_serializability)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- actual serial execution, [Actual Serial Execution](/en/ch8#sec_transactions_serial)-[Summary of serial execution](/en/ch8#summary-of-serial-execution)
|
|
- pessimistic versus optimistic concurrency control, [Pessimistic versus optimistic concurrency control](/en/ch8#pessimistic-versus-optimistic-concurrency-control)
|
|
- serializable snapshot isolation (SSI), [Serializable Snapshot Isolation (SSI)](/en/ch8#sec_transactions_ssi)-[Performance of serializable snapshot isolation](/en/ch8#performance-of-serializable-snapshot-isolation)
|
|
- two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- single-object and multi-object, [Single-Object and Multi-Object Operations](/en/ch8#sec_transactions_multi_object)-[Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- handling errors and aborts, [Handling errors and aborts](/en/ch8#handling-errors-and-aborts)
|
|
- need for multi-object transactions, [The need for multi-object transactions](/en/ch8#sec_transactions_need)
|
|
- single-object writes, [Single-object writes](/en/ch8#sec_transactions_single_object)
|
|
- snapshot isolation (see snapshots)
|
|
- strict serializability, [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition)
|
|
- weak isolation levels, [Weak Isolation Levels](/en/ch8#sec_transactions_isolation_levels)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- read committed, [Read Committed](/en/ch8#sec_transactions_read_committed)-[Snapshot Isolation and Repeatable Read](/en/ch8#sec_transactions_snapshot_isolation)
|
|
- traversal (graphs), [Property Graphs](/en/ch3#id56)
|
|
- trie (data structure), [Constructing and merging SSTables](/en/ch4#constructing-and-merging-sstables), [Full-Text Search](/en/ch4#sec_storage_full_text)
|
|
- as SSTable index, [The SSTable file format](/en/ch4#the-sstable-file-format)
|
|
- triggers (databases), [Transmitting Event Streams](/en/ch12#sec_stream_transmit)
|
|
- Trino (data warehouse), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- federated databases, [The meta-database of everything](/en/ch13#id341)
|
|
- query optimizer, [Query languages](/en/ch11#sec_batch_query_lanauges)
|
|
- use for ETL, [Extract--Transform--Load (ETL)](/en/ch11#sec_batch_etl_usage)
|
|
- workflow example, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- triple-stores, [Triple-Stores and SPARQL](/en/ch3#id59)-[The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- SPARQL query language, [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- tumbling windows (stream processing), [Types of windows](/en/ch12#id324)
|
|
- (see also windows)
|
|
- in microbatching, [Microbatching and checkpointing](/en/ch12#id329)
|
|
- Turbopuffer (vector search), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- Turtle (RDF data format), [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- Twitter (see X (social network))
|
|
- two-phase commit (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)-[Coordinator failure](/en/ch8#coordinator-failure), [Glossary](/en/glossary)
|
|
- confusion with two-phase locking, [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)
|
|
- coordinator failure, [Coordinator failure](/en/ch8#coordinator-failure)
|
|
- coordinator recovery, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure)
|
|
- how it works, [A system of promises](/en/ch8#a-system-of-promises)
|
|
- performance cost, [Distributed Transactions Across Different Systems](/en/ch8#sec_transactions_xa)
|
|
- problems with XA transactions, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- transactions holding locks, [Holding locks while in doubt](/en/ch8#holding-locks-while-in-doubt)
|
|
- two-phase locking (2PL), [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)-[Index-range locks](/en/ch8#sec_transactions_2pl_range), [What Makes a System Linearizable?](/en/ch10#sec_consistency_lin_definition), [Glossary](/en/glossary)
|
|
- confusion with two-phase commit, [Two-Phase Locking (2PL)](/en/ch8#sec_transactions_2pl)
|
|
- growing and shrinking phases, [Implementation of two-phase locking](/en/ch8#implementation-of-two-phase-locking)
|
|
- index-range locks, [Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- performance of, [Performance of two-phase locking](/en/ch8#performance-of-two-phase-locking)
|
|
- type checking, dynamic versus static, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
|
|
### U
|
|
|
|
- UDP (User Datagram Protocol)
|
|
- comparison to TCP, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- multicast, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- Ultima Online (game), [Sharding](/en/ch7#ch_sharding)
|
|
- unbounded datasets, [Stream Processing](/en/ch12#ch_stream), [Glossary](/en/glossary)
|
|
- (see also streams)
|
|
- unbounded delays, [Glossary](/en/glossary)
|
|
- in networks, [Timeouts and Unbounded Delays](/en/ch9#sec_distributed_queueing)
|
|
- process pauses, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- unbundling databases, [Unbundling Databases](/en/ch13#sec_future_unbundling)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- composing data storage technologies, [Composing Data Storage Technologies](/en/ch13#id447)-[Unbundled versus integrated systems](/en/ch13#id448)
|
|
- federation versus unbundling, [The meta-database of everything](/en/ch13#id341)
|
|
- designing applications around dataflow, [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)-[Stream processors and services](/en/ch13#id345)
|
|
- observing derived state, [Observing Derived State](/en/ch13#sec_future_observing)-[Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- materialized views and caching, [Materialized views and caching](/en/ch13#id451)
|
|
- multi-shard data processing, [Multi-shard data processing](/en/ch13#sec_future_unbundled_multi_shard)
|
|
- pushing state changes to clients, [Pushing state changes to clients](/en/ch13#id348)
|
|
- uncertain (transaction status) (see in doubt)
|
|
- union type (in Avro), [Schema evolution rules](/en/ch5#schema-evolution-rules)
|
|
- uniq (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis), [Distributed Job Orchestration](/en/ch11#id278)
|
|
- uniqueness constraints
|
|
- asynchronously checked, [Loosely interpreted constraints](/en/ch13#id362)
|
|
- requiring consensus, [Uniqueness constraints require consensus](/en/ch13#id452)
|
|
- requiring linearizability, [Constraints and uniqueness guarantees](/en/ch10#sec_consistency_uniqueness)
|
|
- uniqueness in log-based messaging, [Uniqueness in log-based messaging](/en/ch13#sec_future_uniqueness_log)
|
|
- Unity (data catalog), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- universally unique identifiers (see UUIDs)
|
|
- Unix philosophy
|
|
- comparison to relational databases, [Unbundling Databases](/en/ch13#sec_future_unbundling), [The meta-database of everything](/en/ch13#id341)
|
|
- comparison to stream processing, [Processing Streams](/en/ch12#sec_stream_processing)
|
|
- Unix pipes, [Simple Log Analysis](/en/ch11#sec_batch_log_analysis)
|
|
- compared to distributed batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- UPDATE statement (SQL), [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- updates
|
|
- preventing lost updates, [Preventing Lost Updates](/en/ch8#sec_transactions_lost_update)-[Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- atomic write operations, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- automatically detecting lost updates, [Automatically detecting lost updates](/en/ch8#automatically-detecting-lost-updates)
|
|
- compare-and-set (CAS), [Conditional writes (compare-and-set)](/en/ch8#sec_transactions_compare_and_set)
|
|
- conflict resolution and replication, [Conflict resolution and replication](/en/ch8#conflict-resolution-and-replication)
|
|
- using explicit locking, [Explicit locking](/en/ch8#explicit-locking)
|
|
- preventing write skew, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- utilization
|
|
- batch process scheduling, [Resource Allocation](/en/ch11#id279)
|
|
- increasing through preemption, [Handling Faults](/en/ch11#id281)
|
|
- trade-off with latency, [Can we not simply make network delays predictable?](/en/ch9#can-we-not-simply-make-network-delays-predictable)
|
|
- uTP protocol (BitTorrent), [The Limitations of TCP](/en/ch9#sec_distributed_tcp)
|
|
- UUIDs, [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
|
|
### V
|
|
|
|
- validity (consensus), [Single-value consensus](/en/ch10#single-value-consensus), [Atomic commitment as consensus](/en/ch10#atomic-commitment-as-consensus)
|
|
- vBuckets (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- vector clocks, [Version vectors](/en/ch6#version-vectors)
|
|
- (see also version vectors)
|
|
- and Lamport/hybrid logical clocks, [Lamport/hybrid logical clocks versus vector clocks](/en/ch10#lamporthybrid-logical-clocks-vs-vector-clocks)
|
|
- and version vectors, [Version vectors](/en/ch6#version-vectors)
|
|
- vector embedding, [Vector Embeddings](/en/ch4#id92)
|
|
- vectorized processing, [Query Execution: Compilation and Vectorization](/en/ch4#sec_storage_vectorized)
|
|
- vendor lock-in, [Pros and Cons of Cloud Services](/en/ch1#sec_introduction_cloud_tradeoffs)
|
|
- Venice (database), [Serving Derived Data](/en/ch11#sec_batch_serving_derived)
|
|
- verification, [Trust, but Verify](/en/ch13#sec_future_verification)-[Tools for auditable data systems](/en/ch13#id366)
|
|
- avoiding blind trust, [Don't just blindly trust what they promise](/en/ch13#id364)
|
|
- designing for auditability, [Designing for auditability](/en/ch13#id365)
|
|
- end-to-end integrity checks, [The end-to-end argument again](/en/ch13#id456)
|
|
- tools for auditable data systems, [Tools for auditable data systems](/en/ch13#id366)
|
|
- version control systems
|
|
- merge conflicts, [Manual conflict resolution](/en/ch6#manual-conflict-resolution)
|
|
- reliance on immutable data, [Concurrency control](/en/ch12#sec_stream_concurrency)
|
|
- version vectors, [Problems with different topologies](/en/ch6#problems-with-different-topologies), [Version vectors](/en/ch6#version-vectors)
|
|
- dotted, [Version vectors](/en/ch6#version-vectors)
|
|
- versus vector clocks, [Version vectors](/en/ch6#version-vectors)
|
|
- Vertica (database), [Cloud Data Warehouses](/en/ch4#sec_cloud_data_warehouses)
|
|
- handling writes, [Writing to Column-Oriented Storage](/en/ch4#writing-to-column-oriented-storage)
|
|
- vertical scaling (see scaling up)
|
|
- vertices (in graphs), [Graph-Like Data Models](/en/ch3#sec_datamodels_graph)
|
|
- property graph model, [Property Graphs](/en/ch3#id56)
|
|
- video games, [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- video transcoding (example), [Cross-channel timing dependencies](/en/ch10#cross-channel-timing-dependencies)
|
|
- views (SQL queries), [Datalog: Recursive Relational Queries](/en/ch3#id62)
|
|
- materialized views (see materialization)
|
|
- Viewstamped Replication (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- use of model-checking, [Model checking and specification languages](/en/ch9#model-checking-and-specification-languages)
|
|
- view number, [From single-leader replication to consensus](/en/ch10#from-single-leader-replication-to-consensus)
|
|
- virtual block device, [Separation of storage and compute](/en/ch1#sec_introduction_storage_compute)
|
|
- virtual file system, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- comparison to distributed filesystems, [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- virtual machines, [Layering of cloud services](/en/ch1#layering-of-cloud-services)
|
|
- context switches, [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- network performance, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- noisy neighbors, [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- virtualized clocks in, [Clock Synchronization and Accuracy](/en/ch9#sec_distributed_clock_accuracy)
|
|
- virtual memory
|
|
- process pauses due to page faults, [Latency and Response Time](/en/ch2#id23), [Process Pauses](/en/ch9#sec_distributed_clocks_pauses)
|
|
- Virtuoso (database), [The SPARQL query language](/en/ch3#the-sparql-query-language)
|
|
- VisiCalc (spreadsheets), [Designing Applications Around Dataflow](/en/ch13#sec_future_dataflow)
|
|
- Vitess (database)
|
|
- key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- vnodes (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- vocabularies, [Triple-Stores and SPARQL](/en/ch3#id59)
|
|
- Voice over IP (VoIP), [Network congestion and queueing](/en/ch9#network-congestion-and-queueing)
|
|
- VoltDB (database)
|
|
- cross-shard serializability, [Sharding](/en/ch8#sharding)
|
|
- deterministic stored procedures, [Pros and cons of stored procedures](/en/ch8#sec_transactions_stored_proc_tradeoffs)
|
|
- in-memory storage, [Keeping everything in memory](/en/ch4#sec_storage_inmemory)
|
|
- process-per-core model, [Pros and Cons of Sharding](/en/ch7#sec_sharding_reasons)
|
|
- secondary indexes, [Local Secondary Indexes](/en/ch7#id166)
|
|
- serial execution of transactions, [Actual Serial Execution](/en/ch8#sec_transactions_serial)
|
|
- statement-based replication, [Statement-based replication](/en/ch6#statement-based-replication), [Rebuilding state after a failure](/en/ch12#sec_stream_state_fault_tolerance)
|
|
- transactions in stream processing, [Atomic commit revisited](/en/ch12#sec_stream_atomic_commit)
|
|
|
|
### W
|
|
|
|
- WAL (write-ahead log), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal)
|
|
- WAL-G (backup tool), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- WarpStream (messaging), [Disk space usage](/en/ch12#sec_stream_disk_usage)
|
|
- web services (see services)
|
|
- webhooks, [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- webMethods (messaging), [Message brokers](/en/ch5#message-brokers)
|
|
- WebSocket (protocol), [Pushing state changes to clients](/en/ch13#id348)
|
|
- wide-column data model, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- versus column-oriented storage, [Column Compression](/en/ch4#sec_storage_column_compression)
|
|
- windows (stream processing), [Stream analytics](/en/ch12#id318), [Reasoning About Time](/en/ch12#sec_stream_time)-[Types of windows](/en/ch12#id324)
|
|
- infinite windows for changelogs, [Maintaining materialized views](/en/ch12#sec_stream_mat_view), [Stream-table join (stream enrichment)](/en/ch12#sec_stream_table_joins)
|
|
- knowing when all events have arrived, [Handling straggler events](/en/ch12#id323)
|
|
- stream joins within a window, [Stream-stream join (window join)](/en/ch12#id440)
|
|
- types of windows, [Types of windows](/en/ch12#id324)
|
|
- WITH RECURSIVE syntax (SQL), [Graph Queries in SQL](/en/ch3#id58)
|
|
- Word2Vec (language model), [Vector Embeddings](/en/ch4#id92)
|
|
- workflow engines, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- Airflow (see Airflow (workflow scheduler))
|
|
- batch processing, [Scheduling Workflows](/en/ch11#sec_batch_workflows)
|
|
- Camunda (see Camunda (workflow engine))
|
|
- Dagster (see Dagster (workflow scheduler))
|
|
- durable execution, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- ETL (see ETL (extract-transform-load))
|
|
- executor, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows)
|
|
- orchestrators, [Durable Execution and Workflows](/en/ch5#sec_encoding_dataflow_workflows), [Batch Processing](/en/ch11#ch_batch)
|
|
- Orkes (see Orkes (workflow engine))
|
|
- Prefect (see Prefect (workflow scheduler))
|
|
- reliance on determinism, [Deterministic simulation testing](/en/ch9#deterministic-simulation-testing)
|
|
- Restate (see Restate (workflow engine))
|
|
- Temporal (see Temporal (workflow engine))
|
|
- working set, [Sorting Versus In-memory Aggregation](/en/ch11#id275)
|
|
- write amplification, [Write amplification](/en/ch4#write-amplification)
|
|
- write path (derived data), [Observing Derived State](/en/ch13#sec_future_observing)
|
|
- write skew (transaction isolation), [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- characterizing, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew)-[Phantoms causing write skew](/en/ch8#sec_transactions_phantom), [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)
|
|
- examples of, [Write Skew and Phantoms](/en/ch8#sec_transactions_write_skew), [More examples of write skew](/en/ch8#more-examples-of-write-skew)
|
|
- materializing conflicts, [Materializing conflicts](/en/ch8#materializing-conflicts)
|
|
- occurrence in practice, [Maintaining integrity in the face of software bugs](/en/ch13#id455)
|
|
- phantoms, [Phantoms causing write skew](/en/ch8#sec_transactions_phantom)
|
|
- preventing
|
|
- in snapshot isolation, [Decisions based on an outdated premise](/en/ch8#decisions-based-on-an-outdated-premise)-[Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- in two-phase locking, [Predicate locks](/en/ch8#predicate-locks)-[Index-range locks](/en/ch8#sec_transactions_2pl_range)
|
|
- options for, [Characterizing write skew](/en/ch8#characterizing-write-skew)
|
|
- write-ahead log (WAL), [Making B-trees reliable](/en/ch4#sec_storage_btree_wal), [Write-ahead log (WAL) shipping](/en/ch6#write-ahead-log-wal-shipping)
|
|
- in durable execution, [Durable execution](/en/ch5#durable-execution)
|
|
- writes (database)
|
|
- atomic write operations, [Atomic write operations](/en/ch8#atomic-write-operations)
|
|
- detecting writes affecting prior reads, [Detecting writes that affect prior reads](/en/ch8#sec_detecting_writes_affect_reads)
|
|
- preventing dirty writes with read committed, [No dirty writes](/en/ch8#sec_transactions_dirty_write)
|
|
- WS-\* framework, [The problems with remote procedure calls (RPCs)](/en/ch5#sec_problems_with_rpc)
|
|
- WS-AtomicTransaction (2PC), [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc)
|
|
|
|
### X
|
|
|
|
- X (social network)
|
|
- constructing home timelines (example), [Case Study: Social Network Home Timelines](/en/ch2#sec_introduction_twitter), [Deriving several views from the same event log](/en/ch12#sec_stream_deriving_views), [Table-table join (materialized view maintenance)](/en/ch12#id326), [Materialized views and caching](/en/ch13#id451)
|
|
- cost of joins, [Denormalization in the social networking case study](/en/ch3#denormalization-in-the-social-networking-case-study)
|
|
- describing load, [Describing Load](/en/ch2#id33)
|
|
- fault tolerance, [Fault Tolerance](/en/ch2#id27)
|
|
- performance metrics, [Describing Performance](/en/ch2#sec_introduction_percentiles)
|
|
- DistributedLog (event log), [Using logs for message storage](/en/ch12#id300)
|
|
- Snowflake (ID generator), [ID Generators and Logical Clocks](/en/ch10#sec_consistency_logical)
|
|
- XA transactions, [Two-Phase Commit (2PC)](/en/ch8#sec_transactions_2pc), [XA transactions](/en/ch8#xa-transactions)-[Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- heuristic decisions, [Recovering from coordinator failure](/en/ch8#recovering-from-coordinator-failure)
|
|
- problems with, [Problems with XA transactions](/en/ch8#problems-with-xa-transactions)
|
|
- xargs (Unix tool), [Simple Log Analysis](/en/ch11#sec_batch_log_analysis)
|
|
- XFS (file system), [Distributed Filesystems](/en/ch11#sec_batch_dfs)
|
|
- XGBoost (machine learning library), [Machine Learning](/en/ch11#id290)
|
|
- XML
|
|
- binary variants, [Binary encoding](/en/ch5#binary-encoding)
|
|
- data locality, [Data locality for reads and writes](/en/ch3#sec_datamodels_document_locality)
|
|
- encoding RDF data, [The RDF data model](/en/ch3#the-rdf-data-model)
|
|
- for application data, issues with, [JSON, XML, and Binary Variants](/en/ch5#sec_encoding_json)
|
|
- in relational databases, [Schema flexibility in the document model](/en/ch3#sec_datamodels_schema_flexibility)
|
|
- XML databases, [Relational Model versus Document Model](/en/ch3#sec_datamodels_history), [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- Xorq (query engine), [The meta-database of everything](/en/ch13#id341)
|
|
- XPath, [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
- XQuery, [Query languages for documents](/en/ch3#query-languages-for-documents)
|
|
|
|
### Y
|
|
|
|
- Yahoo
|
|
- response time study, [Average, Median, and Percentiles](/en/ch2#id24)
|
|
- YARN (job scheduler), [Distributed Job Orchestration](/en/ch11#id278), [Separation of application code and state](/en/ch13#id344)
|
|
- ApplicationMaster, [Distributed Job Orchestration](/en/ch11#id278)
|
|
- Yjs (CRDT library), [Pros and cons of sync engines](/en/ch6#pros-and-cons-of-sync-engines)
|
|
- YugabyteDB (database)
|
|
- hash-range sharding, [Sharding by hash range](/en/ch7#sharding-by-hash-range)
|
|
- key-range sharding, [Sharding by Key Range](/en/ch7#sec_sharding_key_range)
|
|
- multi-leader replication, [Geographically Distributed Operation](/en/ch6#sec_replication_multi_dc)
|
|
- request routing, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- sharded secondary indexes, [Global Secondary Indexes](/en/ch7#id167)
|
|
- tablets (sharding), [Sharding](/en/ch7#ch_sharding)
|
|
- transactions, [What Exactly Is a Transaction?](/en/ch8#sec_transactions_overview), [Database-internal Distributed Transactions](/en/ch8#sec_transactions_internal)
|
|
- use of clock synchronization, [Synchronized clocks for global snapshots](/en/ch9#sec_distributed_spanner)
|
|
|
|
### Z
|
|
|
|
- Zab (consensus algorithm), [Consensus](/en/ch10#sec_consistency_consensus), [Consensus in Practice](/en/ch10#sec_consistency_total_order)
|
|
- use in ZooKeeper, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- zero-copy, [Formats for Encoding Data](/en/ch5#sec_encoding_formats)
|
|
- zero-disk architecture (ZDA), [Setting Up New Followers](/en/ch6#sec_replication_new_replica)
|
|
- ZeroMQ (messaging library), [Direct messaging from producers to consumers](/en/ch12#id296)
|
|
- zombies (split brain), [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens)
|
|
- zones (cloud computing) (see availability zones)
|
|
- ZooKeeper (coordination service), [Coordination Services](/en/ch10#sec_consistency_coordination)-[Service discovery](/en/ch10#service-discovery)
|
|
- generating fencing tokens, [Fencing off zombies and delayed requests](/en/ch9#sec_distributed_fencing_tokens), [Using shared logs](/en/ch10#sec_consistency_smr), [Coordination Services](/en/ch10#sec_consistency_coordination)
|
|
- linearizable operations, [Implementing Linearizable Systems](/en/ch10#sec_consistency_implementing_linearizable)
|
|
- locks and leader election, [Locking and leader election](/en/ch10#locking-and-leader-election)
|
|
- observers, [Service discovery](/en/ch10#service-discovery)
|
|
- use for service discovery, [Load balancers, service discovery, and service meshes](/en/ch5#sec_encoding_service_discovery), [Service discovery](/en/ch10#service-discovery)
|
|
- use for shard assignment, [Request Routing](/en/ch7#sec_sharding_routing)
|
|
- use of Zab algorithm, [Consensus](/en/ch10#sec_consistency_consensus)
|