Slide 1Cloud computing, cloud databasesCloud computing, cloud databasesRelaxing ACID propertiesEventual ConsistencyCAP Theoremhttps://foundationdb.com/white-papers/the-cap-theoremCAP TheoremCAP TheoremNoSQL databasesCategories of NoSQL databasesCategories of NoSQL databasesKey-Value Data StoresColumn-oriented*Column-orientedA table representation of a row in BigTableColumn-orientedDocument-basedTypical NoSQL APIRepresentatives of NoSQL databases key-valuedRepresentatives of NoSQL databases column-orientedRepresentatives of NoSQL databases document-basedSlide 231NoSQL DatabasesSlides take fromJ. PokornýKSI MFF UKDATAKON 2011J. Pokorný2Cloud computing, cloud databasesCloud computingdata intensive applications on hundreds of thousands of commodity servers and storage devicesbasic features: elasticity, fault-toleranceautomatic provisioning Cloud databases: traditional scaling up (adding new expensive big servers) is not possiblerequires higher level of skills is not reliable in some cases Architectural principle: scaling out (or horizontal scaling) based on data partitioning, i.e. dividing the database across many (inexpensive) machinesDATAKON 2011J. Pokorný3Cloud computing, cloud databasesTechnique: data sharding, i.e. horizontal partitioning of data (e.g. hash or range partitioning)Consequences: manage parallel access in the applicationscales well for both reads and writesnot transparent, application needs to be partition-awareDATAKON 2011J. Pokorný4Relaxing ACID propertiesCloud computing: ACID is hard to achieve, moreover, it is not always required, e.g. for blogs, status updates, product listings, etc.AvailabilityTraditionally, thought of as the server/process available 99.999 % of timeFor a large-scale node system, there is a high probability that a node is either down or that there is a network partitioning Partition tolerance ensures that write and read operations are redirected to available replicas when segments of the network become disconnectedDATAKON 2011J. Pokorný5Eventual ConsistencyEventual ConsistencyWhen no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistentFor a given accepted update and a given node, eventually either the update reaches the node or the node is removed from serviceBASE (Basically Available, Soft state, Eventual consistency) properties, as opposed to ACIDSoft state: copies of a data item may be inconsistentEventually Consistent – copies becomes consistent at some later time if there are no more updates to that data itemBasically Available – possibilities of faults but not a fault of the whole systemDATAKON 2011J. Pokorný6CAP TheoremSuppose three properties of a systemConsistency (all copies have same value)Availability (system can run even if parts have failed)Partitions (network can break into two or more parts, each with active systems that can not influence other parts)Brewer’s CAP “Theorem”: for any system sharing data it is impossible to guarantee simultaneously all of these three propertiesVery large systems will partition at some pointit is necessary to decide between C and Atraditional DBMS prefer C over A and Pmost Web applications choose A (except in specific applications such as order processing)https://foundationdb.com/white-papers/the-cap-theoremBrewer originally described this impossibility result as forcing a choice of "two out of the three" CAP properties, leaving three viable design options: CP, AP, and CA. However, further consideration shows that CA is not really a coherent option because a system that is not Partition-tolerant will, by definition, be forced to give up Consistency or Availability during a partition. A more modern interpretation of the theorem is: during a network partition, a distributed system must choose either Consistency or Availability.DATAKON 2011J. Pokorný7CAP TheoremDATAKON 2011J. Pokorný8DATAKON 2011J. Pokorný9CAP TheoremDrop A or C of ACIDrelaxing C makes replication easy, facilitates fault tolerance,relaxing A reduces (or eliminates) need for distributed concurrency control.DATAKON 2011J. Pokorný10NoSQL databasesThe name stands for Not Only SQLCommon features:non-relational usually do not require a fixed table schemahorizontal scalable mostly open sourceMore characteristicsrelax one or more of the ACID properties (see CAP theorem)replication supporteasy API (if SQL, then only its very restricted variant) Do not fully support relational featuresno join operations (except within partitions),no referential integrity constraints across partitions.DATAKON 2011J. Pokorný11Categories of NoSQL databaseskey-value storescolumn NoSQL databases document-basedXML databases (myXMLDB, Tamino, Sedna) graph database (neo4j, InfoGrid)DATAKON 2011J. Pokorný12Categories of NoSQL databaseskey-value storescolumn NoSQL databases document-basedXML databases (myXMLDB, Tamino, Sedna) graph database (neo4j, InfoGrid)DATAKON 2011J. Pokorný13Key-Value Data StoresExample: SimpleDBBased on Amazon’s Single Storage Service (S3)items (represent objects) having one or more pairs (name, value), where name denotes an attribute.An attribute can have multiple values.items are combined into domains.DATAKON 2011J. Pokorný14Column-oriented*store data in column order allow key-value pairs to be stored (and retrieved on key) in a massively parallel systemdata model: families of attributes defined in a schema, new attributes can be addedstoring principle: big hashed distributed tablesproperties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application* Better: extendible recordsDATAKON 2011J. Pokorný15Column-orientedExample: BigTableindexed by row key, column key and timestamp. i.e. (row: string , column: string , time: int64 ) String.rows are ordered in lexicographic order by row key.row range for a table is dynamically partitioned, each row range is called a tablet.columns: syntax is family:qualifier“Contents:”“anchor:cnnsi.com”“anchor:my.look.ca”“mff.ksi.www”“MFF” “MFF.cz”t3t5t6t9 t8<html><html><html><html>column familyDATAKON 2011J. Pokorný16A table representation of a row in BigTableRow key Time stamp Column name
View Full Document