Basically databases are made of btrees. The differenece between mongodb and sql is schema
enforcement.
ACID
- Atomic: Either it happens or doesn’t happen
- Durability: A transaction is permanant and is stored in the non-volatile memory
- Isolation: Transactions appear to run sequentially.
- Consistency: constraints, cascades, triggers are all followed and atomic.
Concurrency
MVCC algorithm - Takes a snaphot to allow reads while writes happen separately - View Serialisability
2 Phase locking - Use locks for allowing consistent reads as well - Conflict serialisability
CAP Theorem
Availability is achieved by replicating the data across different machines
Let’s say there is a network problem.
Consistency: A read is guaranteed to return the most recent write for a given client (consistency)
Availability: Any node can service requests
BASE - Basic Availability Soft State Eventual Consistency - Serve stale data and sync later
ACID - We can wait for the state to sync state and throw an error (ACID)
Consistency: The data is the same everywhere. If you do a write and then a read that read will contain that write.
Availability: It stays up
Partition Tolerance: If one instance can�t talk to another the cluster can survive
Practical Errors
- Network
- Disastor
- Operating System crash
- Hardware crashed
- Power cuts
Indexes
Nonclustered indexa - Only index has sorted rows. Actual rows are stored in an arbitrary order.
Clustered Index - Store rows
Compound index - Sorted excel
InnoDB uses a Btree to store values at the leaf nodes.
Covering Index already has the values needed.
Data Warehouse
OLAP -> Analytics Data
OLTP -> Transactional Data
Sharding
- Split tables to databases
- Split users table based on alphabet.
- Convert hash to a number and hash the id
- Lookup table
- Geolocation
Data Retrieval
- Get Request
- Get Shrad
- Aggregate
- Session Affinity
Operations that should be automated
Split Shrad
Move Shrad
To do the above online you need to lock tables while using replication.
Modelling
datum = unit of information
information = unit of the world = fact
data is a plural form
information can be a set of combination of datum
attribue can be atomic, composite, multiple-valued, derived over a domain and super
database ~ databank
analogy to a bank from which application of constraints is clear
dbms is a set of utilities used to build and maintain db’s
data model is a structural and a calculus logic plan by which data are understood and manipulated
the accepted conceptual model consists of entitity types and relationships
entity = self-contained information
weak entity = context-dependent information
entity set = set of entities
relationship set = mapping of affairs among entities of different entity sets
ER schema = outline = set of combination of attributes which give out entity sets and relation sets = can be a diagram (see cheatsheet)
the accepted logical model consists of tables
table = relationship set | entity set = relation = set of tuples
tuple = entity or a mapping
relation schema = set of attributes
database = set of relation schema
ddl = data definition language
create, alter, drop
insert, delete, update
dml = data manipulation language
select+where,project, rename, union, intersection, difference, cross, division, join+where
*set operations work wrt entire table
join is implemented using cross and select
intersection is implemented using union and difference
division = column matching
1-1 does not require a separate table but separation can help speed up
1-M requires the M table to hold a foregn key
M-M requires a association table
join and nested queries are used in the 1-M and M-M case
equi join = union of tuples with same foreign key
inner join = natural join = equi join + unique columns
outer join = place nulls (U) = union of tuples wrt foreign key while preserving the entire table
cross join removes the common part
decomposition = cohesion or else annomalies
entity constraint = primary key = super attribute = should be unique and not null
referenctial integrity constraint = foreign key = foreign tuple must exist
sematinc constraints = triggers
strict functional dependency contraint makes common attributes of any two tuples determine anyother attributes semanticlly impossible without a relationship
update anamoly – update at multiple tuples for single change
insertion anamoly – can not insert some information
deletion anamoly – more information is lost than required
0nf = data should have a primary key
1nf = atomic attribute values. the redundant values are repeated within another table.
2nf = primary key must the the sole primary key (a primary attribute alone should not be a key)
3nf = 2nf + no non-prime attribute is transitively dependant on the prime key
(association table)
bcnf = 2nf + no non-prime attribute is transitively dependant on any prime key
stored procedures - checking, looping, variables, network transperancy
backup and replication - dump, M-M, M-S(-S-S ..)
security - authentication
optimization - indexing
views
Distributed
couchbase
cassandra
voldemort
mongodb
immudb
dgraph
Memcache
get
set
delete
incr
decr
multiget
multiset
read
- read from cache
- else query
write
- write to db
- refresh key
Doesn’t do any clustering.
Just add servers manually.
Uses consistent hashing.
Expiration + LRU
Memory for an item is not actively reclaimed. If you store an item and
it expires, it sits in the LRU cache at its position until it falls to
the end and is reused. However, if you fetch an expired item, memcached
will find the item, notice that it’s expired, and free its memory. This
gives you the common case of normal cache churn reusing its own memory.
Items can also be evicted to make way for new items that need to be stored, or expired items are discovered and their memory reused.
Redis is better at reclaiming unused memory.
BAS
Parse
Firebase
Couchdb
p2pDb
https://remotestorage.io
https://unhosted.org/adventures/16/Our-plan-to-save-the-web.html
https://5apps.com/deploy/home
Couchbase
Hasura
Pouchdb
Nimbus
http://deployd.com
https://kuzzle.io
https://getmateria.com/#addons
https://appwrite.io
https://sapphire-db.com/start/main
https://api-platform.com
http://nobackend.org
https://bubble.io/plugins
https://strapi.io
Featherjs
https://glitch.com/@glitch
Pandas
http://ehneilsen.net/notebook/pandasExamples/pandas_examples.html
http://pbpython.com/pandas-pivot-table-explained.html
Explanation of pandas pivot_table function.
relational algebra <-> sql
relation <-> table <-> entity set
derived relation <-> view
tuple <-> row <-> entity && entity type
attribute <-> column
domain constraints
boolean constraints
primary key <-> entity
foreign key <-> weak entity
1st form -> no lists within an attribute if so split
2nd form -> no functional dependencies in a concatenated key if so split and recursively apply
3rd form -> no functional dependencies at all if so separate and use foreign key
File system
folder/file
basic attributes, arbitrary attributes
fileops new save delete move copy link attribute_shit
sync or async io
deafault caching(os) concurrency(locking) network(nfs) support
security provided by jails unions tickets
serialisation persistence simple
transaction needs a library suuport
revision control easy
multiple formats – hash, tree, record, xml, document
query by grep && find or more sophisticated functions
folder == table
file == row
link == foreign link
however scripting languages are a must for using filesystem as database
EAV
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
https://www.youtube.com/watch?v=ZoLoIFW1H6g
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.linuxjournal.com/article/7451
Pintrest
Flickr