整理学习到的databse知识。

### Storage

###### Hierachy (speed fast to slow):

CPU Register -> Cache -> Main Memory -> Disk -> Tapes

###### Disk Capacity

Disk Capacity = #tracks per surface x (2 x #platter) x #sector per track x byte per sector.

**Access time = seek time + rotational delay + transfer time + other delay**- Seek time: time to move the arm to position disk head on the right track.
- Rotational delay: time to wait for sector to rotate under the disk head.
- Average rotational delay: time for 1/2 revolution.
- transfer time: block size / transfer rate.

###### Buffer Management

Pages in memory is like blocks in hard disk.

###### Example 1

1 | Given a total revolution of 7200 RPM. |

###### Example 2

1 | 3600 RPM = 60 revolutions/s -> 1 rev = 16.66 ms |

### Record Layout

#### Conventional Indexes

###### Types:

- Dense index: index record appears for every search-key value
- Sparse index: contains index records for only some search-key value
- Multilevel index: if primary index does not fit in memory, access becomes expensive.
- Outer index: a sparse index of primary index.
- Inner index: the primary index file

- Duplicate keys:
- Dense index: only keep record in index for first data record with each search key value.
- Sparse index: has key-pointer pairs corresponding to the first search key on each block of the data file.

#### B+ Tree

###### Characteristics:

full(packed) and balanced(almost uniform height).

1 | Consider a DBMS has following characteristics: Pointers:4 bytes. Keys: 4 bytes. Blocks:100 bytes. |

#### B Tree

#### Hashing

###### Characteristics:

$$

Avg\ occupancy = \frac{r}{n\times\gamma}

$$

n: The number of buckets that is currently in use.

r: current number of search keys stored in the buckets.

$\gamma$: block size(# search keys that can be stored in 1 block).

### Relational Algebra

#### Algebra expressions / Translate SQL into relational algebra / Equivlaence.

###### Query Process

- parsing: translating the query into a parsing tree.
- query rewrite: the parse tree is transformed into an expression tree of relational algebra (logical query plan)
- physical plan generation: translate the logical plan into a physical plan
- select algorithms to implement each operator
- choose order of operations.

###### Physical Query Plan

- Sequential-Scan
- Read tuples from the relation R by reading data blocks - one block at a time from disk to memory.

- Index-Scan
- The relation R must have an index
- The index is scanned to find blocks that contain the desired tuples.
- All the blocks containing desired tuples are then read into the memory - one block at a time.

###### Clustered and unclustered files/relations/indexes

- Clustered file: A file that stores records of one relation.
- Unclustered file: A file that stores records of multiple relations.
- Clustered index: Index that allows tuples to be read in an order that corresponds to physical order.
- Relation R is clustered.

Operator | I/O cost | Explanation |
---|---|---|

Sequential-Scan | B(R) | Read all blocks, and there are B(R) blocks. |

Index-Scan | ≤B(R) | Depends on number of values in the index is scanned. |

- Relation R is unclustered.

Operator | I/O cost | Explanation |
---|---|---|

Sequential-Scan | ~T(R) | Assuming next tuple is not found in the current block. We will read 1 block per tuple |

Index-Scan | ≤T(R) | Depends on number of values in the index is scanned. |

###### SQL grammer

1 | <Query> ::= SELECT <SelList> |

###### Example of parse trees

1 | SELECT movieTitle |

###### Example 2 of parse trees

1 | SELECT movieTitle |

###### Virtual Relation Pre-Processing

1 | CREATE VIEW Paramount_Movies AS |

###### Consider the following query on the virtual table Paramount_Movies:

1 | SELECT title |

###### Then, the virtual table is replaced by the corresponding logical query plan

#### Convert Parse Tree into initial L.Q.P

###### Set vs Bag semantics

- Set semantics: No duplicate entries.
- Bag semantics: Allow duplicate entries.

###### Operators

- Selection
- Syntax: σ c (R)
- Set: { t | t ∈ R AND t fulfills c}
- Bag: { $t^n$ | $t^n$ ∈ R AND t fulfills c}

- Renaming
- Syntax: ρ A (R)
- Set: { t.A | t∈R}
- Bag: { $(t.A)^n$ | $t^n$ ∈R}
- Example: $ρ_c$←a(R)

- Projection
- Syntax: π A (R)
- Set: { t.A | t∈R}
- Bag: { $(t.A)^n$ | $t^n$ ∈R}

- Joins
- Theta, natural, cross-product, outer, anti
- Cross Product
- Syntax: R × S
- Set: { (t,s) | t∈R AND s∈S}
- Bag: { (t,s)^n×m | t^n ∈R AND s^m ∈S}

- Join
- Syntax: $R\bowtie_{c}S$
- Set: { (t,s) | t∈R AND s∈S AND (t,s) matches c}
- Bag: { $(t,s)^{n×m}$ |$t^n$ ∈R AND $s^m$ ∈S AND (t,s) matches c}

- Natural Join
- Syntax: $R\bowtie S$
- All combinations of tuples from R and S that match on common attributes
- A = common attributes of R and S
- C = exclusive attributes of S
- Set: {(t,s.C) | t∈R AND s∈S AND t.A=s.A}
- Bag: {$(t,s.C)^{n×m}$ | $t^n$ ∈R AND $s^m$ ∈S AND t.A=s.A}

- Left-outer Join
- Syntax:$R=\bowtie{c} S$
- Contains all tuples in R join S
- Also includes every tuple in R that is not joined with a tuple in S, after padding a NULL
- $R=\bowtie _{a=d}S$

- Right-outer Join
- Syntax:$R\bowtie =S$
- s∈S without match, ﬁll R attributes with NULLrigh
- $R\bowtie =_{a=d}S$

- Full-outer Join
- Syntax: $R=\bowtie =_{c}S$
- $R=\bowtie=_{a=d}S$

- Aggregation
- All operators treat a relation as a bag of tuples.
- SUM: computes the sum of a column with numerical values.
- AVG: computes the average of a column with numerical values.
- MIN and MAX: Natural Order
- COUNT: computes the number of non-NULL tuples in a column.
- Grouping and aggregation generally need to be implemented and optimized together
- Syntax: $G\gamma A(R)$
- A is list of aggregation functions
- G is list of group by attributes

- Build groups of tuples according G and compute the aggregation functions from each group
- $b\gamma sum(a)(R)$

- Duplicate removal
- Syntax: $\delta(R)$: is the relation containing exactly one copy of each tuple in R.
- $\delta(R)$

- Set operations.
- Union, Intersection and Difference
- Union
- Syntax: $R\cup S$

- Intersection
- Syntax: $R\cap S$

- Set Difference
- Syntax: $R-S$

- Union

#### Canonical Translation to Relational Algebra

**FROM**clause into joins and crossproducts**WHERE**clause into selection**SELECT**clause into projection and renaming- If it has aggregation functions use aggregation
**DISTINCT**into duplicate removal

**GROUP BY**clause into aggregation**HAVING**clause into selection**ORDER BY**- no counter-part

#### Set Operations

**UNION ALL**into union**UNION**duplicate removal over union**INTERSECT ALL**into intersection**INTERSECT**add duplicate removal**EXCEPT ALL**into set diﬀerence**EXCEPT**apply duplicate removal to inputs and then apply set diﬀerence

#### Conversion into Relational Algebra

##### Example 1

1 | SELECT name FROM MovieStar WHERE birthdate LIKE '%1960'; |

##### Example 2

1 | SELECT movieTitle |

##### Example 3

1 | SELECT sum(a) |

##### Example 4

1 | SELECT dep, headcnt |

##### Example 5

1 | SELECT title FROM StarsIn WHERE starName IN (<Query>); |

### Query Processing and Optimization

#### Query Optimization

##### Relational Algebra Optimization

###### Algebraic Laws:

- Joins and Products
- Natural joins and product are both associative and commutative
- theta-join is commutative but not always associative.

- Union and Intersect
- Union and intersection are both associative and commutative

- Laws for Bags and Sets can differ
- Holds for sets, but guarentee for bags

- Select
- reduces the number of tuples (size of relation)
- an important general rule in optimization is to push selects as far down the tree as possible
- commutative
- must be pushed through both arguments for union:
- Union: σ p (R ∪ S) = σ p (R) ∪ σ p (S)

- must be pushed through ﬁrst arguments, optionally second for diﬀerence:
- Diﬀerence: σ p (R - S) = σ p (R) - S

- For the other operators it is only required that the selection be pushed to one argument.
- σ p (R × S) = σ p (R) × S, if p contains only attributes from R

- Derivation
- Amendment to the simple push down the select
- Use this algebraic law in the reverse order: $\sigma _p(R\bowtie S) = \sigma _p(R)\bowtie S$ to push the σ year=1996 operation up the tree
- Then use the algebraic law in the forward order: $\sigma _p(R\bowtie S) = \sigma _p(R)\bowtie \sigma _p(S)$ push the $\sigma _{year=1996}$ operation down the tree.

- Projection
- Example
- Summary
- Selections
- push down tree as far as possible
- if condition is an AND, split and push separately
- sometimes need to push up before pushing down

- Projections
- can be pushed down
- new ones can be added (but be careful)

- duplicate eliminations can sometimes be removed (e.g., if one key)
- Selection/product combinations
- can sometimes be replaced with join

- Many transformations lead to “promising” plans

- Selections