0%

### Storage

###### Hierachy (speed fast to slow):

​ CPU Register -> Cache -> Main Memory -> Disk -> Tapes

###### Disk Capacity

​ Disk Capacity = #tracks per surface x (2 x #platter) x #sector per track x byte per sector.

• Access time = seek time + rotational delay + transfer time + other delay

• Seek time: time to move the arm to position disk head on the right track.
• Rotational delay: time to wait for sector to rotate under the disk head.
• Average rotational delay: time for 1/2 revolution.
• transfer time: block size / transfer rate.
###### Buffer Management

​ Pages in memory is like blocks in hard disk.

### Record Layout

#### Conventional Indexes

###### Types:
1. Dense index: index record appears for every search-key value
2. Sparse index: contains index records for only some search-key value
3. Multilevel index: if primary index does not fit in memory, access becomes expensive.
1. Outer index: a sparse index of primary index.
2. Inner index: the primary index file
4. Duplicate keys:
1. Dense index: only keep record in index for first data record with each search key value.
2. Sparse index: has key-pointer pairs corresponding to the first search key on each block of the data file.

#### B+ Tree

###### Characteristics:

​ full(packed) and balanced(almost uniform height).

#### Hashing

###### Characteristics:

$$Avg\ occupancy = \frac{r}{n\times\gamma}$$

n: The number of buckets that is currently in use.

r: current number of search keys stored in the buckets.

$\gamma$: block size(# search keys that can be stored in 1 block).

### Relational Algebra

#### Algebra expressions / Translate SQL into relational algebra / Equivlaence.

###### Query Process
• parsing: translating the query into a parsing tree.
• query rewrite: the parse tree is transformed into an expression tree of relational algebra (logical query plan)
• physical plan generation: translate the logical plan into a physical plan
• select algorithms to implement each operator
• choose order of operations.
###### Physical Query Plan
• Sequential-Scan
• Read tuples from the relation R by reading data blocks - one block at a time from disk to memory.
• Index-Scan
• The relation R must have an index
• The index is scanned to find blocks that contain the desired tuples.
• All the blocks containing desired tuples are then read into the memory - one block at a time.
###### Clustered and unclustered files/relations/indexes
• Clustered file: A file that stores records of one relation.
• Unclustered file: A file that stores records of multiple relations.
• Clustered index: Index that allows tuples to be read in an order that corresponds to physical order.
• Relation R is clustered.
Operator I/O cost Explanation
Sequential-Scan B(R) Read all blocks, and there are B(R) blocks.
Index-Scan ≤B(R) Depends on number of values in the index is scanned.
• Relation R is unclustered.
Operator I/O cost Explanation
Sequential-Scan ~T(R) Assuming next tuple is not found in the current block. We will read 1 block per tuple
Index-Scan ≤T(R) Depends on number of values in the index is scanned.

#### Convert Parse Tree into initial L.Q.P

###### Set vs Bag semantics
• Set semantics: No duplicate entries.
• Bag semantics: Allow duplicate entries.
###### Operators
• Selection
• Syntax: σ c (R)
• Set: { t | t ∈ R AND t fulfills c}
• Bag: { $t^n$ | $t^n$ ∈ R AND t fulfills c}
• Renaming
• Syntax: ρ A (R)
• Set: { t.A | t∈R}
• Bag: { $(t.A)^n$ | $t^n$ ∈R}
• Example: $ρ_c$←a(R)
• Projection
• Syntax: π A (R)
• Set: { t.A | t∈R}
• Bag: { $(t.A)^n$ | $t^n$ ∈R}
• Joins
• Theta, natural, cross-product, outer, anti
• Cross Product
• Syntax: R × S
• Set: { (t,s) | t∈R AND s∈S}
• Bag: { (t,s)^n×m | t^n ∈R AND s^m ∈S}
• Join
• Syntax: $R\bowtie_{c}S$
• Set: { (t,s) | t∈R AND s∈S AND (t,s) matches c}
• Bag: { $(t,s)^{n×m}$ |$t^n$ ∈R AND $s^m$ ∈S AND (t,s) matches c}
• Natural Join
• Syntax: $R\bowtie S$
• All combinations of tuples from R and S that match on common attributes
• A = common attributes of R and S
• C = exclusive attributes of S
• Set: {(t,s.C) | t∈R AND s∈S AND t.A=s.A}
• Bag: {$(t,s.C)^{n×m}$ | $t^n$ ∈R AND $s^m$ ∈S AND t.A=s.A}
• Left-outer Join
• Syntax:$R=\bowtie{c} S$
• Contains all tuples in R join S
• Also includes every tuple in R that is not joined with a tuple in S, after padding a NULL
• $R=\bowtie _{a=d}S$
• Right-outer Join
• Syntax:$R\bowtie =S$
• s∈S without match, ﬁll R attributes with NULLrigh
• $R\bowtie =_{a=d}S$
• Full-outer Join
• Syntax: $R=\bowtie =_{c}S$
• $R=\bowtie=_{a=d}S$
• Aggregation
• All operators treat a relation as a bag of tuples.
• SUM: computes the sum of a column with numerical values.
• AVG: computes the average of a column with numerical values.
• MIN and MAX: Natural Order
• COUNT: computes the number of non-NULL tuples in a column.
• Grouping and aggregation generally need to be implemented and optimized together
• Syntax: $G\gamma A(R)$
• A is list of aggregation functions
• G is list of group by attributes
• Build groups of tuples according G and compute the aggregation functions from each group
• $b\gamma sum(a)(R)$
• Duplicate removal
• Syntax: $\delta(R)$: is the relation containing exactly one copy of each tuple in R.
• $\delta(R)$
• Set operations.
• Union, Intersection and Difference
• Union
• Syntax: $R\cup S$
• Intersection
• Syntax: $R\cap S$
• Set Difference
• Syntax: $R-S$

#### Canonical Translation to Relational Algebra

• FROM clause into joins and crossproducts
• WHERE clause into selection
• SELECT clause into projection and renaming
• If it has aggregation functions use aggregation
• DISTINCT into duplicate removal
• GROUP BY clause into aggregation
• HAVING clause into selection
• ORDER BY - no counter-part

#### Set Operations

• UNION ALL into union
• UNION duplicate removal over union
• INTERSECT ALL into intersection
• INTERSECT add duplicate removal
• EXCEPT ALL into set diﬀerence
• EXCEPT apply duplicate removal to inputs and then apply set diﬀerence

### Query Processing and Optimization

#### Query Optimization

##### Relational Algebra Optimization
###### Algebraic Laws:
• Joins and Products
• Natural joins and product are both associative and commutative
• theta-join is commutative but not always associative.
• Union and Intersect
• Union and intersection are both associative and commutative
• Laws for Bags and Sets can differ
• Holds for sets, but guarentee for bags
• Select
• reduces the number of tuples (size of relation)
• an important general rule in optimization is to push selects as far down the tree as possible
• commutative
• must be pushed through both arguments for union:
• Union: σ p (R ∪ S) = σ p (R) ∪ σ p (S)
• must be pushed through ﬁrst arguments, optionally second for diﬀerence:
• Diﬀerence: σ p (R - S) = σ p (R) - S
• For the other operators it is only required that the selection be pushed to one argument.
• σ p (R × S) = σ p (R) × S, if p contains only attributes from R
• Derivation
• Amendment to the simple push down the select
• Use this algebraic law in the reverse order: $\sigma _p(R\bowtie S) = \sigma _p(R)\bowtie S$ to push the σ year=1996 operation up the tree
• Then use the algebraic law in the forward order: $\sigma _p(R\bowtie S) = \sigma _p(R)\bowtie \sigma _p(S)$ push the $\sigma _{year=1996}$ operation down the tree.
• Projection
• Example
• Summary
• Selections
• push down tree as far as possible
• if condition is an AND, split and push separately
• sometimes need to push up before pushing down
• Projections
• can be pushed down
• new ones can be added (but be careful)
• duplicate eliminations can sometimes be removed (e.g., if one key)
• Selection/product combinations
• can sometimes be replaced with join
• Many transformations lead to “promising” plans