🗿

OS Notes

Chapter 1: Operating System

  • CPU is the brain of the Computer
  • Memory stores the running programs
  • System bus: wires to connect many PC's parts
  • Instructions need to be loaded into memory to be executed later.
  • CPU includes:
    • PC: program counter
    • IR: instruction register
    • MAR: memory address register
    • MBR: memory buffer register
    • I/O AR: I/O address register
    • I/O BR: I/O buffer register
  • Memory hierarchy
    • Primary:
      • Register
      • Cache
      • Main memory
    • Secondary:
      • Nonvolatile memory
      • Hard-disk memory
    • Tertiary:
      • Optical disk
      • Magnetic tapes
  • Computer Architecture: single/multiprocessors, clusters...
  • Core is the component that executes instructions and has registers for storing data locally
  • When the program is running: Fetch → Decode (control signal) → Execute (run) → Fetch → ....
  • What is the OS?
    • User view: making sure the system operates correctly and efficiently in an easy-to-use manner
    • System review:
      • Resource manager/allocator
      • Control program
    • OS as a kernel (the one program running at all times on the computer)

Chapter 2: Processes

Chapter 3: Scheduling - Multi-level Feedback Queue

Scheduling metrics:

Tturnaround=TcompletionTarrivalT_{turnaround} = T_{completion} - T_{arrival}


A new metric: Response Time

Tresponse=TfirstrunTarrivalT_{response} = T_{firstrun} - T_{arrival}


Two types of schedulers

  1. SJF and STCF to optimize turnaround time
  1. RR to optimize response time

If there are I/O operations



If there is no more oracle

💡
FIFO + preemptive = STCF

How to both:

💡
"Learn from the past, predict the future"

Two kinds of jobs:

  1. Interactive: include short I/O operations
  1. CPU-intensive

MLFQ - Multi-Level Feedback Queue



⇒ Problems:

  1. Starvation: too many interactive jobs consume all CPU time, long-running jobs never receive any CPU time
  1. A program may change its behavior over time and may not be treated like other interactive jobs
  1. Gaming the scheduler: a user program running for 99% of a time slice before relinquishing the CPU; thus, it could monopolize the CPU


Problem: How to choose the time period S




Summary



Chapter 4: Dynamic relocation & Segmentation

💡
Every address generated by a user program is a virtual address

Requirements for these day computers:

  1. Multiprogramming
  1. Time-sharing
  1. Interactivity


Solution: Virtual Memory - The address space

  • The OS gives an illusion/abstraction to the process that it has the full memory, includes all memory states: code, stack, heap, bss...
  • Virtual memory makes programs think:
    • It is loaded into memory at a particular address
    • It has a potentially very large address space (32 → 64 bits)
  • The OS + hardware support ⇒ translates virtual address into physical address

The Goals of Virtualizing Memory

💡
Overview of address translation
  • Hardware-based address translation: virtual address → physical address
  • Manage memory: which are free/in used; how the memory is used
  • Illusion → program has it own private memory

Assumption

  1. User's address space must be placed contiguously in physical memory
  1. The size of the address space is not too big and less than the size of physical memory
  1. Each address space is exactly the same size
💡
Address = base + offset

Two questions

  1. How to relocate the process in a way that is transparent to the process?
  1. How to provide an illusion of a virtual address space starting at 0?

Dynamic Relocation



Some definitions:

💡
Hardware support is needed for dynamic relocation
💡
Fault - out of bounds controlled by bound registers



Problem with Dynamic Relocation - Internal Fragmentation

💡
One pair of base and bound registers per process
💡
Sparse address space = unused address space

Segmentation

💡
Now, the base and bound pair per logical segment
  • Logic segment: code, stack. heap
  • Allows the OS to place each one of those segments in different parts of physical memory → to avoid unused memory
  • If a program tries to access an illegal address
    • The OS terminates the process
    • Segmentation fault showed up

Paging

Identify segment

  1. Explicitly
  1. Implicitly
    1. The address was generated from the PC → code segment
    1. The address is based on the stack or base pointer → stack segment
    1. Any other address → heap

Attempt to deal with External Fragmentation

  1. Compact the physical memory by rearranging existing segments
  1. Use a free-list management algorithm

    ⇒ But, the EF still exists anyway, just be minimized

VPN -Virtual Page Number

Problem with Paging:

Chapter 5: Paging - Fast Translation (Translation-Lookaside Buffer)

Overview of the TLB

💡
One page table per process

TLB algorithm

  1. Calculate VPN = VA & Mask >> SHIFT
  1. Lookup the VPN for PTE,
    1. if OK → PFN, get the FN → Compute PA → Access PA
    1. if FAIL - a.k.a TLB miss → Look up the page table in the Memory
      1. get the PTE - which is a ROW of VPN → PFN
        1. if OK, PTE is put into the TLB cache
        1. if FAIL || PROTECTION FAILS → throw ERROR

Example

💡
One address per byte

Caching

Handling TLB miss

Content of the TLB

💡
One TLB per all processes

⇒ Each process has one address space, each address space has one ID

The problem of TLB: TLB can be full

⇒ Least-recently-used LRU or Random removal

Paging: Small Tables

Example of Linear Page Table

💡
Linear page tables get pretty BIG

232212=220virtualpages\dfrac{2^{32}}{2^{12}}=2^{20} \,virtual\,pages

⇒ Page table SIZE = 220×4=222=4MB2^{20}\times4=2^{22}=4\,MB in size

Solution 1: Bigger pages

232214=218virtualpages\dfrac{2^{32}}{2^{14}}=2^{18} \,virtual\,pages

⇒ Page table SIZE = 218×4=220=1MB2^{18}\times4=2^{20}=1\,MB in size

💡
Another problem: Big pages cause Internal Fragmentation

Commonly used is 4KB or 8KB page table sizes

💡
2n2^n virtual pages ⇒ n-bit VPN

2n2^n page size ⇒ n-bit offset

Solution 2: Paging & Segments

💡
Instead of having a page table for the entire AP of the process, why we don't have one per logical segment?

Example for Solution 2: Hybrid

Compute the Address of PTE:

💡
Hybrid = 1 process with many page tables

Problem with Hybrid solution

Solution for Hybrid problems: Multi-Level Page Tables

💡
How to get rid of all invalid regions in the page tables?

Page Directory

Pros and Cons of PD

PD with more than TWO levels

💡
TO make each piece of the page fit within a single page

Solution 4: Inverted Page Table

💡
Now, divide physical memory → pages

Solution 5: Swapping the Page Table → Disk

💡
What if page tables are too big to fit into memory all at once?

⇒ Place them in kernel virtual memory, then swap them into memory if it is less tight

Chapter 6:

💡
Storing pages on disk also has a problem → timing

Challenges

💡
Swap space allows the OS to support the illusion of a large Virtual Memory for multiple concurrently-running processes

Swap Space

The present bit

💡
When hardware looks in the PTE, how can it determine if a page is in physical memory?

When the page fault, the page-fault handler:

What if memory is full

When the replacements really occur

Beyond Physical Memory Policies

Cache management

Average Memory Access Time - AMAT

The Optimal Replacement Policy

💡
Replace the page that will be accessed furthest in the future

⇒ ONE BIG Problem: we don't know about the FUTURE

FIFO Policy

RANDOM Policy

LRU Policy

Least-recently used
All of them are approximately the same
Memory size should be bigger than 20% of pages and lower 80% of total pages to obtain the benefit from the LRU Policy
RAND is better because it does not remove the old ones, just randomly remove pages

How to implement HISTORICAL algorithms - LRU

Solution: Approximating LRU

💡
Clock algorithm
  • Clock hand point to some particular page to begin
  • Go round the clock, search for next candidate to evict, use bit = 0
  • Change use bit to 0 after visited by clock hand

Considering Dirty Pages

  • If a page has been modified → dirty → if eviction ~ writing modified data into disk → more expensive
  • System refers to evict clean pages
  • Clock algorithm evicts modified bit = 0 and use bit = 0 pages first

Other VM Policies

Thrashing

  1. Page Selection Policy
    1. Demand Paging: based on the need
    1. Prefetching: need to know what it is
  1. Clustering/Grouping writes

What if the demand exceeds the available PM?

💡
Thrashing - caused by constantly paging

Solution:

  • Reduce the number of concurrent process → admission control
  • OR out-of-memory killer daemon (Linux)

Chapter 7: Concurrency Introduction

Threads

Problem with Threads: INTERRUPT - Uncontrolled Scheduling

Locks

💡
One thread in the critical section at a time

Basic Idea

lock_t mutex; // mutual exclusion
...
lock(&mutex);
n = n + 1;
unlock(&mutex);

Pthread locks

pthread_mutex_t lock = PTHREAD_MUTEX_INIT;
pthread_mutex_lock(&lock);
n = n + 1;
pthread_mutex_unlock(&lock);

Evaluating locks

  1. Correctness: it should work - can it prevent multiple threads from entering
  1. Fairness: each thread contending (tranh giành) for the lock get a fair shot at acquiring it
  1. Starve: one/many threads never obtain the lock
  1. Performance: time overheads → should not waste the CPU cycles

Solution 1: Controlling Interrupts

void lock(){
	DisableInterrupts(); // disable interrupt before entering CS
}
void unlock(){
	EnableInterrupts();
}
  • Problems:
    1. A thread can monopolize the processor by keeping the lock
    1. Not work on multiprocessors
    1. Interrupt losts
    1. Inefficient

Solution 2: A failed attempt - Just using loads/stores

typeof struct __lock_t { int flag; } lock_t;
void init(lock_t *mutex) {
	mutex -> flag = 0; // 0: free; 1: busy
}
void lock(lock_t *mutex) {
	while (mutex -> flag == 1) {
		// do nothing
	}
	mutex -> flag = 1;
}
void unlock(lock_t *mutex) {
	mutex -> flag = 0;
}

Problems:

  1. shared variable → cause incorrect setting
  1. Spin and wait → wasting CPU cycles
💡
If only use the software solutions → there are always other problems

Solution 3: Building working spinlocks with test-and-set

  • Hardware support → test-and-set (atomic exchange) instruction
  • Require a preemptive scheduler
int TestAndSet(int *old_ptr, int new) {
	int old = *old_ptr;
	*old_ptr = new;
	return old;
}

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t *lock) {lock -> flag = 0;}

void lock(lock_t *lock) {
	while(TestAndSet(&lock->flag, 1) == 1) {
		// do nothing
	}
}
void unlock(lock_t *lock) { lock->flag = 0; }

Solution 4: Compare-and-swap

int CompareAndSwap(int *ptr, int expected, int new) {
	int original = *ptr;
	if (original == expected) {
		*ptr = new;
	}
	return original; 
}

void lock(lock_t *lock) {
	while (CompareAndSwap(&lock->flag, 0, 1) == 1) {
		// do nothing
	}
}

Solution 5: Load-linked and store-conditional

int LoadLinked(int *ptr) {
	return *ptr;
}

int StoreConditional(int *ptr, int value) {
	if (no update to *ptr since LoadLinked to this address) {
		*ptr = value;
		return 1;
	} else {
		return 0;
	}
}
void lock(lock_t *lock) {
	while(1) {
		while (LoadLinked(&lock->flag, 1) == 1) { // do nothing}
		if (StoreConditional(&lock->flag, 1) == 1) return; 
		// only update when there is no update between load&store
		// if 1: all done, else: try again
	}
}

Solution 6: Fetch-and-add

int FetchAndAdd(int *ptr) {
	int old = *ptr;
	*ptr = old + 1;
	return old;
}
typedef struc  __lock_t { int ticket, turn; } lock_t;
void lock_init(lock_t *lock) {
	lock -> ticket = 0; lock -> turn = 0;
}
void lock(lock_t *lock) {
	int myturn = FetchAndAdd(&lock->ticket); // increase the ticket, return previous
	while(lock->turn !== myturn) { // do nothing }
}
void unlock(lock_t *lock) {
	lock -> turn = lock -> turn + 1
}
  • Solve the starvation problem
💡
BUT, too much SPINNING

NEED OS supports

Solution 7: yield()

void lock() {
	while(TestAndSet(&flag, 1) == 1) {
		yield(); // give up the CPU
	}
}

Solution 8: using queues → sleeping instead of spinning

typedef struct __lock_t {int flag, guard; queue_t *q;} lock_t;

void init(lock_t *m) {
	m -> flag = 0;
	m -> guard = 0; 
	queue_init(m -> q);
}
void lock(lock_t *m) {
	while(TestAndSet(&m->guard, 1) == 1) {
		// acquire guard lock by spinning
	}
	if (m -> flag == 0) { m -> flag = 1; m -> guard = 0}
	else { 
		queue_add(m -> q, gettid()); // put pid -> queue and wait
	  m -> guard = 0; 
		park();  // put thread to sleep
	}
}
void unlock(lock_t *m) {
	while(TestAndSet(&m->guard, 1) == 1) {
		// acquire guard lock by spinning
	}
	if (queue_empty(m->q)) { m -> flag = 0; // let go of lock, not need anymore}
	else { 
		unpark(queue_remove(m->q)); // put pid -> queue and wait
	}
	m -> guard = 0; park(); 
}