CSE320: System Fundamentals 2

Ch. 1 : Tour of CS
Ch. 2 : Represent Info
Ch. 3 : Machine Represent
Ch. 5 : Optimizing Program
Ch. 6 : Memory Hierarchy
Ch. 8 : Exceptional C. Flow
Ch. 9 : Virtual Memory
Ch. 10 : Unix I/O
Ch. 12 : Concurrent Prog.

Chapter 1 : A Tour of Computer Systems

Chapter 2 : Representing and Manipulating Information

Chapter 3 : Machine-Level Representation of Programs

Chapter 5 : Optimizing Program Performance

Chapter 6 : The Memory Hierarchy

Locality Caches Memory

Chapter 6.3 : The Memory Hierarchy

Exhibit Good Locality
L0-L6
Chapter 6.3.1 : Caching in the Memory Hierarchy
- L0 is a cache for L1
- Block sizes are fixed
- Cache Hit = Not Miss = Data is in that Cache
- Restrictive Placement Policies cause Conflict Miss
- Capacity miss = working set exceeds cache size
- Cache management may be software or hardware depending on Lk
Chapter 6.3.2 : Summary of Memory Hierarchy Concepts
- Page = Block between MainMemory & Disk

Chapter 6.4 : Cache Memories

Originally, only CPU + Mem + Disk
Now L1-L3

Chapter 6.4.1 : Generic Cache Memory Organization

(Capacity) C = (#Sets) S * (Lines/Set) E * (Bytes/Line) B
Line = Valid | Tag | Block
Address = tbits | sbits | bbits

Cache	m	C	B	E	S	t	s	b
1.	32	1,024	4	1	2^8	22	8	2
2.	32	1,024	8	4	2^5	24	5	3
3.	32	1,024	32	32	2^0	27	0	5

Chapter 6.4.2 : Direct-Mapped Caches
- Get word w = (1) Set Selection (2) Line Matching & (3) Word Extraction
- Set Selection in Direct-Mapped Caches : Use sbits to get Set
- Line Matching in Direct-Mapped Caches : Check Valid, Check tbits=Tag
- Word Selection in Direct-Mapped Caches : Use bbits to get block byte offset
- Block Number = tbits | sbits
- Cold Cache has Valid = 0
- Valid | Tag | Block[0] | Block[1]
- Conflict Misses in Direct-Mapped Caches
- Middle indexing works better
Chapter 6.4.3 : Set Associative Caches
- E-way set associative cache: 1 < E < C/B
- Associative in the sense of associative array (i.e. key space is sparse)
- LRU = Least Recently Use: A line replacement policy
Chapter 6.4.4 : Fully Associative Caches
- S = 1 (i.e E=S/B)
- Set 0 is always selected
- Practice Problem 6.12 : t=8|s=3|b=2
- Practice Problem 6.13 : 0D53 = 01101010|100|11 , CO=3 , CI=4 , CT=6A , Miss , -
- Practice Problem 6.14 : 0CB4 = 01100101|101|00 , CO=0 , CI=5 , CT=65 , Miss , -
- Practice Problem 6.15 : 0A31 = 01010001|100|01 , CO=1 , CI=4 , CT=51 , Miss , -
- Practice Problem 6.16 : 064C = 00110010|011|00 , 064D = 00110010|011|01 , 064E = 00110010|011|10 , 064F = 00110010|011|11
Chapter 6.4.5 : Issues with Writes
- Write-Through vs Write-back
Chapter 6.4.6 : Anatomy of a Real Cache Hierarchy
- Intel Core i7 has separate i-cache & d-cache (instruction & data)
Chapter 6.4.7 : Performance Impact of Cache Parameters
- Higher Associativity (Big E) reduces conflict misses

Chapter 6.5 : Writing Cache-Friendly Code

Miss Rate
Block ⊆ Line ⊆ Set
Spatial & Temporal Locality
Exploit Locality
L1 Cache
Hit Rate

Chapter 6.6 : Putting It Together: The Impact of Caches on Program Performance

Chapter 6.6.1 : The Memory Mountain
Chapter 6.6.2 : Rearranging Loops to Increase Spatial Locality
Chapter 6.6.3 : Exploiting Locality in Your Programs

Chapter 6 : Problems

6.25

Cache	m	C	B	E	S	t	s	b
1.	32	1,024	4	4	2^6	24	6	2
2.	32	1,024	4	256	2^0	30	0	2
3.	32	1,024	8	1	2^7	22	7	3
4.	32	1,024	8	128	2^0	29	0	3
5.	32	1,024	32	1	2^5	22	5	5
6.	32	1,024	32	4	2^3	24	3	5

6.26

Cache	m	C	B	E	S	t	s	b
1.	32	2^11	8	1	2^8	21	8	3
2.	32	2,048	4	4	128	23	7	2
3.	32	1,024	2	8	64	25	6	1
4.	32	1,024	32	2	16	23	4	5

6.27
a) Hit Set 1 : 45|001|XX = 01000101|001|XX = 08A[4-7] , 38|001|XX = 00111000|001|XX = 070[4-7]
b) Hit Set 6 : 91|110|XX = 10010001|110|XX = 123[8-B]
6.28
a) Hit Set 2 :
b) Hit Set 4 : C7|100|XX = 11000111|100|XX = 18F[0-3] , 05|100|XX = 00000101|100|XX = 00B[0-3]
c) Hit Set 5 : 71|101|XX = 01110001|101|XX = 0E3[4-7]
d) Hit Set 7 : DE|111|XX = 11011110|111|XX = 1BD[C-F]
6.29
a) t=8|s=2|b=2
b) 834 = 1000 0011 |01|00 = Miss , 836 = 1000 0011 |01|10 = Miss , FFD = 1111 1111 |11|01 = Hit(C0)
6.30
a) C = S * E * B = 8 * 4 * 4 = 2^7
b) t=8|s=3|b=2
6.31
a) t=8|s=3|b=2
b) 71A = 0111 000|1 10|10 = Hit(EB)
6.32
a) t=8|s=3|b=2
b) 16E8 = 1 0110 111|0 10|00 = Miss(-)
6.33
BC|010|XX = 10111100|010|XX = 178[8-B] , B6|010|XX = 10110110|010|XX = 16C[8-B]

Chapter 8 : Exceptional Control Flow

ECF = Exceptional Control Flow
ECF will help you understand concurrency

Chapter 8.1 : Exceptions

8.1.1 Exception Handling
- Involves close cooperation between hardware and software
- Jump Table = List of Addresses
- Hardware triggers an exception then software (i.e. Exceptional Handler) runs.
8.1.2 Classes of Exceptions
- Interrupts : Interrupts are handled by Interupt Handlers , Seemless = Return to Next Instructions
- Traps and System Calls : System Calls are implemented by Traps , Exception Handlers run in Kernel Mode
- Faults : Either Abort or Return to Current Instruction
- Aborts : Abort handler Ends Program w/out Returning
8.1.3 Exceptions in Linux/x86-64 Systems
- Linux/x86-64 Faults and Aborts
  - Intel
- Linux/x86-64 System Calls
  - OS
  - System Call 1 = write
  - "Exceptions" include exceptions (synchronous) & interrupts (asynchronous)

Chapter 8.2 : Processes

Process = an instance of a program in execution
8.2.1 : Logical Control Flow
- Logical Control Flow = What stepping debugger sees
- Physical Control Flow = What the processor sees
8.2.2 : Concurrent Flows
- When 1 flow starts, between the starting and ending of another flow.
- Parallel flows = Concurrent Flows on separate cores
8.2.3 : Private Address Space
- Each Private Addres Space comes with its own User Portion and Kernel Portion
- Each Portion has Code, Data, Heap, and Stack Segments
8.2.4 : User and Kernel Modes
- Restricts Instructions
- Processors usually have a Mode Bit
- /proc gives info kernel has about processes
8.2.5 : Context Switches
- Context Switches are built on top of 8.1 Exceptions
- Kernel Code performs Context Switches

Chapter 8.3 : System Call Error Handling

If error
- System functions return -1
- Error-Handling Wrappers exit(0)

Chapter 8.4 : Process Control

Unix System Calls can Control Processes
8.4.1 Obtaining Process IDs
- getpid() return myPID
8.4.2 Creating and Terminating Processes
- Process States: Running, Stopped, or Terminated
- fork() duplicates this process
- Duplicate but separate address spaces
- Any topological sort is possible
- Practice Problem 8.2 : a) nothing OR "p1: a=9" OR "p1: a=9 p2: a=8" b) "p2: a=9"
8.4.3 Reaping Child Processes
- Zombie = Terminated & UnReaped
- waitpid waits for a child to terminate
- waitpid's statusp arg is child's exit status
- Practice Problem 8.3 : 936036 930636 903636 093636
- wait(&status) = waitpid(-1,&status,0)
- waitpid's arg1 is childpid or -1 for any
- Practice Problem 8.4 : a) 6 b) Start 0 PID Child Stop Stop
- WIFEXITED(status) macro masks status
8.4.4 Putting Process to Sleep
- sleep(9) sleeps for 9 seconds
- pause = sleep(∞)
8.4.5 Loading and Running Programs
- execve runs a program
- getenv(key) returns value
- setenv(key,value)
8.4.6 Using fork() and execve() to Run Programs
- execve() overwrites this process
- Shell has eval(), parseline(), & builtin_command()
- The eval() function calls fork() & execve()
- parseline() parses the commandline and builds argv

Chapter 8.5 : Signals

High-Level Exceptional Control Flow : Linux Signal
Signal = message to process
SIGINT = interrupt from keyboard
SIGKILL = kill program
8.5.1 Signal Terminology
- Kernel sends signal, process receives
8.5.2 Sending Signals
- Process Group
- Sending Signal from the Keyboard
- Sending Signals with the kill Function
8.5.3 Receiving Signals
- signal(i,f) makes f catch signal i
8.5.4 Blocking and Unblocking Signals
- sigprocmask changes the set of currently blocked signals
8.5.5 Writing Signal Handlers
- Safe Signal Handling
  - Async-signal-safe
  - Save and restore errno
  - Protect access to shared data by blocking all signals
- Correct Error Handling
  - Signal are not queued
  - Signals can't be used to count events
  - while(waitpid(-1,NULL,0)>0) { }
- Portable Signal Handling
  - Signal wraps sigaction
8.5.6 Synchronizing Flows to Avoid Nasty Concurrency Bugs
- Race conditions may delete jobs before adding
8.5.7 Explicitly Waiting for Signals
- Use sigprocmask to synchronize signals via blocking
- spinloops are correct & wasteful

Chapter 8.6 : Nonlocal Jumps

C has nonlocal jumps
Sigemptyset(p) initialize p to 0
setjmp(buf) saves the current calling environment
longjmp(buf,ret) jumps back to setjmp and returns ret
sigsetjmp is a version of setjmp used by signal handlers
setjmp/longjmp ~ try/throw

Chapter 8.7 : Tools for Manipulating Processes

PS: Lists processes

Chapter 8.8 : Summary

Intel

Abort Fault Interrupt Trap

OS

System Call

Chapter 9.2 : Address Spaces

N=2^n Virtual Addresses
M=2^m Physical Addresses

Chapter 9.3 : VM as a tool for caching

Virtual Pages are P=2^p bytes in size

Practice Problem 9.2

n	P=2^p	Number of PTEs
12	1K	2^12/2^10=2^2
16	16K	2^16/2^14=2^2
24	2M	2^24/2^21=2^3
36	1G	2^36/2^30=2^6

Chapter 9.6 : Address Translation

Hardware supports Virtual Memory
Virtual Address Space is mapped to Physical Address Space

Practice Problem 9.3

P	Number of VPN bits	Number of VPO bits	Number of PPN bits	Number of PPO bits
1KB	54	10	22	10
2KB	53	11	21	11
4KB	52	12	20	12
16KB	50	14	18	14

Chapter 9.9 : Dynamic Memory Allocation

C programs allocate a block by calling malloc
Free a block by calling the free function

Chapter 9.9.1 : The malloc and free Functions

stdlib.h
Applications that want initialized dynamic memory can use calloc
Blocks aligned to double words.
It is the responsibility of the application not to use freed pointers

Chapter 9.9.2 : Why Dynamic Memory Allocation?

They do not know the size of certain data structures until the program actually runs
(int *)malloc(n * sizeof(int))

Chapter 9.9.3 : Allocator Requirements and Goals

Aligning blocks (alignment requirement): Blocks must be able to hold any type.
Throughput (important) vs. Utilization (unimportant)

Chapter 9.9.4 : Fragmentation

Fragmentation decreases Utilization.
Fragmentation: Internal & External.

Chapter 9.9.5 : Implementation Issues

Free block organization
Placement
Splitting
Coalescing

Chapter 9.9.6 : Implicit Free Lists

Block = Header + Payload + Optional Padding
Implicit free list = Header has size only
O(#Blocks)

Chapter 9.9.7 : Placing Allocated Blocks

Placement Policy: First Fit, Next Fit, Best Fit
First Fit tends to leave "splinters"

Chapter 9.9.8 : Splitting Free Blocks

Free Block -> Allocated + Free

Chapter 9.9.9 : Getting Additional Heap Memory

sbrk() Kernel returns more memory.

Chapter 9.9.10 : Coalescing Free Blocks

Coalescing fixes False Fragmentation
Immediate vs. Deferred Coalescing

Chapter 9.9.11 : Coalescing with Boundary Tags

Footer (Boundary Tag) allows for O(1) coalescing
Footer is a copy of header
Coalescing adds sizes

Chapter 9.9.12 : Putting It Together: Implementing a Simple Allocator

General Allocator Design
sbrk() increases heap size
Basic Constants and Macros for Manipulating the Free List
Creating the Initial Free List
mem_sbrk() calls sbrk()
Freeing and Coalescing Blocks
Allocating Blocks
FindFit()
Place()

Chapter 9.9.13 : Explicit Free Lists

Payload stores next and prev Free Block
Free-list has LIFO or Address order

Chapter 9.9.14 : Segregated Free Lists

Maintain multiple Free Lists segregated by size
Simple Segregated Storage
Segregated Fits
Buddy Systems

Chapter 9.10 : Garbage Collection

Chapter 9.10.1 : Garbage Collector Basics
Chapter 9.10.2 : Mark&Sweep Garbage Collectors
Chapter 9.10.3 : Conservative Mark&Sweep for C Programs

Chapter 9.11 : Common Memory-Related Bugs in C Programs

Chapter 10 : Unix I/O

Sometimes you have no choice but to use Unix I/O

Chapter 10.1 : Unix I/O

Unix I/O is an API
open returns a descriptor

Chapter 10.2 : Files

Regular & Directory
The root directory is named / (slash)
Absolute paths start with / (slash)

Chapter 10.3 : Opening & Closing Files

open() converts a filename to a file descriptor.
close() closes a file.

Chapter 10.4 : Reading and Writing Files

write ( fd, char* , n)
read ( fd, char* , n)
Short Count : < n bytes returned

Chapter 10.6 : Reading File Metadata

stat ( char* , struct stat* )
mystat.st_mode tells if file is directory

Chapter 10.7 : Reading Directory Contents

opendir() returns DIR*
readdir() returns dirent*

Chapter 10.8 : Sharing Files

Descriptor Table: Maps FDs to row in File Table.
File Table: Maps Files to row in v-node Table.
fork() copies Descriptor Table.

Chapter 10.9 : I/O Redirection

Redirect Operator: <
dup2(4,1) closes(1)

Chapter 10.10 : Standard I/O

stdio: High-level aternative to Unix I/O

Chapter 10.11 : Putting It Together: Which I/O Functions Should I Use?

Standard I/O: Written on top of Unix I/O
Use the Standard I/O functions whenever possible
Don't use the Standard I/O functions on sockets

Chapter 10.12 : Summary

Standard I/O: Written on top of Unix I/O
Unix I/O should be used for network applications
Buffered I/O handles Short Counts

Chapter 12.1 : Concurrent Programming with Processes

Practice Problem 12.2 : Descriptors close when processes die.
write(accept(open_listenfd(8080),0,0), "Hi", 2)

Chapter 12.3 : Concurrent Programming with Threads

Threads (like Processes) are scheduled by the kernel, but (like Multiplexing) run in a shared context.
12.3.1 Thread Execution Model
- There's one main thread, and a pool of peer threads.
12.3.2 Posix Threads
- exit() ends process (all threads)
12.3.3 Creating Threads
- int pthread_create(pthread_t *tid, pthread_attr_t *attr, func *f, void *arg);
12.3.4 Terminating Threads
- pthread_exit() ends this thread, and if caller is main waits for all peers
- pthread_cancel(tid) ends thread tid
12.3.5 Reaping Terminated Threads
- pthread_join(tid,return) waits for tid
12.3.6 Detaching Threads
- Threads are Joinable or Detached
- pthread_detach(tid) detaches tid
12.3.3 Creating Threads
- int pthread_create(pthread_t *tid, pthread_attr_t *attr, func *f, void *arg);
12.3.7 Initializing Threads
- Dynamically initialize global variables shared by multiple threads
12.3.8 A Concurrent Server Based on Threads
- Like a Concurrent Server Based on Processes
- But passing connfd to thread is tediously manual

Chapter 12.4 : Shared Variables in Threaded Programs

Multiple threads can share the same program variables
12.4.1 Threads Memory Model
- Each thread has its own thread context: stack + registers
- code + data + heap are shared
12.4.2 Mapping Variables to Memory
- Global variables can be referrenced by any thread

12.4.3 Shared Variables

Shared = referenced by multiple threads

Variable Instance	Referenced by main thread?	Referenced by peer thread 0?	Referenced by peer thread 1?
ptr	Yes	Yes	Yes
cnt	No	Yes	Yes
i.m	Yes	No	No
msgs.m	Yes	Yes	Yes
myid.p0	No	Yes	No
myid.p1	No	No	Yes

ptr, cnt, and msgs is shared

Chapter 12.5 : Synchronizing Threads with Semaphores

Pthread_create(&tid, NULL, thread, threadArgs)
Shared data may have interleaved calls
Progress Graph
12.5.1 Progress Graphs
- n-dimensional
- Points in the graph may be safe or unsafe
12.5.2 Semaphores
- s is a global variable
- semaphore.h
12.5.3 Using Semaphores for Mutual Exclusion
- mutex = binary semaphore
- P(&mutex) locks mutex
12.5.4 Using Semaphores to Schedule Shared Resources
- Producer-Consumer Problem
- Readers-Writers Problem
- V(&mutex) unlocks mutex
- first readers-writers problem favors readers
12.5.5 Putting it Together: A Concurrent Server Based on Prethreading
- Prethreading uses the Producer-Consumer model
- sbuf: a Synchronized Buffer
- A Prethreaded Concurrent Server uses A Sychronized Buffer
- Map Reduce
- Parallel Programs are special types of Concurrent Programs

Chapter 12.6 : Using Threads for Parallelism

Parallel Program = Concurrent Program running on Multiple Cores
Synchronization is Expensive
Mutex is Slow
Separate Globals is Fast
Thread operating on Local Variable is another option
Characterizing the Performance of Parallel Programs
- Absolute Speedup = Time of Sequential program / Time of Parallel program
- Efficiency = Absolute Speedup / #cores

Chapter 12.7 : Other Concurrency Issues

Synchronization
12.7.1 Thread Safety
- Functions should protect shared variables
- Functions should avoid manipulating static variables
- Functions should avoid calling thread-unsafe functions
12.7.2 Reentrancy
- Reentrant Functions are thread-safe functions that don't referrence shared variables
- Don't passed shared data to Implicit Reentrant functions
- Practice Problem 12.12 : rand_r() implicitly reentrant because it is reentrant if not passed a shared variable
12.7.3 Using Existing Library Functions in Threaded Programs
- Most C & Linux functions are thread-safe
- _r = reentrant version of Linux function
12.7.4 Races
- Pthread_create(out &id, NULL, in &f, inout arg)
- Passing shared data to Implicit Reentrant makes it not Reentrant
- Practice Problem 12.13 : Threads should free(vargp) because only they know when it is used
- Practice Problem 12.14 : We could used static allocation, which is simple but uses more memory
- Passing unshared data to Implicit Reentrant makes it Reentrant

12.7.5 Deadlocks

Semaphores may Deadlock
Deadlock region has forbidden future
Mutex lock ordering rule: Ladies First

V(t)	_	_	_	_	_	_	_	_
.	_	_	_	_	_	_	_	_
P(t)	X	X	X	X	X	X	_	_
.	_	_	_	_	_	X	_	_
V(s)	_	X	X	X	_	X	_	_
.	_	X	X	X	_	X	_	_
P(s)	_	X	X	X	_	X	_	_
.	_	_	_	_	_	X	_	_
.	.	P(s)	_	V(s)	_	P(t)	_	V(t)

Always Deadlocks
t = 1

.	_	_	_	_	_	_	_	_	_
V(t)	_	_	_	_	_	X	X	X	_
.	_	_	_	_	_	X	X	X	_
P(t)	_	_	_	_	_	X	X	X	_
.	_	_	_	_	_	_	_	_	_
V(s)	_	X	X	X	_	_	_	_	_
.	_	X	X	X	_	_	_	_	_
P(s)	_	X	X	X	_	_	_	_	_
.	_	_	_	_	_	_	_	_	_
.	.	P(s)	_	V(s)	_	P(t)	_	V(t)	_

Chapter 12.8 : Summary

Concurrent Programs: Processes, Multiplexing, & Threads

Intel
Abort	Fault	Interrupt	Trap
OS
			System Call

CSE 320 : System Fundamentals 2

Quiz Course Website : http://www3.cs.stonybrook.edu/~cse320/

Chapter 1 : A Tour of Computer Systems

Chapter 2 : Representing and Manipulating Information

Chapter 3 : Machine-Level Representation of Programs

Chapter 5 : Optimizing Program Performance

Chapter 6 : The Memory Hierarchy

Chapter 6.3 : The Memory Hierarchy

Chapter 6.4 : Cache Memories

Chapter 6.5 : Writing Cache-Friendly Code

Chapter 6.6 : Putting It Together: The Impact of Caches on Program Performance

Chapter 6 : Problems

Chapter 8 : Exceptional Control Flow

Chapter 8.1 : Exceptions

Chapter 8.2 : Processes

Chapter 8.3 : System Call Error Handling

Chapter 8.4 : Process Control

Chapter 8.5 : Signals

Chapter 8.6 : Nonlocal Jumps

Chapter 8.7 : Tools for Manipulating Processes

Chapter 8.8 : Summary

Chapter 9.2 : Address Spaces

Chapter 9.3 : VM as a tool for caching

Chapter 9.6 : Address Translation

Chapter 9.9 : Dynamic Memory Allocation

Chapter 9.9.1 : The malloc and free Functions

Chapter 9.9.2 : Why Dynamic Memory Allocation?

Chapter 9.9.3 : Allocator Requirements and Goals

Chapter 9.9.4 : Fragmentation

Chapter 9.9.5 : Implementation Issues

Chapter 9.9.6 : Implicit Free Lists

Chapter 9.9.7 : Placing Allocated Blocks

Chapter 9.9.8 : Splitting Free Blocks

Chapter 9.9.9 : Getting Additional Heap Memory

Chapter 9.9.10 : Coalescing Free Blocks

Chapter 9.9.11 : Coalescing with Boundary Tags

Chapter 9.9.12 : Putting It Together: Implementing a Simple Allocator

Chapter 9.9.13 : Explicit Free Lists

Chapter 9.9.14 : Segregated Free Lists

Chapter 9.10 : Garbage Collection

Chapter 9.11 : Common Memory-Related Bugs in C Programs

Chapter 10 : Unix I/O

Chapter 10.1 : Unix I/O

Chapter 10.2 : Files

Chapter 10.3 : Opening & Closing Files

Chapter 10.4 : Reading and Writing Files

Chapter 10.6 : Reading File Metadata

Chapter 10.7 : Reading Directory Contents

Chapter 10.8 : Sharing Files

Chapter 10.9 : I/O Redirection

Chapter 10.10 : Standard I/O

Chapter 10.11 : Putting It Together: Which I/O Functions Should I Use?

Chapter 10.12 : Summary

Chapter 12.1 : Concurrent Programming with Processes

Chapter 12.3 : Concurrent Programming with Threads

Chapter 12.4 : Shared Variables in Threaded Programs

Chapter 12.5 : Synchronizing Threads with Semaphores

Chapter 12.6 : Using Threads for Parallelism

Chapter 12.7 : Other Concurrency Issues

Chapter 12.8 : Summary