CSE 320 : System Fundamentals 2
Chapter 1 : A Tour of Computer Systems
Chapter 2 : Representing and Manipulating Information
Chapter 3 : Machine-Level Representation of Programs
Chapter 5 : Optimizing Program Performance
Chapter 6 : The Memory Hierarchy
Chapter 6.3 : The Memory Hierarchy
- Exhibit Good Locality
- L0-L6
-
Chapter 6.3.1 : Caching in the Memory Hierarchy
- L0 is a cache for L1
- Block sizes are fixed
- Cache Hit = Not Miss = Data is in that Cache
- Restrictive Placement Policies cause Conflict Miss
- Capacity miss = working set exceeds cache size
- Cache management may be software or hardware depending on Lk
-
Chapter 6.3.2 : Summary of Memory Hierarchy Concepts
- Page = Block between MainMemory & Disk
Chapter 6.4 : Cache Memories
- Originally, only CPU + Mem + Disk
- Now L1-L3
-
Chapter 6.4.1 : Generic Cache Memory Organization
- (Capacity) C = (#Sets) S * (Lines/Set) E * (Bytes/Line) B
- Line = Valid | Tag | Block
- Address = tbits | sbits | bbits
-
Cache | m | C | B | E | S | t | s | b |
1. | 32 | 1,024 | 4 | 1 | 2^8 | 22 | 8 | 2 |
2. | 32 | 1,024 | 8 | 4 | 2^5 | 24 | 5 | 3 |
3. | 32 | 1,024 | 32 | 32 | 2^0 | 27 | 0 | 5 |
-
Chapter 6.4.2 : Direct-Mapped Caches
- Get word w = (1) Set Selection (2) Line Matching & (3) Word Extraction
- Set Selection in Direct-Mapped Caches : Use sbits to get Set
- Line Matching in Direct-Mapped Caches : Check Valid, Check tbits=Tag
- Word Selection in Direct-Mapped Caches : Use bbits to get block byte offset
- Block Number = tbits | sbits
- Cold Cache has Valid = 0
- Valid | Tag | Block[0] | Block[1]
- Conflict Misses in Direct-Mapped Caches
- Middle indexing works better
-
Chapter 6.4.3 : Set Associative Caches
- E-way set associative cache: 1 < E < C/B
- Associative in the sense of associative array (i.e. key space is sparse)
- LRU = Least Recently Use: A line replacement policy
-
Chapter 6.4.4 : Fully Associative Caches
- S = 1 (i.e E=S/B)
- Set 0 is always selected
- Practice Problem 6.12 : t=8|s=3|b=2
- Practice Problem 6.13 : 0D53 = 01101010|100|11 , CO=3 , CI=4 , CT=6A , Miss , -
- Practice Problem 6.14 : 0CB4 = 01100101|101|00 , CO=0 , CI=5 , CT=65 , Miss , -
- Practice Problem 6.15 : 0A31 = 01010001|100|01 , CO=1 , CI=4 , CT=51 , Miss , -
- Practice Problem 6.16 : 064C = 00110010|011|00 , 064D = 00110010|011|01 , 064E = 00110010|011|10 , 064F = 00110010|011|11
-
Chapter 6.4.5 : Issues with Writes
- Write-Through vs Write-back
-
Chapter 6.4.6 : Anatomy of a Real Cache Hierarchy
- Intel Core i7 has separate i-cache & d-cache (instruction & data)
-
Chapter 6.4.7 : Performance Impact of Cache Parameters
- Higher Associativity (Big E) reduces conflict misses
Chapter 6.5 : Writing Cache-Friendly Code
- Miss Rate
- Block ⊆ Line ⊆ Set
- Spatial & Temporal Locality
- Exploit Locality
- L1 Cache
- Hit Rate
Chapter 6.6 : Putting It Together: The Impact of Caches on Program Performance
- Chapter 6.6.1 : The Memory Mountain
- Chapter 6.6.2 : Rearranging Loops to Increase Spatial Locality
- Chapter 6.6.3 : Exploiting Locality in Your Programs
Chapter 6 : Problems
-
6.25
Cache | m | C | B | E | S | t | s | b |
1. | 32 | 1,024 | 4 | 4 | 2^6 | 24 | 6 | 2 |
2. | 32 | 1,024 | 4 | 256 | 2^0 | 30 | 0 | 2 |
3. | 32 | 1,024 | 8 | 1 | 2^7 | 22 | 7 | 3 |
4. | 32 | 1,024 | 8 | 128 | 2^0 | 29 | 0 | 3 |
5. | 32 | 1,024 | 32 | 1 | 2^5 | 22 | 5 | 5 |
6. | 32 | 1,024 | 32 | 4 | 2^3 | 24 | 3 | 5 |
-
6.26
Cache | m | C | B | E | S | t | s | b |
1. | 32 | 2^11 | 8 | 1 | 2^8 | 21 | 8 | 3 |
2. | 32 | 2,048 | 4 | 4 | 128 | 23 | 7 | 2 |
3. | 32 | 1,024 | 2 | 8 | 64 | 25 | 6 | 1 |
4. | 32 | 1,024 | 32 | 2 | 16 | 23 | 4 | 5 |
-
6.27
a) Hit Set 1 : 45|001|XX = 01000101|001|XX = 08A[4-7] , 38|001|XX = 00111000|001|XX = 070[4-7]
b) Hit Set 6 : 91|110|XX = 10010001|110|XX = 123[8-B]
-
6.28
a) Hit Set 2 :
b) Hit Set 4 : C7|100|XX = 11000111|100|XX = 18F[0-3] , 05|100|XX = 00000101|100|XX = 00B[0-3]
c) Hit Set 5 : 71|101|XX = 01110001|101|XX = 0E3[4-7]
d) Hit Set 7 : DE|111|XX = 11011110|111|XX = 1BD[C-F]
-
6.29
a) t=8|s=2|b=2
b) 834 = 1000 0011 |01|00 = Miss , 836 = 1000 0011 |01|10 = Miss , FFD = 1111 1111 |11|01 = Hit(C0)
-
6.30
a) C = S * E * B = 8 * 4 * 4 = 2^7
b) t=8|s=3|b=2
-
6.31
a) t=8|s=3|b=2
b) 71A = 0111 000|1 10|10 = Hit(EB)
-
6.32
a) t=8|s=3|b=2
b) 16E8 = 1 0110 111|0 10|00 = Miss(-)
-
6.33
BC|010|XX = 10111100|010|XX = 178[8-B] , B6|010|XX = 10110110|010|XX = 16C[8-B]
Chapter 8 : Exceptional Control Flow
- ECF = Exceptional Control Flow
- ECF will help you understand concurrency
Chapter 8.1 : Exceptions
- 8.1.1 Exception Handling
- Involves close cooperation between hardware and software
- Jump Table = List of Addresses
- Hardware triggers an exception then software (i.e. Exceptional Handler) runs.
- 8.1.2 Classes of Exceptions
- Interrupts : Interrupts are handled by Interupt Handlers , Seemless = Return to Next Instructions
- Traps and System Calls : System Calls are implemented by Traps , Exception Handlers run in Kernel Mode
- Faults : Either Abort or Return to Current Instruction
- Aborts : Abort handler Ends Program w/out Returning
- 8.1.3 Exceptions in Linux/x86-64 Systems
- Linux/x86-64 Faults and Aborts
- Linux/x86-64 System Calls
- OS
- System Call 1 = write
- "Exceptions" include exceptions (synchronous) & interrupts (asynchronous)
Chapter 8.2 : Processes
- Process = an instance of a program in execution
-
8.2.1 : Logical Control Flow
- Logical Control Flow = What stepping debugger sees
- Physical Control Flow = What the processor sees
-
8.2.2 : Concurrent Flows
- When 1 flow starts, between the starting and ending of another flow.
- Parallel flows = Concurrent Flows on separate cores
-
8.2.3 : Private Address Space
- Each Private Addres Space comes with its own User Portion and Kernel Portion
- Each Portion has Code, Data, Heap, and Stack Segments
-
8.2.4 : User and Kernel Modes
- Restricts Instructions
- Processors usually have a Mode Bit
- /proc gives info kernel has about processes
-
8.2.5 : Context Switches
- Context Switches are built on top of 8.1 Exceptions
- Kernel Code performs Context Switches
Chapter 8.3 : System Call Error Handling
- If error
- System functions return -1
- Error-Handling Wrappers exit(0)
Chapter 8.4 : Process Control
- Unix System Calls can Control Processes
-
8.4.1 Obtaining Process IDs
-
8.4.2 Creating and Terminating Processes
- Process States: Running, Stopped, or Terminated
- fork() duplicates this process
- Duplicate but separate address spaces
- Any topological sort is possible
- Practice Problem 8.2 : a) nothing OR "p1: a=9" OR "p1: a=9 p2: a=8" b) "p2: a=9"
-
8.4.3 Reaping Child Processes
- Zombie = Terminated & UnReaped
- waitpid waits for a child to terminate
- waitpid's statusp arg is child's exit status
- Practice Problem 8.3 : 936036 930636 903636 093636
- wait(&status) = waitpid(-1,&status,0)
- waitpid's arg1 is childpid or -1 for any
- Practice Problem 8.4 : a) 6 b) Start 0 PID Child Stop Stop
- WIFEXITED(status) macro masks status
-
8.4.4 Putting Process to Sleep
- sleep(9) sleeps for 9 seconds
- pause = sleep(∞)
-
8.4.5 Loading and Running Programs
- execve runs a program
- getenv(key) returns value
- setenv(key,value)
-
8.4.6 Using fork() and execve() to Run Programs
- execve() overwrites this process
- Shell has eval(), parseline(), & builtin_command()
- The eval() function calls fork() & execve()
- parseline() parses the commandline and builds argv
Chapter 8.5 : Signals
- High-Level Exceptional Control Flow : Linux Signal
- Signal = message to process
- SIGINT = interrupt from keyboard
- SIGKILL = kill program
-
8.5.1 Signal Terminology
- Kernel sends signal, process receives
-
8.5.2 Sending Signals
- Process Group
- Sending Signal from the Keyboard
- Sending Signals with the kill Function
-
8.5.3 Receiving Signals
- signal(i,f) makes f catch signal i
-
8.5.4 Blocking and Unblocking Signals
- sigprocmask changes the set of currently blocked signals
-
8.5.5 Writing Signal Handlers
-
Safe Signal Handling
- Async-signal-safe
- Save and restore errno
- Protect access to shared data by blocking all signals
-
Correct Error Handling
- Signal are not queued
- Signals can't be used to count events
- while(waitpid(-1,NULL,0)>0) { }
-
Portable Signal Handling
-
8.5.6 Synchronizing Flows to Avoid Nasty Concurrency Bugs
- Race conditions may delete jobs before adding
-
8.5.7 Explicitly Waiting for Signals
- Use sigprocmask to synchronize signals via blocking
- spinloops are correct & wasteful
Chapter 8.6 : Nonlocal Jumps
- C has nonlocal jumps
- Sigemptyset(p) initialize p to 0
- setjmp(buf) saves the current calling environment
- longjmp(buf,ret) jumps back to setjmp and returns ret
- sigsetjmp is a version of setjmp used by signal handlers
- setjmp/longjmp ~ try/throw
Chapter 8.7 : Tools for Manipulating Processes
Chapter 8.8 : Summary
-
Intel |
Abort | Fault | Interrupt | Trap |
OS |
| System Call |
Chapter 9.2 : Address Spaces
- N=2^n Virtual Addresses
- M=2^m Physical Addresses
Chapter 9.3 : VM as a tool for caching
- Virtual Pages are P=2^p bytes in size
-
Practice Problem 9.2
n | P=2^p | Number of PTEs |
12 | 1K | 2^12/2^10=2^2 |
16 | 16K | 2^16/2^14=2^2 |
24 | 2M | 2^24/2^21=2^3 |
36 | 1G | 2^36/2^30=2^6 |
Chapter 9.6 : Address Translation
- Hardware supports Virtual Memory
- Virtual Address Space is mapped to Physical Address Space
-
Practice Problem 9.3
P | Number of VPN bits | Number of VPO bits | Number of PPN bits | Number of PPO bits |
1KB | 54 | 10 | 22 | 10 |
2KB | 53 | 11 | 21 | 11 |
4KB | 52 | 12 | 20 | 12 |
16KB | 50 | 14 | 18 | 14 |
Chapter 9.9 : Dynamic Memory Allocation
- C programs allocate a block by calling malloc
- Free a block by calling the free function
Chapter 9.9.1 : The malloc and free Functions
- stdlib.h
- Applications that want initialized dynamic memory can use calloc
- Blocks aligned to double words.
- It is the responsibility of the application not to use freed pointers
Chapter 9.9.2 : Why Dynamic Memory Allocation?
- They do not know the size of certain data structures until the program actually runs
- (int *)malloc(n * sizeof(int))
Chapter 9.9.3 : Allocator Requirements and Goals
- Aligning blocks (alignment requirement): Blocks must be able to hold any type.
- Throughput (important) vs. Utilization (unimportant)
Chapter 9.9.4 : Fragmentation
- Fragmentation decreases Utilization.
- Fragmentation: Internal & External.
Chapter 9.9.5 : Implementation Issues
- Free block organization
- Placement
- Splitting
- Coalescing
Chapter 9.9.6 : Implicit Free Lists
- Block = Header + Payload + Optional Padding
- Implicit free list = Header has size only
- O(#Blocks)
Chapter 9.9.7 : Placing Allocated Blocks
- Placement Policy: First Fit, Next Fit, Best Fit
- First Fit tends to leave "splinters"
Chapter 9.9.8 : Splitting Free Blocks
- Free Block -> Allocated + Free
Chapter 9.9.9 : Getting Additional Heap Memory
- sbrk() Kernel returns more memory.
Chapter 9.9.10 : Coalescing Free Blocks
- Coalescing fixes False Fragmentation
- Immediate vs. Deferred Coalescing
Chapter 9.9.11 : Coalescing with Boundary Tags
- Footer (Boundary Tag) allows for O(1) coalescing
- Footer is a copy of header
- Coalescing adds sizes
Chapter 9.9.12 : Putting It Together: Implementing a Simple Allocator
- General Allocator Design
- sbrk() increases heap size
- Basic Constants and Macros for Manipulating the Free List
- Creating the Initial Free List
- mem_sbrk() calls sbrk()
- Freeing and Coalescing Blocks
- Allocating Blocks
- FindFit()
- Place()
Chapter 9.9.13 : Explicit Free Lists
- Payload stores next and prev Free Block
- Free-list has LIFO or Address order
Chapter 9.9.14 : Segregated Free Lists
- Maintain multiple Free Lists segregated by size
- Simple Segregated Storage
- Segregated Fits
- Buddy Systems
Chapter 9.10 : Garbage Collection
- Chapter 9.10.1 : Garbage Collector Basics
- Chapter 9.10.2 : Mark&Sweep Garbage Collectors
- Chapter 9.10.3 : Conservative Mark&Sweep for C Programs
Chapter 9.11 : Common Memory-Related Bugs in C Programs
Chapter 10 : Unix I/O
- Sometimes you have no choice but to use Unix I/O
Chapter 10.1 : Unix I/O
- Unix I/O is an API
- open returns a descriptor
Chapter 10.2 : Files
- Regular & Directory
- The root directory is named / (slash)
- Absolute paths start with / (slash)
Chapter 10.3 : Opening & Closing Files
- open() converts a filename to a file descriptor.
- close() closes a file.
Chapter 10.4 : Reading and Writing Files
- write ( fd, char* , n)
- read ( fd, char* , n)
- Short Count : < n bytes returned
Chapter 10.6 : Reading File Metadata
- stat ( char* , struct stat* )
- mystat.st_mode tells if file is directory
Chapter 10.7 : Reading Directory Contents
- opendir() returns DIR*
- readdir() returns dirent*
Chapter 10.8 : Sharing Files
- Descriptor Table: Maps FDs to row in File Table.
- File Table: Maps Files to row in v-node Table.
- fork() copies Descriptor Table.
Chapter 10.9 : I/O Redirection
- Redirect Operator: <
- dup2(4,1) closes(1)
Chapter 10.10 : Standard I/O
- stdio: High-level aternative to Unix I/O
Chapter 10.11 : Putting It Together: Which I/O Functions Should I Use?
- Standard I/O: Written on top of Unix I/O
- Use the Standard I/O functions whenever possible
- Don't use the Standard I/O functions on sockets
Chapter 10.12 : Summary
- Standard I/O: Written on top of Unix I/O
- Unix I/O should be used for network applications
- Buffered I/O handles Short Counts
Chapter 12.1 : Concurrent Programming with Processes
- Practice Problem 12.2 : Descriptors close when processes die.
- write(accept(open_listenfd(8080),0,0), "Hi", 2)
Chapter 12.3 : Concurrent Programming with Threads
- Threads (like Processes) are scheduled by the kernel, but (like Multiplexing) run in a shared context.
-
12.3.1 Thread Execution Model
- There's one main thread, and a pool of peer threads.
-
12.3.2 Posix Threads
- exit() ends process (all threads)
-
12.3.3 Creating Threads
- int pthread_create(pthread_t *tid, pthread_attr_t *attr, func *f, void *arg);
-
12.3.4 Terminating Threads
- pthread_exit() ends this thread, and if caller is main waits for all peers
- pthread_cancel(tid) ends thread tid
-
12.3.5 Reaping Terminated Threads
- pthread_join(tid,return) waits for tid
-
12.3.6 Detaching Threads
- Threads are Joinable or Detached
- pthread_detach(tid) detaches tid
-
12.3.3 Creating Threads
- int pthread_create(pthread_t *tid, pthread_attr_t *attr, func *f, void *arg);
-
12.3.7 Initializing Threads
- Dynamically initialize global variables shared by multiple threads
-
12.3.8 A Concurrent Server Based on Threads
- Like a Concurrent Server Based on Processes
- But passing connfd to thread is tediously manual
Chapter 12.4 : Shared Variables in Threaded Programs
- Multiple threads can share the same program variables
-
12.4.1 Threads Memory Model
- Each thread has its own thread context: stack + registers
- code + data + heap are shared
-
12.4.2 Mapping Variables to Memory
- Global variables can be referrenced by any thread
-
12.4.3 Shared Variables
- Shared = referenced by multiple threads
-
Variable Instance | Referenced by main thread? | Referenced by peer thread 0? | Referenced by peer thread 1? |
ptr | Yes | Yes | Yes |
cnt | No | Yes | Yes |
i.m | Yes | No | No |
msgs.m | Yes | Yes | Yes |
myid.p0 | No | Yes | No |
myid.p1 | No | No | Yes |
- ptr, cnt, and msgs is shared
Chapter 12.5 : Synchronizing Threads with Semaphores
- Pthread_create(&tid, NULL, thread, threadArgs)
- Shared data may have interleaved calls
- Progress Graph
-
12.5.1 Progress Graphs
- n-dimensional
- Points in the graph may be safe or unsafe
-
12.5.2 Semaphores
- s is a global variable
- semaphore.h
-
12.5.3 Using Semaphores for Mutual Exclusion
- mutex = binary semaphore
- P(&mutex) locks mutex
-
12.5.4 Using Semaphores to Schedule Shared Resources
- Producer-Consumer Problem
- Readers-Writers Problem
- V(&mutex) unlocks mutex
- first readers-writers problem favors readers
-
12.5.5 Putting it Together: A Concurrent Server Based on Prethreading
- Prethreading uses the Producer-Consumer model
- sbuf: a Synchronized Buffer
- A Prethreaded Concurrent Server uses A Sychronized Buffer
- Map Reduce
- Parallel Programs are special types of Concurrent Programs
Chapter 12.6 : Using Threads for Parallelism
- Parallel Program = Concurrent Program running on Multiple Cores
- Synchronization is Expensive
- Mutex is Slow
- Separate Globals is Fast
- Thread operating on Local Variable is another option
-
Characterizing the Performance of Parallel Programs
- Absolute Speedup = Time of Sequential program / Time of Parallel program
- Efficiency = Absolute Speedup / #cores
Chapter 12.7 : Other Concurrency Issues
- Synchronization
-
12.7.1 Thread Safety
- Functions should protect shared variables
- Functions should avoid manipulating static variables
- Functions should avoid calling thread-unsafe functions
-
12.7.2 Reentrancy
- Reentrant Functions are thread-safe functions that don't referrence shared variables
- Don't passed shared data to Implicit Reentrant functions
- Practice Problem 12.12 : rand_r() implicitly reentrant because it is reentrant if not passed a shared variable
-
12.7.3 Using Existing Library Functions in Threaded Programs
- Most C & Linux functions are thread-safe
- _r = reentrant version of Linux function
-
12.7.4 Races
- Pthread_create(out &id, NULL, in &f, inout arg)
- Passing shared data to Implicit Reentrant makes it not Reentrant
- Practice Problem 12.13 : Threads should free(vargp) because only they know when it is used
- Practice Problem 12.14 : We could used static allocation, which is simple but uses more memory
- Passing unshared data to Implicit Reentrant makes it Reentrant
-
12.7.5 Deadlocks
- Semaphores may Deadlock
- Deadlock region has forbidden future
- Mutex lock ordering rule: Ladies First
-
V(t) | _ | _ | _ | _ | _ | _ | _ | _ |
. | _ | _ | _ | _ | _ | _ | _ | _ |
P(t) | X | X | X | X | X | X | _ | _ |
. | _ | _ | _ | _ | _ | X | _ | _ |
V(s) | _ | X | X | X | _ | X | _ | _ |
. | _ | X | X | X | _ | X | _ | _ |
P(s) | _ | X | X | X | _ | X | _ | _ |
. | _ | _ | _ | _ | _ | X | _ | _ |
. | . | P(s) | _ | V(s) | _ | P(t) | _ | V(t) |
- Always Deadlocks
- t = 1
-
. | _ | _ | _ | _ | _ | _ | _ | _ | _ |
V(t) | _ | _ | _ | _ | _ | X | X | X | _ |
. | _ | _ | _ | _ | _ | X | X | X | _ |
P(t) | _ | _ | _ | _ | _ | X | X | X | _ |
. | _ | _ | _ | _ | _ | _ | _ | _ | _ |
V(s) | _ | X | X | X | _ | _ | _ | _ | _ |
. | _ | X | X | X | _ | _ | _ | _ | _ |
P(s) | _ | X | X | X | _ | _ | _ | _ | _ |
. | _ | _ | _ | _ | _ | _ | _ | _ | _ |
. | . | P(s) | _ | V(s) | _ | P(t) | _ | V(t) | _ |
Chapter 12.8 : Summary
- Concurrent Programs: Processes, Multiplexing, & Threads