W4118 Operating Systems Instructor: Junfeng Yang1Outline x86 segmentation and paging hardware Linux address space translation Copy-on-write Linux page replacement algorithm Linux dynamic memory allocation2x86 segmentation and paging Using Pentium as example CPU generates virtual address (seg, offset)Given to segmentation unit• Which produces linear addresses Linear address given to paging unit• Which generates physical address in main memory• Paging units form equivalent of MMU23x86 segmentation hardware344Specifying segment selectorvirtual address: segment selector + offsetSegment selector stored in segment registers (16-bit)cs: code segment selectorss: stack segment selectords: data segment selectores, fs, gsSegment register can be implicitly or explicitly specified Implicit by type of memory reference• jmp $8049780 // implicitly use cs• mov $8049780, %eax // implicitly use ds Through special registers (cs, ss, es, ds, fs, gs on x86)• mov %ss:$8049780, %eax // explicitly use ss5x86 paging hardware56Outline x86 segmentation and paging hardware Linux address space translation Copy-on-write Linux page replacement algorithm Linux dynamic memory allocation7Linux address translation Linux uses paging to translate virtual addresses to physical addresses Linux does not use segmentation AdvantagesMore portable since some RISC architectures don’t support segmentationHierarchical paging is flexible enough78Linux segmentation Since x86 segmentation hardware cannot be disabled, Linux just uses NULL mappings Linux defines four segmentsSet segment base to 0x00000000, limit to 0xffffffffsegment offset == linear addressesUser code (segment selector: __USER_CS)User data (segment selector: __USER_DS)Kernel code (segment selector: __KERNEL_CS)Kernel data (segment selector: __KERNEL_DATA)arch/i386/kernel/head.S9Segment protectionCurrent Privilege level (CPL) specifies privileged mode or user modeStored in current code segment descriptorUser code segment: CPL = 3Kernel code segment: CPL = 0Descriptor Privilege Level (DPL) specifies protectionOnly accessible if CPL <= DPLSwitch between user mode and kernel mode (e.g. system call and return)Hardware load the corresponding segment selector (__USER_CS or __KERNEL_CS) into register cs910PagingLinux uses up to 4-level hierarchical pagingA linear address is split into five parts, to seamlessly handle a range of different addressing modesPage Global DirPage Upper DirPage Middle DirPage TablePage OffsetExample: 32-bit address space, 4KB page without physical address extension (hardware mechanism to extend address range of physical memory)Page Global dir: 10 bitsPage Upper dir and Page Middle dir are not usedPage Table: 10 bitsPage Offset: 12 bits1011Paging in 64 bit LinuxPlatformPage SizeAddress Bits UsedPaging LevelsAddress SplittingAlpha 8 KB 43 3 10+10+10+13IA64 4 KB 39 3 9+9+9+12PPC64 4 KB 41 3 10+10+9+12sh64 4 KB 41 3 10+10+9+12X86_64 4 KB 48 4 9+9+9+9+121112Page table operations Linux provides data structures and operations to create, delete, read and write page directoriesinclude/asm-i386/pgtable.harch/i386/mm/hugetlbpage.c Naming conventionpgd: Page Global Directorypmd: Page Middle Directorypud: Page Upper Directorypte: Page Table EntryExample: mk_pte(p, prot)1213TLB operations x86 uses hardware TLBOS does not manage TLB Only operation: flush TLB entriesinclude/asm-i386/tlbflush.hmovl %0 cr3: flush all TLB entriesinvlpg addr: flush a single TLB entry• More efficient than flushing all TLB entries1314Outline x86 segmentation and paging hardware Linux address space translation Copy-on-write Linux page replacement algorithm Linux dynamic memory allocation15A cool trick: copy-on-write In fork(), parent and child often share significant amount of memoryExpensive to copy all pages COW Idea: exploit VA to PA indirectionInstead of copying all pages, share themIf either process writes to shared pages, only then is the page copiedHow to detect page write?• Mark pages as read-only in both parent and child address space• On write, page fault occurs1516Share pages copy_process() in kernel/fork.c copy_mm() dup_mmap() // copy page tables copy_page_range() in mm/memory.c copy_pud_range() copy_pmd_range() copy_pte_range() copy_one_pte() // mark readonly1617Copy page on page fault set_intr_gate(14, &page_fault) in arch/i386/kernel/traps.c ENTRY(page_fault) calls do_page_fault in arch/i386/kernel/entry.s do_page_fault in arch/i386/mm/fault.c cr2 stores faulting virtual address handle_mm_fault in mm/memory.c handle_pte_fault in mm/memory.c if(write_access) do_wp_page()1718Outline x86 segmentation and paging hardware Linux address space translation Copy-on-write Linux page replacement algorithm Linux dynamic memory allocation19Linux page replacement algorithm Two lists in struct zoneactive_list: hot pagesinactive_list: cold pages Two bits in struct pagePG_active: is page on active list?PG_referenced: has page been referenced recently? Approximate LRU algorithmReplace a page in inactive listMove from active to inactive under memory pressureNeed two accesses to go from inactive to active1920Functions for page replacement lru_cache_add*(): add to inactive or active list mark_page_accessed(): called twice to move a page from inactive to active page_referenced(): test if a page is referenced refill_inactive_zone(): move pages from active to inactive21How to swap out page free_more_memory() in fs/buffer.c called try_to_free_pages in mm/vmscan.c shrink_caches shrink_zone refill_inactive_zone shrink_cache shrink_list if(PageDirty(page)) pageout()2122How to load page On page fault, cr2 stores faulting virtual address handle_mm_fault() in mm/memory.c handle_pte_fault() if(!pte_present(entry)) do_no_page() // anonymous page do_file_page() // file mapped page do_swap_page() // swapped out page2223Outline x86 segmentation and paging hardware Linux address space translation Copy-on-write Linux page replacement algorithm Linux dynamic memory allocation24Dynamic memory allocation How to allocate pages?Data structures for page
View Full Document