Context Switch Overheads for Linux on ARM Platforms Francis M David fdavid uiuc edu Jeffrey C Carlyle jcarlyle uiuc edu Roy H Campbell rhc uiuc edu Department of Computer Science University of Illinois at Urbana Champaign 201 N Goodwin Ave Urbana IL 61801 2302 ABSTRACT associated with the actual context switching code there are several other factors that contribute to this penalty The perturbation of processor caches like the instruction data address translation and branch target buffers results in an additional indirect overhead Yet another possible source of indirect overhead is operating system memory paging A context switch can result in an in use memory page being moved to disk if there is no free memory thus hurting performance In this paper we do not consider overheads due to paging and assume that sufficient main memory is present to avoid thrashing Context switching imposes a performance penalty on threads in a multitasking environment The source of this penalty is both direct overhead due to running the context switch code and indirect overhead due to perturbation of caches We calculate indirect overhead by measuring the running time of tasks that use context switching and subtracting the direct overhead We also measure the indirect overhead impact on the running time of tasks due to processor interrupt servicing Experiment results are presented for the Linux kernel running on an ARM processor based mobile device platform We have described a context switch as a mechanism used to switch between two threads of execution We do not consider a system call a context switch This is like a simple function call and only involves switching the processor from unprivileged user mode to a privileged kernel mode Memory maps are not switched The transition back to userspace from the kernel during the return of the system call is similar to a function call return Categories and Subject Descriptors D 4 8 Operating Systems Performance Measurements General Terms Experimentation Measurement Performance Keywords A processor interrupt causes the state of the currently executing task to be saved while an interrupt service routine is executed When the interrupt service routine completes the saved state is restored While memory maps are not switched during interrupt servicing it does perturb cache state and might also contribute some indirect overhead operating system context switch overhead 1 INTRODUCTION Context switching is the fundamental mechanism that is used to share a processor across multiple threads of execution Each thread is associated with a processor state such as the program counter general purpose registers status registers and so on A context switch is the act of saving the processor state of a thread and loading the saved state of another thread If the threads are associated with different virtual address spaces a context switch also involves switching the address translation maps used by the processor In Linux this happens when the threads belong to different user processes Switching address spaces requires that relevant entries in the processor s address translation cache TLB are invalidated If the instruction or data caches are tagged using virtual addresses they would have to be emptied as well In this paper we measure the indirect overhead of context switches inside the Linux kernel using pairs of tasks that perform cooperative multitasking In a separate set of experiments we also measure the indirect overhead introduced due to processor interrupt servicing We do not explore userspace implementations of threads and userspace context switching The latest versions of the Linux kernel support the Native Posix Threading Library NPTL which implements user threads as kernel threads and context switching happens inside the kernel Context switching imposes a small performance penalty on threads in a multitasking environment In addition to the direct overhead This study targets mobile device architectures and the hardware platform we use in our experiments is the OMAP1610 H2 Software Development Platform 8 cellular phone reference design from Texas Instruments The OMAP1610 is powered by an ARM processor core Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise to republish to post on servers or to redistribute to lists requires prior specific permission and or a fee ExpCS 13 14 June 2007 San Diego CA Copyright 2007 ACM 978 1 59593 751 3 07 06 5 00 The rest of this paper is organized as follows Section 2 presents a quick introduction to the hardware platform that we use in our experiments We discuss the experiment setup and results for context switch overhead measurements in section 3 The experiment setup 1 Context Switches 1 Context Switches 3 Task 1 Begin Task 1 Begin Time Task 2 Begin Task 1 End Rtotal CS Time Task 2 Begin R total Task 1 End Task 2 End Task 2 End Figure 1 Context Switch Overhead Experiment Measurements 3 CONTEXT SWITCHING OVERHEAD 3 1 Experiment Setup and results for interrupt servicing overhead measurements are presented in section 4 After exploring some related work in section 5 we conclude in section 6 2 We added code into the Linux kernel to measure the running time of tasks performing deterministic computation with a controlled number of context switches and without external interference such as interrupts In order to measure running time and the effects of a context switch accurately the task code is built into the kernel and system calls are not used We however configure the task with a unique mmu struct to ensure that the page table mappings are reset during a context switch This setup allows us to explore the impact of cache flushes and TLB invalidation during a context switch EXPERIMENTATION PLATFORM ARM is a 32 bit RISC architecture ARM processors are widely used in mobile devices because of their low power consumption In this section we briefly describe some features of the ARM architecture that are relevant to this research Our implementations and experiments have been carried out on a processor core which belongs to the ARMv5 architecture generation The ARM926EJS processor core that we use is part of the OMAP1610 chip from Texas Instruments All measurements are performed starting with cold data and
or
We will never post anything without your permission.
Don't have an account? Sign up