This appendix includes:
This appendix describes various features and restrictions related to the Neutrino implementation on ARM/Xscale processors:
For an overview of how Neutrino manages memory, see the introduction to the Finding Memory Errors chapter of the IDE User's Guide.
This section describes the major restrictions and issues raised by the Neutrino implementation on ARM/Xscale:
Device drivers in Neutrino use ThreadCtl() with the _NTO_TCTL_IO flag to obtain I/O privileges. This mechanism allows direct access to I/O ports and the ability to control processor interrupt masking.
On ARM platforms, all I/O access is memory-mapped, so this flag is used primarily to allow manipulation of the processor interrupt mask.
Normal user processes execute in the processor's User mode, and the processor silently ignores any attempts to manipulate the interrupt mask in the CPSR register (i.e. they don't cause any protection violation, and simply have no effect on the mask).
The _NTO_TCTL_IO flag makes the calling thread execute in the processor's System mode. This is a privileged mode that differs only from the Supervisor mode in its use of banked registers.
This means that such privileged user processes execute with all the access permission of kernel code:
The major consequence of this is that buggy programs using _NTO_TCTL_IO can corrupt kernel memory.
All currently supported ARM/Xscale processors implement a virtually indexed cache. This has a number of software-visible consequences:
The Neutrino implementation does perform this flushing when memory is unmapped, but it avoids the context-switch penalty by using the "Fast Context Switch Extension" implemented by some ARM MMUs. This is described below.
An alternative to making such memory uncached is to modify all drivers that perform DMA access to explicitly synchronize memory when necessary:
As mentioned, Neutrino uses the MMU Fast Context Switch Extension (FCSE) to avoid cache-flushing during context switches. Since the cost of this cache-flushing can be significant (potentially many thousands of cycles), this is crucial to a microkernel system like Neutrino because context switches are much more frequent than in a monolithic (e.g. UNIX-like) OS:
The FCSE implementation works by splitting the 4 GB virtual address space into a number of 32 MB slots. Each address space appears to have a virtual address space range of 0 - 32 MB, but the MMU transparently remaps this to a a "real" virtual address by putting the slot index into the top 7 bits of the virtual address.
For example, consider two processes: process 1 has slot index 1; process 2 has slot index 2. Each process appears to have an address space 0 - 32 MB, and their code uses those addresses for execution, loads and stores.
In reality, the virtual addresses seen by the MMU (cache and TLB) are:
This mechanism imposes a number of restrictions:
|0- 1 MB||Initial thread stack|
|1-16 MB||Program text, data, and BSS|
|16-24 MB||Shared libraries|
|24-32 MB||MAP_SHARED mappings|
When a program is loaded, the loader will have populated the stack, text, data, BSS, and shared library areas.
If you allocate memory, malloc() tries to find a free virtual address range for the requested size. If you try to allocate more than 15 MB, the allocation will likely fail because of this layout. The free areas are typically:
The current limit is 63 slots:
Since each process typically has its own address space, this imposes a hard limit of at most 63 different processes.
Strictly speaking, this is required only if at least one writable mapping exists, but the current VM implementation doesn't track this, and unconditionally makes all mappings uncached.
The consequence of this is that performance of memory accesses to shared memory object mappings will be bound by the uncached memory performance of the system.
This section describes the ARM-specific behavior of certain operations that are provided via a processor-independent interface:
The Neutrino implementation on ARM uses various shm_ctl() flags to provide some workarounds for the restrictions imposed by the MMU FCSE implementation, to provide a "global" address space above 0x80000000 that lets processes map objects that wouldn't otherwise fit into the (private) 32 MB process-address space.
The following flags supplied to shm_ctl() create a shared memory object that you can subsequently mmap() with special properties:
Since all mappings of these objects share the same virtual address, there are a number of artifacts caused by mmap():
Specifying this flag allows any process in the system to access the object, because the virtual address is visible to all processes.
To create these special mappings:
fd = shm_open(name, ...) shm_ctl(fd, ...)
Note that you must be root to use shm_ctl().
fd = shm_open(name, ...) mmap( ..., fd, ...)
Any process that can use shm_open() on the object can map it, not just the process that created the object.
The following table summarizes the effect of the various combinations of flags passed to shm_ctl():
|Flags||Object type||Effect of mmap()|
|SHMCTL_ANON||Anonymous memory (not contiguous)||Mapped into normal process address space. PROT_NOCACHE is forced.|
|SHMCTL_ANON | SHMCTL_PHYS||Anonymous memory (physically contiguous)||Mapped into normal process address space. PROT_NOCACHE is forced.|
|SHMCTL_ANON | SHMCTL_GLOBAL||Anonymous memory (not contiguous)||Mapped into global address space. PROT_NOCACHE isn't forced. All processes receive the same mapping.|
|SHMCTL_ANON | SHMCTL_GLOBAL | SHMCTL_PHYS||Anonymous memory (not contiguous)||Mapped into global address space. PROT_NOCACHE isn't forced. All processes receive the same mapping.|
|SHMCTL_PHYS||Physical memory range||Mapped into global address space. PROT_NOCACHE is forced. Processes receive unique mappings.|
|SHMCTL_PHYS | SHMCTL_GLOBAL||Physical memory range||Mapped into global address space. PROT_NOCACHE isn't forced. All processes receive the same mapping.|
Note that by default, mmap() creates privileged access mappings, so the caller must have _NTO_TCTL_IO privilege to access them.
Flags may specify SHMCTL_LOWERPROT to create user-accessible mappings. However, this allows any process to access these mappings if they're in the global address space.