Filesystems

Introduction

QNX Neutrino provides a rich variety of filesystems. Like most service-providing processes in the OS, these filesystems execute outside the kernel; applications use them by communicating via messages generated by the shared-library implementation of the POSIX API.

Most of these filesystems are resource managers as described in this book. Each filesystem adopts a portion of the pathname space (called a mountpoint) and provides filesystem services through the standard POSIX API (open(), close(), read(), write(), lseek(), etc.). Filesystem resource managers take over a mountpoint and manage the directory structure below it. They also check the individual pathname components for permissions and for access authorizations.

This implementation means that:

Filesystems and pathname resolution

You can seamlessly locate and connect to any service or filesystem that's been registered with the process manager. When a filesystem resource manager registers a mountpoint, the process manager creates an entry in the internal mount table for that mountpoint and its corresponding server ID (i.e. the nd, pid, chid identifiers).

This table effectively joins multiple filesystem directories into what users perceive as a single directory. The process manager handles the mountpoint portion of the pathname; the individual filesystem resource managers take care of the remaining parts of the pathname. Filesystems can be registered (i.e. mounted) in any order.

When a pathname is resolved, the process manager contacts all the filesystem resource managers that can handle some component of that path. The result is a collection of file descriptors that can resolve the pathname.

If the pathname represents a directory, the process manager asks all the filesystems that can resolve the pathname for a listing of files in that directory when readdir() is called. If the pathname isn't a directory, then the first filesystem that resolves the pathname is accessed.

For more information on pathname resolution, see the section "Pathname management" in the chapter on the Process Manager in this guide.

Filesystem classes

The many filesystems available can be categorized into the following classes:

Image
A special filesystem that presents the modules in the image and is always present. Note that the procnto process automatically provides an image filesystem and a RAM filesystem.
Block
Traditional filesystems that operate on block devices like hard disks and CD-ROM drives. This includes the QNX 4, DOS, and CD-ROM filesystems.
Flash
Nonblock-oriented filesystems designed explicitly for the characteristics of flash memory devices. For NOR devices, use the FFS3 filesystem; for NAND, use ETFS.
Network
Filesystems that provide network file access to the filesystems on remote host computers. This includes the NFS and CIFS (SMB) filesystems.
Virtual
QNX Neutrino provides the following virtual filesystems:

Filesystems as shared libraries

Since it's common to run many filesystems under QNX Neutrino, they have been designed as a family of drivers and shared libraries to maximize code reuse. This means the cost of adding an additional filesystem is typically smaller than might otherwise be expected.

Once an initial filesystem is running, the incremental memory cost for additional filesystems is minimal, since only the code to implement the new filesystem protocol would be added to the system.

The various filesystems are layered as follows:


Neutrino filesystem diagram


QNX Neutrino filesystem layering.

As shown in this diagram, the filesystems and io-blk are implemented as shared libraries (essentially passive blocks of code resident in memory), while the devb-* driver is the executing process that calls into the libraries. In operation, the driver process starts first and invokes the block-level shared library (io-blk.so). The filesystem shared libraries may be dynamically loaded later to provide filesystem interfaces and services.

A "filesystem" shared library implements a filesystem protocol or "personality" on a set of blocks on a physical disk device. The filesystems aren't built into the OS kernel; rather, they're dynamic entities that can be loaded or unloaded on demand.

For example, a removable storage device (PCCard flash card, floppy disk, removable cartridge disk, etc.) may be inserted at any time, with any of a number of filesystems stored on it. While the hardware the driver interfaces to is unlikely to change dynamically, the on-disk data structure could vary widely. The dynamic nature of the filesystem copes with this very naturally.

io-blk

Most of the filesystem shared libraries ride on top of the Block I/O module (io-blk.so). This module also acts as a resource manager and exports a block-special file for each physical device. For a system with two hard disks the default files would be:

/dev/hd0
First hard disk
/dev/hd1
Second hard disk

These files represent each raw disk and may be accessed using all the normal POSIX file primitives (open(), close(), read(), write(), lseek(), etc.). Although the io-blk module can support a 64-bit offset on seek, the driver interface is 32-bit, allowing access to 2-terabyte disks.

Builtin RAM disk

The io-blk module supports an internal RAM-disk device that can be created via a command-line option (blk ramdisk=size). Since this RAM disk is internal to the io-blk module (rather than created and maintained by an additional device driver such as devb-ram), performance is significantly better than that of a dedicated RAM-disk driver.

By incorporating the RAM-disk device directly at the io-blk layer, the device's data memory parallels the main cache, so I/O operations to this device can bypass the buffer cache, eliminating a memory copy yet still retaining coherency. Contrast this with a driver-level implementation (e.g. devb-ram) where transparently presenting the RAM as a block device involves additional memory copies and duplicates data in the buffer cache. Inter-DLL callouts are also eliminated. In addition, there are benefits in terms of installation footprint for systems that have a hard disk and also want a RAM disk -- only the single driver is needed.

Partitions

QNX Neutrino complies with the de facto industry standard for partitioning a disk. This allows a number of filesystems to share the same physical disk. Each partition is also represented as a block-special file, with the partition type appended to the filename of the disk it's located on. In the above "two-disk" example, if the first disk had a QNX partition and a DOS partition, while the second disk had only a QNX partition, then the default files would be:

/dev/hd0
First hard disk
/dev/hd0t6
DOS partition on first hard disk
/dev/hd0t79
QNX partition on first hard disk
/dev/hd1
Second hard disk
/dev/hd1t79
QNX partition on second hard disk

The following list shows some typical assigned partition types:

Type Filesystem
1 DOS (12-bit FAT)
4 DOS (16-bit FAT; partitions <32M)
5 DOS Extended Partition (enumerated but not presented)
6 DOS 4.0 (16-bit FAT; partitions >=32M)
7 OS/2 HPFS
7 Previous QNX version 2 (pre-1988)
8 QNX 1.x and 2.x ("qny")
9 QNX 1.x and 2.x ("qnz")
11 DOS 32-bit FAT; partitions up to 2047G
12 Same as Type 11, but uses Logical Block Address Int 13h extensions
14 Same as Type 6, but uses Logical Block Address Int 13h extensions
15 Same as Type 5, but uses Logical Block Address Int 13h extensions
77 QNX POSIX partition (secondary)
78 QNX POSIX partition (secondary)
79 QNX POSIX partition
99 UNIX
131 Linux (Ext2)

Buffer cache

The io-blk shared library implements a buffer cache that all filesystems inherit. The buffer cache attempts to store frequently accessed filesystem blocks in order to minimize the number of times a system has to perform a physical I/O to the disk.

Read operations are synchronous; write operations are usually asynchronous. When an application writes to a file, the data enters the cache, and the filesystem manager immediately replies to the client process to indicate that the data has been written. The data is then written to the disk.

Critical filesystem blocks such as bitmap blocks, directory blocks, extent blocks, and inode blocks are written immediately and synchronously to disk.

Applications can modify write behavior on a file-by-file basis. For example, a database application can cause all writes for a given file to be performed synchronously. This would ensure a high level of file integrity in the face of potential hardware or power problems that might otherwise leave a database in an inconsistent state.

Filesystem limitations

POSIX defines the set of services a filesystem must provide. However, not all filesystems are capable of delivering all those services.

The following chart highlights some of the POSIX-defined capabilities and indicates if the filesystems support them:

Capability Image RAM ETFS QNX 4 DOS CD-ROM FFS3 NFS CIFS Ext2
Access date no yes yes yes yesa yesb no yes no yes
Modification date no yes yes yes yes yesb yes yes yes yes
Status change date no yes yes yes no yesb yes yes no yes
Filename length 255 255 91 48c 8.3d 32e 255 --f --f 255
User permissions yes yes yes yes no yesb yes yesf yesf yes
Group permissions yes yes yes yes no yesb yes yesf yesf yes
Other permissions yes yes yes yes no yesb yes yesf yesf yes
Directories no no yes yes yes yes yes yes yes yes
Hard links no no no yes no no no yesf no yes
Soft links no no yes yes no yesb yes yesf no yes
Decompression on read no no no no no no yes no no no

a VFAT or FAT32 (e.g. Windows 95).

b With Rock Ridge extensions.

c 505 if .longfilenames is enabled; otherwise, 48.

d 255-character filename lengths used by VFAT or FAT32 (e.g. Windows 95).

e 128 with Joliet extensions; 255 with Rock Ridge extensions.

f Limited by the remote filesystem.

Image filesystem

Every QNX Neutrino system image provides a simple read-only filesystem that presents the set of files built into the OS image.

Since this image may include both executables and data files, this filesystem is sufficient for many embedded systems. If additional filesystems are required, they would be placed as modules within the image where they can be started as needed.

RAM "filesystem"

Every QNX system also provides a simple RAM-based "filesystem" that allows read/write files to be placed under /dev/shmem.

This RAM filesystem finds the most use in tiny embedded systems where persistent storage across reboots isn't required, yet where a small, fast, temporary-storage filesystem with limited features is called for.

The filesystem comes for free with procnto and doesn't require any setup. You can simply create files under /dev/shmem and grow them to any size (depending on RAM resources).

Although the RAM filesystem itself doesn't support hard or soft links or directories, you can create a link to it by using process-manager links. For example, you could create a link to a RAM-based /tmp directory:

ln -sP /dev/shmem /tmp

This tells procnto to create a process manager link to /dev/shmem known as "/tmp." Application programs can then open files under /tmp as if it were a normal filesystem.


Note: In order to minimize the size of the RAM filesystem code inside the process manager, this filesystem specifically doesn't include "big filesystem" features such as file locking and directory creation.

Embedded transaction filesystem (ETFS)

ETFS implements a high-reliability filesystem for use with embedded solid-state memory devices, particularly NAND flash memory. The filesystem supports a fully hierarchical directory structure with POSIX semantics as shown in the table above.

ETFS is a filesystem composed entirely of transactions. Every write operation, whether of user data or filesystem metadata, consists of a transaction. A transaction either succeeds or is treated as if it never occurred.

Transactions never overwrite live data. A write in the middle of a file or a directory update always writes to a new unused area. In this way, if the operation fails part way through (due to a crash or power failure), the old data is still intact.

Some log-based filesystems also operate under the principle that live data is never overwritten. But ETFS takes this to the extreme by turning everything into a log of transactions. The filesystem hierarchy is built on the fly by processing the log of transactions in the device. This scan occurs at startup, but is designed such that only a small subset of the data is read and CRC-checked, resulting in faster startup times without sacrificing reliability.

Transactions are position-independent in the device and may occur in any order. You could read the transactions from one device and write them in a different order to another device. This is important because it allows bulk programming of devices containing bad blocks that may be at arbitrary locations.

This design is well-suited for NAND flash memory. NAND flash is shipped with factory-marked bad blocks that may occur in any location.


Embedded Transaction filesystem diagram


ETFS is a filesystem composed entirely of transactions.

Inside a transaction

Each transaction consists of a header followed by data. The header contains the following:

FID
A unique file ID that identifies which file the transaction belongs to.
Offset
The offset of the data portion within the file.
Size
The size of the data portion.
Sequence
A monotonically increasing number (to enable time ordering).
CRCs
Data integrity checks (for NAND, NOR, SRAM).
ECCs
Error correction (for NAND).
Other
Reserved for future expansion.

Types of storage media

Although best for NAND devices, ETFS also supports other types of embedded storage media through the use of driver classes as follows:

Class CRC ECC Wear-leveling
erase
Wear-leveling
read
Cluster size
NAND 512+16 Yes Yes Yes Yes 1K
NAND 2048+64 Yes Yes Yes Yes 2K
RAM No No No No 1K
SRAM Yes No No No 1K
NOR Yes No Yes No 1K

Note: Although ETFS can support NOR flash, we recommend instead the FFS3 filesystem (devf-*), which is designed explicitly for NOR flash devices.

Reliability features

ETFS is designed to survive across a power failure, even during an active flash write or block erase. The following features contribute to its reliability:

Dynamic wear-leveling

Flash memory allows a limited number of erase cycles on a flash block before the block will fail. This number can be as low as 100,000. ETFS tracks the number of erases on each block. When selecting a block to use, ETFS attempts to spread the erase cycles evenly over the device, dramatically increasing its life. The difference can be extreme: from usage scenarios of failure within a few days without wear-leveling to over 40 years with wear-leveling.

Static wear-leveling

Filesystems often consist of a large number of static files that are read but not written. These files will occupy flash blocks that have no reason to be erased. If the majority of the files in flash are static, this will cause the remaining blocks containing dynamic data to wear at a dramatically increased rate.

ETFS notices these under-worked static blocks and forces them into service by copying their data to an over-worked block. This solves two problems: It gives the over-worked block a rest, since it now contains static data, and it forces the under-worked static block into the dynamic pool of blocks.

CRC error detection

Each transaction is protected by a cyclic redundancy check (CRC). This ensures quick detection of corrupted data, and forms the basis for the rollback operation of damaged or incomplete transactions at startup. The CRC can detect multiple bit errors that may occur during a power failure.

ECC error correction

On a CRC error, ETFS can apply error correction coding (ECC) to attempt to recover the data. This is suitable for NAND flash memory, in which single-bit errors may occur during normal usage. An ECC error is a warning signal that the flash block the error occurred in may be getting weak, i.e. loosing charge.

ETFS will mark the weak block for a refresh operation, which copies the data to a new flash block and erases the weak block. The erase recharges the flash block.

Read degradation monitoring with automatic refresh

Each read operation within a NAND flash block weakens the charge maintaining the data bits. Most devices support about 100,000 reads before there's danger of loosing a bit. The ECC will recover a single-bit error, but may not be able to recover multi-bit errors.

ETFS solves this by tracking reads and marking blocks for refresh before the 100,000 read limit is reached.

Transaction rollback

When ETFS starts, it processes all transactions and rolls back (discards) the last partial or damaged transaction. The rollback code is designed to handle a power failure during a rollback operation, thus allowing the system to recover from multiple nested faults. The validity of a transaction is protected by CRC codes on each transaction.

Atomic file operations

ETFS implements a very simple directory structure on the device, allowing significant modifications with a single flash write. For example, the move of a file or directory to another directory is often a multistage operation in most filesystems. In ETFS, a move is accomplished with a single flash write.

Automatic file defragmentation

Log-based filesystems often suffer from fragmentation, since each update or write to an existing file causes a new transaction to be created. ETFS uses write-buffering to combine small writes into larger write transactions in an attempt to minimize fragmentation caused by lots of very small transactions. ETFS also monitors the fragmentation level of each file and will do a background defrag operation on files that do become badly fragmented. Note that this background activity will always be preempted by a user data request in order to ensure immediate access to the file being defragmented.

QNX 4 filesystem

The QNX 4 filesystem (fs-qnx4.so) is a high-performance filesystem that shares the same on-disk structure as QNX 4.

The QNX 4 filesystem implements an extremely robust design, utilizing an extent-based, bitmap allocation scheme with fingerprint control structures to safeguard against data loss and to provide easy recovery. Features include:


Note: Since the release of 6.2.1, the 48-character filename limit has increased to 505 characters via a backwards-compatible extension. The same on-disk format is retained, but new systems will see the longer name, old ones will see a truncated 48-character name.

For more information, see "QNX 4 filesystem" in the Working With Filesystems chapter of the QNX Neutrino User's Guide.

DOS Filesystem

The DOS Filesystem, fs-dos.so, provides transparent access to DOS disks, so you can treat DOS filesystems as though they were POSIX filesystems. This transparency allows processes to operate on DOS files without any special knowledge or work on their part.

The structure of the DOS filesystem on disk is old and inefficient, and lacks many desirable features. Its only major virtue is its portability to DOS and Windows environments. You should choose this filesystem only if you need to transport DOS files to other machines that require it. Consider using the QNX filesystem alone if DOS file portability isn't an issue or in conjunction with the DOS filesystem if it is.

If there's no DOS equivalent to a POSIX feature, fs-dos.so will either return an error or a reasonable default. For example, an attempt to create a link() will result in the appropriate errno being returned. On the other hand, if there's an attempt to read the POSIX times on a file, fs-dos.so will treat any of the unsupported times the same as the last write time.

DOS version support

The fs-dos.so program supports both floppies and hard disk partitions from DOS version 2.1 to Windows 98 with long filenames.

DOS text files

DOS terminates each line in a text file with two characters (CR/LF), while POSIX (and most other) systems terminate each line with a single character (LF). Note that fs-dos.so makes no attempt to translate text files being read. Most utilities and programs won't be affected by this difference.

Note also that some very old DOS programs may use a Ctrl-Z (^Z) as a file terminator. This character is also passed through without modification.

QNX-to-DOS filename mapping

In DOS, a filename cannot contain any of the following characters:

/ \ [ ] : * | + = ; , ?

An attempt to create a file that contains one of these invalid characters will return an error. DOS (8.3 format) also expects all alphabetical characters to be uppercase, so fs-dos.so maps these characters to uppercase when creating a filename on disk. But it maps a filename to lowercase by default when returning a filename to a QNX Neutrino application, so that QNX Neutrino users and programs can always see and type lowercase (via the sfn=sfn_mode option).

Handling filenames

You can specify how you want fs-dos.so to handle long filenames (via the lfn=lfn_mode option):

If you use the ignore option, you can specify whether or not to silently truncate filename characters beyond the 8.3 limit.

International filenames

The DOS filesystem supports DOS "code pages" (international character sets) for locale filenames. Short 8.3 names are stored using a particular character set (typically the most common extended characters for a locale are encoded in the 8th-bit character range). All the common American as well as Western and Eastern European code pages (437, 850, 852, 866, 1250, 1251, 1252) are supported. If you produce software that must access a variety of DOS/Windows hard disks, or operate in non-US-English countries, this feature offers important portability -- filenames will be created with both a Unicode and locale name and are accessible via either name.


Note: The DOS filesystem supports international text in filenames only. No attempt is made to be aware of data contents, with the sole exception of Windows "shortcut" (.LNK) files, which will be parsed and translated into symbolic links if you've specified that option (lnk=lnk_mode).

DOS volume labels

DOS uses the concept of a volume label, which is an actual directory entry in the root of the DOS filesystem. To distinguish between the volume label and an actual DOS directory, fs-dos.so reports the volume label according to the way you specify its vollabel option. You can choose to:

DOS-QNX permission mapping

DOS doesn't support all the permission bits specified by POSIX. It has a READ_ONLY bit in place of separate READ and WRITE bits; it doesn't have an EXECUTE bit. When a DOS file is created, the DOS READ_ONLY bit will be set if all the POSIX WRITE bits are off. When a DOS file is accessed, the POSIX READ bit is always assumed to be set for user, group, and other.

Since you can't execute a file that doesn't have EXECUTE permission, fs-dos.so has an option (exe=exec_mode) that lets you specify how to handle the POSIX EXECUTE bit for executables.

File ownership

Although the DOS file structure doesn't support user IDs and group IDs, fs-dos.so (by default) won't return an error code if an attempt is made to change them. An error isn't returned because a number of utilities attempt to do this and failure would result in unexpected errors. The approach taken is "you can change anything to anything since it isn't written to disk anyway."

The posix= options let you set stricter POSIX checks and enable POSIX emulation. For example, in POSIX mode, an error of EINVAL is flagged for attempts to do any of the following:

If you set the posix option to emulate (the default) or strict, you get the following benefits:

CD-ROM filesystem

The CD-ROM filesystem provides transparent access to CD-ROM media, so you can treat CD-ROM filesystems as though they were POSIX filesystems. This transparency allows processes to operate on CD-ROM files without any special knowledge or work on their part.

The fs-cd.so manager implements the ISO 9660 standard as well as a number of extensions, including Rock Ridge (RRIP), Joliet (Microsoft), and multisession (Kodak Photo CD, enhanced audio).

FFS3 filesystem

The FFS3 filesystem drivers implement a POSIX-like filesystem on NOR flash memory devices. The drivers are standalone executables that contain both the flash filesystem code and the flash device code. There are versions of the FFS3 filesystem driver for different embedded systems hardware as well as PCMCIA memory cards.

The naming convention for the drivers is devf-system, where system describes the embedded system. For example, the devf-800fads driver is for the 800FADS PowerPC evaluation board.

To find out what flash devices we currently support, please refer to the following sources:

Customization

Along with the prebuilt flash filesystem drivers, including the "generic" driver (devf-generic), we provide the libraries and source code that you'll need to build custom flash filesystem drivers for different embedded systems. For information on how to do this, see the Customizing the Flash Filesystem chapter in the Building Embedded Systems book.

Organization

The FFS3 filesystem drivers support one or more logical flash drives. Each logical drive is called a socket, which consists of a contiguous and homogeneous region of flash memory. For example, in a system containing two different types of flash device at different addresses, where one flash device is used for the boot image and the other for the flash filesystem, each flash device would appear in a different socket.

Each socket may be divided into one or more partitions. Two types of partitions are supported: raw partitions and flash filesystem partitions.

Raw partitions

A raw partition in the socket is any partition that doesn't contain a flash filesystem. The driver doesn't recognize any filesystem types other than the flash filesystem. A raw partition may contain an image filesystem or some application-specific data.

The filesystem will make accessible through a raw mountpoint (see below) any partitions on the flash that aren't flash filesystem partitions. Note that the flash filesystem partitions are available as raw partitions as well.

Filesystem partitions

A flash filesystem partition contains the POSIX-like flash filesystem, which uses a QNX proprietary format to store the filesystem data on the flash devices. This format isn't compatible with either the Microsoft FFS2 or PCMCIA FTL specification.

The filesystem allows files and directories to be freely created and deleted. It recovers space from deleted files using a reclaim mechanism similar to garbage collection.

Mountpoints

When you start the flash filesystem driver, it will by default mount any partitions it finds in the socket. Note that you can specify the mountpoint using mkefs or flashctl (e.g. /flash).

Mountpoint Description
/dev/fsX raw mountpoint socket X
/dev/fsXpY raw mountpoint socket X partition Y
/fsXpY filesystem mountpoint socket X partition Y
/fsXpY/.cmp filesystem compressed mountpoint socket X partition Y

Features

The FFS3 filesystem supports many advanced features, such as POSIX compatibility, multiple threads, background reclaim, fault recovery, transparent decompression, endian-awareness, wear-leveling, and error-handling.

POSIX

The filesystem supports the standard POSIX functionality (including long filenames, access privileges, random writes, truncation, and symbolic links) with the following exceptions:

These design compromises allow this filesystem to remain small and simple, yet include most features normally found with block device filesystems.

Background reclaim

The FFS3 filesystem stores files and directories as a linked list of extents, which are marked for deletion as they're deleted or updated. Blocks to be reclaimed are chosen using a simple algorithm that finds the block with the most space to be reclaimed while keeping level the amount of wear of each individual block. This wear-leveling increases the MTBF (mean time between failures) of the flash devices, thus increasing their longevity.

The background reclaim process is performed when there isn't enough free space. The reclaim process first copies the contents of the reclaim block to an empty spare block, which then replaces the reclaim block. The reclaim block is then erased. Unlike rotating media with a mechanical head, proximity of data isn't a factor with a flash filesystem, so data can be scattered on the media without loss of performance.

Fault recovery

The filesystem has been designed to minimize corruption due to accidental loss-of-power faults. Updates to extent headers and erase block headers are always executed in carefully scheduled sequences. These sequences allow the recovery of the filesystem's integrity in the case of data corruption.

Note that properly designed flash hardware is essential for effective fault-recovery systems. In particular, special reset circuitry must be in place to hold the system in "reset" before power levels drop below critical. Otherwise, spurious or random bus activity can form write/erase commands and corrupt the flash beyond recovery.

Rename operations are guaranteed atomic, even through loss-of-power faults. This means, for example, that if you lost power while giving an image or executable a new name, you would still be able to access the file via its old name upon recovery.

When the FFS3 filesystem driver is started, it scans the state of every extent header on the media (in order to validate its integrity) and takes appropriate action, ranging from a simple block reclamation to the erasure of dangling extent links. This process is merged with the filesystem's normal mount procedure in order to achieve optimal bootstrap timings.

Compression/decompression

For fast and efficient compression/decompression, you can use the deflate and inflator utilities, which rely on popular deflate/inflate algorithms.

The deflate algorithm combines two algorithms. The first takes care of removing data duplication in files; the second algorithm handles data sequences that appear the most often by giving them shorter symbols. Those two algorithms provide excellent lossless compression of data and executable files. The inflate algorithm simply reverses what the deflate algorithm does.

The deflate utility is intended for use with the filter attribute for mkefs. You can also use it to precompress files intended for a flash filesystem.

The inflator resource manager sits in front of the other filesystems that were previously compressed using the deflate utility. It can almost double the effective size of the flash memory.

Compressed files can be manipulated with standard utilities such as cp or ftp -- they can display their compressed and uncompressed size with the ls utility if used with the proper mountpoint. These features make the management of a compressed flash filesystem seamless to a systems designer.

Flash errors

As flash hardware wears out, its write state-machine may find that it can't write or erase a particular bit cell. When this happens, the error status is propagated to the flash driver so it can take proper action (i.e. mark the bad area and try to write/erase in another place).

This error-handling mechanism is transparent. Note that after several flash errors, all writes and erases that fail will eventually render the flash read-only. Fortunately, this situation shouldn't happen before several years of flash operation. Check your flash specification and analyze your application's data flow to flash in order to calculate its potential longevity or MTBF.

Endian awareness

The FFS3 filesystem is endian-aware, making it portable across different platforms. The optimal approach is to use the mkefs utility to select the target's endian-ness.

Utilities

The filesystem supports all the standard POSIX utilities such as ls, mkdir, rm, ln, mv, and cp. There are also some QNX Neutrino utilities for managing the flash:

flashctl
Erase, format, and mount flash partitions.
deflate
Compress files for flash filesystems.
mkefs
Create flash filesystem image files.

System calls

The filesystem supports all the standard POSIX I/O functions such as open(), close(), read(), and write(). Special functions such as erasing are supported using the devctl() function.

NFS filesystem

The Network File System (NFS) allows a client workstation to perform transparent file access over a network. It allows a client workstation to operate on files that reside on a server across a variety of operating systems. Client file access calls are converted to NFS protocol requests, and are sent to the server over the network. The server receives the request, performs the actual filesystem operation, and sends a response back to the client.

The Network File System operates in a stateless fashion by using remote procedure calls (RPC) and TCP/IP for its transport. Therefore, to use fs-nfs2 or fs-nfs3, you'll also need to run the TCP/IP client for Neutrino.

Any POSIX limitations in the remote server filesystem will be passed through to the client. For example, the length of filenames may vary across servers from different operating systems. NFS (versions 2 and 3) limits filenames to 255 characters; mountd (versions 1 and 3) limits pathnames to 1024 characters.


Note: Although NFS (version 2) is older than POSIX, it was designed to emulate UNIX filesystem semantics and happens to be relatively close to POSIX.

CIFS filesystem

Formerly known as SMB, the Common Internet File System (CIFS) allows a client workstation to perform transparent file access over a network to a Windows 98 or NT system, or a UNIX system running an SMB server. Client file access calls are converted to CIFS protocol requests and are sent to the server over the network. The server receives the request, performs the actual filesystem operation, and sends a response back to the client.


Note: The CIFS protocol makes no attempt to conform to POSIX.

The fs-cifs manager uses TCP/IP for its transport. Therefore, to use fs-cifs (SMBfsys in QNX 4), you'll also need to run the TCP/IP client for Neutrino.

Linux Ext2 filesystem

The Ext2 filesystem (fs-ext2.so) provides transparent access to Linux disk partitions. This implementation supports the standard set of features found in Ext2 versions 0 and 1.

Sparse file support is included in order to be compatible with existing Linux partitions. Other filesystems can only be "stacked" read-only on top of sparse files. There are no such restrictions on normal files.

If an Ext2 filesystem isn't unmounted properly, a filesystem checker is usually responsible for cleaning up the next time the filesystem is mounted. Although the fs-ext2.so module is equipped to perform a quick test, it automatically mounts the filesystem as read-only if it detects any significant problems (which should be fixed using a filesystem checker).

Virtual filesystems

QNX Neutrino provides two types of virtual filesystems:

Package filesystem

Our package filesystem component is a virtual filesystem manager that presents a customized view of a set of files and directories to a client. The actual files are present on some medium (e.g. a server's hard disk) -- the package filesystem presents a "virtual" view of selected files to the client.

This can be useful in a networked environment, where a centralized server maintains a separate package of files for each client machine. Each client may want to see their own custom versions of certain files or unique combinations of files in a package.

Traditionally, you would achieve this either by copying all the files for all the packages into directories, with one directory per machine, or by creating various symbolic links.

By capitalizing on QNX Neutrino's pathname space and resource manager concepts, the package filesystem manager alleviates these problems. You can easily install a set of directories and files, and just as easily uninstall them, all without having to copy the files themselves.

What's in a package?

A package consists of a number of files and directories that are related to each other by virtue of being part of a product or of a particular release. For example, the QNX Neutrino OS, including its binary files and online documentation, is considered a package. A set of updates (called a patch) to the OS might contain only selected files and documentation (i.e. the files that have changed); this would be a different package.

The purpose of the package filesystem is to manage these packages in such a way that various nodes can pick and choose which particular packages they want to use. For example, one node might be an x86 box running, say, Patch B of a certain release, while another node might be a PowerPC box running Patch C.

Since these two machines are different CPU types, we need to give each of them not only different types of executables in their /bin directories (one set of x86 executables and one set of PowerPC executables), but also different contents based on the desired version of the software and patch levels.

Single definition file

In a nutshell, the package filesystem presents, on each node, a virtual filesystem that contains only selected files from the master package database. The selection is controlled by a definition file that indicates:

The advantage of a single definition file is clear: When the time comes to change the package lineup on a particular node, this can be accomplished by a simple (and fast!) change to the definition file.

Inflator

Inflator is a resource manager that sits in front of other filesystems and inflates files that were previously deflated (using the deflate utility).

The inflator utility is typically used when the underlying filesystem is a flash filesystem. Using it can almost double the effective size of the flash memory.

If a file is being opened for a read, inflator attempts to open the file itself on an underlying filesystem. It reads the first 16 bytes and checks for the signature of a deflated file. If the file was deflated, inflator places itself between the application and the underlying filesystem. All reads return the original file data before it was deflated.

From the application's point of view, the file appears to be uncompressed. Random seeks are also supported. If the application does a stat() on the file, the size of the inflated file (the original size before it was deflated) is returned.