Writing a Resource Manager

This chapter contains the following topics:

What is a resource manager?


Note:

This chapter assumes that you're familiar with message passing. If you're not, see the Neutrino Microkernel chapter in the System Architecture book as well as the MsgSend(), MsgReceivev(), and MsgReply() series of calls in the Library Reference.

The code samples used in this chapter are not always POSIX-compliant.


This section contains the following:

A resource manager is a user-level server program that accepts messages from other programs and, optionally, communicates with hardware. It's a process that registers a pathname prefix in the pathname space (e.g. /dev/ser1), and when registered, other processes can open that name using the standard C library open() function, and then read() from, and write() to, the resulting file descriptor. When this happens, the resource manager receives an open request, followed by read and write requests.

A resource manager isn't restricted to handling just open(), read(), and write() calls -- it can support any functions that are based on a file descriptor or file pointer, as well as other forms of IPC.

In Neutrino, resource managers are responsible for presenting an interface to various types of devices. In other operating systems, the managing of actual hardware devices (e.g. serial ports, parallel ports, network cards, and disk drives) or virtual devices (e.g. /dev/null, a network filesystem, and pseudo-ttys), is associated with device drivers. But unlike device drivers, the Neutrino resource managers execute as processes separate from the kernel.


Note: A resource manager looks just like any other user-level program.

Adding resource managers in Neutrino won't affect any other part of the OS -- the drivers are developed and debugged like any other application. And since the resource managers are in their own protected address space, a bug in a device driver won't cause the entire OS to shut down.

If you've written device drivers in most UNIX variants, you're used to being restricted in what you can do within a device driver; but since a device driver in Neutrino is just a regular process, you aren't restricted in what you can do (except for the restrictions that exist inside an ISR).


Note: In order to register a prefix in the pathname space, a resource manager must be run as root.

A few examples...

A serial port may be managed by a resource manager called devc-ser8250, although the actual resource may be called /dev/ser1 in the pathname space. When a process requests serial port services, it does so by opening a serial port (in this case /dev/ser1).

fd = open("/dev/ser1", O_RDWR);
for (packet = 0; packet < npackets; packet++)
    write(fd, packets[packet], PACKET_SIZE);
close(fd);

Because resource managers execute as processes, their use isn't restricted to device drivers -- any server can be written as a resource manager. For example, a server that's given DVD files to display in a GUI interface wouldn't be classified as a driver, yet it could be written as a resource manager. It can register the name /dev/dvd and as a result, clients can do the following:

fd = open("/dev/dvd", O_WRONLY);
while (data = get_dvd_data(handle, &nbytes)) {
    bytes_written = write(fd, data, nbytes);
    if (bytes_written != nbytes) {
        perror ("Error writing the DVD data");
    }
}
close(fd);

Why write a resource manager?

Here are a few reasons why you'd want to write a resource manager:

Under the covers

Despite the fact that you'll be using a resource manager API that hides many details from you, it's still important to understand what's going on under the covers. For example, your resource manager is a server that contains a MsgReceive() loop, and clients send you messages using MsgSend*(). This means that you must reply either to your clients in a timely fashion, or leave your clients blocked but save the rcvid for use in a later reply.

To help you understand, we'll discuss the events that occur under the covers for both the client and the resource manager.

Under the client's covers

When a client calls a function that requires pathname resolution (e.g. open(), rename(), stat(), or unlink()), the function subsequently sends messages to both the process and the resource managers to obtain a file descriptor. Once the file descriptor is obtained, the client can use it to send messages directly to the device associated with the pathname.

In the following, the file descriptor is obtained and then the client writes directly to the device:

/*
 * In this stage, the client talks 
 * to the process manager and the resource manager.
 */
fd = open("/dev/ser1", O_RDWR);

/*
 * In this stage, the client talks directly to the
 * resource manager.
 */
for (packet = 0; packet < npackets; packet++)
    write(fd, packets[packet], PACKET_SIZE);
close(fd);

For the above example, here's the description of what happened behind the scenes. We'll assume that a serial port is managed by a resource manager called devc-ser8250, that's been registered with the pathname prefix /dev/ser1:


Under-the-cover diagram illustrating the client side


Under-the-cover communication between the client, the process manager, and the resource manager.

  1. The client's library sends a "query" message. The open() in the client's library sends a message to the process manager asking it to look up a name (e.g. /dev/ser1).
  2. The process manager indicates who's responsible and it returns the nd, pid, chid, and handle that are associated with the pathname prefix.

    Here's what went on behind the scenes...
    When the devc-ser8250 resource manager registered its name (/dev/ser1) in the namespace, it called the process manager. The process manager is responsible for maintaining information about pathname prefixes. During registration, it adds an entry to its table that looks similar to this:

    0, 47167, 1, 0, 0, /dev/ser1
          

    The table entries represent:

    A resource manager is uniquely identified by a node descriptor, process ID, and a channel ID. The process manager's table entry associates the resource manager with a name, a handle (to distinguish multiple names when a resource manager registers more than one name), and an open type.

    When the client's library issued the query call in step 1, the process manager looked through all of its tables for any registered pathname prefixes that match the name. Previously, had another resource manager registered the name /, more than one match would be found. So, in this case, both / and /dev/ser1 match. The process manager will reply to the open() with the list of matched servers or resource managers. The servers are queried in turn about their handling of the path, with the longest match being asked first.

  3. The client's library sends a "connect" message to the resource manager. To do so, it must create a connection to the resource manager's channel:
    fd = ConnectAttach(nd, pid, chid, 0, 0);
          

    The file descriptor that's returned by ConnectAttach() is also a connection ID and is used for sending messages directly to the resource manager. In this case, it's used to send a connect message (_IO_CONNECT defined in <sys/iomsg.h>) containing the handle to the resource manager requesting that it open /dev/ser1.


    Note: Typically, only functions such as open() call ConnectAttach() with an index argument of 0. Most of the time, you should OR _NTO_SIDE_CHANNEL into this argument, so that the connection is made via a side channel, resulting in a connection ID that's greater than any valid file descriptor.

    When the resource manager gets the connect message, it performs validation using the access modes specified in the open() call (i.e. are you trying to write to a read-only device?, etc.)

  4. The resource manager generally responds with a pass (and open() returns with the file descriptor) or fail (the next server is queried).
  5. When the file descriptor is obtained, the client can use it to send messages directly to the device associated with the pathname.

    In the sample code, it looks as if the client opens and writes directly to the device. In fact, the write() call sends an _IO_WRITE message to the resource manager requesting that the given data be written, and the resource manager responds that it either wrote some of all of the data, or that the write failed.

Eventually, the client calls close(), which sends an _IO_CLOSE_DUP message to the resource manager. The resource manager handles this by doing some cleanup.

Under the resource manager's covers

The resource manager is a server that uses the Neutrino send/receive/reply messaging protocol to receive and reply to messages. The following is pseudo-code for a resource manager:

initialize the resource manager
register the name with the process manager
DO forever
    receive a message
    SWITCH on the type of message
        CASE _IO_CONNECT:
            call io_open handler
            ENDCASE
        CASE _IO_READ:
            call io_read handler
            ENDCASE
        CASE _IO_WRITE:
            call io_write handler
            ENDCASE
        .   /* etc. handle all other messages */
        .   /* that may occur, performing     */
        .   /* processing as appropriate      */
    ENDSWITCH
ENDDO

Many of the details in the above pseudo-code are hidden from you by a resource manager library that you'll use. For example, you won't actually call a MsgReceive*() function -- you'll call a library function, such as resmgr_block() or dispatch_block(), that does it for you. If you're writing a single-threaded resource manager, you might provide a message handling loop, but if you're writing a multi-threaded resource manager, the loop is hidden from you.

You don't need to know the format of all the possible messages, and you don't have to handle them all. Instead, you register "handler functions," and when a message of the appropriate type arrives, the library calls your handler. For example, suppose you want a client to get data from you using read() -- you'll write a handler that's called whenever an _IO_READ message is received. Since your handler handles _IO_READ messages, we'll call it an "io_read handler."

The resource manager library:

  1. Receives the message.
  2. Examines the message to verify that it's an _IO_READ message.
  3. Calls your io_read handler.

However, it's still your responsibility to reply to the _IO_READ message. You can do that from within your io_read handler, or later on when data arrives (possibly as the result of an interrupt from some data-generating hardware).

The library does default handling for any messages that you don't want to handle. After all, most resource managers don't care about presenting proper POSIX filesystems to the clients. When writing them, you want to concentrate on the code for talking to the device you're controlling. You don't want to spend a lot of time worrying about the code for presenting a proper POSIX filesystem to the client.

The types of resource managers

In considering how much work you want to do yourself in order to present a proper POSIX filesystem to the client, you can break resource managers into two types:

Device resource managers

Device resource managers create only single-file entries in the filesystem, each of which is registered with the process manager. Each name usually represents a single device. These resource managers typically rely on the resource-manager library to do most of the work in presenting a POSIX device to the user.

For example, a serial port driver registers names such as /dev/ser1 and /dev/ser2. When the user does ls -l /dev, the library does the necessary handling to respond to the resulting _IO_STAT messages with the proper information. The person who writes the serial port driver is able to concentrate instead on the details of managing the serial port hardware.

Filesystem resource managers

Filesystem resource managers register a mountpoint with the process manager. A mountpoint is the portion of the path that's registered with the process manager. The remaining parts of the path are managed by the filesystem resource manager. For example, when a filesystem resource manager attaches a mountpoint at /mount, and the path /mount/home/thomasf is examined:

/mount/
Identifies the mountpoint that's managed by the process manager.
home/thomasf
Identifies the remaining part that's to be managed by the filesystem resource manager.

Examples of using filesystem resource managers are:

Components of a resource manager

A resource manager is composed of some of the following layers:

iofunc layer

This top layer consists of a set of functions that take care of most of the POSIX filesystem details for you -- they provide a POSIX-personality. If you're writing a device resource manager, you'll want to use this layer so that you don't have to worry too much about the details involved in presenting a POSIX filesystem to the world.

This layer consists of default handlers that the resource manager library uses if you don't provide a handler. For example, if you don't provide an io_open handler, iofunc_open_default() is called.

It also contains helper functions that the default handlers call. If you override the default handlers with your own, you can still call these helper functions. For example, if you provide your own io_read handler, you can call iofunc_read_verify() at the start of it to make sure that the client has access to the resource.

The names of the functions and structures for this layer have the form iofunc_*. The header file is <sys/iofunc.h>. For more information, see the Library Reference.

resmgr layer

This layer manages most of the resource manager library details. It:

If you don't use this layer, then you'll have to parse the messages yourself. Most resource managers use this layer.

The names of the functions and structures for this layer have the form resmgr_*. The header file is <sys/resmgr.h>. For more information, see the Library Reference.


Resource manager layer


You can use the resmgr layer to handle _IO_* messages.

dispatch layer

This layer acts as a single blocking point for a number of different types of things. With this layer, you can handle:

_IO_* messages
It uses the resmgr layer for this.
select
Processes that do TCP/IP often call select() to block while waiting for packets to arrive, or for there to be room for writing more data. With the dispatch layer, you register a handler function that's called when a packet arrives. The functions for this are the select_*() functions.
pulses
As with the other layers, you register a handler function that's called when a specific pulse arrives. The functions for this are the pulse_*() functions.
other messages
You can give the dispatch layer a range of message types that you make up, and a handler. So if a message arrives and the first few bytes of the message contain a type in the given range, the dispatch layer calls your handler. The functions for this are the message_*() functions.

Dispatch layer


You can use the dispatch layer to handle _IO_* messages, select, pulses, and other messages.

The following describes the manner in which messages are handled via the dispatch layer (or more precisely, through dispatch_handler()). Depending on the blocking type, the handler may call the message_*() subsystem. A search is made, based on the message type or pulse code, for a matching function that was attached using message_attach() or pulse_attach(). If a match is found, the attached function is called.

If the message type is in the range handled by the resource manager (I/O messages) and pathnames were attached using resmgr_attach(), the resource manager subsystem is called and handles the resource manager message.

If a pulse is received, it may be dispatched to the resource manager subsystem if it's one of the codes handled by a resource manager (UNBLOCK and DISCONNECT pulses). If a select_attach() is done and the pulse matches the one used by select, then the select subsystem is called and dispatches that event.

If a message is received and no matching handler is found for that message type, MsgError(ENOSYS) is returned to unblock the sender.

thread pool layer

This layer allows you to have a single- or multi-threaded resource manager. This means that one thread can be handling a write() while another thread handles a read().

You provide the blocking function for the threads to use as well as the handler function that's to be called when the blocking function returns. Most often, you give it the dispatch layer's functions. However, you can also give it the resmgr layer's functions or your own.

You can use this layer independently of the resource manager layer.

Simple examples of device resource managers

The following are two complete but simple examples of a device resource manager:


Note: As you read through this chapter, you'll encounter many code snippets. Most of these code snippets have been written so that they can be combined with either of these simple resource managers.

Both of these simple device resource managers model their functionality after that provided by /dev/null:

Single-threaded device resource manager example

Here's the complete code for a simple single-threaded device resource manager:

#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>

static resmgr_connect_funcs_t    connect_funcs;
static resmgr_io_funcs_t         io_funcs;
static iofunc_attr_t             attr;

main(int argc, char **argv)
{
    /* declare variables we'll be using */
    resmgr_attr_t        resmgr_attr;
    dispatch_t           *dpp;
    dispatch_context_t   *ctp;
    int                  id;

    /* initialize dispatch interface */
    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr,
                "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize resource manager attributes */
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    /* initialize functions for handling messages */
    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs, 
                     _RESMGR_IO_NFUNCS, &io_funcs);

    /* initialize attribute structure used by the device */
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);

    /* attach our device name */
    id = resmgr_attach(
            dpp,            /* dispatch handle        */
            &resmgr_attr,   /* resource manager attrs */
            "/dev/sample",  /* device name            */
            _FTYPE_ANY,     /* open type              */
            0,              /* flags                  */
            &connect_funcs, /* connect routines       */
            &io_funcs,      /* I/O routines           */
            &attr);         /* handle                 */
    if(id == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* allocate a context structure */
    ctp = dispatch_context_alloc(dpp);

    /* start the resource manager message loop */
    while(1) {
        if((ctp = dispatch_block(ctp)) == NULL) {
            fprintf(stderr, "block error\n");
            return EXIT_FAILURE;
        }
        dispatch_handler(ctp);
    }
}

Note: Include <sys/dispatch.h> after <sys/iofunc.h> to avoid warnings about redefining the members of some functions.

Let's examine the sample code step-by-step.

Here's an outline of the steps we followed:

Initialize the dispatch interface

/* initialize dispatch interface */
if((dpp = dispatch_create()) == NULL) {
    fprintf(stderr, "%s: Unable to allocate dispatch handle.\n",
            argv[0]);
    return EXIT_FAILURE;
}

We need to set up a mechanism so that clients can send messages to the resource manager. This is done via the dispatch_create() function which creates and returns the dispatch structure. This structure contains the channel ID. Note that the channel ID isn't actually created until you attach something, as in resmgr_attach(), message_attach(), and pulse_attach().


Note: The dispatch structure (of type dispatch_t) is opaque; you can't access its contents directly. Use message_connect() to create a connection using this hidden channel ID.

Initialize the resource manager attributes

/* initialize resource manager attributes */
memset(&resmgr_attr, 0, sizeof resmgr_attr);
resmgr_attr.nparts_max = 1;
resmgr_attr.msg_max_size = 2048;

The resource manager attribute structure is used to configure:

For more information, see resmgr_attach() in the Library Reference.

Initialize functions used to handle messages

/* initialize functions for handling messages */
iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs, 
                 _RESMGR_IO_NFUNCS, &io_funcs);

Here we supply two tables that specify which function to call when a particular message arrives:

Instead of filling in these tables manually, we call iofunc_func_init() to place the iofunc_*_default() handler functions into the appropriate spots.

Initialize the attribute structure used by the device

/* initialize attribute structure used by the device */
iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);

The attribute structure contains information about our particular device associated with the name /dev/sample. It contains at least the following information:

Effectively, this is a per-name data structure. Later on, we'll see how you could extend the structure to include your own per-device information.

Put a name into the namespace

/* attach our device name */
id = resmgr_attach(dpp,            /* dispatch handle        */
                   &resmgr_attr,   /* resource manager attrs */
                   "/dev/sample",  /* device name            */
                   _FTYPE_ANY,     /* open type              */
                   0,              /* flags                  */
                   &connect_funcs, /* connect routines       */
                   &io_funcs,      /* I/O routines           */
                   &attr);         /* handle                 */
if(id == -1) {
    fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
    return EXIT_FAILURE;
}

Before a resource manager can receive messages from other programs, it needs to inform the other programs (via the process manager) that it's the one responsible for a particular pathname prefix. This is done via pathname registration. When registered, other processes can find and connect to this process using the registered name.

In this example, a serial port may be managed by a resource manager called devc-xxx, but the actual resource is registered as /dev/sample in the pathname space. Therefore, when a program requests serial port services, it opens the /dev/sample serial port.

We'll look at the parameters in turn, skipping the ones we've already discussed.

device name
Name associated with our device (i.e. /dev/sample).
open type
Specifies the constant value of _FTYPE_ANY. This tells the process manager that our resource manager will accept any type of open request -- we're not limiting the kinds of connections we're going to be handling.

Some resource managers legitimately limit the types of open requests they handle. For instance, the POSIX message queue resource manager accepts only open messages of type _FTYPE_MQUEUE.

flags
Controls the process manager's pathname resolution behavior. By specifying a value of zero, we'll only accept requests for the name "/dev/sample".

Allocate the context structure

/* allocate a context structure */
ctp = dispatch_context_alloc(dpp);

The context structure contains a buffer where messages will be received. The size of the buffer was set when we initialized the resource manager attribute structure. The context structure also contains a buffer of IOVs that the library can use for replying to messages. The number of IOVs was set when we initialized the resource manager attribute structure.

For more information, see dispatch_context_alloc() in the Library Reference.

Start the resource manager message loop

/* start the resource manager message loop */
while(1) {
    if((ctp = dispatch_block(ctp)) == NULL) {
        fprintf(stderr, "block error\n");
        return EXIT_FAILURE;
    }
    dispatch_handler(ctp);
}

Once the resource manager establishes its name, it receives messages when any client program tries to perform an operation (e.g. open(), read(), write()) on that name. In our example, once /dev/sample is registered, and a client program executes:

fd = open ("/dev/sample", O_RDONLY);

the client's C library constructs an _IO_CONNECT message which it sends to our resource manager. Our resource manager receives the message within the dispatch_block() function. We then call dispatch_handler() which decodes the message and calls the appropriate handler function based on the connect and I/O function tables that we passed in previously. After dispatch_handler() returns, we go back to the dispatch_block() function to wait for another message.

At some later time, when the client program executes:

read (fd, buf, BUFSIZ);

the client's C library constructs an _IO_READ message, which is then sent directly to our resource manager, and the decoding cycle repeats.

Multi-threaded device resource manager example

Here's the complete code for a simple multi-threaded device resource manager:

#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>

/*
 * define THREAD_POOL_PARAM_T such that we can avoid a compiler
 * warning when we use the dispatch_*() functions below
 */
#define THREAD_POOL_PARAM_T dispatch_context_t

#include <sys/iofunc.h>
#include <sys/dispatch.h>

static resmgr_connect_funcs_t    connect_funcs;
static resmgr_io_funcs_t         io_funcs;
static iofunc_attr_t             attr;

main(int argc, char **argv)
{
    /* declare variables we'll be using */
    thread_pool_attr_t   pool_attr;
    resmgr_attr_t        resmgr_attr;
    dispatch_t           *dpp;
    thread_pool_t        *tpp;
    dispatch_context_t   *ctp;
    int                  id;

    /* initialize dispatch interface */
    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr, "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize resource manager attributes */
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    /* initialize functions for handling messages */
    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs, 
                     _RESMGR_IO_NFUNCS, &io_funcs);

    /* initialize attribute structure used by the device */
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);

    /* attach our device name */
    id = resmgr_attach(dpp,            /* dispatch handle        */
                       &resmgr_attr,   /* resource manager attrs */
                       "/dev/sample",  /* device name            */
                       _FTYPE_ANY,     /* open type              */
                       0,              /* flags                  */
                       &connect_funcs, /* connect routines       */
                       &io_funcs,      /* I/O routines           */
                       &attr);         /* handle                 */
    if(id == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize thread pool attributes */
    memset(&pool_attr, 0, sizeof pool_attr);
    pool_attr.handle = dpp;
    pool_attr.context_alloc = dispatch_context_alloc;
    pool_attr.block_func = dispatch_block; 
    pool_attr.unblock_func = dispatch_unblock;
    pool_attr.handler_func = dispatch_handler;
    pool_attr.context_free = dispatch_context_free;
    pool_attr.lo_water = 2;
    pool_attr.hi_water = 4;
    pool_attr.increment = 1;
    pool_attr.maximum = 50;

    /* allocate a thread pool handle */
    if((tpp = thread_pool_create(&pool_attr, 
                                 POOL_FLAG_EXIT_SELF)) == NULL) {
        fprintf(stderr, "%s: Unable to initialize thread pool.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* start the threads, will not return */
    thread_pool_start(tpp);
}

Most of the code is the same as in the single-threaded example, so we will cover only those parts that not are described above. Also, we'll go into more detail on multi-threaded resource managers later in this chapter, so we'll keep the details here to a minimum.

Here's an outline of the steps we'll cover:

For this code sample, the threads are using the dispatch_*() functions (i.e. the dispatch layer) for their blocking loops.

Define THREAD_POOL_PARAM_T

/*
 * define THREAD_POOL_PARAM_T such that we can avoid a compiler
 * warning when we use the dispatch_*() functions below
 */
#define THREAD_POOL_PARAM_T dispatch_context_t

#include <sys/iofunc.h>
#include <sys/dispatch.h>

The THREAD_POOL_PARAM_T manifest tells the compiler what type of parameter is passed between the various blocking/handling functions that the threads will be using. This parameter should be the context structure used for passing context information between the functions. By default it is defined as a resmgr_context_t but since this sample is using the dispatch layer, we need it to be a dispatch_context_t. We define it prior to doing the includes above since the header files refer to it.

Initialize thread pool attributes

/* initialize thread pool attributes */
memset(&pool_attr, 0, sizeof pool_attr);
pool_attr.handle = dpp;
pool_attr.context_alloc = dispatch_context_alloc;
pool_attr.block_func = dispatch_block;
pool_attr.unblock_func = dispatch_unblock;
pool_attr.handler_func = dispatch_handler;
pool_attr.context_free = dispatch_context_free;
pool_attr.lo_water = 2;
pool_attr.hi_water = 4;
pool_attr.increment = 1;
pool_attr.maximum = 50;

The thread pool attributes tell the threads which functions to use for their blocking loop and control how many threads should be in existence at any time. We go into more detail on these attributes when we talk about multi-threaded resource managers in more detail later in this chapter.

Allocate a thread pool handle

/* allocate a thread pool handle */
if((tpp = thread_pool_create(&pool_attr, 
                             POOL_FLAG_EXIT_SELF)) == NULL) {
    fprintf(stderr, "%s: Unable to initialize thread pool.\n",
            argv[0]);
    return EXIT_FAILURE;
}

The thread pool handle is used to control the thread pool. Amongst other things, it contains the given attributes and flags. The thread_pool_create() function allocates and fills in this handle.

Start the threads

/* start the threads, will not return */
thread_pool_start(tpp);

The thread_pool_start() function starts up the thread pool. Each newly created thread allocates a context structure of the type defined by THREAD_POOL_PARAM_T using the context_alloc function we gave above in the attribute structure. They'll then block on the block_func and when the block_func returns, they'll call the handler_func, both of which were also given through the attributes structure. Each thread essentially does the same thing that the single-threaded resource manager above does for its message loop. THREAD_POOL_PARAM_T

From this point on, your resource manager is ready to handle messages. Since we gave the POOL_FLAG_EXIT_SELF flag to thread_pool_create(), once the threads have been started up, pthread_exit() will be called and this calling thread will exit.

Data carrying structures

The resource manager library defines several key structures for carrying data:

This picture may help explain their interrelationships:


Figure showing 3 clients with 3 OCBs


Multiple clients with multiple OCBs, all linked to one mount structure.

The Open Control Block (OCB) structure

The Open Control Block (OCB) maintains the state information about a particular session involving a client and a resource manager. It's created during open handling and exists until a close is performed.

This structure is used by the iofunc layer helper functions. (Later on, we'll show you how to extend this to include your own data).

The OCB structure contains at least the following:

typedef struct _iofunc_ocb {
    IOFUNC_ATTR_T   *attr;
    int32_t         ioflag;
    off_t           offset;
    uint16_t        sflag;
    uint16_t        flags;
} iofunc_ocb_t;

where the values represent:

attr
A pointer to the attribute structure (see below).
ioflag
Contains the mode (e.g. reading, writing, blocking) that the resource was opened with. This information is inherited from the io_connect_t structure that's available in the message passed to the open handler.
offset
User-modifiable. Defines the read/write offset into the resource (e.g. our current lseek() position within a file).
sflag
Defines the sharing mode. This information is inherited from the io_connect_t structure that's available in the message passed to the open handler.
flags
User-modifiable. When the IOFUNC_OCB_PRIVILEGED bit is set, a privileged process (i.e. root) performed the open(). Additionally, you can use flags in the range IOFUNC_OCB_FLAGS_PRIVATE (see <sys/iofunc.h>) for your own purposes.

The attribute structure

The iofunc_attr_t structure defines the characteristics of the device that you're supplying the resource manager for. This is used in conjunction with the OCB structure.

The attribute structure contains at least the following:

typedef struct _iofunc_attr {
    IOFUNC_MOUNT_T            *mount;
    uint32_t                  flags;
    int32_t                   lock_tid;
    uint16_t                  lock_count;
    uint16_t                  count;
    uint16_t                  rcount;
    uint16_t                  wcount;
    uint16_t                  rlocks;
    uint16_t                  wlocks;
    struct _iofunc_mmap_list  *mmap_list;
    struct _iofunc_lock_list  *lock_list;
    void                      *list;
    uint32_t                  list_size;
    off_t                     nbytes;
    ino_t                     inode;
    uid_t                     uid;
    gid_t                     gid;
    time_t                    mtime;
    time_t                    atime;
    time_t                    ctime;
    mode_t                    mode;
    nlink_t                   nlink;
    dev_t                     rdev;
} iofunc_attr_t;

where the values represent:

*mount
A pointer to the mount structure.
flags
The bit-mapped flags member contains the following flags:
IOFUNC_ATTR_ATIME
The access time is no longer valid. Typically set on a read from the resource.
IOFUNC_ATTR_CTIME
The change of status time is no longer valid. Typically set on a file info change.
IOFUNC_ATTR_DIRTY_NLINK
The number of links has changed.
IOFUNC_ATTR_DIRTY_MODE
The mode has changed.
IOFUNC_ATTR_DIRTY_OWNER
The uid or the gid has changed.
IOFUNC_ATTR_DIRTY_RDEV
The rdev member has changed, e.g. mknod().
IOFUNC_ATTR_DIRTY_SIZE
The size has changed.
IOFUNC_ATTR_DIRTY_TIME
One or more of mtime, atime, or ctime has changed.
IOFUNC_ATTR_MTIME
The modification time is no longer valid. Typically set on a write to the resource.

Since your resource manager uses these flags, you can tell right away which fields of the attribute structure have been modified by the various iofunc-layer helper routines. That way, if you need to write the entries to some medium, you can write just those that have changed. The user-defined area for flags is IOFUNC_ATTR_PRIVATE (see <sys/iofunc.h>).

For details on updating your attribute structure, see the section on "Updating the time for reads and writes" below.

lock_tid and lock_count
To support multiple threads in your resource manager, you'll need to lock the attribute structure so that only one thread at a time is allowed to change it. The resource manager layer automatically locks the attribute (using iofunc_attr_lock()) for you when certain handler functions are called (i.e. IO_*). The lock_tid member holds the thread ID; the lock_count member holds the number of times the thread has locked the attribute structure. (For more information, see the iofunc_attr_lock() and iofunc_attr_unlock() functions in the Library Reference.)
count, rcount, wcount, rlocks and wlocks
Several counters are stored in the attribute structure and are incremented/decremented by some of the iofunc layer helper functions. Both the functionality and the actual contents of the message received from the client determine which specific members are affected.
This counter: tracks the number of:
count OCBs using this attribute in any manner. When this count goes to zero, it means that no one is using this attribute.
rcount OCBs using this attribute for reading.
wcount OCBs using this attribute for writing.
rlocks read locks currently registered on the attribute.
wlocks write locks currently registered on the attribute.

These counts aren't exclusive. For example, if an OCB has specified that the resource is opened for reading and writing, then count, rcount, and wcount will all be incremented. (See the iofunc_attr_init(), iofunc_lock_default(), iofunc_lock(), iofunc_ocb_attach(), and iofunc_ocb_detach() functions.)

mmap_list and lock_list
To manage their particular functionality on the resource, the mmap_list member is used by the iofunc_mmap() and iofunc_mmap_default() functions; the lock_list member is used by the iofunc_lock_default() function. Generally, you shouldn't need to modify or examine these members.
list
Reserved for future use.
list_size
Size of reserved area; reserved for future use.
nbytes
User-modifiable. The number of bytes in the resource. For a file, this would contain the file's size. For special devices (e.g. /dev/null) that don't support lseek() or have a radically different interpretation for lseek(), this field isn't used (because you wouldn't use any of the helper functions, but would supply your own instead.) In these cases, we recommend that you set this field to zero, unless there's a meaningful interpretation that you care to put to it.
inode
This is a mountpoint-specific inode that must be unique per mountpoint. You can specify your own value, or 0 to have the process manager fill it in for you. For filesystem type of applications, this may correspond to some on-disk structure. In any case, the interpretation of this field is up to you.
uid and gid
The user ID and group ID of the owner of this resource. These fields are updated automatically by the chown() helper functions (e.g. iofunc_chown_default()) and are referenced in conjunction with the mode member for access-granting purposes by the open() help functions (e.g. iofunc_open_default()).
mtime, atime, and ctime
The three POSIX time members:

Note: One or more of the three time members may be invalidated as a result of calling an iofunc-layer function. This is to avoid having each and every I/O message handler go to the kernel and request the current time of day, just to fill in the attribute structure's time member(s).

POSIX states that these times must be valid when the fstat() is performed, but they don't have to reflect the actual time that the associated change occurred. Also, the times must change between fstat() invocations if the associated change occurred between fstat() invocations. If the associated change never occurred between fstat() invocations, then the time returned should be the same as returned last time. Furthermore, if the associated change occurred multiple times between fstat() invocations, then the time need only be different from the previously returned time.

There's a helper function that fills the members with the correct time; you may wish to call it in the appropriate handlers to keep the time up-to-date on the device -- see the iofunc_time_update() function.

mode
Contains the resource's mode (e.g. type, permissions). Valid modes may be selected from the S_* series of constants in <sys/stat.h>.
nlink
User-modifiable. Number of links to this particular name. For names that represent a directory, this value must be greater than 2.
rdev
Contains the device number for a character special device and the rdev number for a named special device.

The mount structure

The members of the mount structure, specifically the conf and flags members, modify the behavior of some of the iofunc layer functions. This optional structure contains at least the following:

typedef struct _iofunc_mount {
    uint32_t            flags;
    uint32_t            conf;
    dev_t               dev;
    int32_t             blocksize;
    iofunc_funcs_t      *funcs;
} iofunc_mount_t;

The variables are:

flags
Contains one relevant bit (manifest constant IOFUNC_MOUNT_32BIT), which indicates that the offsets used by this resource manager are 32-bit (as opposed to the extended 64-bit offsets). The user-modifiable mount flags are defined as IOFUNC_MOUNT_FLAGS_PRIVATE (see <sys/iofunc.h>).
conf
Contains several bits:
IOFUNC_PC_CHOWN_RESTRICTED
Causes the default handler for the _IO_CHOWN message to behave in a manner defined by POSIX as "chown-restricted".
IOFUNC_PC_NO_TRUNC
Has no effect on the iofunc layer libraries, but is returned by the iofunc layer's default _IO_PATHCONF handler.
IOFUNC_PC_SYNC_IO
If not set, causes the default iofunc layer _IO_OPEN handler to fail if the client specified any one of O_DSYNC, O_RSYNC, or O_SYNC.
IOFUNC_PC_LINK_DIR
Controls whether or not root is allowed to link and unlink directories.

Note that the options mentioned above for the conf member are returned by the iofunc layer _IO_PATHCONF default handler.

dev
Contains the device number for the filesystem. This number is returned to the client's stat() function in the struct stat st_dev member.
blocksize
Contains the block size of the device. On filesystem types of resource managers, this indicates the native blocksize of the disk, e.g. 512 bytes.
funcs
Contains the following structure:
struct _iofunc_funcs {
   unsigned     nfuncs;
   IOFUNC_OCB_T *(*ocb_calloc) (resmgr_context_t *ctp,
                                IOFUNC_ATTR_T *attr);
   void         (*ocb_free) (IOFUNC_OCB_T *ocb);
};
       

where:

nfuncs
Indicates the number of functions present in the structure; it should be filled with the manifest constant _IOFUNC_NFUNCS.
ocb_calloc() and ocb_free()
Allows you to override the OCBs on a per-mountpoint basis. (See the section titled "Extending the OCB and attribute structures.") If these members are NULL, then the default library versions are used. You must specify either both or neither of these functions -- they operate as a matched pair.

Handling the _IO_READ message

The io_read handler is responsible for returning data bytes to the client after receiving an _IO_READ message. Examples of functions that send this message are read(), readdir(), fread(), and fgetc(). Let's start by looking at the format of the message itself:

struct _io_read {
    uint16_t            type;
    uint16_t            combine_len;
    int32_t             nbytes;
    uint32_t            xtype;
};

typedef union {
    struct _io_read     i;
    /* unsigned char    data[nbytes];    */
    /* nbytes is returned with MsgReply  */
} io_read_t;

As with all resource manager messages, we've defined union that contains the input (coming into the resource manager) structure and a reply or output (going back to the client) structure. The io_read() function is prototyped with an argument of io_read_t *msg -- that's the pointer to the union containing the message.

Since this is a read(), the type member has the value _IO_READ. The items of interest in the input structure are:

combine_len
This field has meaning for a combine message -- see the "Combine messages" section in this chapter.
nbytes
How many bytes the client is expecting.
xtype
A per-message override, if your resource manager supports it. Even if your resource manager doesn't support it, you should still examine this member. More on the xtype later (see the section "xtype").

We'll create an io_read() function that will serve as our handler that actually returns some data (the fixed string "Hello, world\n"). We'll use the OCB to keep track of our position within the buffer that we're returning to the client.

When we get the _IO_READ message, the nbytes member tells us exactly how many bytes the client wants to read. Suppose that the client issues:

read (fd, buf, 4096);

In this case, it's a simple matter to return our entire "Hello, world\n" string in the output buffer and tell the client that we're returning 13 bytes, i.e. the size of the string.

However, consider the case where the client is performing the following:

while (read (fd, &character, 1) != EOF) {
    printf ("Got a character \"%c\"\n", character);
}

Granted, this isn't a terribly efficient way for the client to perform reads! In this case, we would get msg->i.nbytes set to 1 (the size of the buffer that the client wants to get). We can't simply return the entire string all at once to the client -- we have to hand it out one character at a time. This is where the OCB's offset member comes into play.

Sample code for handling _IO_READ messages

Here's a complete io_read() function that correctly handles these cases:

#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>

int io_read (resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb);

static char                     *buffer = "Hello world\n";

static resmgr_connect_funcs_t   connect_funcs;
static resmgr_io_funcs_t        io_funcs;
static iofunc_attr_t            attr;

main(int argc, char **argv)
{
    /* declare variables we'll be using */
    resmgr_attr_t        resmgr_attr;
    dispatch_t           *dpp;
    dispatch_context_t   *ctp;
    int                  id;

    /* initialize dispatch interface */
    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr, "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize resource manager attributes */
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    /* initialize functions for handling messages */
    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                     _RESMGR_IO_NFUNCS, &io_funcs);
    io_funcs.read = io_read;

    /* initialize attribute structure used by the device */
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);
    attr.nbytes = strlen(buffer)+1;
    
    /* attach our device name */
    if((id = resmgr_attach(dpp, &resmgr_attr, "/dev/sample", _FTYPE_ANY, 0,
                 &connect_funcs, &io_funcs, &attr)) == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* allocate a context structure */
    ctp = dispatch_context_alloc(dpp);

    /* start the resource manager message loop */
    while(1) {
        if((ctp = dispatch_block(ctp)) == NULL) {
            fprintf(stderr, "block error\n");
            return EXIT_FAILURE;
        }
        dispatch_handler(ctp);
    }
}

int
io_read (resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb)
{
    int         nleft;
    int         nbytes;
    int         nparts;
    int         status;

    if ((status = iofunc_read_verify (ctp, msg, ocb, NULL)) != EOK)
        return (status);
        
    if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
        return (ENOSYS);

    /*
     *  On all reads (first and subsequent), calculate
     *  how many bytes we can return to the client,
     *  based upon the number of bytes available (nleft)
     *  and the client's buffer size
     */

    nleft = ocb->attr->nbytes - ocb->offset;
    nbytes = min (msg->i.nbytes, nleft);

    if (nbytes > 0) {
        /* set up the return data IOV */
        SETIOV (ctp->iov, buffer + ocb->offset, nbytes);

        /* set up the number of bytes (returned by client's read()) */
        _IO_SET_READ_NBYTES (ctp, nbytes);

        /*
         * advance the offset by the number of bytes
         * returned to the client.
         */

        ocb->offset += nbytes;
        
        nparts = 1;
    } else {
        /*
         * they've asked for zero bytes or they've already previously
         * read everything
         */
        
        _IO_SET_READ_NBYTES (ctp, 0);
        
        nparts = 0;
    }

    /* mark the access time as invalid (we just accessed it) */

    if (msg->i.nbytes > 0)
        ocb->attr->flags |= IOFUNC_ATTR_ATIME;

    return (_RESMGR_NPARTS (nparts));
}

The ocb maintains our context for us by storing the offset field, which gives us the position within the buffer, and by having a pointer to the attribute structure attr, which tells us how big the buffer actually is via its nbytes member.

Of course, we had to give the resource manager library the address of our io_read() handler function so that it knew to call it. So the code in main() where we had called iofunc_func_init() became:

/* initialize functions for handling messages */
iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                 _RESMGR_IO_NFUNCS, &io_funcs);
io_funcs.read = io_read;

We also needed to add the following to the area above main():

#include <errno.h>                                                             
#include <unistd.h>                                                            
                                                                               
int io_read (resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb);        
                                                                               
static char *buffer = "Hello world\n";"                                        

Where did the attribute structure's nbytes member get filled in? In main(), just after we did the iofunc_attr_init(). We modified main() slightly:

After this line:

iofunc_attr_init (&attr, S_IFNAM | 0666, 0, 0);

We added this one:

attr.nbytes = strlen (buffer)+1;

At this point, if you were to run the resource manager (our simple resource manager used the name /dev/sample), you could do:

# cat /dev/sample
Hello, world

The return line (_RESMGR_NPARTS(nparts)) tells the resource manager library to:

Where does it get the IOV array? It's using ctp->iov. That's why we first used the SETIOV() macro to make ctp->iov point to the data to reply with.

If we had no data, as would be the case of a read of zero bytes, then we'd do a return (_RESMGR_NPARTS(0)). But read() returns with the number of bytes successfully read. Where did we give it this information? That's what the _IO_SET_READ_NBYTES() macro was for. It takes the nbytes that we give it and stores it in the context structure (ctp). Then when we return to the library, the library takes this nbytes and passes it as the second parameter to the MsgReplyv(). The second parameter tells the kernel what the MsgSend() should return. And since the read() function is calling MsgSend(), that's where it finds out how many bytes were read.

We also update the access time for this device in the read handler. For details on updating the access time, see the section on "Updating the time for reads and writes" below.

Ways of adding functionality to the resource manager

You can add functionality to the resource manager you're writing in these fundamental ways:

The first two are almost identical, because the default functions really don't do that much by themselves -- they rely on the POSIX helper functions. The third approach has advantages and disadvantages.

Using the default functions

Since the default functions (e.g. iofunc_open_default()) can be installed in the jump table directly, there's no reason you couldn't embed them within your own functions.

Here's an example of how you would do that with your own io_open() handler:

main (int argc, char **argv)
{
    ...

    /* install all of the default functions */
    iofunc_func_init (_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                      _RESMGR_IO_NFUNCS, &io_funcs);

    /* take over the open function */
    connect_funcs.open = io_open;
    ...
}

int
io_open (resmgr_context_t *ctp, io_open_t *msg, 
         RESMGR_HANDLE_T *handle, void *extra)
{
    return (iofunc_open_default (ctp, msg, handle, extra));
}

Obviously, this is just an incremental step that lets you gain control in your io_open() when the message arrives from the client. You may wish to do something before or after the default function does its thing:

/* example of doing something before */

extern int accepting_opens_now;

int
io_open (resmgr_context_t *ctp, io_open_t *msg,
         RESMGR_HANDLE_T *handle, void *extra)
{
    if (!accepting_opens_now) {
        return (EBUSY);
    }

    /* 
     *  at this point, we're okay to let the open happen,
     *  so let the default function do the "work".
     */

    return (iofunc_open_default (ctp, msg, handle, extra));
}

Or:

/* example of doing something after */

int
io_open (resmgr_context_t *ctp, io_open_t *msg,
         RESMGR_HANDLE_T *handle, void *extra)
{
    int     sts;

    /* 
     * have the default function do the checking 
     * and the work for us
     */

    sts = iofunc_open_default (ctp, msg, handle, extra);

    /* 
     *  if the default function says it's okay to let the open
     *  happen, we want to log the request
     */

    if (sts == EOK) {
        log_open_request (ctp, msg);
    }
    return (sts);
}

It goes without saying that you can do something before and after the standard default POSIX handler.

The principal advantage of this approach is that you can add to the functionality of the standard default POSIX handlers with very little effort.

Using the helper functions

The default functions make use of helper functions -- these functions can't be placed directly into the connect or I/O jump tables, but they do perform the bulk of the work.

Here's the source for the two functions iofunc_chmod_default() and iofunc_stat_default():

int
iofunc_chmod_default (resmgr_context_t *ctp, io_chmod_t *msg,
                      iofunc_ocb_t *ocb)
{
    return (iofunc_chmod (ctp, msg, ocb, ocb -> attr));
}

int
iofunc_stat_default (resmgr_context_t *ctp, io_stat_t *msg,
                     iofunc_ocb_t *ocb)
{
    iofunc_time_update (ocb -> attr);
    iofunc_stat (ocb -> attr, &msg -> o);
    return (_RESMGR_PTR (ctp, &msg -> o,
                         sizeof (msg -> o)));
}

Notice how the iofunc_chmod() handler performs all the work for the iofunc_chmod_default() default handler. This is typical for the simple functions.

The more interesting case is the iofunc_stat_default() default handler, which calls two helper routines. First it calls iofunc_time_update() to ensure that all of the time fields (atime, ctime and mtime) are up to date. Then it calls iofunc_stat(), which builds the reply. Finally, the default function builds a pointer in the ctp structure and returns it.

The most complicated handling is done by the iofunc_open_default() handler:

int
iofunc_open_default (resmgr_context_t *ctp, io_open_t *msg,
                     iofunc_attr_t *attr, void *extra)
{
    int     status;

    iofunc_attr_lock (attr);

    if ((status = iofunc_open (ctp, msg, attr, 0, 0)) != EOK) {
        iofunc_attr_unlock (attr);
        return (status);
    }

    if ((status = iofunc_ocb_attach (ctp, msg, 0, attr, 0)) 
        != EOK) {
        iofunc_attr_unlock (attr);
        return (status);
    }

    iofunc_attr_unlock (attr);
    return (EOK);
}

This handler calls four helper functions:

  1. It calls iofunc_attr_lock() to lock the attribute structure so that it has exclusive access to it (it's going to be updating things like the counters, so we need to make sure no one else is doing that at the same time).
  2. It then calls the helper function iofunc_open(), which does the actual verification of the permissions.
  3. Next it calls iofunc_ocb_attach() to bind an OCB to this request, so that it will get automatically passed to all of the I/O functions later.
  4. Finally, it calls iofunc_attr_unlock() to release the lock on the attribute structure.

Writing the entire function yourself

Sometimes a default function will be of no help for your particular resource manager. For example, iofunc_read_default() and iofunc_write_default() functions implement /dev/null -- they do all the work of returning 0 bytes (EOF) or swallowing all the message bytes (respectively).

You'll want to do something in those handlers (unless your resource manager doesn't support the _IO_READ or _IO_WRITE messages).

Note that even in such cases, there are still helper functions you can use: iofunc_read_verify() and iofunc_write_verify().

Handling the _IO_WRITE message

The io_write handler is responsible for writing data bytes to the media after receiving a client's _IO_WRITE message. Examples of functions that send this message are write() and fflush(). Here's the message:

struct _io_write {
    uint16_t            type;
    uint16_t            combine_len;
    int32_t             nbytes;
    uint32_t            xtype;
    /* unsigned char    data[nbytes]; */
};

typedef union {
    struct _io_write    i;
    /*  nbytes is returned with MsgReply  */
} io_write_t;

As with the io_read_t, we have a union of an input and an output message, with the output message being empty (the number of bytes actually written is returned by the resource manager library directly to the client's MsgSend()).

The data being written by the client almost always follows the header message stored in struct _io_write. The exception is if the write was done using pwrite() or pwrite64(). More on this when we discuss the xtype member.

To access the data, we recommend that you reread it into your own buffer. Let's say you had a buffer called inbuf that was "big enough" to hold all the data you expected to read from the client (if it isn't big enough, you'll have to read the data piecemeal).

Sample code for handling _IO_WRITE messages

The following is a code snippet that can be added to one of the simple resource manager examples. It prints out whatever it's given (making the assumption that it's given only character text):

int
io_write (resmgr_context_t *ctp, io_write_t *msg, RESMGR_OCB_T *ocb)
{
    int     status;
    char    *buf;

    if ((status = iofunc_write_verify(ctp, msg, ocb, NULL)) != EOK)
        return (status);

    if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
        return(ENOSYS);

    /* set up the number of bytes (returned by client's write()) */

    _IO_SET_WRITE_NBYTES (ctp, msg->i.nbytes);

    buf = (char *) malloc(msg->i.nbytes + 1);
    if (buf == NULL)
        return(ENOMEM);

    /*
     *  Reread the data from the sender's message buffer.
     *  We're not assuming that all of the data fit into the
     *  resource manager library's receive buffer.
     */

    resmgr_msgread(ctp, buf, msg->i.nbytes, sizeof(msg->i));
    buf [msg->i.nbytes] = '\0'; /* just in case the text is not NULL terminated */
    printf ("Received %d bytes = '%s'\n", msg -> i.nbytes, buf);
    free(buf);

    if (msg->i.nbytes > 0)
        ocb->attr->flags |= IOFUNC_ATTR_MTIME | IOFUNC_ATTR_CTIME;

    return (_RESMGR_NPARTS (0));
}

Of course, we'll have to give the resource manager library the address of our io_write handler so that it'll know to call it. In the code for main() where we called iofunc_func_init(), we'll add a line to register our io_write handler:

/* initialize functions for handling messages */
iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                 _RESMGR_IO_NFUNCS, &io_funcs);                                               
io_funcs.write = io_write;                                                 

You may also need to add the following prototype:

int io_write (resmgr_context_t *ctp, io_write_t *msg,
              RESMGR_OCB_T *ocb);  

At this point, if you were to run the resource manager (our simple resource manager used the name /dev/sample), you could write to it by doing echo Hello > /dev/sample as follows:

# echo Hello > /dev/sample
Received 6 bytes = 'Hello'

Notice how we passed the last argument to resmgr_msgread() (the offset argument) as the size of the input message buffer. This effectively skips over the header and gets to the data component.

If the buffer you supplied wasn't big enough to contain the entire message from the client (e.g. you had a 4 KB buffer and the client wanted to write 1 megabyte), you'd have to read the buffer in stages, using a for loop, advancing the offset passed to resmgr_msgread() by the amount read each time.

Unlike the io_read handler sample, this time we didn't do anything with ocb->offset. In this case there's no reason to. The ocb->offset would make more sense if we were managing things that had advancing positions such as a file position.

The reply is simpler than with the io_read handler, since a write() call doesn't expect any data back. Instead, it just wants to know if the write succeeded and if so, how many bytes were written. To tell it how many bytes were written we used the _IO_SET_WRITE_NBYTES() macro. It takes the nbytes that we give it and stores it in the context structure (ctp). Then when we return to the library, the library takes this nbytes and passes it as the second parameter to the MsgReplyv(). The second parameter tells the kernel what the MsgSend() should return. And since the write() function is calling MsgSend(), that's where it finds out how many bytes were written.

Since we're writing to the device, we should also update the modification, and potentially, the creation time. For details on updating the modification and change of file status times, see the section on "Updating the time for reads and writes" below.

Methods of returning and replying

You can return to the resource manager library from your handler functions in various ways. This is complicated by the fact that the resource manager library can reply for you if you want it to, but you must tell it to do so and put the information that it'll use in all the right places.

In this section, we'll discuss the following ways of returning to the resource manager library:

Returning with an error

To reply to the client such that the function the client is calling (e.g. read()) will return with an error, you simply return with an appropriate errno value (from <errno.h>).

return (ENOMEM);

In the case of a read(), this causes the read to return -1 with errno set to ENOMEM.

Returning using an IOV array that points to your data

Sometimes you'll want to reply with a header followed by one of N buffers, where the buffer used will differ each time you reply. To do this, you can set up an IOV array whose elements point to the header and to a buffer.

The context structure already has an IOV array. If you want the resource manager library to do your reply for you, then you must use this array. But the array must contain enough elements for your needs. To ensure that this is the case, you'd set the nparts_max member of the resmgr_attr_t structure that you passed to resmgr_attach() when you registered your name in the pathname space.

The following example assumes that the variable i contains the offset into the array of buffers of the desired buffer to reply with. The 2 in _RESMGR_NPARTS(2) tells the library how many elements in ctp->iov to reply with.

my_header_t     header;
a_buffer_t      buffers[N];

...

SETIOV(&ctp->iov[0], &header, sizeof(header));
SETIOV(&ctp->iov[1], &buffers[i], sizeof(buffers[i]));
return (_RESMGR_NPARTS(2));

Returning with a single buffer containing data

An example of this would be replying to a read() where all the data existed in a single buffer. You'll typically see this done in two ways:

return (_RESMGR_PTR(ctp, buffer, nbytes));

And:

SETIOV (ctp->iov, buffer, nbytes);
return (_RESMGR_NPARTS(1));

The first method, using the _RESMGR_PTR() macro, is just a convenience for the second method where a single IOV is returned.

Returning success but with no data

This can be done in a few ways. The most simple would be:

return (EOK);

But you'll often see:

return (_RESMGR_NPARTS(0));

Note that in neither case are you causing the MsgSend() to return with a 0. The value that the MsgSend() returns is the value passed to the _IO_SET_READ_NBYTES(), _IO_SET_WRITE_NBYTES(), and other similar macros. These two were used in the read and write samples above.

Getting the resource manager library to do the reply

In this case, you give the client the data and get the resource manager library to do the reply for you. However, the reply data won't be valid by that time. For example, if the reply data was in a buffer that you wanted to free before returning, you could use the following:

resmgr_msgwrite (ctp, buffer, nbytes, 0);
free (buffer);
return (EOK);

The resmgr_msgwrite() copies the contents of buffer into the client's reply buffer immediately. Note that a reply is still required in order to unblock the client so it can examine the data. Next we free the buffer. Finally, we return to the resource manager library such that it does a reply with zero-length data. Since the reply is of zero length, it doesn't overwrite the data already written into the client's reply buffer. When the client returns from its send call, the data is there waiting for it.

Performing the reply in the server

In all of the previous examples, it's the resource manager library that calls MsgReply*() or MsgError() to unblock the client. In some cases, you may not want the library to reply for you. For instance, you might have already done the reply yourself, or you'll reply later. In either case, you'd return as follows:

return (_RESMGR_NOREPLY);

Leaving the client blocked, replying later

An example of a resource manager that would reply to clients later is a pipe resource manager. If the client is doing a read of your pipe but you have no data for the client, then you have a choice:

Another example might be if the client wants you to write out to some device but doesn't want to get a reply until the data has been fully written out. Here are the sequence of events that might follow:

  1. Your resource manager does some I/O out to the hardware to tell it that data is available.
  2. The hardware generates an interrupt when it's ready for a packet of data.
  3. You handle the interrupt by writing data out to the hardware.
  4. Many interrupts may occur before all the data is written -- only then would you reply to the client.

The first issue, though, is whether the client wants to be left blocked. If the client doesn't want to be left blocked, then it opens with the O_NONBLOCK flag:

fd = open("/dev/sample", O_RDWR | O_NONBLOCK);

The default is to allow you to block it.

One of the first things done in the read and write samples above was to call some POSIX verification functions: iofunc_read_verify() and iofunc_write_verify(). If we pass the address of an int as the last parameter, then on return the functions will stuff that int with nonzero if the client doesn't want to be blocked (O_NONBLOCK flag was set) or with zero if the client wants to be blocked.

int    nonblock;                                                     
                                                                     
if ((status = iofunc_read_verify (ctp, msg, ocb,
                                  &nonblock)) != EOK) 
    return (status);                                                 
                                                                     
...                                                                  
                                                                     
int    nonblock;                                                     
                                                                     
if ((status = iofunc_write_verify (ctp, msg, ocb,
                                   &nonblock)) != EOK)
    return (status);

When it then comes time to decide if we should reply with an error or reply later, we do:

if (nonblock) {
    /* client doesn't want to be blocked */
    return (EAGAIN);                                          
} else {                                                      
    /*
    *  The client is willing to be blocked.
    *  Save at least the ctp->rcvid so that you can
    *  reply to it later.
    */
    ...
    return (_RESMGR_NOREPLY);
}                                                             

The question remains: How do you do the reply yourself? The only detail to be aware of is that the rcvid to reply to is ctp->rcvid. If you're replying later, then you'd save ctp->rcvid and use the saved value in your reply.

MsgReply(saved_rcvid, 0, buffer, nbytes);

Or:

iov_t    iov[2];

SETIOV(&iov[0], &header, sizeof(header));
SETIOV(&iov[1], &buffers[i], sizeof(buffers[i]));
MsgReplyv(saved_rcvid, 0, iov, 2);

Note that you can fill up the client's reply buffer as data becomes available by using resmgr_msgwrite() and resmgr_msgwritev(). Just remember to do the MsgReply*() at some time to unblock the client.


Note: If you're replying to an _IO_READ or _IO_WRITE message, the status argument for MsgReply*() must be the number of bytes read or written.

Returning and telling the library to do the default action

The default action in most cases is for the library to cause the client's function to fail with ENOSYS:

return (_RESMGR_DEFAULT);

Handling other read/write details

Topics in this session include:

Handling the xtype member

The io_read, io_write, and io_openfd message structures contain a member called xtype. From struct _io_read:

struct _io_read {
    ...
    uint32_t            xtype;
    ...
}

Basically, the xtype contains extended type information that can be used to adjust the behavior of a standard I/O function. Most resource managers care about only a few values:

_IO_XTYPE_NONE
No extended type information is being provided.
_IO_XTYPE_OFFSET
If clients are calling pread(), pread64(), pwrite(), or pwrite64(), then they don't want you to use the offset in the OCB. Instead, they're providing a one-shot offset. That offset follows the struct _io_read or struct _io_write headers that reside at the beginning of the message buffers.

For example:

struct myread_offset {
    struct _io_read        read;
    struct _xtype_offset   offset;
}   
      

Some resource managers can be sure that their clients will never call pread*() or pwrite*(). (For example, a resource manager that's controlling a robot arm probably wouldn't care.) In this case, you can treat this type of message as an error.

_IO_XTYPE_READCOND
If a client is calling readcond(), they want to impose timing and return buffer size constraints on the read. Those constraints follow the struct _io_read or struct _io_write headers at the beginning of the message buffers. For example:
struct myreadcond {
    struct _io_read        read;
    struct _xtype_readcond cond;
}   
      

As with _IO_XTYPE_OFFSET, if your resource manager isn't prepared to handle readcond(), you can treat this type of message as an error.

If you aren't expecting extended types (xtype)

The following code sample demonstrates how to handle the case where you're not expecting any extended types. In this case, if you get a message that contains an xtype, you should reply with ENOSYS. The example can be used in either an io_read or io_write handler.

int
io_read (resmgr_context_t *ctp, io_read_t *msg,
         RESMGR_OCB_T *ocb)
{
    int    status;

    if ((status = iofunc_read_verify(ctp, msg, ocb, NULL))
         != EOK) {
        return (status);
    }

    /* No special xtypes */
    if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
        return (ENOSYS);

    ...
}

Handling pread*() and pwrite*()

Here are code examples that demonstrate how to handle an _IO_READ or _IO_WRITE message when a client calls:

Sample code for handling _IO_READ messages in pread*()

The following sample code demonstrates how to handle _IO_READ for the case where the client calls one of the pread*() functions.

/* we are defining io_pread_t here to make the code below
   simple */
typedef struct {
    struct _io_read         read;
    struct _xtype_offset    offset;
} io_pread_t;

int
io_read (resmgr_context_t *ctp, io_read_t *msg,
         RESMGR_OCB_T *ocb)
{
    off64_t offset; /* where to read from */
    int     status;

    if ((status = iofunc_read_verify(ctp, msg, ocb, NULL))
         != EOK) {
        return(status);
    }
    
    switch(msg->i.xtype & _IO_XTYPE_MASK) {
    case _IO_XTYPE_NONE:
        offset = ocb->offset;
        break;
    case _IO_XTYPE_OFFSET:
        /*
         *  io_pread_t is defined above.
         *  Client is doing a one-shot read to this offset by
         *  calling one of the pread*() functions
         */
        offset = ((io_pread_t *) msg)->offset.offset;
        break;
    default:
        return(ENOSYS);
    }

    ...
}

Sample code for handling _IO_WRITE messages in pwrite*()

The following sample code demonstrates how to handle _IO_WRITE for the case where the client calls one of the pwrite*() functions. Keep in mind that the struct _xtype_offset information follows the struct _io_write in the sender's message buffer. This means that the data to be written follows the struct _xtype_offset information (instead of the normal case where it follows the struct _io_write). So, you must take this into account when doing the resmgr_msgread() call in order to get the data from the sender's message buffer.

/* we are defining io_pwrite_t here to make the code below
   simple */
typedef struct {
    struct _io_write        write;
    struct _xtype_offset    offset;
} io_pwrite_t;

int
io_write (resmgr_context_t *ctp, io_write_t *msg,
          RESMGR_OCB_T *ocb)
{
    off64_t offset; /* where to write */
    int     status;
    size_t  skip;   /* offset into msg to where the data
                       resides */

    if ((status = iofunc_write_verify(ctp, msg, ocb, NULL))
         != EOK) {
        return(status);
    }
    
    switch(msg->i.xtype & _IO_XTYPE_MASK) {
    case _IO_XTYPE_NONE:
        offset = ocb->offset;
        skip = sizeof(io_write_t);
        break;
    case _IO_XTYPE_OFFSET:
        /* 
         *  io_pwrite_t is defined above
         *  client is doing a one-shot write to this offset by
         *  calling one of the pwrite*() functions
         */
        offset = ((io_pwrite_t *) msg)->offset.offset;
        skip = sizeof(io_pwrite_t);
        break;
    default:
        return(ENOSYS);
    }

    ...
    
    /* 
     *  get the data from the sender's message buffer, 
     *  skipping all possible header information
     */
    resmgr_msgreadv(ctp, iovs, niovs, skip);
    
    ...
}

Handling readcond()

The same type of operation that was done to handle the pread()/_IO_XTYPE_OFFSET case can be used for handling the client's readcond() call:

typedef struct {
    struct _io_read        read;
    struct _xtype_readcond cond;
} io_readcond_t

Then:

struct _xtype_readcond *cond
...
    CASE _IO_XTYPE_READCOND:
        cond = &((io_readcond_t *)msg)->cond
        break;
}

Then your manager has to properly interpret and deal with the arguments to readcond(). For more information, see the Library Reference.

Attribute handling

Updating the time for reads and writes

In the read sample above we did:

if (msg->i.nbytes > 0)
    ocb->attr->flags |= IOFUNC_ATTR_ATIME;

According to POSIX, if the read succeeds and the reader had asked for more than zero bytes, then the access time must be marked for update. But POSIX doesn't say that it must be updated right away. If you're doing many reads, you may not want to read the time from the kernel for every read. In the code above, we mark the time only as needing to be updated. When the next _IO_STAT or _IO_CLOSE_OCB message is processed, the resource manager library will see that the time needs to be updated and will get it from the kernel then. This of course has the disadvantage that the time is not the time of the read.

Similarly for the write sample above, we did:

if (msg->i.nbytes > 0)
    ocb->attr->flags |= IOFUNC_ATTR_MTIME | IOFUNC_ATTR_CTIME;

so the same thing will happen.

If you do want to have the times represent the read or write times, then after setting the flags you need only call the iofunc_time_update() helper function. So the read lines become:

if (msg->i.nbytes > 0) {
    ocb->attr->flags |= IOFUNC_ATTR_ATIME;
    iofunc_time_update(ocb->attr);
}

and the write lines become:

if (msg->i.nbytes > 0) {
    ocb->attr->flags |= IOFUNC_ATTR_MTIME | IOFUNC_ATTR_CTIME;
    iofunc_time_update(ocb->attr);
}

You should call iofunc_time_update() before you flush out any cached attributes. As a result of changing the time fields, the attribute structure will have the IOFUNC_ATTR_DIRTY_TIME bit set in the flags field, indicating that this field of the attribute must be updated when the attribute is flushed from the cache.

Combine messages

In this section:

Where combine messages are used

In order to conserve network bandwidth and to provide support for atomic operations, combine messages are supported. A combine message is constructed by the client's C library and consists of a number of I/O and/or connect messages packaged together into one. Let's see how they're used.

Atomic operations

Consider a case where two threads are executing the following code, trying to read from the same file descriptor:

a_thread ()
{
    char buf [BUFSIZ];

    lseek (fd, position, SEEK_SET);
    read (fd, buf, BUFSIZ);
    ...
}

The first thread performs the lseek() and then gets preempted by the second thread. When the first thread resumes executing, its offset into the file will be at the end of where the second thread read from, not the position that it had lseek()'d to.

This can be solved in one of three ways:

Let's look at these three methods.

Using a mutex

In the first approach, if the two threads use a mutex between themselves, the following issue arises: every read(), lseek(), and write() operation must use the mutex.

If this practice isn't enforced, then you still have the exact same problem. For example, suppose one thread that's obeying the convention locks the mutex and does the lseek(), thinking that it's protected. However, another thread (that's not obeying the convention) can preempt it and move the offset to somewhere else. When the first thread resumes, we again encounter the problem where the offset is at a different (unexpected) location. Generally, using a mutex will be successful only in very tightly managed projects, where a code review will ensure that each and every thread's file functions obey the convention.

Per-thread files

The second approach -- of using different file descriptors -- is a good general-purpose solution, unless you explicitly wanted the file descriptor to be shared.

The readblock() function

In order for the readblock() function to be able to effect an atomic seek/read operation, it must ensure that the requests it sends to the resource manager will all be processed at the same time. This is done by combining the _IO_LSEEK and _IO_READ messages into one message. Thus, when the base layer performs the MsgReceive(), it will receive the entire readblock() request in one atomic message.

Bandwidth considerations

Another place where combine messages are useful is in the stat() function, which can be implemented by calling open(), fstat(), and close() in sequence.

Rather than generate three separate messages (one for each of the functions), the C library combines them into one contiguous message. This boosts performance, especially over a networked connection, and also simplifies the resource manager, because it's not forced to have a connect function to handle stat().

The library's combine-message handling

The resource manager library handles combine messages by presenting each component of the message to the appropriate handler routines. For example, if we get a combine message that has an _IO_LSEEK and _IO_READ in it (e.g. readblock()), the library will call our io_lseek() and io_read() functions for us in turn.

But let's see what happens in the resource manager when it's handling these messages. With multiple threads, both of the client's threads may very well have sent in their "atomic" combine messages. Two threads in the resource manager will now attempt to service those two messages. We again run into the same synchronization problem as we originally had on the client end -- one thread can be part way through processing the message and can then be preempted by the other thread.

The solution? The resource manager library provides callouts to lock the OCB while processing any message (except _IO_CLOSE and _IO_UNBLOCK --we'll return to these). As an example, when processing the readblock() combine message, the resource manager library performs callouts in this order:

  1. lock_ocb handler
  2. _IO_LSEEK message handler
  3. _IO_READ message handler
  4. unlock_ocb handler

Therefore, in our scenario, the two threads within the resource manager would be mutually exclusive to each other by virtue of the lock -- the first thread to acquire the lock would completely process the combine message, unlock the lock, and then the second thread would perform its processing.

Let's examine several of the issues that are associated with handling combine messages:

Component responses

As we've seen, a combine message really consists of a number of "regular" resource manager messages combined into one large contiguous message. The resource manager library handles each component in the combine message separately by extracting the individual components and then out calling to the handlers you've specified in the connect and I/O function tables, as appropriate, for each component.

This generally doesn't present any new wrinkles for the message handlers themselves, except in one case. Consider the readblock() combine message:

Client call:
readblock()
Message(s):
_IO_LSEEK , _IO_READ
Callouts:
io_lock_ocb()
io_lseek()
io_read()
io_unlock_ocb()

Ordinarily, after processing the _IO_LSEEK message, your handler would return the current position within the file. However, the next message (the _IO_READ) also returns data. By convention, only the last data-returning message within a combine message will actually return data. The intermediate messages are allowed to return only a pass/fail indication.

The impact of this is that the _IO_LSEEK message handler has to be aware of whether or not it's being invoked as part of combine message handling. If it is, it should only return either an EOK (indicating that the lseek() operation succeeded) or an error indication to indicate some form of failure.

But if the _IO_LSEEK handler isn't being invoked as part of combine message handling, it should return the EOK and the new offset (or, in case of error, an error indication only).

Here's a sample of the code for the default iofunc-layer lseek() handler:

int
iofunc_lseek_default (resmgr_context_t *ctp,
                      io_lseek_t *msg,
                      iofunc_ocb_t *ocb)
{
    /* 
     *  performs the lseek processing here
     *  may "early-out" on error conditions
     */
     . . .

    /* decision re: combine messages done here */
    if (msg -> i.combine_len & _IO_COMBINE_FLAG) {
        return (EOK);
    }

    msg -> o = offset;
    return (_RESMGR_PTR (ctp, &msg -> o, sizeof (msg -> o)));
}

The relevant decision is made in this statement:

if (msg -> i.combine_len & _IO_COMBINE_FLAG)

If the _IO_COMBINE_FLAG bit is set in the combine_len member, this indicates that the message is being processed as part of a combine message.

When the resource manager library is processing the individual components of the combine message, it looks at the error return from the individual message handlers. If a handler returns anything other than EOK, then processing of further combine message components is aborted. The error that was returned from the failing component's handler is returned to the client.

Component data access

The second issue associated with handling combine messages is how to access the data area for subsequent message components.

For example, the writeblock() combine message format has an lseek() message first, followed by the write() message. This means that the data associated with the write() request is further in the received message buffer than would be the case for just a simple _IO_WRITE message:

Client call:
writeblock()
Message(s):
_IO_LSEEK , _IO_WRITE , data
Callouts:
io_lock_ocb()
io_lseek()
io_write()
io_unlock_ocb()

This issue is easy to work around. There's a resource manager library function called resmgr_msgread() that knows how to get the data corresponding to the correct message component. Therefore, in the io_write handler, if you used resmgr_msgread() instead of MsgRead(), this would be transparent to you.


Note: Resource managers should always use resmgr_msg*() cover functions.

For reference, here's the source for resmgr_msgread():

int resmgr_msgread( resmgr_context_t *ctp,
                    void *msg,
                    int nbytes,
                    int offset)
{
    return MsgRead(ctp->rcvid, msg, nbytes, ctp->offset + offset);
}

As you can see, resmgr_msgread() simply calls MsgRead() with the offset of the component message from the beginning of the combine message buffer. For completeness, there's also a resmgr_msgwrite() that works in an identical manner to MsgWrite(), except that it dereferences the passed ctp to obtain the rcvid.

Locking and unlocking the attribute structure

As mentioned above, another facet of the operation of the readblock() function from the client's perspective is that it's atomic. In order to process the requests for a particular OCB in an atomic manner, we must lock and unlock the attribute structure pointed to by the OCB, thus ensuring that only one resource manager thread has access to the OCB at a time.

The resource manager library provides two callouts for doing this:

These are members of the I/O functions structure. The handlers that you provide for those callouts should lock and unlock the attribute structure pointed to by the OCB by calling iofunc_attr_lock() and iofunc_attr_unlock(). Therefore, if you're locking the attribute structure, there's a possibility that the lock_ocb callout will block for a period of time. This is normal and expected behavior. Note also that the attributes structure is automatically locked for you when your I/O function is called.

Connect message types

Let's take a look at the general case for the io_open handler -- it doesn't always correspond to the client's open() call!

For example, consider the stat() and access() client function calls.

_IO_CONNECT_COMBINE_CLOSE

For a stat() client call, we essentially perform the sequence open()/fstat()/close(). Note that if we actually did that, three messages would be required. For performance reasons, we implement the stat() function as one single combine message:

Client call:
stat()
Message(s):
_IO_CONNECT_COMBINE_CLOSE , _IO_STAT
Callouts:
io_open()
io_lock_ocb()
io_stat()
io_unlock_ocb()
io_close()

The _IO_CONNECT_COMBINE_CLOSE message causes the io_open handler to be called. It then implicitly (at the end of processing for the combine message) causes the io_close_ocb handler to be called.

_IO_CONNECT_COMBINE

For the access() function, the client's C library will open a connection to the resource manager and perform a stat() call. Then, based on the results of the stat() call, the client's C library access() may perform an optional devctl() to get more information. In any event, because access() opened the device, it must also call close() to close it:

Client call:
access()
Message(s):
_IO_CONNECT_COMBINE , _IO_STAT
_IO_DEVCTL (optional)
_IO_CLOSE
Callouts:
io_open()
io_lock_ocb()
io_stat()
io_unlock_ocb()
io_lock_ocb() (optional)
io_devctl() (optional)
io_unlock_ocb() (optional)
io_close()

Notice how the access() function opened the pathname/device -- it sent it an _IO_CONNECT_COMBINE message along with the _IO_STAT message. This creates an OCB (when the io_open handler is called), locks the associated attribute structure (via io_lock_ocb()), performs the stat (io_stat()), and then unlocks the attributes structure (io_unlock_ocb()). Note that we don't implicitly close the OCB -- this is left for a later, explicit, message. Contrast this handling with that of the plain stat() above.

Extending Data Control Structures (DCS)

This section contains:

Extending the OCB and attribute structures

In our /dev/sample example, we had a static buffer associated with the entire resource. Sometimes you may want to keep a pointer to a buffer associated with the resource, rather than in a global area. To maintain the pointer with the resource, we would have to store it in the attribute structure. Since the attribute structure doesn't have any spare fields, we would have to extend it to contain that pointer.

Sometimes you may want to add extra entries to the standard iofunc_*() OCB (iofunc_ocb_t).

Let's see how we can extend both of these structures. The basic strategy used is to encapsulate the existing attributes and OCB structures within a newly defined superstructure that also contains our extensions. Here's the code (see the text following the listing for comments):

/* Define our overrides before including <sys/iofunc.h>  */
struct device;
#define IOFUNC_ATTR_T       struct device  /* see note 1 */
struct ocb;
#define IOFUNC_OCB_T        struct ocb     /* see note 1 */

#include <sys/iofunc.h>
#include <sys/dispatch.h>

struct ocb {                               /* see note 2 */
    iofunc_ocb_t            hdr;           /* see note 4; must always be first */
    struct ocb              *next;
    struct ocb              **prev;        /* see note 3 */
};

struct device {                            /* see note 2 */
    iofunc_attr_t           attr;          /* must always be first */
    struct ocb              *list;         /* waiting for write */
};

/* Prototypes, needed since we refer to them a few lines down */

struct ocb *ocb_calloc (resmgr_context_t *ctp, struct device *device);
void ocb_free (struct ocb *ocb);

iofunc_funcs_t ocb_funcs = { /* our ocb allocating & freeing functions */
    _IOFUNC_NFUNCS,
    ocb_calloc,
    ocb_free
};

/* The mount structure.  We have only one, so we statically declare it */

iofunc_mount_t          mountpoint = { 0, 0, 0, 0, &ocb_funcs };

/* One struct device per attached name (there's only one name in this
   example) */

struct device           deviceattr;

main()
{
    ...

    /* 
     *  deviceattr will indirectly contain the addresses 
     *  of the OCB allocating and freeing functions
     */

    deviceattr.attr.mount = &mountpoint;
    resmgr_attach (..., &deviceattr);

    ...
}

/*
 * ocb_calloc
 *
 *  The purpose of this is to give us a place to allocate our own OCB.
 *  It is called as a result of the open being done
 *  (e.g. iofunc_open_default causes it to be called). We
 *  registered it through the mount structure.
 */
IOFUNC_OCB_T
ocb_calloc (resmgr_context_t *ctp, IOFUNC_ATTR_T *device)
{
    struct ocb *ocb;

    if (!(ocb = calloc (1, sizeof (*ocb)))) {
        return 0;
    }

    /* see note 3 */
    ocb -> prev = &device -> list;
    if (ocb -> next = device -> list) {
        device -> list -> prev = &ocb -> next;
    }
    device -> list = ocb;
    
    return (ocb);
}

/*
 * ocb_free
 *
 * The purpose of this is to give us a place to free our OCB.
 * It is called as a result of the close being done
 * (e.g. iofunc_close_ocb_default causes it to be called). We
 * registered it through the mount structure.
 */
void
ocb_free (IOFUNC_OCB_T *ocb)
{
    /* see note 3 */
    if (*ocb -> prev = ocb -> next) {
        ocb -> next -> prev = ocb -> prev;
        }
        free (ocb);
}

Here are the notes for the above code:

  1. We place the definitions for our enhanced structures before including the standard I/O functions header file. Because the standard I/O functions header file checks to see if the two manifest constants are already defined, this allows a convenient way for us to semantically override the structures.
  2. Define our new enhanced data structures, being sure to place the encapsulated members first.
  3. The ocb_calloc() and ocb_free() sample functions shown here cause the newly allocated OCBs to be maintained in a linked list. Note the use of dual indirection on the struct ocb **prev; member.
  4. You must always place the iofunc structure that you're overriding as the first member of the new extended structure. This lets the common library work properly in the default cases.

Extending the mount structure

You can also extend the iofunc_mount_t structure in the same manner as the attribute and OCB structures. In this case, you'd define:

#define IOFUNC_MOUNT_T       struct newmount  

then declare the new structure:

struct newmount {
    iofunc_mount_t          mount;
    int                   ourflag;
};

Handling devctl() messages

The devctl() function is a general-purpose mechanism for communicating with a resource manager. Clients can send data to, receive data from, or both send and receive data from a resource manager. The format of the client devctl() call is:

devctl( int fd,
        int dcmd, 
        void * data, 
        size_t nbytes, 
        int * return_info);

The following values (described in detail in the devctl() documentation in the Library Reference) map directly to the _IO_DEVCTL message itself:

struct _io_devctl {
        uint16_t                  type;
        uint16_t                  combine_len;
        int32_t                   dcmd;
        int32_t                   nbytes;
        int32_t                   zero;
/*      char                      data[nbytes]; */
};

struct _io_devctl_reply {
        uint32_t                  zero;
        int32_t                   ret_val;
        int32_t                   nbytes;
        int32_t                   zero2;
/*      char                      data[nbytes]; */
    } ;

typedef union {
        struct _io_devctl         i;
        struct _io_devctl_reply   o;
} io_devctl_t;

As with most resource manager messages, we've defined a union that contains the input structure (coming into the resource manager), and a reply or output structure (going back to the client). The io_devctl resource manager handler is prototyped with the argument:

io_devctl_t *msg

which is the pointer to the union containing the message.

The type member has the value _IO_DEVCTL.

The combine_len field has meaning for a combine message; see the "Combine messages" section in this chapter.

The nbytes value is the nbytes that's passed to the devctl() function. The value contains the size of the data to be sent to the device driver, or the maximum size of the data to be received from the device driver.

The most interesting item of the input structure is the dcmd. that's passed to the devctl() function. This command is formed using the macros defined in <devctl.h>:

#define _POSIX_DEVDIR_NONE        0
#define _POSIX_DEVDIR_TO          0x80000000
#define _POSIX_DEVDIR_FROM        0x40000000
#define __DIOF(class, cmd, data)  ((sizeof(data)<<16) + ((class)<<8) + (cmd) + _POSIX_DEVDIR_FROM)
#define __DIOT(class, cmd, data)  ((sizeof(data)<<16) + ((class)<<8) + (cmd) + _POSIX_DEVDIR_TO)
#define __DIOTF(class, cmd, data) ((sizeof(data)<<16) + ((class)<<8) + (cmd) + _POSIX_DEVDIR_TOFROM)
#define __DION(class, cmd)        (((class)<<8) + (cmd) + _POSIX_DEVDIR_NONE)

It's important to understand how these macros pack data to create a command. An 8-bit class (defined in <devctl.h>) is combined with an 8-bit subtype that's manager-specific, and put together in the lower 16 bits of the integer.

The upper 16 bits contain the direction (TO, FROM) as well as a hint about the size of the data structure being passed. This size is only a hint put in to uniquely identify messages that may use the same class and code but pass different data structures.

In the following example, a cmd is generated to indicate that the client is sending data to the server (TO), but not receiving anything in return. The only bits that the library or the resource manager layer look at are the TO and FROM bits to determine which arguments are to be passed to MsgSend().

struct _my_devctl_msg {
    ...
}

#define MYDCMD  __DIOT(_DCMD_MISC, 0x54, struct _my_devctl_msg) 

Note: The size of the structure that's passed as the last field to the __DIO* macros must be less than 214 == 16 KB. Anything larger than this interferes with the upper two directional bits.

The data directly follows this message structure, as indicated by the /* char data[nbytes] */ comment in the _io_devctl structure.

Sample code for handling _IO_DEVCTL messages

You can add the following code samples to either of the examples provided in the "Simple device resource manager examples" section. Both of those code samples provided the name /dev/sample. With the changes indicated below, the client can use devctl() to set and retrieve a global value (an integer in this case) that's maintained in the resource manager.

The first addition defines what the devctl() commands are going to be. This is generally put in a common or shared header file:

typedef union _my_devctl_msg {
        int tx;             //Filled by client on send
        int rx;             //Filled by server on reply
} data_t;

#define MY_CMD_CODE      1
#define MY_DEVCTL_GETVAL __DIOF(_DCMD_MISC,  MY_CMD_CODE + 0, int)
#define MY_DEVCTL_SETVAL __DIOT(_DCMD_MISC,  MY_CMD_CODE + 1, int)
#define MY_DEVCTL_SETGET __DIOTF(_DCMD_MISC, MY_CMD_CODE + 2, union _my_devctl_msg)

In the above code, we defined three commands that the client can use:

MY_DEVCTL_SETVAL
Sets the server global to the integer the client provides.
MY_DEVCTL_GETVAL
Gets the server global and puts that value into the client's buffer.
MY_DEVCTL_SETGET
Sets the server global to the integer the client provides and returns the previous value of the server global in the client's buffer.

Add this code to the main() function:

io_funcs.devctl = io_devctl;  /* For handling _IO_DEVCTL, sent by devctl() */

And the following code gets added before the main() function:

int io_devctl(resmgr_context_t *ctp, io_devctl_t *msg, RESMGR_OCB_T *ocb);

int global_integer = 0;

Now, you need to include the new handler function to handle the _IO_DEVCTL message:

int io_devctl(resmgr_context_t *ctp, io_devctl_t *msg, RESMGR_OCB_T *ocb) {
    int     nbytes, status, previous;
    union {
        data_t  data;
        int     data32;
        // ... other devctl types you can receive
    } *rx_data;
    

    /*
     Let common code handle DCMD_ALL_* cases.
     You can do this before or after you intercept devctl's depending
     on your intentions.  Here we aren't using any pre-defined values
     so let the system ones be handled first.
    */
    if ((status = iofunc_devctl_default(ctp, msg, ocb)) != _RESMGR_DEFAULT) {
        return(status);
    }
    status = nbytes = 0;

    /*
     Note this assumes that you can fit the entire data portion of
     the devctl into one message.  In reality you should probably
     perform a MsgReadv() once you know the type of message you
     have received to suck all of the data in rather than assuming
     it all fits in the message.  We have set in our main routine
     that we'll accept a total message size of up to 2k so we
     don't worry about it in this example where we deal with ints.
    */
    rx_data = _DEVCTL_DATA(msg->i);

    /*
     Three examples of devctl operations.
     SET: Setting a value (int) in the server
     GET: Getting a value (int) from the server
     SETGET: Setting a new value and returning with the previous value
    */
    switch (msg->i.dcmd) {
    case MY_DEVCTL_SETVAL: 
        global_integer = rx_data->data32;
        nbytes = 0;
        break;

    case MY_DEVCTL_GETVAL: 
        rx_data->data32 = global_integer;
        nbytes = sizeof(rx_data->data32);
        break;
        
    case MY_DEVCTL_SETGET: 
        previous = global_integer; 
        global_integer = rx_data->data.tx;
        rx_data->data.rx = previous;        //Overwrites tx data
        nbytes = sizeof(rx_data->data.rx);
        break;

    default:
        return(ENOSYS); 
    }

    /* Clear the return message ... note we saved our data _after_ this */
    memset(&msg->o, 0, sizeof(msg->o));

    /*
     If you wanted to pass something different to the return
     field of the devctl() you could do it through this member.
    */
    msg->o.ret_val = status;

    /* Indicate the number of bytes and return the message */
    msg->o.nbytes = nbytes;
    return(_RESMGR_PTR(ctp, &msg->o, sizeof(msg->o) + nbytes));
}

When working with devctl() handler code, you should be familiar with the following:

If you add the following handler code, a client should be able to open /dev/sample and subsequently set and retrieve the global integer value:

int main(int argc, char **argv) {
    int     fd, ret, val;
    data_t  data;

    if ((fd = open("/dev/sample", O_RDONLY)) == -1) {
            return(1);
    }

    /* Find out what the value is set to initially */
    val = -1;
    ret = devctl(fd, MY_DEVCTL_GETVAL, &val, sizeof(val), NULL);
    printf("GET returned %d w/ server value %d \n", ret, val);

    /* Set the value to something else */
    val = 25;
    ret = devctl(fd, MY_DEVCTL_SETVAL, &val, sizeof(val), NULL);
    printf("SET returned %d \n", ret);

    /* Verify we actually did set the value */
    val = -1;
    ret = devctl(fd, MY_DEVCTL_GETVAL, &val, sizeof(val), NULL);
    printf("GET returned %d w/ server value %d == 25? \n", ret, val);

    /* Now do a set/get combination */
    memset(&data, 0, sizeof(data));
    data.tx = 50;
    ret = devctl(fd, MY_DEVCTL_SETGET, &data, sizeof(data), NULL);
    printf("SETGET returned with %d w/ server value %d == 25?\n", ret, data.rx);

    /* Check set/get worked */
    val = -1;
    ret = devctl(fd, MY_DEVCTL_GETVAL, &val, sizeof(val), NULL);
    printf("GET returned %d w/ server value %d == 50? \n", ret, val);

    return(0);
}

Handling ionotify() and select()

A client uses ionotify() and select() to ask a resource manager about the status of certain conditions (e.g. whether input data is available). The conditions may or may not have been met. The resource manager can be asked to:

The select() function differs from ionotify() in that most of the work is done in the library. For example, the client code would be unaware that any event is involved, nor would it be aware of the blocking function that waits for the event. This is all hidden in the library code for select().

However, from a resource manager's point of view, there's no difference between ionotify() and select(); they're handled with the same code.

For more information on the ionotify() and select() functions, see the Library Reference.


Note: If multiple threads in the same client perform simultaneous operations with select() and ionotify(), notification races may occur.

Since ionotify() and select() require the resource manager to do the same work, they both send the _IO_NOTIFY message to the resource manager. The io_notify handler is responsible for handling this message. Let's start by looking at the format of the message itself:

struct _io_notify {
    uint16_t                    type;
    uint16_t                    combine_len;
    int32_t                     action;
    int32_t                     flags;
    struct sigevent             event;
};

struct _io_notify_reply {
    uint32_t                    flags;
};

typedef union {
    struct _io_notify           i;
    struct _io_notify_reply     o;
} io_notify_t;

Note: The code samples used in this chapter are not always POSIX-compliant.

As with all resource manager messages, we've defined a union that contains the input structure (coming into the resource manager), and a reply or output structure (going back to the client). The io_notify handler is prototyped with the argument:

io_notify_t *msg

which is the pointer to the union containing the message. The items in the input structure are:

The type member has the value _IO_NOTIFY.

The combine_len field has meaning for a combine message; see the "Combine messages" section in this chapter.

The action member is used by the iofunc_notify() helper function to tell it whether it should:

Since iofunc_notify() looks at this, you don't have to worry about it.

The flags member contains the conditions that the client is interested in and can be any mixture of the following:

_NOTIFY_COND_INPUT
This condition is met when there are one or more units of input data available (i.e. clients can now issue reads). The number of units defaults to 1, but you can change it. The definition of a unit is up to you: for a character device such as a serial port, it would be a character; for a POSIX message queue, it would be a message. Each resource manager selects an appropriate object.
_NOTIFY_COND_OUTPUT
This condition is met when there's room in the output buffer for one or more units of data (i.e. clients can now issue writes). The number of units defaults to 1, but you can change it. The definition of a unit is up to you -- some resource managers may default to an empty output buffer while others may choose some percentage of the buffer empty.
_NOTIFY_COND_OBAND
The condition is met when one or more units of out-of-band data are available. The number of units defaults to 1, but you can change it. The definition of out-of-band data is specific to the resource manager.

The event member is what the resource manager delivers once a condition is met.

A resource manager needs to keep a list of clients that want to be notified as conditions are met, along with the events to use to do the notifying. When a condition is met, the resource manager must traverse the list to look for clients that are interested in that condition, and then deliver the appropriate event. As well, if a client closes its file descriptor, then any notification entries for that client must be removed from the list.

To make all this easier, the following structure and helper functions are provided for you to use in a resource manager:

iofunc_notify_t structure
Contains the three notification lists, one for each possible condition. Each is a list of the clients to be notified for that condition.
iofunc_notify()
Adds or removes notification entries; also polls for conditions. Call this function inside of your io_notify handler function.
iofunc_notify_trigger()
Sends notifications to queued clients. Call this function when one or more conditions have been met.
iofunc_notify_remove()
Removes notification entries from the list. Call this function when the client closes its file descriptor.

Sample code for handling _IO_NOTIFY messages

You can add the following code samples to either of the examples provided in the "Simple device resource manager examples" section. Both of those code samples provided the name /dev/sample. With the changes indicated below, clients can use writes to send it data, which it'll store as discrete messages. Other clients can use either ionotify() or select() to request notification when that data arrives. When clients receive notification, they can issue reads to get the data.

You'll need to replace this code that's located above the main() function:

#include <sys/iofunc.h>
#include <sys/dispatch.h>

static resmgr_connect_funcs_t    connect_funcs;
static resmgr_io_funcs_t         io_funcs;
static iofunc_attr_t             attr;

with the following:

struct device_attr_s;
#define IOFUNC_ATTR_T   struct device_attr_s

#include <sys/iofunc.h>
#include <sys/dispatch.h>

/*
 * define structure and variables for storing the data that is received.
 * When clients write data to us, we store it here.  When clients do
 * reads, we get the data from here.  Result ... a simple message queue.
*/
typedef struct item_s {
    struct item_s   *next;
    char            *data;
} item_t;

/* the extended attributes structure */
typedef struct device_attr_s {
    iofunc_attr_t   attr;
    iofunc_notify_t notify[3];  /* notification list used by iofunc_notify*() */
    item_t          *firstitem; /* the queue of items */
    int             nitems;     /* number of items in the queue */
} device_attr_t;

/* We only have one device; device_attr is its attribute structure */

static device_attr_t    device_attr;

int io_read(resmgr_context_t *ctp, io_read_t  *msg, RESMGR_OCB_T *ocb);
int io_write(resmgr_context_t *ctp, io_write_t *msg, RESMGR_OCB_T *ocb);
int io_notify(resmgr_context_t *ctp, io_notify_t *msg, RESMGR_OCB_T *ocb);
int io_close_ocb(resmgr_context_t *ctp, void *reserved, RESMGR_OCB_T *ocb);

static resmgr_connect_funcs_t  connect_funcs;
static resmgr_io_funcs_t       io_funcs;

We need a place to keep data that's specific to our device. A good place for this is in an attribute structure that we can associate with the name we registered: /dev/sample. So, in the code above, we defined device_attr_t and IOFUNC_ATTR_T for this purpose. We talk more about this type of device-specific attribute structure in the section, "Extending Data Control Structures (DCS)."

We need two types of device-specific data:

Note that we removed the definition of attr, since we use device_attr instead.

Of course, we have to give the resource manager library the address of our handlers so that it'll know to call them. In the code for main() where we called iofunc_func_init(), we'll add the following code to register our handlers:

/* initialize functions for handling messages */
iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                 _RESMGR_IO_NFUNCS, &io_funcs);
io_funcs.notify = io_notify; /* for handling _IO_NOTIFY, sent as
                                a result of client calls to ionotify()
                                and select() */
io_funcs.write = io_write;
io_funcs.read = io_read;
io_funcs.close_ocb = io_close_ocb;

And, since we're using device_attr in place of attr, we need to change the code wherever we use it in main(). So, you'll need to replace this code:

/* initialize attribute structure used by the device */
iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);

/* attach our device name */
id = resmgr_attach(dpp,            /* dispatch handle        */
                   &resmgr_attr,   /* resource manager attrs */
                   "/dev/sample",  /* device name            */
                   _FTYPE_ANY,     /* open type              */
                   0,              /* flags                  */
                   &connect_funcs, /* connect routines       */
                   &io_funcs,      /* I/O routines           */
                   &attr);         /* handle                 */

with the following:

/* initialize attribute structure used by the device */
iofunc_attr_init(&device_attr.attr, S_IFNAM | 0666, 0, 0);
IOFUNC_NOTIFY_INIT(device_attr.notify);
device_attr.firstitem = NULL;
device_attr.nitems = 0;

/* attach our device name */
id = resmgr_attach(dpp,            /* dispatch handle        */
                   &resmgr_attr,   /* resource manager attrs */
                   "/dev/sample",  /* device name            */
                   _FTYPE_ANY,     /* open type              */
                   0,              /* flags                  */
                   &connect_funcs, /* connect routines       */
                   &io_funcs,      /* I/O routines           */
                   &device_attr);  /* handle                 */

Note that we set up our device-specific data in device_attr. And, in the call to resmgr_attach(), we passed &device_attr (instead of &attr) for the handle parameter.

Now, you need to include the new handler function to handle the _IO_NOTIFY message:

int
io_notify(resmgr_context_t *ctp, io_notify_t *msg, RESMGR_OCB_T *ocb)
{
    device_attr_t   *dattr = (device_attr_t *) ocb->attr;
    int             trig;
    
    /* 
     * 'trig' will tell iofunc_notify() which conditions are currently
     * satisfied.  'dattr->nitems' is the number of messages in our list of
     * stored messages.
    */

    trig = _NOTIFY_COND_OUTPUT;         /* clients can always give us data */
    if (dattr->nitems > 0)
        trig |= _NOTIFY_COND_INPUT;      /* we have some data available */
    
    /*
     * iofunc_notify() will do any necessary handling, including adding
     * the client to the notification list is need be.
    */

    return (iofunc_notify(ctp, msg, dattr->notify, trig, NULL, NULL));
}

As stated above, our io_notify handler will be called when a client calls ionotify() or select(). In our handler, we're expected to remember who those clients are, and what conditions they want to be notified about. We should also be able to respond immediately with conditions that are already true. The iofunc_notify() helper function makes this easy.

The first thing we do is to figure out which of the conditions we handle have currently been met. In this example, we're always able to accept writes, so in the code above we set the _NOTIFY_COND_OUTPUT bit in trig. We also check nitems to see if we have data and set the _NOTIFY_COND_INPUT if we do.

We then call iofunc_notify(), passing it the message that was received (msg), the notification lists (notify), and which conditions have been met (trig). If one of the conditions that the client is asking about has been met, and the client wants us to poll for the condition before arming, then iofunc_notify() will return with a value that indicates what condition has been met and the condition will not be armed. Otherwise, the condition will be armed. In either case, we'll return from the handler with the return value from iofunc_notify().

Earlier, when we talked about the three possible conditions, we mentioned that if you specify _NOTIFY_COND_INPUT, the client is notified when there's one or more units of input data available and that the number of units is up to you. We said a similar thing about _NOTIFY_COND_OUTPUT and _NOTIFY_COND_OBAND. In the code above, we let the number of units for all these default to 1. If you want to use something different, then you must declare an array such as:

int notifycounts[3] =  { 10, 2, 1 };

This sets the units for: _NOTIFY_COND_INPUT to 10; _NOTIFY_COND_OUTPUT to 2; and _NOTIFY_COND_OBAND to 1. We would pass notifycounts to iofunc_notify() as the second to last parameter.

Then, as data arrives, we notify whichever clients have asked for notification. In this sample, data arrives through clients sending us _IO_WRITE messages and we handle it using an io_write handler.

int
io_write(resmgr_context_t *ctp, io_write_t *msg,
         RESMGR_OCB_T *ocb)
{
    device_attr_t   *dattr = (device_attr_t *) ocb->attr;
    int             i;
    char            *p;
    int             status;
    char            *buf;
    item_t          *newitem;

    if ((status = iofunc_write_verify(ctp, msg, ocb, NULL))
         != EOK)
        return (status);

    if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
        return (ENOSYS);

    if (msg->i.nbytes > 0) {
        
        /* Get and store the data */
        
        if ((newitem = malloc(sizeof(item_t))) == NULL)
            return (errno);
        if ((newitem->data = malloc(msg->i.nbytes+1)) ==
            NULL) {
            free(newitem);
            return (errno);
        }
        /* reread the data from the sender's message buffer */
        resmgr_msgread(ctp, newitem->data, msg->i.nbytes,
                       sizeof(msg->i));
        newitem->data[msg->i.nbytes] = NULL;

        if (dattr->firstitem)
            newitem->next = dattr->firstitem;
        else
            newitem->next = NULL;
        dattr->firstitem = newitem;
        dattr->nitems++;

        /*
         * notify clients who may have asked to be notified
         * when there is data
        */
    
        if (IOFUNC_NOTIFY_INPUT_CHECK(dattr->notify,
            dattr->nitems, 0))
            iofunc_notify_trigger(dattr->notify, dattr->nitems,
                                  IOFUNC_NOTIFY_INPUT);
    }
   
    /* set up the number of bytes (returned by client's
       write()) */
 
    _IO_SET_WRITE_NBYTES(ctp, msg->i.nbytes);

    if (msg->i.nbytes > 0)
        ocb->attr->attr.flags |= IOFUNC_ATTR_MTIME |
                                 IOFUNC_ATTR_CTIME;

    return (_RESMGR_NPARTS(0));
}

The important part of the above io_write() handler is the code within the following section:

if (msg->i.nbytes > 0) {
    ....
}

Here we first allocate space for the incoming data, and then use resmgr_msgread() to copy the data from the client's send buffer into the allocated space. Then, we add the data to our queue.

Next, we pass the number of input units that are available to IOFUNC_NOTIFY_INPUT_CHECK() to see if there are enough units to notify clients about. This is checked against the notifycounts that we mentioned above when talking about the io_notify handler. If there are enough units available then we call iofunc_notify_trigger() telling it that nitems of data are available (IOFUNC_NOTIFY_INPUT means input is available). The iofunc_notify_trigger() function checks the lists of clients asking for notification (notify) and notifies any that asked about data being available.

Any client that gets notified will then perform a read to get the data. In our sample, we handle this with the following io_read handler:

int
io_read(resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb)
{
    device_attr_t   *dattr = (device_attr_t *) ocb->attr;
    int             status;
    
    if ((status = iofunc_read_verify(ctp, msg, ocb, NULL)) != EOK)
        return (status);

    if ((msg->i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE)
        return (ENOSYS);

    if (dattr->firstitem) {
        int     nbytes;
        item_t  *item, *prev;
        
        /* get last item */
        item = dattr->firstitem;
        prev = NULL;
        while (item->next != NULL) {
            prev = item;
            item = item->next;
        }

        /* 
         * figure out number of bytes to give, write the data to the 
         * client's reply buffer, even if we have more bytes than they
         * are asking for, we remove the item from our list
        */
        nbytes = min (strlen (item->data), msg->i.nbytes);

        /* set up the number of bytes (returned by client's read()) */
        _IO_SET_READ_NBYTES (ctp, nbytes);

        /* 
         * write the bytes to the client's reply buffer now since we
         * are about to free the data
        */
        resmgr_msgwrite (ctp, item->data, nbytes, 0);

        /* remove the data from the queue */
        if (prev)
            prev->next = item->next;
        else
            dattr->firstitem = NULL;
        free(item->data);
        free(item);
        dattr->nitems--;
    } else {
        /* the read() will return with 0 bytes */
        _IO_SET_READ_NBYTES (ctp, 0);
    }   

    /* mark the access time as invalid (we just accessed it) */

    if (msg->i.nbytes > 0)
        ocb->attr->attr.flags |= IOFUNC_ATTR_ATIME;

    return (EOK);
}

The important part of the above io_read handler is the code within this section:

if (firstitem) {
    ....
}

We first walk through the queue looking for the oldest item. Then we use resmgr_msgwrite() to write the data to the client's reply buffer. We do this now because the next step is to free the memory that we're using to store that data. We also remove the item from our queue.

Lastly, if a client closes their file descriptor, we must remove them from our list of clients. This is done using a io_close_ocb handler:

int
io_close_ocb(resmgr_context_t *ctp, void *reserved, RESMGR_OCB_T *ocb)
{
    device_attr_t   *dattr = (device_attr_t *) ocb->attr;

    /*
     * a client has closed their file descriptor or has terminated.
     * Remove them from the notification list.
    */
    
    iofunc_notify_remove(ctp, dattr->notify);

    return (iofunc_close_ocb_default(ctp, reserved, ocb));
}

In the io_close_ocb handler, we called iofunc_notify_remove() and passed it ctp (contains the information that identifies the client) and notify (contains the list of clients) to remove the client from the lists.

Handling private messages and pulses

A resource manager may need to receive and handle pulses, perhaps because an interrupt handler has returned a pulse or some other thread or process has sent a pulse.

The main issue with pulses is that they have to be received as a message -- this means that a thread has to explicitly perform a MsgReceive() in order to get the pulse. But unless this pulse is sent to a different channel than the one that the resource manager is using for its main messaging interface, it will be received by the library. Therefore, we need to see how a resource manager can associate a pulse code with a handler routine and communicate that information to the library.

The pulse_attach() function can be used to associate a pulse code with a handler function. Therefore, when the dispatch layer receives a pulse, it will look up the pulse code and see which associated handler to call to handle the pulse message.

You may also want to define your own private message range to communicate with your resource manager. Note that the range 0x0 to 0x1FF is reserved for the OS. To attach a range, you use the message_attach() function.

In this example, we create the same resource manager, but this time we also attach to a private message range and attach a pulse, which is then used as a timer event:

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>

#define THREAD_POOL_PARAM_T     dispatch_context_t
#include <sys/iofunc.h>
#include <sys/dispatch.h>

static resmgr_connect_funcs_t   connect_func;
static resmgr_io_funcs_t        io_func;
static iofunc_attr_t            attr;

int
timer_tick(message_context_t *ctp, int code, unsigned flags, void *handle) {

    union sigval             value = ctp->msg->pulse.value;
    /*
     *  Do some useful work on every timer firing
     *  ....
     */
    printf("received timer event, value %d\n", value.sival_int);
    return 0;
}

int
message_handler(message_context_t *ctp, int code, unsigned flags, void *handle) {
    printf("received private message, type %d\n", code);
    return 0;
}

int
main(int argc, char **argv) {
    thread_pool_attr_t    pool_attr;
    resmgr_attr_t         resmgr_attr;
    struct sigevent       event;
    struct _itimer        itime;
    dispatch_t            *dpp;
    thread_pool_t         *tpp;
    resmgr_context_t      *ctp;
    int                   timer_id;
    int                   id;


    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr,
                "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    memset(&pool_attr, 0, sizeof pool_attr);
    pool_attr.handle = dpp;
    /*  We are doing resmgr and pulse-type attaches.
     *
     *  If you're going to use custom messages or pulses with 
     *  the message_attach() or pulse_attach() functions,
     *  then you MUST use the dispatch functions 
     *  (i.e. dispatch_block(),  dispatch_handler(), ...),
     *  NOT the resmgr functions (resmgr_block(), resmgr_handler()).
     */
    pool_attr.context_alloc = dispatch_context_alloc;
    pool_attr.block_func = dispatch_block; 
    pool_attr.unblock_func = dispatch_unblock;
    pool_attr.handler_func = dispatch_handler;
    pool_attr.context_free = dispatch_context_free;
    pool_attr.lo_water = 2;
    pool_attr.hi_water = 4;
    pool_attr.increment = 1;
    pool_attr.maximum = 50;

    if((tpp = thread_pool_create(&pool_attr, POOL_FLAG_EXIT_SELF)) == NULL) {
        fprintf(stderr, "%s: Unable to initialize thread pool.\n",argv[0]);
        return EXIT_FAILURE;
    }

    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_func, _RESMGR_IO_NFUNCS,
                     &io_func);
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);
        
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    if((id = resmgr_attach(dpp, &resmgr_attr, "/dev/sample", _FTYPE_ANY, 0,
                 &connect_func, &io_func, &attr)) == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* We want to handle our own private messages, of type 0x5000 to 0x5fff */
    if(message_attach(dpp, NULL, 0x5000, 0x5fff, &message_handler, NULL) == -1) {
        fprintf(stderr, "Unable to attach to private message range.\n");
         return EXIT_FAILURE;
    }

    /* Initialize an event structure, and attach a pulse to it */
    if((event.sigev_code = pulse_attach(dpp, MSG_FLAG_ALLOC_PULSE, 0, &timer_tick,
                                        NULL)) == -1) {
        fprintf(stderr, "Unable to attach timer pulse.\n");
         return EXIT_FAILURE;
    }

    /* Connect to our channel */
    if((event.sigev_coid = message_connect(dpp, MSG_FLAG_SIDE_CHANNEL)) == -1) {
        fprintf(stderr, "Unable to attach to channel.\n");
        return EXIT_FAILURE;
    }

    event.sigev_notify = SIGEV_PULSE;
    event.sigev_priority = -1;
    /* We could create several timers and use different sigev values for each */
    event.sigev_value.sival_int = 0;

    if((timer_id = TimerCreate(CLOCK_REALTIME, &event)) == -1) {;
        fprintf(stderr, "Unable to attach channel and connection.\n");
        return EXIT_FAILURE;
    }

    /* And now set up our timer to fire every second */
    itime.nsec = 1000000000;
    itime.interval_nsec = 1000000000;
    TimerSettime(timer_id, 0, &itime, NULL);

    /* Never returns */
    thread_pool_start(tpp);
}

We can either define our own pulse code (e.g. #define OurPulseCode 57), or we can ask the pulse_attach() function to dynamically generate one for us (and return the pulse code value as the return code from pulse_attach()) by specifying the pulse code as _RESMGR_PULSE_ALLOC.

See the pulse_attach(), MsgSendPulse(), MsgDeliverEvent(), and MsgReceive() functions in the Library Reference for more information on receiving and generating pulses.

Handling open(), dup(), and close() messages

The resource manager library provides another convenient service for us: it knows how to handle dup() messages.

Suppose that the client executed code that eventually ended up performing:

fd = open ("/dev/sample", O_RDONLY);
...
fd2 = dup (fd);
...
fd3 = dup (fd);
...
close (fd3);
...
close (fd2);
...
close (fd);

Our resource manager would get an _IO_CONNECT message for the first open(), followed by two _IO_DUP messages for the two dup() calls. Then, when the client executed the close() calls, we would get three _IO_CLOSE messages.

Since the dup() functions generate duplicates of the file descriptors, we don't want to allocate new OCBs for each one. And since we're not allocating new OCBs for each dup(), we don't want to release the memory in each _IO_CLOSE message when the _IO_CLOSE messages arrive! If we did that, the first close would wipe out the OCB.

The resource manager library knows how to manage this for us; it keeps count of the number of _IO_DUP and _IO_CLOSE messages sent by the client. Only on the last _IO_CLOSE message will the library synthesize a call to our _IO_CLOSE_OCB handler.


Note: Most users of the library will want to have the default functions manage the _IO_DUP and _IO_CLOSE messages; you'll most likely never override the default actions.

Handling client unblocking due to signals or timeouts

Another convenient service that the resource manager library does for us is unblocking.

When a client issues a request (e.g. read()), this translates (via the client's C library) into a MsgSend() to our resource manager. The MsgSend() is a blocking call. If the client receives a signal during the time that the MsgSend() is outstanding, our resource manager needs to have some indication of this so that it can abort the request.

Because the library set the _NTO_CHF_UNBLOCK flag when it called ChannelCreate(), we'll receive a pulse whenever the client tries to unblock from a MsgSend() that we have MsgReceive()'d.

As an aside, recall that in the Neutrino messaging model the client can be in one of two states as a result of calling MsgSend(). If the server hasn't yet received the message (via the server's MsgReceive()), the client is in a SEND-blocked state -- the client is waiting for the server to receive the message. When the server has actually received the message, the client transits to a REPLY-blocked state -- the client is now waiting for the server to reply to the message (via MsgReply()).

When this happens and the pulse is generated, the resource manager library handles the pulse message and synthesizes an _IO_UNBLOCK message.

Looking through the resmgr_io_funcs_t and the resmgr_connect_funcs_t structures (see the Library Reference), you'll notice that there are actually two unblock message handlers: one in the I/O functions structure and one in the connect functions structure.

Why two? Because we may get an abort in one of two places. We can get the abort pulse right after the client has sent the _IO_OPEN message (but before we've replied to it), or we can get the abort during an I/O message.

Once we've performed the handling of the _IO_CONNECT message, the I/O functions' unblock member will be used to service an unblock pulse. Therefore, if you're supplying your own io_open handler, be sure to set up all relevant fields in the OCB before you call resmgr_open_bind(); otherwise, your I/O functions' version of the unblock handler may get called with invalid data in the OCB. (Note that this issue of abort pulses "during" message processing arises only if there are multiple threads running in your resource manager. If there's only one thread, then the messages will be serialized by the library's MsgReceive() function.)

The effect of this is that if the client is SEND-blocked, the server doesn't need to know that the client is aborting the request, because the server hasn't yet received it.

Only in the case where the server has received the request and is performing processing on that request does the server need to know that the client now wishes to abort.

For more information on these states and their interactions, see the MsgSend(), MsgReceive(), MsgReply(), and ChannelCreate() functions in the Library Reference; see also the chapter on Interprocess Communication in the System Architecture book.

If you're overriding the default unblock handler, you should always call the default handler to process any generic unblocking cases first. For example:

if((status = iofunc_unblock_default(...)) != _RESMGR_DEFAULT) {
return status;
}

/* Do your own thing to look for a client to unblock */

This ensures that any client waiting on a resource manager lists (such as an advisory lock list) will be unblocked if possible.

Handling interrupts

Resource managers that manage an actual hardware resource will likely need to handle interrupts generated by the hardware. For a detailed discussion on strategies for interrupt handlers, see the chapter on Writing an Interrupt Handler in this book.

How do interrupt handlers relate to resource managers? When a significant event happens within the interrupt handler, the handler needs to inform a thread in the resource manager. This is usually done via a pulse (discussed in the "Handling private messages and pulses" section), but it can also be done with the SIGEV_INTR event notification type. Let's look at this in more detail.

When the resource manager starts up, it transfers control to thread_pool_start(). This function may or may not return, depending on the flags passed to thread_pool_create() (if you don't pass any flags, the function returns after the thread pool is created). This means that if you're going to set up an interrupt handler, you should do so before starting the thread pool, or use one of the strategies we discussed above (such as starting a thread for your entire resource manager).

However, if you're going to use the SIGEV_INTR event notification type, there's a catch -- the thread that attaches the interrupt (via InterruptAttach() or InterruptAttachEvent()) must be the same thread that calls InterruptWait().

Sample code for handling interrupts

Here's an example that includes relevant portions of the interrupt service routine and the handling thread:

#define INTNUM 0
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
#include <sys/neutrino.h>

static resmgr_connect_funcs_t   connect_funcs;
static resmgr_io_funcs_t        io_funcs;
static iofunc_attr_t            attr;

void *
interrupt_thread (void * data)
{
    struct sigevent event;
    int             id;

    /* fill in "event" structure */
    memset(&event, 0, sizeof(event));
    event.sigev_notify = SIGEV_INTR;

    /* Obtain I/O privileges */
    ThreadCtl( _NTO_TCTL_IO, 0 );

    /* intNum is the desired interrupt level */
    id = InterruptAttachEvent (INTNUM, &event, 0);

    /*... insert your code here ... */

    while (1) {
        InterruptWait (NULL, NULL);
        /*  do something about the interrupt,
         *  perhaps updating some shared
         *  structures in the resource manager 
         *
         *  unmask the interrupt when done
         */
        InterruptUnmask(INTNUM, id);
    }
}

int
main(int argc, char **argv) {
    thread_pool_attr_t    pool_attr;
    resmgr_attr_t         resmgr_attr;
    dispatch_t            *dpp;
    thread_pool_t         *tpp;
    int                   id;


    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr,
                "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    memset(&pool_attr, 0, sizeof pool_attr);
    pool_attr.handle = dpp; 
    pool_attr.context_alloc = dispatch_context_alloc; 
    pool_attr.block_func = dispatch_block;  
    pool_attr.unblock_func = dispatch_unblock; 
    pool_attr.handler_func = dispatch_handler; 
    pool_attr.context_free = dispatch_context_free;
    pool_attr.lo_water = 2;
    pool_attr.hi_water = 4;
    pool_attr.increment = 1;
    pool_attr.maximum = 50;

    if((tpp = thread_pool_create(&pool_attr, 
                                 POOL_FLAG_EXIT_SELF)) == NULL) {
        fprintf(stderr, "%s: Unable to initialize thread pool.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs,
                     _RESMGR_IO_NFUNCS, &io_funcs);
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);
        
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    if((id = resmgr_attach(dpp, &resmgr_attr, "/dev/sample", 
                           _FTYPE_ANY, 0,
                 &connect_funcs, &io_funcs, &attr)) == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* Start the thread that will handle interrupt events. */
    pthread_create (NULL, NULL, interrupt_thread, NULL);

    /* Never returns */
    thread_pool_start(tpp);
}

Here the interrupt_thread() function uses InterruptAttachEvent() to bind the interrupt source (intNum) to the event (passed in event), and then waits for the event to occur.

This approach has a major advantage over using a pulse. A pulse is delivered as a message to the resource manager, which means that if the resource manager's message-handling threads are busy processing requests, the pulse will be queued until a thread does a MsgReceive().

With the InterruptWait() approach, if the thread that's executing the InterruptWait() is of sufficient priority, it unblocks and runs immediately after the SIGEV_INTR is generated.

Multi-threaded resource managers

In this section:

Multi-threaded resource manager example

Let's look at our multi-threaded resource manager example in more detail:

#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>

/*
 *  define THREAD_POOL_PARAM_T such that we can avoid a compiler
 *  warning when we use the dispatch_*() functions below
 */
#define THREAD_POOL_PARAM_T dispatch_context_t

#include <sys/iofunc.h>
#include <sys/dispatch.h>

static resmgr_connect_funcs_t    connect_funcs;
static resmgr_io_funcs_t         io_funcs;
static iofunc_attr_t             attr;

main(int argc, char **argv)
{
    /* declare variables we'll be using */
    thread_pool_attr_t   pool_attr;
    resmgr_attr_t        resmgr_attr;
    dispatch_t           *dpp;
    thread_pool_t        *tpp;
    dispatch_context_t   *ctp;
    int                  id;

    /* initialize dispatch interface */
    if((dpp = dispatch_create()) == NULL) {
        fprintf(stderr,
                "%s: Unable to allocate dispatch handle.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize resource manager attributes */
    memset(&resmgr_attr, 0, sizeof resmgr_attr);
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    /* initialize functions for handling messages */
    iofunc_func_init(_RESMGR_CONNECT_NFUNCS, &connect_funcs, 
                     _RESMGR_IO_NFUNCS, &io_funcs);

    /* initialize attribute structure used by the device */
    iofunc_attr_init(&attr, S_IFNAM | 0666, 0, 0);

    /* attach our device name */
    id = resmgr_attach(
            dpp,            /* dispatch handle        */
            &resmgr_attr,   /* resource manager attrs */
            "/dev/sample",  /* device name            */
            _FTYPE_ANY,     /* open type              */
            0,              /* flags                  */
            &connect_funcs, /* connect routines       */
            &io_funcs,      /* I/O routines           */
            &attr);         /* handle                 */
    if(id == -1) {
        fprintf(stderr, "%s: Unable to attach name.\n", argv[0]);
        return EXIT_FAILURE;
    }

    /* initialize thread pool attributes */
    memset(&pool_attr, 0, sizeof pool_attr);
    pool_attr.handle = dpp;
    pool_attr.context_alloc = dispatch_context_alloc;
    pool_attr.block_func = dispatch_block;
    pool_attr.unblock_func = dispatch_unblock;
    pool_attr.handler_func = dispatch_handler;
    pool_attr.context_free = dispatch_context_free;
    pool_attr.lo_water = 2;
    pool_attr.hi_water = 4;
    pool_attr.increment = 1;
    pool_attr.maximum = 50;

    /* allocate a thread pool handle */
    if((tpp = thread_pool_create(&pool_attr, 
                                 POOL_FLAG_EXIT_SELF)) == NULL) {
        fprintf(stderr, "%s: Unable to initialize thread pool.\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* start the threads, will not return */
    thread_pool_start(tpp);
}

The thread pool attribute (pool_attr) controls various aspects of the thread pool, such as which functions get called when a new thread is started or dies, the total number of worker threads, the minimum number, and so on.

Thread pool attributes

Here's the _thread_pool_attr structure:

typedef struct _thread_pool_attr {
  THREAD_POOL_HANDLE_T  *handle;
  THREAD_POOL_PARAM_T   *(*block_func)(THREAD_POOL_PARAM_T *ctp);
  void                  (*unblock_func)(THREAD_POOL_PARAM_T *ctp);
  int                   (*handler_func)(THREAD_POOL_PARAM_T *ctp);
  THREAD_POOL_PARAM_T   *(*context_alloc)(
                            THREAD_POOL_HANDLE_T *handle);
  void                  (*context_free)(THREAD_POOL_PARAM_T *ctp);
  pthread_attr_t        *attr;
  unsigned short        lo_water;
  unsigned short        increment;
  unsigned short        hi_water;
  unsigned short        maximum;
  unsigned              reserved[8];
} thread_pool_attr_t;

The functions that you fill into the above structure can be taken from the dispatch layer (dispatch_block(), ...), the resmgr layer (resmgr_block(), ...) or they can be of your own making. If you're not using the resmgr layer functions, then you'll have to define THREAD_POOL_PARAM_T to some sort of context structure for the library to pass between the various functions. By default, it's defined as a resmgr_context_t but since this sample is using the dispatch layer, we needed it to be adispatch_context_t. We defined it prior to doing the includes above since the header files refer to it. THREAD_POOL_PARAM_T

Part of the above structure contains information telling the resource manager library how you want it to handle multiple threads (if at all). During development, you should design your resource manager with multiple threads in mind. But during testing, you'll most likely have only one thread running (to simplify debugging). Later, after you've ensured that the base functionality of your resource manager is stable, you may wish to "turn on" multiple threads and revisit the debug cycle.

The following members control the number of threads that are running:

lo_water
Minimum number of blocked threads.
increment
Number of thread to create at a time to achieve lo_water.
hi_water
Maximum number of blocked threads.
maximum
Total number of threads created at any time.

The important parameters specify the maximum thread count and the increment. The value for maximum should ensure that there's always a thread in a RECEIVE-blocked state. If you're at the number of maximum threads, then your clients will block until a free thread is ready to receive data. The value you specify for increment will cut down on the number of times your driver needs to create threads. It's probably wise to err on the side of creating more threads and leaving them around rather than have them being created/destroyed all the time.

You determine the number of threads you want to be RECEIVE-blocked on the MsgReceive() at any time by filling in the lo_water parameter.

If you ever have fewer than lo_water threads RECEIVE-blocked, the increment parameter specifies how many threads should be created at once, so that at least lo_water number of threads are once again RECEIVE-blocked.

Once the threads are done their processing, they will return to the block function. The hi_water variable specifies an upper limit to the number of threads that are RECEIVE-blocked. Once this limit is reached, the threads will destroy themselves to ensure that no more than hi_water number of threads are RECEIVE-blocked.

To prevent the number of threads from increasing without bounds, the maximum parameter limits the absolute maximum number of threads that will ever run simultaneously.

When threads are created by the resource manager library, they'll have a stack size as specified by the thread_stack_size parameter. If you want to specify stack size or priority, fill in pool_attr.attr with a proper pthread_attr_t pointer.

The thread_pool_attr_t structure contains pointers to several functions:

block_func()
Called by the worker thread when it needs to block waiting for some message.
handler_func()
Called by the thread when it has unblocked because it received a message. This function processes the message.
context_alloc()
Called when a new thread is created. Returns a context that this thread uses to do its work.
context_free()
Free the context when the worker thread exits.
unblock_func()
Called by the library to shutdown the thread pool or change the number of running threads.

Thread pool functions

The library provides the following thread pool functions:

thread_pool_create()
Initializes the pool context. Returns a thread pool handle (tpp) that's used to start the thread pool.
thread_pool_start()
Start the thread pool. This function may or may not return, depending on the flags passed to thread_pool_create().
thread_pool_destroy()
Destroy a thread pool.
thread_pool_control()
Control the number of threads.

Note:

In the example provided in the multi-threaded resource managers section, thread_pool_start(tpp) never returns because we set the POOL_FLAG_EXIT_SELF bit. Also, the POOL_FLAG_USE_SELF flag itself never returns, but the current thread becomes part of the thread pool.


If no flags are passed (i.e. 0 instead of any flags), the function returns after the thread pool is created.

Filesystem resource managers

In this section:

Considerations for filesystem resource managers

Since a filesystem resource manager may potentially receive long pathnames, it must be able to parse and handle each component of the path properly.

Let's say that a resource manager registers the mountpoint /mount/, and a user types:

ls -l /mount/home

where /mount/home is a directory on the device.

ls does the following:

d = opendir("/mount/home");
while (...) {
    dirent = readdir(d);
    ...
}

Taking over more than one device

If we wanted our resource manager to handle multiple devices, the change is really quite simple. We would call resmgr_attach() for each device name we wanted to register. We would also pass in an attributes structure that was unique to each registered device, so that functions like chmod() would be able to modify the attributes associated with the correct resource.

Here are the modifications necessary to handle both /dev/sample1 and /dev/sample2:

/* 
 *  MOD [1]:  allocate multiple attribute structures,
 *            and fill in a names array (convenience)
 */

#define NumDevices  2
iofunc_attr_t     sample_attrs [NumDevices];
char              *names [NumDevices] =
{
    "/dev/sample1",
    "/dev/sample2"
};

main ()
{
    ...
    /*
     *  MOD [2]:  fill in the attribute structure for each device 
     *           and call resmgr_attach for each device           
     */
    for (i = 0; i < NumDevices; i++) {
        iofunc_attr_init (&sample_attrs [i],
                          S_IFCHR | 0666, NULL, NULL);
        pathID = resmgr_attach (dpp, &resmgr_attr, name[i],
                                 _FTYPE_ANY, 0,
                                 &my_connect_funcs,
                                 &my_io_funcs,
                                 &sample_attrs [i]);
    }
    ...
}                                    

The first modification simply declares an array of attributes, so that each device has its own attributes structure. As a convenience, we've also declared an array of names to simplify passing the name of the device in the for loop. Some resource managers (such as devc-ser8250) construct the device names on the fly or fetch them from the command line.

The second modification initializes the array of attribute structures and then calls resmgr_attach() multiple times, once for each device, passing in a unique name and a unique attribute structure.

Those are all the changes required. Nothing in our io_read() or io_write() functions has to change -- the iofunc-layer default functions will gracefully handle the multiple devices.

Handling directories

Up until this point, our discussion has focused on resource managers that associate each device name via discrete calls to resmgr_attach(). We've shown how to "take over" a single pathname. (Our examples have used pathnames under /dev, but there's no reason you couldn't take over any other pathnames, e.g. /MyDevice.)

A typical resource manager can take over any number of pathnames. A practical limit, however, is on the order of a hundred -- the real limit is a function of memory size and lookup speed in the process manager.

What if you wanted to take over thousands or even millions of pathnames?

The most straightforward method of doing this is to take over a pathname prefix and manage a directory structure below that prefix (or mountpoint).

Here are some examples of resource managers that may wish to do this:

And those are just the most obvious ones. The reasons (and possibilities) are almost endless.

The common characteristic of these resource managers is that they all implement filesystems. A filesystem resource manager differs from the "device" resource managers (that we have shown so far) in the following key areas:

  1. The _RESMGR_FLAG_DIR flag in resmgr_attach() informs the library that the resource manager will accept matches at or below the defined mountpoint.
  2. The _IO_CONNECT logic has to check the individual pathname components against permissions and access authorizations. It must also ensure that the proper attribute is bound when a particular filename is accessed.
  3. The _IO_READ logic has to return the data for either the "file" or "directory" specified by the pathname.

Let's look at these points in turn.

Matching at or below a mountpoint

When we specified the flags argument to resmgr_attach() for our sample resource manager, we specified a 0, implying that the library should "use the defaults."

If we specified the value _RESMGR_FLAG_DIR instead of 0, the library would allow the resolution of pathnames at or below the specified mountpoint.

The _IO_OPEN message for filesystems

Once we've specified a mountpoint, it would then be up to the resource manager to determine a suitable response to an open request. Let's assume that we've defined a mountpoint of /sample_fsys for our resource manager:

pathID = resmgr_attach
             (dpp,
             &resmgr_attr,
             "/sample_fsys",    /* mountpoint */
            _FTYPE_ANY,
             _RESMGR_FLAG_DIR,   /* it's a directory */
             &connect_funcs,
             &io_funcs,
             &attr);

Now when the client performs a call like this:

fopen ("/sample_fsys/spud", "r");

we receive an _IO_CONNECT message, and our io_open handler will be called. Since we haven't yet looked at the _IO_CONNECT message in depth, let's take a look now:

struct _io_connect {
    unsigned short  type;
    unsigned short  subtype;     /* _IO_CONNECT_*              */
    unsigned long   file_type;   /* _FTYPE_* in sys/ftype.h    */
    unsigned short  reply_max;
    unsigned short  entry_max;
    unsigned long   key;
    unsigned long   handle;
    unsigned long   ioflag;      /* O_* in fcntl.h, _IO_FLAG_* */
    unsigned long   mode;        /* S_IF* in sys/stat.h        */
    unsigned short  sflag;       /* SH_* in share.h            */
    unsigned short  access;      /* S_I in sys/stat.h          */
    unsigned short  zero;
    unsigned short  path_len;
    unsigned char   eflag;       /* _IO_CONNECT_EFLAG_*        */
    unsigned char   extra_type;  /* _IO_EXTRA_*                */
    unsigned short  extra_len;
    unsigned char   path[1];     /* path_len, null, extra_len  */
};

Looking at the relevant fields, we see ioflag, mode, sflag, and access, which tell us how the resource was opened.

The path_len parameter tells us how many bytes the pathname takes; the actual pathname appears in the path parameter. Note that the pathname that appears is not /sample_fsys/spud, as you might expect, but instead is just spud -- the message contains only the pathname relative to the resource manager's mountpoint. This simplifies coding because you don't have to skip past the mountpoint name each time, the code doesn't have to know what the mountpoint is, and the messages will be a little bit shorter.

Note also that the pathname will never have relative (. and ..) path components, nor redundant slashes (e.g. spud//stuff) in it -- these are all resolved and removed by the time the message is sent to the resource manager.

When writing filesystem resource managers, we encounter additional complexity when dealing with the pathnames. For verification of access, we need to break apart the passed pathname and check each component. You can use strtok() and friends to break apart the string, and then there's iofunc_check_access(), a convenient iofunc-layer call that performs the access verification of pathname components leading up to the target. (See the Library Reference page for the iofunc_open() for information detailing the steps needed for this level of checking.)


Note: The binding that takes place after the name is validated requires that every path that's handled has its own attribute structure passed to iofunc_open_default(). Unexpected behavior will result if the wrong attribute is bound to the pathname that's provided.

Returning directory entries from _IO_READ

When the _IO_READ handler is called, it may need to return data for either a file (if S_ISDIR (ocb->attr->mode) is false) or a directory (if S_ISDIR (ocb->attr->mode) is true). We've seen the algorithm for returning data, especially the method for matching the returned data's size to the smaller of the data available or the client's buffer size.

A similar constraint is in effect for returning directory data to a client, except we have the added issue of returning block-integral data. What this means is that instead of returning a stream of bytes, where we can arbitrarily package the data, we're actually returning a number of struct dirent structures. (In other words, we can't return 1.5 of those structures; we always have to return an integral number.)

A struct dirent looks like this:

struct dirent {
    ino_t           d_ino;
    off_t           d_offset;
    unsigned short  d_reclen;
    unsigned short  d_namelen;
    char            d_name [NAME_MAX + 1];
};

The d_ino member contains a mountpoint-unique file serial number. This serial number is often used in various disk-checking utilities for such operations as determining infinite-loop directory links. (Note that the inode value cannot be zero, which would indicate that the inode represents an unused entry.)

The d_offset member is typically used to identify the directory entry itself. For a disk-based filesystem, this value might be the actual offset into the on-disk directory structure.

Other implementations may assign a directory entry index number (0 for the first directory entry in that directory, 1 for the next, and so on). The only constraint is that the numbering scheme used must be consistent between the _IO_LSEEK message handler and the _IO_READ message handler.

For example, if you've chosen to have d_offset represent a directory entry index number, this means that if an _IO_LSEEK message causes the current offset to be changed to 7, and then an _IO_READ request arrives, you must return directory information starting at directory entry number 7.

The d_reclen member contains the size of this directory entry and any other associated information (such as an optional struct stat structure appended to the struct dirent entry; see below).

The d_namelen parameter indicates the size of the d_name parameter, which holds the actual name of that directory entry. (Since the size is calculated using strlen(), the \0 string terminator, which must be present, is not counted.)

So in our io_read handler, we need to generate a number of struct dirent entries and return them to the client. If we have a cache of directory entries that we maintain in our resource manager, it's a simple matter to construct a set of IOVs to point to those entries. If we don't have a cache, then we must manually assemble the directory entries into a buffer and then return an IOV that points to that.

Returning information associated with a directory structure

Instead of returning just the struct dirent in the _IO_READ message, you can also return a struct stat. Although this will improve efficiency, returning the struct stat is entirely optional. If you don't return one, the users of your device will then have to call the stat() function to get that information. (This is basically a usage question. If your device is typically used in such a way that readdir() is called, and then stat() is called, it will be more efficient to return both. See the documentation for readdir() in the Library Reference for more information.)

The extra struct stat information is returned after each directory entry:


Directory structure info


Returning the optional struct stat along with the struct dirent entry can improve efficiency.


Note: The struct stat must be aligned on an 8-byte boundary. The d_reclen member of the struct dirent must contain the size of both structures, including any filler necessary for alignment.

Message types

Generally, a resource manager receives these types of messages:

Connect messages

A connect message is issued by the client to perform an operation based on a pathname. This may be a message that establishes a longer term relationship between the client and the resource manager (e.g. open()), or it may be a message that is a "one-shot" event (e.g. rename()).

The library looks at the connect_funcs parameter (of type resmgr_connect_funcs_t -- see the Library Reference) and calls out to the appropriate function.

If the message is the _IO_CONNECT message (and variants) corresponding with the open() outcall, then a context needs to be established for further I/O messages that will be processed later. This context is referred to as an OCB (Open Control Block) -- it holds any information required between the connect message and subsequent I/O messages.

Basically, the OCB is a good place to keep information that needs to be stored on a per-open basis. An example of this would be the current position within a file. Each open file descriptor would have its own file position. The OCB is allocated on a per-open basis. During the open handling, you'd initialize the file position; during read and write handling, you'd advance the file position. For more information, see the section "The open control block (OCB) structure."

I/O messages

An I/O message is one that relies on an existing binding (e.g. OCB) between the client and the resource manager.

An an example, an _IO_READ (from the client's read() function) message depends on the client's having previously established an association (or context) with the resource manager by issuing an open() and getting back a file descriptor. This context, created by the open() call, is then used to process the subsequent I/O messages, like the _IO_READ.

There are good reasons for this design. It would be inefficient to pass the full pathname for each and every read() request, for example. The open() handler can also perform tasks that we want done only once (e.g. permission checks), rather than with each I/O message. Also, when the read() has read 4096 bytes from a disk file, there may be another 20 megabytes still waiting to be read. Therefore, the read() function would need to have some context information telling it the position within the file it's reading from, how much has been read, and so on.

The resmgr_io_funcs_t structure is filled in a manner similar to the connect functions structure resmgr_connect_funcs_t.

Notice that the I/O functions all have a common parameter list. The first entry is a resource manager context structure, the second is a message (the type of which matches the message being handled and contains parameters sent from the client), and the last is an OCB (containing what we bound when we handled the client's open() function).

Resource manager data structures

_resmgr_attr_t control structure

The _resmgr_attr_t control structure contains at least the following:

typedef struct _resmgr_attr {
    unsigned            flags;
    unsigned            nparts_max;
    unsigned            msg_max_size;
    int                 (*other_func)(resmgr_context_t *,
                                      void *msg);
    unsigned            reserved[4];    
} resmgr_attr_t;
nparts_max
The number of components that should be allocated to the IOV array.
msg_max_size
The size of the message buffer.

These members will be important when you start writing your own handler functions.

If you specify a value of zero for nparts_max, the resource manager library will bump the values to the minimum usable by the library itself. Why would you want to set the size of the IOV array? As we've seen in the Getting the resource manager library to do the reply section, you can tell the resource manager library to do our replying for us. We may want to give it an IOV array that points to N buffers containing the reply data. But, since we'll ask the library to do the reply for us, we need to use its IOV array, which of course would need to be big enough to point to our N buffers.

flags
Lets you change the behavior of the resource manager interface.
other_func
Lets you specify a routine to call in cases where the resource manager gets an I/O message that it doesn't understand. (In general, we don't recommend that you use this member. For more information, see the following section.) To attach an other_func, you must set the RESMGR_FLAG_ATTACH_OTHERFUNC flag.

If the resource manager library gets an I/O message that it doesn't know how to handle, it'll call the routine specified by the other_func member, if non-NULL. (If it's NULL, the resource manager library will return an ENOSYS to the client, effectively stating that it doesn't know what this message means.)

You might specify a non-NULL value for other_func in the case where you've specified some form of custom messaging between clients and your resource manager, although the recommended approach for this is the devctl() function call (client) and the _IO_DEVCTL message handler (server) or a MsgSend*() function call (client) and the _IO_MSG message handler (server).

For non-I/O message types, you should use the message_attach() function, which attaches a message range for the dispatch handle. When a message with a type in that range is received, the dispatch_block() function calls a user-supplied function that's responsible for doing any specific work, such as replying to the client.