Appendix: Advanced Qnet Topics

This appendix includes:

Low-level discussion on Qnet principles

The Qnet protocol extends interprocess communication (IPC) transparently over a network of microkernels. This is done by taking advantage of the Neutrino's message-passing paradigm. Message passing is the central theme of Neutrino that manages a group of cooperating processes by routing messages. This enhances the efficiency of all transactions among all processes throughout the system.

As we found out in the "How does it work?" section of the Transparent Distributed Processing Using Qnet chapter, many POSIX and other function calls are built on this message passing. For example, the write() function is built on the MsgSendv() function. In this section, you'll find several things, e.g. how Qnet works at the message passing level; how node names are resolved to node numbers, and how that number is used to create a connection to a remote node.

In order to understand how message passing works, consider two processes that wish to communicate with each other: a client process and a server process. First we consider a single-node case, where both client and server reside in the same machine. In this case, the client simply creates a connection (via ConnectAttach()) to the server, and then sends a message (perhaps via MsgSend()).

The Qnet protocol extends this message passing over to a network. For example, consider the case of a simple network with two machines: one contains the client process, the other contains the server process. The code required for client-server communication is identical (it uses same API) to the code in the single-node case. The client creates a connection to the server and sends the server a message. The only difference in the network case is that the client specifies a different node descriptor for the ConnectAttach() function call in order to indicate the server's node. See the diagram below to understand how message passing works.

Message passing

Note: Each node in the network is assigned a unique name that becomes its identifier. This is what we call a node descriptor. This name is the only visible means to determine whether the OS is running as a network or as a standalone operating system.

Details of Qnet data communication

As mentioned before, Qnet relies on the message passing paradigm of Neutrino. Before any message pass, however, the application (e.g. the client) must establish a connection to the server using the low-level ConnectAttach() function call:

ConnectAttach(nd, pid, chid, index, flags);

In the above call, nd is the node descriptor that identifies each node uniquely. The node descriptor is the only visible means to determine whether the Neutrino is running as a network or as a standalone operating system. If nd is zero, you're specifying a local server process, and you'll get local message passing from the client to the server, carried out by the local kernel as shown below:

Message passing in the same machine

When you specify a nonzero value for nd, the application transparently passes message to a server on another machine, and connects to a server on another machine. This way, Qnet not only builds a network of trusted machines, it lets all these machines share their resources with little overhead.

Message passing in two different machines

The advantage of this approach lies in using the same API. The key design features are:

These features maximize performance for large payloads and minimize turnaround time for small packets.

Node descriptors

The <sys/netmgr.h> header file

The <sys/netmgr.h> header defines the ND_LOCAL_NODE macro as zero. You can use it any time that you're dealing with node descriptors to make it obvious that you're talking about the local node.

As discussed, node descriptors represent machines, but they also include Quality of Service information. If you want to see if two node descriptors refer to the same machine, you can't just arithmetically compare the descriptors for equality; use the ND_NODE_CMP() macro instead:

This is similar to the way that strcmp() and memcmp() work. It's done this way in case you want to do any sorting that's based on node descriptors.

The <sys/netmgr.h> header file also defines the following networking functions:


int netmgr_strtond(const char *nodename, char **endstr);

This function converts the string pointed at by nodename into a node descriptor, which it returns. If there's an error, netmgr_strtond() returns -1 and sets errno. If the endstr parameter is non-NULL, netmgr_strtond() sets *endstr to point at the first character beyond the end of the node name. This function accepts all three forms of node name -- simple, directory, and FQNN (Fully Qualified NodeName). FQNN identifies a Neutrino node using a unique name on a network. The FQNN consists of the nodename and the node domain.


int netmgr_ndtostr(unsigned flags, 
                   int nd, 
                   char *buf, 
                   size_t maxbuf);

This function converts the given node descriptor into a string and stores it in the memory pointed to by buf. The size of the buffer is given by maxbuf. The function returns the actual length of the node name (even if the function had to truncate the name to get it to fit into the space specified by maxbuf), or -1 if an error occurs (errno is set).

The flags parameter controls the conversion process, indicating which pieces of the string are to be output. The following bits are defined:

Show or hide the network directory portion of the string. If you don't set either of these bits, the string includes the network directory portion if the node isn't in the default network directory.
Show or hide the quality of service portion of the string. If you don't specify either of these bits, the string includes the quality of service portion if it isn't the default QoS for the node.
Show or hide the node name portion of the string. If you don't specify either of these bits, the string includes the name if the node descriptor doesn't represent the local node.
Show or hide the node domain portion of the string. If you don't specify either of these bits, and a network directory portion is included in the string, the node domain is included if it isn't the default for the output network directory. If you don't specify either of these bits, and the network directory portion isn't included in the string, the node domain is included if the domain isn't in the default network directory.

By combining the above bits in various combinations, all sorts of interesting information can be extracted, for example:

A name that's useful for display purposes.
A name that you can pass to another node and know that it's referring to the same machine (i.e. the FQNN).
The default network directory.
The default Quality of Service for the node.


int netmgr_remote_nd(int remote_nd, int local_nd);

This function takes the local_nd node descriptor (which is relative to this node) and returns a new node descriptor that refers to the same machine, but is valid only for the node identified by remote_nd. The function can return -1 in some cases (e.g. if the remote_nd machine can't talk to the local_nd machine).

Booting over the network


Unleash the power of Qnet to boot your computer (i.e. client) over the network! You can do it when your machine doesn't have a local disk or large flash. In order to do this, you first need the GRUB executable. GRUB is the generic boot loader that runs at computer startup and is responsible for loading the OS into memory and starting to execute it.

During booting, you need to load the GRUB executable into the memory of your machine, by using:

Neutrino doesn't ship GRUB. To get GRUB:

  1. Go to website.
  2. Download the GRUB executable.
  3. Create a floppy or CD with GRUB on it, or put the GRUB binary on the server for downloading by a network boot ROM.

Here's what the PXE boot ROM does to download the OS image:

Here's an example to show the different steps to boot your client using PXE boot ROM:

Creating directory and setting up configuration files

Create a new directory on your DHCP server machine called /tftpboot and run make install. Copy the pxegrub executable image from /opt/share/grub/i386-pc to the /tftpboot directory.

Modify the /etc/dhcpd.conf file to allow the network machine to download the pxegrub image and configuration menu, as follows:

# dhcpd.conf
# Sample configuration file for PXE dhcpd

subnet netmask {
  option broadcast-address;
  option domain-name-servers;

# Hosts which require special configuration options can be listed in
# host statements.   If no address is specified, the address will be
# allocated dynamically (if possible), but the host-specific information
# will still come from the host declaration.

host testpxe {
  hardware ethernet 00:E0:29:88:0D:D3;         # MAC address of system to boot
  fixed-address;                   # This line is optional
  option option-150 "(nd)/tftpboot/menu.1st";  # Tell grub to use Menu file
  filename "/tftpboot/pxegrub";                # Location of PXE grub image
# End dhcpd.conf

Note: If you're using an ISC 3 DHCP server, you may have to add a definition of code 150 at the top of the dhcpd.conf file as follows:
option pxe-menu code 150 = text;

Then instead of using option option-150, use:

option pxe-menu "(nd)/tftpboot/menu.1st";)

Here's an example of the menu.1st file:

# menu.1st start

default 0                             # default OS image
to load
timeout 3                             # seconds to pause
before loading default image
title Neutrino Bios image             # text displayed in menu
kernel (nd)/tftpboot/bios.ifs         # OS image
title Neutrino ftp image              # text for second OS image
kernel (nd)/tftpboot/ftp.ifs          # 2nd OS image (optional)

# menu.1st end

Building an OS image

In this section, there is a functional buildfile that you can use to create an OS image that can be loaded by GRUB without a hard disk or any local storage.

Create the image by typing the following:

$ mkifs -vvv build.txt build.img
$ cp build.img /tftpboot

Here is the buildfile:

[virtual=x86,elf +compress] boot = {
    PATH=/proc/boot:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin LD_LIBRARY_PATH=/proc/boot:/lib:/usr/lib:/lib/dll  procnto

[+script] startup-script = {
    procmgr_symlink ../../proc/boot/ /usr/lib/

    # do magic required to set up PnP and pci bios on x86
    display_msg Do the BIOS magic ...
    waitfor /dev/pci

    # A really good idea is to set hostname and domain
    # before qnet is started
    setconf _CS_HOSTNAME aboyd
    setconf _CS_DOMAIN

    # If you do not set the hostname to something
    # unique before qnet is started, qnet will try
    # to create and set the hostname to a hopefully
    # unique string constructed from the ethernet
    # address, which will look like EAc07f5e
    # which will probably work, but is pretty ugly.

    # start io-net, network driver and qnet
    # NB to help debugging, add verbose=1 after -pqnet below
    display_msg Starting io-net and speedo driver and qnet ...
    io-net -dspeedo -pqnet

    display_msg Waiting for ethernet driver to initialize ...
    waitfor /dev/io-net/en0 60

    display_msg Waiting for Qnet to initialize ...
    waitfor /net 60

    # Now that we can fetch executables from the remote server
    # we can run devc-con and ksh, which we do not include in
    # the image, to keep the size down
    # In our example, the server we are booting from
    # has the hostname qpkg and the SAME domain:
    # We clean out any old bogus connections to the qpkg server
    # if we have recently rebooted quickly, by fetching a trivial
    # executable which works nicely as a sacrificial lamb
    # now print out some interesting techie-type information
    display_msg hostname:
    getconf _CS_HOSTNAME
    display_msg domain:
    getconf _CS_DOMAIN
    display_msg uname -a:
    uname -a

    # create some text consoles
    display_msg .
    display_msg Starting 3 text consoles which you can flip
    display_msg between by holding ctrl alt + OR ctrl alt -
    display_msg .
    devc-con -n3
    waitfor /dev/con1

    # start up some command line shells on the text consoles
    reopen /dev/con1
    [+session] TERM=qansi HOME=/ PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin:/proc/boot ksh &

    reopen /dev/con2
    [+session] TERM=qansi HOME=/ PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin:/proc/boot ksh &

    reopen /dev/con3
    [+session] TERM=qansi HOME=/ PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin:/proc/boot ksh &

    # startup script ends here

# Lets create some links in the virtual file system so that
# applications are fooled into thinking there's a local hard disk

# Make /tmp point to the shared memory area
[type=link] /tmp=/dev/shmem

# Redirect console (error) messages to con1
[type=link] /dev/console=/dev/con1

# Now for the diskless qnet magic.  In this example, we are booting
# using a server which has the hostname qpkg.  Since we do not have
# a hard disk, we will create links to point to the servers disk
[type=link] /bin=/net/qpkg/bin
[type=link] /boot=/net/qpkg/boot
[type=link] /etc=/net/qpkg/etc
[type=link] /home=/net/qpkg/home
[type=link] /lib=/net/qpkg/lib
[type=link] /opt=/net/qpkg/opt
[type=link] /pkgs=/net/qpkg/pkgs
[type=link] /root=/net/qpkg/root
[type=link] /sbin=/net/qpkg/sbin
[type=link] /usr=/net/qpkg/usr
[type=link] /var=/net/qpkg/var
[type=link] /x86=/

# these are essential shared libraries which must be in the
# image for us to start io-net, the ethernet driver and qnet

# copy code and data for all following executables
# which will be located in /proc/boot in the image


# uncomment this for debugging
# getconf

Booting the client

With your DHCP server running, boot the client machine using the PXE ROM. The client machine attempts to obtain an IP address from the DHCP server and load pxegrub. If successful, it should display a menu of available images to load. Select your option for the OS image. If you don't select any available option, the BIOS image is loaded after 3 seconds. You can also use the arrow keys to select the downloaded OS image.

If all goes well, you should now be running your OS image.


If the boot is unsuccessful, troubleshoot as follows:

Make sure your:

What doesn't work ...