ZFS root (builtin)

From SlackWiki
Revision as of 22:31, 12 September 2012 by Foobarz (talk | contribs)
Jump to navigation Jump to search

The first "ZFS root" wiki explains the details of running ZFS as your root filesystem by using a fully modular generic kernel approach. Now, I can share how it can be done with the SPL and ZFS modules built into the kernel. This procedure is just an example and can use some fine tuning, but here goes:

The steps below are to create a kernel with SPL and ZFS modules builtin. This kernel will be installed as an alternative kernel to boot in lilo, and it will have a separate kernel installation from the fully modular and working system kernel. This will allow testing the builtin kernel while able to boot back onto a working modular ZFS system. We start this procedure assuming you are on a working fully modular ZFS install as in the "ZFS root" wiki.

Use mountpoint=legacy so that zfs will not expect zfs mount, but instead expect standard mount for mounting zfs-root:

zfs set mountpoint=legacy zfs-root

Edit /etc/rc.d/rc.S and rc.6 to use regular mount commands, remove zfs commands. Edit rc.6 and remove or comment out zfs export command. These edits undo changes made in the "ZFS root" wiki, but it shows the two different ways of mounting a zfs filesystem.

You might think to set bootfs property like this:

zpool set bootfs=zfs-root zfs-root

However, this will result in ZFS placing many restrictions on zfs-root as per how that option is used on Solaris systems:

  1. zfs-root must be VTOC (SMI) label, not GPT (EFI) label
  2. zfs-root can only contain 1 vdev: either a plain disk, or a mirror

So, do NOT set that property! If bootfs is set, you would find that you cannot add vdevs to your zfs-root. If it is set already, then unset it like this:

zpool set bootfs= zfs-root

Make a directory to store files to be inside the initramfs:

mkdir /boot/initramfs-source

This directory will hold some files for the initial rootfs inside the kernel. Your /boot should be on a small ext4 partition or on a mounted USB stick that you use to boot.

Extract the slackware kernel-source package to obtain a copy of the kernel source, and make an initial kernel config for it:

cd ~
mkdir src
cd src
tar xvzf /mnt/cdrom/slackware64/k/kernel-source-*.txz
mv usr/src/linux-3.2.27 /usr/src/linux-3.2.27b
rm -r install
cd /usr/src/linux-3.2.27b
make menuconfig
  General setup->Local version - append to kernel release = b
  General setup->Default hostname                         = slackzfs
  General setup->Initramfs source files(s)                = /boot/initramfs-source
  Device Drivers->SCSI device support->SCSI low-level drivers-> <*> SYM53C8XX Version 2 SCSI support
  File systems -> <*> The Extended 4 (ext4) filesystem
  # build in any hard drive controllers and filesystems that you need for booting
  # then save the config
make prepare scripts

The kernel source has been set to have a "Local version" b. This will result in modules installed to /lib/modules/3.2.27b and keeps this kernel's source and modules seperate from slackware's kernel-generic package installation. The make prepare scripts command readies the kernel source as needed before we can install a copy of spl and zfs modules into the kernel source tree.

Now proceed to configure and copy spl and zfs into the kernel source tree:

cd ~/src
tar xvzf ~/spl-0.6.0-rc10.tar.gz
mkdir install
cd spl-0.6.0-rc10
./configure \
    --prefix=/ \
    --libdir=/lib64 \
    --includedir=/usr/include \
    --datarootdir=/usr/share \
    --enable-linux-builtin=yes \
    --with-linux=/usr/src/linux-3.2.27b \
    --with-linux-obj=/usr/src/linux-3.2.27b
wget https://raw.github.com/zfsonlinux/spl/master/copy-builtin
chmod +x copy-builtin
./copy-builtin /usr/src/linux-3.2.27b
make
make install DESTDIR=~/src/install
cd ~/src/install
makepkg ../spl-0.6.0rc10_3.2.27b-x86_64-1root.txz
cd ..
rm -r install

tar xvzf ~/zfs-0.6.0-rc10.tar.gz
mkdir install
cd spl-0.6.0-rc10
./configure \
    --prefix=/ \
    --libdir=/lib64 \
    --includedir=/usr/include \
    --datarootdir=/usr/share \
    --enable-linux-builtin=yes \
    --with-linux=/usr/src/linux-3.2.27b \
    --with-linux-obj=/usr/src/linux-3.2.27b \
    --with-spl=/root/src/spl-0.6.0-rc10
wget https://raw.github.com/zfsonlinux/zfs/master/copy-builtin
chmod +x copy-builtin
./copy-builtin /usr/src/linux-3.2.27b
make
make install DESTDIR=~/src/install
cd ~/src/install
makepkg ../zfs-0.6.0rc10_3.2.27b-x86_64-1root.txz
cd ..
rm -r install

At this point, the copy-builtin scripts have installed the configured spl and zfs modules source into the kernel source. The regular make menuconfig in the kernel source should now show options for spl and zfs in the kernel config.

As of this writing, the copy-builtin script places the spl and zfs modules into the kernel source in places where the modules may not initialize in the best order, after all lower-level storage drivers are initialized. If spl and zfs initialize too soon, there is a chance they may not be able to see all of the storage devices. The order of initialization (as seen in dmesg) of modules that are builtin to the kernel is the order that the modules are linked into the kernel during kernel compilation. The order of link depends on where the module source files are located in the kernel source tree. To ensure that spl and zfs will initialize after all hardware drivers, we move them into the drivers tree and after all other drivers:

cd /usr/src/linux-3.2.27b
mkdir drivers/zfsonlinux
mv spl drivers/zfsonlinux
vi Kconfig
  # remove references to spl
vi Makefile
  # remove references to spl

cd /usr/src/linux-3.2.27b/fs
mv zfs ../drivers/zfsonlinux
vi Kconfig
  # remove references to zfs
vi Makefile
  # remove references to zfs

cd /usr/src/linux-3.2.27b/drivers
vi Kconfig
  )# add line at end of menu, before "endmenu":
  )source "drivers/zfsonlinux/Kconfig"
  )endmenu

cd /usr/src/linux-3.2.27b/drivers
vi Makefile
  )# add line at very end of file:
  )obj-$(CONFIG_ZFSONLINUX) += zfsonlinux/

cd /usr/src/linux-3.2.27b/drivers/zfsonlinux
cat > Kconfig <<"EOF"
menuconfig ZFSONLINUX
	tristate "ZFSonLinux support"

if ZFSONLINUX

source "drivers/zfsonlinux/spl/Kconifg"

source "drivers/zfsonlinux/zfs/Kconifg"

endif
EOF

cd /usr/src/linux-3.2.27b/zfsonlinux
cat > Makefile <<"EOF"
obj-$(CONFIG_SPL) += spl/
obj-$(CONFIG_ZFS) += zfs/
EOF

This completes the move of the spl and zfs modules in the kernel source tree to be under drivers/zfsonlinux.

Now, in kernel configuration, you can build in the modules as follows:

cd /usr/src/linux-3.2.27b
make menuconfig
   Device Drivers ->
    <*> ZFSonLinux support ->
      <*> Solaris Porting Layer (SPL)
      <*>     ZFS

In the kernel config, we set /boot/initramfs-source as source location of the contents to be linked into kernel's builtin initramfs (rootfs). Now, we create some files that are needed to be in the initramfs:

cd /boot/initramfs-source

# make standard directories
mkdir -p proc dev sys mnt bin sbin etc/zfs

# zfs seems to want mtab present, even if empty
touch etc/mtab

# if zpool.cache file can be read by zfs at module init, it imports the pools in the cache
cp /etc/zfs/zpool.cache-initrd etc/zfs/zpool.cache

# make initial console device node; otherwise, there is a problem at startup:
# "Warning: unable to open an initial console" and you'd have problems with console and tty login
# making these nodes from within /init is not early enough to avoid problem
mknod dev/console c 5 1   # system console

# make memory device kmsg to "printk" kernel messages
# we can write to this file to send out messages
mknod dev/kmsg    c 1 11  # lines printed to kmsg enter kernel messages buffer

# recommended loop0 device to mount fs images
mknod dev/loop0   b 7 0

# make initial virtual terminal devices
mknod dev/tty     c 5 0   # current tty
mknod dev/tty0    c 4 0   # current virtual term
mknod dev/tty1    c 4 1   # login virtual term 1 (F1)

# make alternative console=ttyS0 standard 8250/16550 UART serial port devices
#   useful with kernel parameter console=ttyS0
#          with qemu -nographic -option-rom sgabios.bin,bootindex=0
mknod dev/ttyS0   c 4 64  # COM1
mknod dev/ttyS1   c 4 65  # COM2
mknod dev/ttyS2   c 4 66  # COM3
mknod dev/ttyS3   c 4 67  $ COM4

This should be enough device nodes initially in the initramfs. Once devtmpfs mounts over /dev inside initramfs, a lot more devices are available in it by default.

The initramfs needs an /init program that will be run automatically by the kernel as process id 1, the first program to run on the system. This program's task is to mount a root filesystem, switch it to be the new root (/), and then exec the init (/sbin/init) program that is on the new root so that the new root system is booted.

The common way to make /init is using busybox with files like these in /boot/initrd-tree or /boot/initramfs-source:

bin/busybox
bin/ash -> busybox
bin/sh -> busybox
# run busybox to see what other "applets" can be symlinked to it
# when busybox is run as a different symlinked applet name, it runs as the applet
init
# init can be an ash script, see slackware's /boot/initrd-tree/init for example

For example, as an ash script, slackware's init does the following:

  1. parse kernel parameters passed into it and set variables based on them
  2. start udevd and trigger rules for block devices to setup initial /dev devices
  3. load kernel modules and keyboard map
  4. run mdadm, cryptsetup, lvm, zpool/zfs to setup more devices (any udev rules for them should do more /dev setup in the background)
  5. mount the root filesystem read-only at /mnt
  6. stat /mnt/sbin/init, and if not executable run a rescue shell /bin/sh
  7. shutdown udevd so it can be restarted by root system when booted
  8. mount --move {proc,sys,run,dev} to under /mnt
  9. run: exec switch_root /mnt /sbin/init $RUNLEVEL

The slackware mkinitrd package is a fully configured BusyBox installation that makes an external initramfs image file, initrd.gz, that is passed to the kernel by the bootloader (lilo). It could be changed to be internal to the kernel by just using initrd-tree as the CONFIG_INITRAMFS_SOURCE in the kernel config and removing initrd in the lilo.conf.

The main purpose of most initramfs/initrd is to load and configure kernel modules and devices needed to mount a root filesystem. Because the spl/zfs modules are builtin, there isn't really a lot that needs to be done by an initramfs. What remains to be done is import the zfs-root pool, mount it, and exec the system /sbin/init. The kernel's "legacy" parameters root= and rootfstype= that are able to directly boot a root disk do not work with ZFS at this time because ZFS pools always need to be imported before it is possible to mount and boot on them. The automatic import that ZFS can do by using a cachefile (/etc/zfs/zpool.cache) stored in initramfs still does not work with the legacy boot parameters because at an eariler stage of the boot process, the kernel ignores a rootfs that does not contain an /init program, and/or the device type of an imported pool is not the device types recognized by the legacy boot as valid root devices.

You could just continue to use slackware's mkinitrd package as in the "ZFS root" wiki and it works the same whether the modules are builtin or not. But, by building the spl/zfs modules in, aren't we really wanting to get rid of using the mkinitrd/initrd.gz? We'd like to have a kernel with everything in it to boot, and use a simple lilo image entry that has no initrd.

We can do this, and not use the mkinitrd package, but we still need to finish our own initramfs /init program to import, mount, switch root, and boot. Save the following c program to "foobarz-init.c":

/* software product name: foobarz-init.c
 * suggested binary name: /init (in initramfs-source, rootfs)
 * license              : BSD
 * license text:
Copyright (c) 2012, foobarz
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of the <organization> nor the
      names of its contributors may be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

/* Your initramfs-source should contain the following
 *
 * cd /boot/initramfs-source
 * mkdir -p proc dev sys mnt bin sbin etc/zfs
 * touch etc/mtab
 * cp /etc/zfs/zpool.cache-initrd etc/zfs/zpool.cache
 * # zpool.cache is optional - zpool_import_ kernel params can be used instead
 * #  see below for details on using zpool_import
 * mknod dev/console c 5 1   # system console
 * mknod dev/kmsg    c 1 11  # lines printed to kmsg enter kernel messages buffer
 * mknod dev/loop0   b 7 0
 * mknod dev/tty     c 5 0   # current tty
 * mknod dev/tty0    c 4 0   # current virtual term
 * mknod dev/tty1    c 4 1   # login virtual term 1 (F1)
 * mknod dev/ttyS0   c 4 64  # COM1
 * mknod dev/ttyS1   c 4 65  # COM2
 * mknod dev/ttyS2   c 4 66  # COM3
 * mknod dev/ttyS3   c 4 67  $ COM4
 *
 * and this program compiled to /boot/initramfs-source/init
 *
 * Set kernel config option
 *  CONFIG_INITRAMFS_SOURCE=/boot/initramfs-source
 * to build the initramfs into your kernel image
 *  that also has builtin drivers (spl and zfs, etc).
 */

#define FOOBARZ_INIT_VERSION "1.1.1"
#define _BSD_SOURCE
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mount.h>
#include <sysexits.h>
#include <errno.h>
#include <string.h>
/* support for zpool import
 *
 * If -DINCLUDE_ZPOOL_IMPORT, then support to import a zpool is
 * enabled in the program. Enabling it will add many dependencies to the
 * compile and link; for example:
 *
 * gcc -static -DINCLUDE_ZPOOL_IMPORT \
 *    foobarz-init.c -include /usr/src/zfs-0.6.0-rc10/3.2.28/zfs_config.h \
 *   -o init \
 *   -I /usr/include/libspl -I /usr/include/libzfs \
 *   -lzfs -lnvpair -lzpool -luutil -luuid -lrt -lz -lm -lpthread \
 *   -I /usr/include/tirpc \
 *   -ltirpc
 *
 * Note that libtirpc is a drop-in replacement for the SunRPC functions that
 * used to be in glibc. No additional includes are needed, just the gcc -I and -l
 * options for tirpc.
 *
 * Otherwise, with -UINCLUDE_ZPOOL_IMPORT, the compile is just:
 * gcc -static foobarz-init.c -o init
 */
#if defined(INCLUDE_ZPOOL_IMPORT)
#include <libzfs.h>
#else
#include <sys/types.h>
#include <sys/stat.h>
#endif

#define PARAM_REQ_NO 0
#define PARAM_REQ_YES 1
#define PARAM_SRC_DEFAULT 0
#define PARAM_SRC_CMDLINE 1

void printk(char *fmt, ...) {
  FILE* f;
  int fd;
  va_list args;

  f = fopen("/dev/kmsg", "w");
  va_start(args, fmt);
  vfprintf(f, fmt, args);
  va_end(args);
  fflush(f);
  fclose(f);
  /* avoid flooding kmsg and having msgs suppressed; 20msgs/sec */
  usleep(50000);
}

int main(int argc, char* argv[]) {
 /*** variables */

 int i;
 int   fd = 0; /* file descriptor */
 unsigned long mountflags;

 /* kernel command line */
 off_t cmdline_size;
 char* cmdline; /* to be malloc 4096B */
 char* cmdline_end;
 char* temp_end;
 char* src_msg; /* default or cmdline */
 int flag_param_missing = 0;

 /* use to hold contents of a misc /proc/<file> */
 char* miscproc_buff; /* to be malloc 4096B */
 off_t miscproc_size;

 /* note about environ, argv, and kernel cmdline for init:
  *   environ is not defined for init
  *   only argv[0] is set for init
  *   kernel command line parameters are accessed
  *   at /proc/cmdline
  */

 /* kernel parameters expected to be name=value */
 /* do not use quotes or spaces in parameters   */
 /* you can add more params somwhere after root= */
 struct nv { char* n; char* v; char* v_end; int req; int src; };
 struct nv param[] = {
   { "root=",       NULL, NULL, PARAM_REQ_YES, PARAM_SRC_DEFAULT },
   { "rootfstype=", NULL, NULL, PARAM_REQ_YES, PARAM_SRC_DEFAULT },
   { "mountopt=",   NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "init=",       NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "runlevel=",   NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "console=",    NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT }
#if defined(INCLUDE_ZPOOL_IMPORT)
   ,
   { "zpool_import_name=",    NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "zpool_import_guid=",    NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "zpool_import_newname=", NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT },
   { "zpool_import_force=",   NULL, NULL, PARAM_REQ_NO , PARAM_SRC_DEFAULT }
#endif
 };
 enum {
	 iroot,
	 irootfstype,
	 imountopt,
	 iinit,
	 irunlevel,
	 iconsole,
#if defined(INCLUDE_ZPOOL_IMPORT)
	 izpool_import_name,
	 izpool_import_guid,
	 izpool_import_newname,
	 izpool_import_force,
#endif
	 ilastparam
 };

#if defined(INCLUDE_ZPOOL_IMPORT)
 libzfs_handle_t* libzfs = NULL;
 importargs_t iargs = { 0 };
 nvlist_t* pools = NULL;
 nvpair_t* pool = NULL;
 nvlist_t* config = NULL;
#endif

 /*** program */

 printk("foobarz-init, version %s: booting initramfs.\n", FOOBARZ_INIT_VERSION);

 cmdline       = (char*) malloc(4096);
 miscproc_buff = (char*) malloc(4096);
 if( (cmdline == NULL) || (miscproc_buff == NULL) ) {
   printk("Unable to allocate buffer memory: malloc: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 }

 /* mount proc /proc
  *  note: some /dev devices symlink into /proc
  *  proc contains info about processes, including cmdline etc. */
 printk("Attempting cmd: mount proc /proc\n");
 if( mount("proc", "/proc", "proc", 0, NULL) != 0 ) {
   printk("time to panic: mount: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 } else {
   printk("Mount proc successful.\n");
 }

 /* mount devtmpfs /dev
  *  note: This simple init program works if your root device is made from devices
  *  that are available by default in devtmpfs, such as /dev/sd*
  *
  *  For zfs, your root zfs pool should be created with default device nodes and
  *  then it should be mountable by this simple init program.
  *
  *  udev may be needed to configure device nodes and symlinks required
  *  to access a root device configuration made with such nodes and symlinks.
  *  If you need udevd, you can include it into your initramfs-source and
  *  modify this program to run it before attempting to mount your root device.
  *  However, if udevd is needed, a significant number of userspace programs may also be
  *  required by rules in /lib/udev/. You could install busybox + udev (about 5MB) or
  *  coreutils + util-linux + bash + udev (about 25MB) into initramfs-source. But, at that
  *  point you'd have ash or bash and many tools that are easier to use than this
  *  simple init program; it would then be easy to have /init as #!/bin/<b>ash script. */
 printk("Attempting cmd: mount devtmpfs /dev\n");
 if( mount("devtmpfs", "/dev", "devtmpfs", 0, NULL) != 0 ) {
   printk("time to panic: mount: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 } else {
   printk("Mount devtmpfs successful.\n");
 }

 /* mount sysfs /sys
  *  note: some kernel modules try to access /sys with userspace helpers to echo values into /sys variables;
  *  such modules expect a minimal userspace that contains coreutils or busybox */
 printk("Attempting cmd: mount sysfs /sys\n");
 if( mount("sysfs", "/sys", "sysfs", 0, NULL) != 0 ) {
   printk("time to panic: mount: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 } else {
   printk("Mount sysfs successful.\n");
 }

 /* process kernel command line */
 fd = open("/proc/cmdline", O_RDONLY);
 if( fd == -1 ) {
   printk("Cannot open /proc/cmdline: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 }
 /* note, on /proc fs:
  *       lseek likely always returns error
  *       stat likely always returns st_size = 0
  *   so determining size of /proc file means just reading it;
  *   you have to read /proc files according to their documented
  *   maximum sizes; this is probably for performance reasons */
 cmdline_size = read(fd, cmdline, 4095);
 if( cmdline_size == -1 ) {
   printk("Failed to read /proc/cmdline: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 }
 close(fd);

 /* cmdline may be newline + null terminated, but make it null + null */
 cmdline[cmdline_size] = '\0';
 if( cmdline[cmdline_size-1] == '\n' ) {
   cmdline[cmdline_size-1] = '\0';
   cmdline_size--;
   cmdline_end = cmdline + cmdline_size;
 }
 printk("Kernel cmdline size: %i\n", cmdline_size);
 printk("Kernel cmdline: \"%s\"\n", cmdline);

 /* find v and v_end of values in cmdline */
 for( i=iroot; i<ilastparam; i++ ) {
   param[i].v = strstr(cmdline, param[i].n);
   if( param[i].v != NULL ) {
     param[i].src = PARAM_SRC_CMDLINE;
     while( *(param[i].v) != '=' ) param[i].v++;
     param[i].v++;
     temp_end = param[i].v;
     while( !( (*temp_end == ' ') ||
               (*temp_end == '\n') ||
               (temp_end == cmdline_end)
	     ) ) temp_end++;
     if( temp_end == param[i].v ) {
       printk("Kernel parameter %s: value missing.\n", param[i].n);
       param[i].v = NULL;
     } else param[i].v_end = temp_end;
   }
 }

 for( i=iroot; i<ilastparam; i++ ) {
   /* terminate value strings */
   if( param[i].v_end != NULL ) *(param[i].v_end) = '\0';
   /* set defaults if no value on cmdline */
   if( param[i].v == NULL ) {
     param[i].src = PARAM_SRC_DEFAULT;
     if( param[i].req == PARAM_REQ_YES ) flag_param_missing = 1;
     switch(i) {
       case iroot      : param[i].v = "<missing required param>" ; break;
       case irootfstype: param[i].v = "<missing required param>" ; break;
       case imountopt  : param[i].v = "ro"        ; break;
       case iinit      : param[i].v = "/sbin/init"; break;
       case irunlevel  : param[i].v = "3"         ; break;
       case iconsole   : param[i].v = "console"   ; break;
       default         : param[i].v = NULL;
     }
   }
   if(param[i].src == PARAM_SRC_DEFAULT) src_msg = "default";
   else src_msg = "cmdline";
   printk("Using %s \"%s\" (source: %s)\n", param[i].n, param[i].v, src_msg);
 }

 if( flag_param_missing ) {
   printk("Aborting boot process: missing required kernel parameter(s).\n");
   return EX_USAGE;
 }

 /* generic nv pair kernel cmdline processing finished
  *  now, examine specific params for defaults and correctness */

 /* param[irootfstype]: can be checked against /proc/filesystems: */
 fd = open("/proc/filesystems", O_RDONLY);
 if( fd == -1 ) {
   printk("Cannot open /proc/filesystems: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 }
 miscproc_size = read(fd, miscproc_buff, 4095);
 if( miscproc_size == -1 ) {
   printk("Failed to read /proc/filesystems: %s\n", strerror(errno));
   return EX_UNAVAILABLE;
 }
 close(fd);
 if( strstr(miscproc_buff, param[irootfstype].v) == NULL ) {
   printk("%s \"%s\": filesystem type not available.\n", param[irootfstype].n, param[irootfstype].v);
   return EX_UNAVAILABLE;
 }

 /* zfs-specific */
 if( strcmp(param[irootfstype].v, "zfs") == 0 ) {
   if( access("/etc/zfs/zpool.cache", F_OK) == 0 )
     printk("rootfstype=%s: /etc/zfs/zpool.cache is present in initramfs.\n", param[irootfstype].v);
   else
     printk("rootfstype=%s: /etc/zfs/zpool.cache not present in initramfs.\n", param[irootfstype].v);

   if( access("/etc/hostid", F_OK) == 0 )
     printk("rootfstype=%s: /etc/hostid is present in initramfs.\n", param[irootfstype].v);
   else
     printk("rootfstype=%s: /etc/hostid not present in initramfs.\n", param[irootfstype].v);

#if defined(INCLUDE_ZPOOL_IMPORT)
   /* zpool import */
   if( (param[izpool_import_name].v != NULL) || (param[izpool_import_guid].v != NULL) ) {
	printk("zpool_import: import requested.\n");
	if( (param[izpool_import_name].v != NULL) && (param[izpool_import_guid].v != NULL) ) {
		printk("zpool_import: given both pool name and guid; using guid.\n");
		param[izpool_import_name].v = NULL;
	}
	if( param[izpool_import_name].v != NULL )
		printk("zpool_import: pool name: %s\n", param[izpool_import_name].v );
	else
		printk("zpool_import: pool guid: %s\n", param[izpool_import_guid].v );

	iargs.path = NULL;
	iargs.paths = 0;
	iargs.poolname = param[izpool_import_name].v;
	if( param[izpool_import_guid].v != NULL )
		iargs.guid = strtoull(param[izpool_import_guid].v, NULL, 10);
	else
		iargs.guid = 0;
	iargs.cachefile = NULL;
	if( (param[izpool_import_force].v != NULL) && (strcmp(param[izpool_import_force].v, "1") == 0) ) {
		iargs.can_be_active = 1;
		printk("zpool_import: import forced.\n");
	} else {
		iargs.can_be_active = 0;
		printk("zpool_import: import not forced.\n");
	}
	iargs.unique = 1;
	iargs.exists = 1;

	printk("zpool_import: init libzfs.\n");
	libzfs = libzfs_init();
	printk("zpool_import: searching for pool.\n");
	pools = zpool_search_import(libzfs, &iargs);
	if( (pools == NULL) || nvlist_empty(pools) )
		printk("zpool_import: pool not available for import, or already imported by cachefile.\n");
	else {
		printk("zpool_import: getting pool information.\n");
		pool = nvlist_next_nvpair(pools, pool);
		printk("zpool_import: getting pool configuration.\n");
		nvpair_value_nvlist(pool, &config);
		printk("zpool_import: attempting pool import.\n");
		if( zpool_import(libzfs, config, param[izpool_import_newname].v, NULL) != 0 ) {
			printk("zpool_import: import failed.\n");
			printk("zpool_import: error description: %s\n", libzfs_error_description(libzfs) );
			printk("zpool_import: error action: %s\n", libzfs_error_action(libzfs) );
		} else  printk("zpool_import: import successful.\n");
	}
	printk("zpool_import: fini libzfs.\n");
	libzfs_fini(libzfs);
   }
#endif /* zpool_import */
 }

 if(      strcmp(param[imountopt].v, "ro") == 0 ) mountflags = MS_RDONLY;
 else if( strcmp(param[imountopt].v, "rw") == 0 ) mountflags = 0;
 else {
   printk("%s \"%s\": invalid parameter value; defaulting to \"ro\".\n", param[imountopt].n, param[imountopt].v);
   mountflags = MS_RDONLY;
 }

 /* param[iroot]: nothing to check; if user gives bad root=device then mount fails */

 /* try to mount root=device at /mnt
  *
  * note: for zfs, if a copy of /etc/zfs/zpool.cache (when pool is imported) is put in initramfs-source, then
  * the zfs module can read it and automatically import the pools described in the cache file; the imported
  * pools can be available to mount here if they were created using standard device names, otherwise
  * udevd may be required to run before mounting the pool */
 printk("Attempting cmd: mount -t %s -o %s %s /mnt.\n", param[irootfstype].v, param[imountopt].v, param[iroot].v);
 if( mount(param[iroot].v, "/mnt", param[irootfstype].v, mountflags, NULL) != 0 ) {
  printk("time to panic: mount: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }
 printk("%s mounted successfully.\n", param[iroot].v);

 /* check to see if the mounted root filesystem has an executable init program */
 chdir("/mnt");
 if( access(param[iinit].v+1, X_OK) != 0 ) {
   chdir("/");
   printk("access X_OK: %s\n", strerror(errno));
   printk("The init program /mnt/%s is not present or not executable.\n", param[iinit].v+1);
   printk("Aborting boot process: no init program.\n");
   printk("Unmounting %s.\n", param[iroot].v);
   if( umount("/mnt") == -1 ) {
     printk("umount: %s\n", strerror(errno));
     printk("Failed to umount %s.\n", param[iroot].v);
   } else printk("Successfully unmounted %s.\n", param[iroot].v);
   return EX_UNAVAILABLE;
 }
 chdir("/");
 printk("Init program /mnt/%s is present and executable.\n", param[iinit].v+1);

 /* switch the root / from initramfs to the mounted new root device at /mnt.
  *
  * note: after this switch, it is not possible to access the initramfs files anymore,
  * yet they consume ram memory unless they are deleted here before switching.
  * Any programs that are run after clearing the initramfs and switching root must exist on the new root.
  * This program may safely delete itself (/init) since it is already in ram and executing.
  * If you have installed additional files and programs in initramfs that consume significant ram,
  * then you need to insert additional code here to delete those files (carefully). */

 /* delete files off of initramfs to free ram memory */
 printk("Freeing memory from initramfs...\n");
 if( unlink(argv[0]) != 0 ) printk("unlink %s: %s\n", argv[0], strerror(errno));
 else printk("%s %s", argv[0], "deleted from initramfs.\n");

 /* switch root */
 printk("Beginning switch root procedure.\n");

 printk("(1) Attempting cmd: mount --move /dev /mnt/dev \n");
 if( mount("/dev", "/mnt/dev", NULL, MS_MOVE, NULL) != 0 ) {
  printk("time to panic: mount: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(2) Attempting cmd: mount --move /proc /mnt/proc \n");
 if( mount("/proc", "/mnt/proc", NULL, MS_MOVE, NULL) != 0 ) {
  printk("time to panic: mount: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(3) Attempting cmd: mount --move /sys /mnt/sys \n");
 if( mount("/sys", "/mnt/sys", NULL, MS_MOVE, NULL) != 0 ) {
  printk("time to panic: mount: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(4) Attempting cmd: chdir /mnt \n");
 if( chdir("/mnt") != 0 ) {
  printk("time to panic: chdir: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(5) Attempting cmd: mount --move . / \n");
 if( mount(".", "/", NULL, MS_MOVE, NULL) != 0 ) {
  printk("time to panic: mount: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(6) Attempting cmd: chroot . \n");
 if( chroot(".") != 0 ) {
  printk("time to panic: chroot: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }

 printk("(7) Attempting cmd: chdir / \n");
 if( chdir("/") != 0 ) {
  printk("time to panic: chdir: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }
 printk("Completed switch root procedure.\n");

 /* check for "console=" kernel parameter and switch
  *  stdin, stdout, and stderr to named console device
  */
 if( param[iconsole].src == PARAM_SRC_CMDLINE ) {
   printk("Console redirection to device %s requested.\n", param[iconsole].v);
   /* expect only basename of console device (e.g., ttyS0), so chdir /dev */
   chdir("/dev");
   if( access(param[iconsole].v, F_OK ) == 0 ) {
     printk("Opening stdin, stdout, and stderr on %s.\n", param[iconsole].v);
     close(0);
     open(param[iconsole].v, O_RDWR);
     dup2(0, 1);
     dup2(0, 2);
   } else {
     printk("access F_OK: %s\n", strerror(errno));
     printk("Could not access device: %s!\n", param[iconsole].v);
     printk("Console redirection to device %s aborted!\n", param[iconsole].v);
   }
   chdir("/");
 }

 printk("Execing: \"%s %s\" to boot mounted root system.\n", param[iinit].v, param[irunlevel].v);

 /* free resources held to this point */
 free(cmdline);
 free(miscproc_buff);

 if( execl(param[iinit].v, param[irunlevel].v, (char *) NULL ) != 0 ) {
  printk("time to panic: execl: %s\n", strerror(errno));
  return EX_UNAVAILABLE;
 }
}

If foobarz-init is updated, you can find the latest version at github.

The foobarz-init program can be compiled two ways as of v1.1.1. First, the simpler compile that does not support zpool_import but does support importing pools using a /etc/zfs/zpool.cache file in the initramfs:

gcc --static init.c -o init
strip init

The second way to compile it adds support for zpool_import so that you do not have to include a /etc/zfs/zpool.cache file in the initramfs:

gcc -static -DINCLUDE_ZPOOL_IMPORT \
     foobarz-init.c -include /usr/src/zfs-0.6.0-rc10/3.2.28/zfs_config.h \
    -o init \
    -I /usr/include/libspl -I /usr/include/libzfs \
    -lzfs -lnvpair -lzpool -luutil -luuid -lrt -lz -lm -lpthread \
    -I /usr/include/tirpc \
    -ltirpc
strip init

Dependencies required before you can compile this second way:

  • Install libtirpc. The glibc package has removed the RPC-related functions from the static libc. The libtirpc is the new replacement as a separate library. You can make your own libtirpc package and install it to resolve these RPC-related functions. Without this, the compile fails because it cannot find many "xdr" functions.
  • Reinstall the util-linux package with static libraries enabled. Edit the util-linux.SlackBuild and remove the line --disable-static, then build it and use upgradepkg to install the new package. This resolves linking static to libuuid.
  • You may need to install the spl and zfs packages configured for --enable-linux-builtin=yes before compiling and linking to libzfs to make sure you are using the version of libzfs intended for builtin modules in case a difference between the two configurations exists. Currently, there seems to be no difference in the libzfs for either modular or builtin modules, but later there could be at least a small difference for detection of modules to load or builtin already. You may see a message that the ZFS modules stack load failed, but it can be ignored since the modules are builtin; this is just libzfs trying to load the modules even though they are builtin.

With these dependencies, you can probably compile static with the zpool_import support. The exact location of "zfs_config.h" may vary depending on your zfs and kernel versions, so adjust accordingly.

Read the foobarz-init source code to see how it uses kernel parameters. The params root and rootfstype are always required to tell init what root pool name to mount and the filesystem type (e.g. zfs).

For the zpool_import support: zpool_import_name=poolname, or zpool_import_guid=<guid of a pool> tells what pool to import. You can also import the pool with a new different name by also using zpool_import_newname=newpoolname. If the import is not working, zpool_import_force=1 can be used to force import even if ZFS thinks the pool may be in use by another system.

However you import the pool, either using zpool.cache or zpool_import support, the kernel param root=<your poolname or newpoolname> and rootfstype=zfs needs to be given also. The zpool_import only imports, it does not say what to mount. What to mount is still told using root= and rootfstype=.

Now copy the compiled static program into the initramfs:

cp init /boot/initramfs-source/init

Build and install kernel:

cd /usr/src/linux-3.2.27b
make -j8
make -j8 modules_install
cp arch/x86/boot/bzImage /boot/vm3.2.27b
cp System.map /boot/System.map-vm3.2.27b
# add new lilo menu entry for vm3.2.27b kernel
vi /etc/lilo.conf
  image = /boot/vm3.2.27b
   label = vm3.2.27b
   addappend = " spl.spl_hostid=0x007f0100 zfs.spa_config_path=/etc/zfs/zpool.cache root=zfs-root rootfstype=zfs"
lilo
reboot

Congrats! If this works, you can select the "vm3.2.27b" lilo entry to boot. If you are using the zpool_import support in the init program, you need to add at least the zpool_import_name=zfs-root param to the addappend line in lilo.conf.

Notes on kernel params (lilo append and addappend lines): root= and rootfstype= are used by both foorbarz-init and mkinitrd. The "ro" param is not used by foobarz-init, so instead use: mountopt=ro|rw. See "modinfo spl" and "modinfo zfs" to see all of the options those modules can take on the kernel command line. Notice the spl.spl_hostid=0x007f0100 param that works around the hostid mismatch problem while trying to import a pool; this param and hostname/ip set to use local loopback 127.0.0.1 in etc/hosts etc/HOSTNAME in initramfs will work most of the time, and if not the pool import needs to be forced.

Notes on the spl and zfs packages to install: If you have errors using zfs and zpool commands on booted system with builtin modules, then upgrade/switch to the 0.6.0rc10_3.2.27b builds, or you can make a custom package with the binaries renamed like zpoolb, zfsb etc for builtin.


Using qemu's -nographic serial console (/usr/share/qemu/sgabios.bin): If you have a boot problem, you might want to run in qemu's -nographic console mode:

First, edit /etc/inittab, and uncomment the line for local serial ports runlevel "s1" (ttyS0) for serial console login support. This is done in the guest (host does not need this):

# Local serial lines:
s1:12345:respawn:/sbin/agetty -L ttyS0 38400 linux
#s1:12345:respawn:/sbin/agetty -L ttyS0 9600 vt100
#s2:12345:respawn:/sbin/agetty -L ttyS1 9600 vt100

Second, edit /etc/securetty, and uncomment ttyS0 line:

ttyS0
#ttyS1
# ...

You cannot login on ttyS0 without this change even though you get the login prompt!

Then, run qemu such as:

qemu-kvm <all regular options> -nographic -option-rom sgabios.bin,bootindex=0

Wait for the lilo prompt, it may take several seconds show up in this mode.

In the lilo menu, select your kernel image with builtin spl/zfs, and use kernel params such as:

boot: vm3.2.27b console=ttyS0

At the lilo prompt, use kernel parameter: console=ttyS0 (do NOT say /dev/ttS0). Passing runlevel=s1 does not appear to be necessary.

When you are in qemu's serial mode, the interface is a little different than in the normal graphical mode: use keypresses:

ctrl-a h    # for help
ctrl-a c    # toggle access to (QEMU) console / back to emulation

If you cannot get a login: prompt, try ctrl-a c, then do (QEMU) sendkey ctrl-alt-delete. This will reboot cleanly back to lilo, then you can do (QEMU) quit.

Once you login to the guest, the terminal size needs adjustment. On your host, in another xterm sized the same as your guest console terminal, get the size of the terminal:

echo $LINES
echo $COLUMNS

Now, back in the guest terminal of same size:

export LINES <lines>
export COLUMNS <columns>
stty cols $COLUMNS rows $LINES
exec bash
reset

So, you set the environment variables and also set them with stty. Then, restart bash (exec, or else it is a child shell process).

BECAREFUL not to start qemu twice on the same ZFS guest! Two simultaneous running qemu on the same ZFS will corrupt the pool and it will NOT recover - you lose your whole installation! Before running qemu, check ps -A, look for an already running qemu process incase it somehow entered into the background!

BECAREFUL when using -nographic ttyS0 that you are not doing commands on the HOST! Only use -nographic to see kernel problems. Use regular console if booting is working normally.

To use the qemu -drive if=scsi option, the linux module sym53c8xx should be loaded in the guest. You may need to use the qemu option:

-option-rom 8xx_64.rom,bootindex=1

The file 8xx_64.rom used to be inside qemu, but in new versions it is missing. You can download it:

wget http://www.lsi.com/downloads/Public/Host%20Bus%20Adapters/Host%20Bus%20Adapters%20Common%20Files/lsi_bios.zip

Extract the zip and place 8xx_64.rom in /usr/share/qemu/.

Good luck!