This is an interesting project in itself. This involves compiling your own kernel as well as a root file system. Fortunately, the BOOTDISK-HOWTO, gives you a head start. However, it is by no means an end. You will learn a lot of things trying to get your system to behave just like you want. For example, you will learn that unless you have /dev/tty, you will not be able to ssh out of your machine. Similarly unless you have /dev/pts and /dev/ptmx, you will not be able to ssh into your machine.
Since linux (and all modern OSes) require parts of the root file system to be writable, booting into linux off a CD is a non-trivial task. Basically, we store the compressed kernel image and a compressed file system in two files on the CD. Then we ask isolinux, (through isolinux/isolinux.cfg), to boot the kernel image using the compressed file system as the initial ram disk. For this to work the kernel should have RAMdisk as well as initrd (initial ram disk support). Basically, what happens is this: The kernel code, when executed, uncompresses itself, as well as the RAMdisk (which should be loaded into the memory by ISOlinux), and proceeds as usual. Since the file system is in memory, it is writable.
The Bootdisk-HOWTO page describes the process of creating the root filesystem as well as compiling the kernel. Since the kernel as well as the compressed root file system are stored on the CD (and not inside any floppy image), we don't have to worry much about space constraints. A compressed kernel size of about 900K, and a root filesystem of 16MB is enough for our purposes. When compiling the kernel, you might want to make modules for things which you don't need at boot time. For example, if you want MSDOS file system support, you can compile that as a module. On the other hand, ext2 file system support has to be compiled into the kernel as the root file system is ext2 (you can also choose minix).
Even though many CD-based linuxes use busyboxinit, we use sysvinit. No specific reason. Since we use sysvinit on regular linux installs, I am more comfortable with that. One feature of sysvinit (I dont know if busyboxinit has that), is that all arguments passed to the kernel, which the kernel and init dont understand are passed through as environment variables. So passing a "network=dhcp" to the kernel as an additional parameter will result in the environment variable $network to have the value dhcp. This functionality is exploited to its fullest, in this CD.
The root file system should have the basic utilities and the cdrom can have some of the more esoteric utilities. Some of the big utilities available in the cdrom are awk, python and rdistd. We use a python script, to send email from this client. So even if we are unable to mount any network file systems, we can email a text file. We use rdistd to deploy a linux image on this client.
Another feature of this CD is that it does not mount the CD if it does not need to. Suppose, you need to install linux on 20 machines, using this CD. Since the CD is not mounted, you can eject the CD, once the OS boots up, and use the same CD on another machine. So, you dont need 20 CDs to install linux on 20 machines. On the other hand, if you need to trouble shoot a system, then we will mount the CD, so that we have access to all the utilities available on the CD.
Once you have a network aware Linux, this is trivial. You just need to dd the hard disk device to a remote file. Restoring is just as trivial, dd from the remote file to the hard disk device. Ofcourse to save space, you can compress the image. If you are more space conscious, you can use part-image, a utility which does just that but understands file systems, and hence does not backup unused sectors.
Lets see what happens, when a linux image on this CD is chosen. First, each image results in booting into the same kernel/root file system combination, but with different boot time arguments. For, example, the prep image resolves to the arguments
root=/dev/ram0 initrd=rootfs.img image=fdisk mountcd=no network=dhcp
logfile=/dev/tty4 utils=utils baseurl=http://server/dir/name
This basically instructs the init scripts, not to mount the CDROM, and to initialise the network card using DHCP. Then once all the init scripts have been completed, it downloads and executes the script, $baseurl/$utils, (which downloads additional binaries if need be), and then downloads and executed the script $baseurl/$image. If network is not available, then it executes /cdrom/scripts/$image failing which it tries /scripts/$image. If all fails, it drops you into a shell. All the scripts, echo useful debugging information to $logfile. So the logfile=/dev/tty4, allows one to see this information on the virtual console. Similarly, all kernel log messages are sent to /dev/tty5, as well as /var/log/syslog.log.
The fdisk script, downloads the main script (in this case fdisk.sh) as well as all the other scripts which fdisk.sh depends on, and then runs the actual fdisk.sh script. This script, asks the user some questions regarding how the partitioning should happen, and then partitions the hard disk, and formats the partition. Then it identifies the hardware and then creates a couple of scripts to help the installation of the OSes.
In case of the linstall, it resolves to the same thing as prep, except that the value of image is linstall. Again linstall, downloads the actual script and its dependencies and runs the main script. The main script, identifies the different partitions, gets the information which fdisk let for it to find, and starts creating the system files (eg. lilo.conf, fstab, XF86Config,...) based on the information gathered and the template system files which it downloaded. Once this is done, the user connects to the rdist server, and initiates an rdist to this new machine. Once that is done, the main script, ensures the files needed to boot are present (kernel, etc/fstab,lilo.conf,...) and then runs lilo to install the master boot record.