May 2006


A core fundamental aspect of UNIX is the concept of having programs that are well define and do one thing really well. Beyond this core concept is the idea of modular programs. Each module is well defined, has a defined API (application programming interface) and can integrate smoothly with other modules.

By having well defined modules with well defined application interfaces, it provides developers an easy way to quickly get up-to-speed on a given aspect of a complex application, extend functionality and not have it inadvertantly impact other attributes of the system.

Today a FOSS system is almost exclusively modular. For example, my FreeBSD desktop is running on the FreeBSD Kernel which uses extensive device and service modules that compile into the kernel to provide functionality (interfacing with hardware, pseudo devices, firewalls, etc..). On top of the kernel is many of the traditional UNIX progarms which all do one thing really well (allows me to chain programs together to quickly build complex, adhoc tool chains).

In addition to this is the Perl scripting language which provides a HUGE library of pluggable classes and source codes from a central repository called CPAN to allow me to insert well defined source code libraries that can allow me to quickly build customized applications (such as generating reports, doing batch processing, system adminsitration tasks and so forth).

My web server is built on Apache which uses extensive modules to add functionality. This includes the PHP scripting language that utilizes PEAR (similar to CPAN) to provide a rich library of pluggable source code tools into my own scripts. Some of these scripts, such as the Typo3 Content Management System is exclusively modular so any aspect can be pulled and replaced by another module and new modules can be easily integrated into the system.

My mail server is built around Postfix, a modular design that has many distinct programs that handle all aspects of the mail delivery process. This taps into a chain of programs that each perform a well defined, specific task (spam filter, virus filtering, etc..).

My graphical interface is built on the modular X.org X11R7. Providing a modular base allows me ot plug-in various modules to customize its capabilities for my system. On top of the X Windows system is KDE, a modular desktop environment that uses a huge array of modular technologies such as KParts and KIO Slaves which allows graphical application developers to pull functionality from other applications and merge them together — quickly and effortessly. Kontact, Konqueror, Quanta, and Kate are all examples of applications that extensively use this modular approach.

So whats the big deal? For one thing, by having a modular system, it is possible to pull out different elements of the system and replace them with other modules without impacting the whole system. There are many examples of this happening.

Open source operating systems are regularly ported to new hardware platforms. When 64bit AMD chips originally came out, Linux and BSD, due to their modular architecture were early adopters of the technology. Sun Microsystems recently releaesd the UltraSparc T1 processor which manages to provide exceptional speed and performance in server tasks. Linux and BSD were very early adopters of this technology and quickly were able to have extensive functionality recompiled and optimized for this new platform. The well defined nature of each layer of software allows a developer to focus exclusively on a given layer that needs to be adjusted.

Modularlity also works to scale up or down a particular system. Linux and BSD are both used in low-power, embedded applications such as cell phones, wireless routers, hardware firewalls, etc. Given the modular approach, a person can easily remove unneeded functionality easily and without worry of adversely impacting other attributes of the system.

Modularity also works to completely replace one of the key elements of a system. When the XFree86 development team got stagnant and did not meet the needs of the community, the X.org project was formed and was able to replace XFree86 as the X Windows implimentation without impacting higher or lower levels of the system. The sendmail mail server, dating back to the early 1980’s can easily be replaced (if desired) by Postfix or other mail servers without requiring other applications to know what mail server they are talking with (due to common APIs and open standards).

A very healthy side-effect of the modular approach is security issues. When a security issue arises, it is encapsulated in a module. As the module has well defined input and output characteristics, it is possible to quickly fix the security issue with minimal testing required to verify the issue was addressed fully. A less modular system would require extensive testing and may ultimately fail to deliver a security patch that is fully effective.

The way free, open source software is developed makes modular design an almost mandatory requirement. Without a modular archtecture, it would require extensive up-front learning for developers to assist in enhancing a piece of software. This would lead to programming errors and lack of commitment to contribute. Fortunately, this intial focus on design (due to the FOSS model of development) has made a lot of sense for all facets of the computing platform - ease of hardware migration, ease of security patching/release, ease of bug fixing, ease of continued development, ease of troubleshooting, ease of module replacement and so forth.

I must say, ZFS is truly impressing me. Check out this ZFS Presentation. It explains a lot of what ZFS is all about.. its truly revolutionary in file system design.

More gloat…

  • auto-correcting - if your reading from a RAID and lets say there is a bad block, ZFS continuously maintains checksums and will notice the bad block, read from the other mirror drive and correct the issue w/o user intervention. Great when your disk might develop sector errors but not necessarily fail completely.
  • pooled storage - you can add a drive, include it in an existing pool and it gives you access to that storage. No need to backup/rebuilt a RAID or whatever. The RAID-Z method allows you to add disks adhoc when your storage requirements demand it .. no need to backup, migrate to a full new RAID or any of that. Talk about great technology for an online file storage server.. just let it grow as needed by buying new drives..
  • quotas and reservations - file system level quotas (limit to a certain amount of storage) and reservations (guarantee a certain amount of storage). Very nicely done. One simple command and all is taken care of. Someone need more space? no problem .. just increase the quota/reservation.
  • snapshots - have the system process snapshots on a nightly basis .. users now have access to historical versions of their documents. If a document changes slightly, only the byte changes are resaved .. this ends up making the use of snapshots very conservative for disk space (rsnapshot w/rsync & hard links provides similar functions but ultimately is saving the full file when it changes..) - I don’t know ANY user that wouldn’t find this beneficial.
  • backups - built on snapshot technology, ZFS allows for filesystem level backups to occur .. let it be full snapshots, incrementals or a combination of both. Not only this, but it can transfer byte-level changes to a remote server to provide synchronization. Simply awesome. I’m guessing this is also VERY fast as they mention doing this on a *per-minute* basis for the entire file system. whoah. :)

I’m tempted to install some OpenSolaris just to check out ZFS. It truly is an awesome file system. Lots of benefits for all levels of users. The second to last slide of the linked presentation pretty much sums it up:

Nightly “ztest� program does all of the following in parallel:

  • Read, write, create, and delete files and directories
  • Create and destroy entire filesystems and storage pools
  • Turn compression on and off (while filesystem is active)
  • Change checksum algorithm (while filesystem is active)
  • Add and remove devices (while pool is active)
  • Change I/O caching and scheduling policies (while pool is active)
  • Scribble random garbage on one side of live mirror to test self-healing data
  • Force violent crashes to simulate power loss, then verify pool integrity
  • Probably more abuse in 20 seconds than you’d see in a lifetime
  • ZFS has been subjected to over a million forced, violent crashes without losing data integrity or leaking a single block

So cool.