The UNIX Directory Structure

I considered myself as familiar with Linux. I wouldn’t say proficient, but I can get around it, compile stuff, and so. I thought that really getting into it was a matter of spending many hours hacking around. But the Startup Engineering course from Coursera has shown me that there are many basic things to learn, and that some of them are not that hard to catch up.

One of the things that has amazed me the most is the Unix Filesystem, which I’m gonna summarize here. It turns out that is doesn’t only make sense, but it’s great from a SysAdmin perspective.

So, we have a root directory everything hangs from, the famous /. A bunch of directories should hang from it. Remember that “hanging” doesn’t necessarily mean that the files are one inside the other physically. After all, on the disk everything is sequential. As we will see, some of the folders are actually virtual and represent entities that can be thought of as files.

Most of the directories contain system files, which were created when installing the OS and are only read during normal operation. Some examples are: /bin (binaries required at boot time: ls, cp…), /sbin (binaries required at boot time run by administrators: mount…) or /lib.

Some directories can be classified as “virtual”, because they represent things other than traditional files: /dev (access to devices), /proc (access to processes). There is also /tmp, that contains temporary files wiped out during reboots. Two other of them, /mount and /mnt, contain mount points to other disks. That is, they actually point to files, but these files are in an external hard drive, or in a USB stick. Is here where disk images are mounted as well, and this can happen while installing certain software packages.

The most known is /home, where every user owns a subfolder only accessible by him and by the administrators and where he puts all the documents into.

The interesting part comes with /var, /usr, /etc and /opt. /var contains variable files, that is, files that vary a lot, such as caches or logs. There is also /var/tmp, which contains long-term temporary files that won’t get erased on reboot. It looks like databases tend to be stored in /etc, obeying to a priority rule: information in /var is not very prioritary (logs are only accessed from time to time and nothing happens if one is lost), whereas information in /etc is very important, as we’ll see in a moment.

/etc contains “other stuff”, and in typical scenarios that means system-wide configuration files. They don’t vary a lot, are not part of the core OS and are not part of any program. Keeping them separate form the binaries allows an easy upgrade of packages without breaking the fine-tuning.

The trickiest one is /usr. Before anything, it comes from user but it’s got no obvious relation. if the root directory is the first layer of the executable part of the system, /usr would be the second one. It has /usr/bin, /usr/sbin, /usr/lib, and they contain programs that are part of the distribution but aren’t really needed. For instance, inside /usr/lib is where one would find python, perl, ruby…

/usr has one very special child, /usr/local. This directory is initially empty, and is where the administrator can place links to other executables in order to make them available system-wide. They are usually organized in /usr/local/bin, etc. This would be a third layer of executable stuff.

And last but no least, /opt contains software packages that are too complex or too big to be inside /usr; for instance, the Tomcat Java web engine. Notice that the binaries of these packages need to be either linked to /usr/local/bin or added to the PATH.

If the machine is a server, /home would be typically be very small, and instead a /srv directory would hold all files to be served over the network (or to be interpreted when a request is received).

Does all of this make sense to you? It does for me, and I think that it separates very well the assets in terms of priority, importance and acces type. This allows a lot of optimizations if the system is big. For instance, since /var is rewritten a lot of times it makes sense to mount it on a disk that has a very high write speed. /etc, on the other side, doesn’t need that write speed, but it’s important to make sure that data won’t get corrupted; the same applies for /home. If I had to choose two directories for making backups or placing a RAID, it would be these two. /tmp, for instance, is usually physically located at the RAM, instead of at the disk.