How I Dig Into Linux File Systems: Inodes and VFS

I still remember the first time I crashed a production server because I didn’t understand how the Linux file system actually worked. It was 3:00 AM, and a database server was refusing to write new data. The error message was clear: “No space left on device.” naturally, I ran df -h immediately. To my confusion, the disk was only 45% full. I spent an hour deleting log files and panicking before a senior sysadmin logged on, ran one command, and pointed out that we had run out of inodes, not block storage. A runaway script had created millions of zero-byte files, eating up the file system table entries while leaving the actual disk space mostly empty.

That night changed how I view Linux Administration. We often treat the file system as a simple black box where we dump data, but it’s actually a complex database of pointers, metadata, and abstraction layers. Whether you are doing digital forensics, performance tuning, or just trying to figure out why Docker is acting up, you need to understand what happens beneath the standard ls command.

The Illusion of the Directory Tree

When I teach Linux System Administration, I always start with the Virtual File System (VFS). It is the kernel’s abstraction layer that allows us to interact with different file systems—Ext4, XFS, NFS, or even procfs—using the exact same standard system calls. When you run a Python script to read a file, your Python code doesn’t know if that file is on a local NVMe drive, a USB stick, or a network share. The VFS handles that translation.

I think of VFS as the universal translator. It defines a common file model (inodes, superblocks, directory entries) that every specific file system driver must map to. This is why I can mount a Windows NTFS partition and copy files to a Linux Ext4 partition without the tools crashing.

Inodes: The Real Identity of a File

Here is the reality that trips up many beginners: The filename is not the file. The filename is just a label pointing to an inode number. The inode (Index Node) is the actual data structure on the disk that describes the file.

When I’m debugging permission issues or trying to recover data, I stop looking at names and start looking at inodes. An inode contains the file’s metadata: permissions, owner, group, size, timestamps, and pointers to the disk blocks where the data lives. Crucially, it does not contain the filename.

You can see this relationship easily in the terminal. I use the -i flag with ls to reveal these hidden IDs:

$ touch critical_config.conf
$ ls -li critical_config.conf
245912 -rw-r--r-- 1 user user 0 Dec 26 10:00 critical_config.conf

That number, 245912, is the file. If I create a hard link, I’m just creating a second filename that points to the same inode number. This is why deleting a file in Linux is technically “unlinking” it. The data isn’t marked as free until the link count (the number of filenames pointing to that inode) drops to zero.

Why Deleted Files Don’t Always Disappear

This brings me back to my server outage story. A common issue I face in System Monitoring involves “deleted” files consuming space. If a process (like Apache or a Python script) holds a file open, the kernel maintains a reference to that inode. Even if you rm the file, the directory entry is gone, but the inode and data blocks remain active and occupied until the process closes the handle.

I use lsof (List Open Files) to hunt these ghosts down. It is an essential tool for Linux Forensics and general troubleshooting. If your disk usage doesn’t add up, run this:

# Check for deleted files that are still held open by processes
sudo lsof +L1

This command lists open files with a link count of less than 1. If you see a massive log file in that list, simply restarting the process holding it (daemon-reload or service restart) will finally free the space.

Linux command line interface - The Linux command line for beginners | Ubuntu
Linux command line interface – The Linux command line for beginners | Ubuntu

Exploring the Structure with Python

While Bash Scripting is great for quick checks, I prefer Python Automation when I need to analyze file system attributes across thousands of files. Python’s os and stat modules give you direct access to the stat system call structure.

Here is a script I use to audit file permissions and inode consistency. It’s particularly useful when I’m verifying security compliance or checking for SUID binaries that shouldn’t exist:

import os
import stat
import time

def analyze_file_metadata(filepath):
    try:
        # Get the raw stat structure
        file_stat = os.stat(filepath)
        
        print(f"--- Analysis for: {filepath} ---")
        print(f"Inode Number: {file_stat.st_ino}")
        print(f"Device ID:    {file_stat.st_dev}")
        print(f"Hard Links:   {file_stat.st_nlink}")
        print(f"UID/GID:      {file_stat.st_uid}/{file_stat.st_gid}")
        print(f"Size (bytes): {file_stat.st_size}")
        
        # Interpret permissions
        mode = file_stat.st_mode
        if stat.S_ISDIR(mode):
            type_str = "Directory"
        elif stat.S_ISREG(mode):
            type_str = "Regular File"
        elif stat.S_ISLNK(mode):
            type_str = "Symlink"
        else:
            type_str = "Special Device"
            
        print(f"File Type:    {type_str}")
        print(f"Octal Perms:  {oct(mode)[-3:]}")
        
    except FileNotFoundError:
        print(f"Error: {filepath} not found.")
    except PermissionError:
        print(f"Error: Permission denied accessing {filepath}")

# Example usage
analyze_file_metadata("/etc/passwd")
analyze_file_metadata("/bin/bash")

Running this kind of Python System Admin script helps me visualize exactly what the kernel sees. It strips away the user-friendly output of ls and shows the raw attributes.

Pseudo-Filesystems: /proc and /sys

One aspect of Linux that confused me for years was the /proc directory. I used to think these were real files sitting on my hard drive. They aren’t. Both /proc and /sys are pseudo-filesystems. They are windows into the kernel’s memory, presented as files so we can use standard Linux Utilities to read them.

When I run cat /proc/cpuinfo, the kernel intercepts that read request, generates the text on the fly based on the current hardware state, and returns it. No disk I/O occurs. This is incredibly powerful for Performance Monitoring.

For instance, if I need to see the memory map of a running process for forensic reasons, I don’t need a specialized debugger immediately. I can just look at the filesystem:

# Get the PID of a shell
PID=$$
# Look at the memory map
cat /proc/$PID/maps

This exposes the virtual memory areas, shared libraries, and stack locations. In a security incident, I often check /proc/[pid]/exe, which is a symbolic link to the actual executable binary. If an attacker replaces a binary on disk, the link in /proc might still point to the deleted version (appended with “deleted”), allowing me to recover the malware sample directly from the file descriptor.

Modern Filesystems: Ext4, XFS, and Btrfs

While the VFS handles the abstraction, the underlying filesystem driver manages the physical bits on the disk. In 2025, the debate usually lands on Ext4, XFS, or Btrfs. I have strong opinions here based on my workload.

I stick with Ext4 for standard boot partitions and general-purpose servers. It is rock solid. I’ve had power failures on Ext4 systems that recovered instantly upon reboot because the journaling is that robust.

However, for my data storage servers and database backends, I prefer XFS. XFS excels at parallel I/O and handling massive files. If you are managing a PostgreSQL Linux database with terabytes of data, XFS usually offers better throughput than Ext4.

Then there is Btrfs. I use Btrfs specifically for my backup servers and workstations. The snapshot capability is indispensable. Being able to take an atomic snapshot of the file system before running a risky dnf update or apt upgrade has saved my weekend more than once. It integrates Copy-on-Write (CoW), which means data isn’t overwritten in place; it’s written to a new location and the pointer is updated. This architecture is fundamentally different from the journaled approach of Ext4.

OverlayFS and the Container Revolution

Linux command line interface - The Linux command line for beginners | Ubuntu
Linux command line interface – The Linux command line for beginners | Ubuntu

You cannot discuss Linux File Systems today without mentioning containers. When I work with Linux Docker or Kubernetes Linux clusters, I am relying heavily on OverlayFS. This is a union mount filesystem. It takes a lower directory (the read-only container image) and an upper directory (the writable layer) and merges them into a single view.

This explains why containers start so fast. They aren’t copying the whole OS. They are just mounting a new writable layer over the existing base image. If you modify a file, the file system copies it from the lower layer to the upper layer (copy-up) and modifies it there. The original remains untouched.

I often use this manually to create temporary test environments without Docker. Here is how you can create a simple overlay mount manually to test dangerous commands without risking your actual data:

# Create directories
mkdir lower upper work merged

# 'lower' simulates our read-only base
echo "Base Config" > lower/config.txt

# Mount the overlay
sudo mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged

# Verify
cat merged/config.txt
# Output: Base Config

# Modify the file in the merged view
echo "Modified Config" > merged/config.txt

# Check the source
cat lower/config.txt
# Output: Base Config (Untouched!)

cat upper/config.txt
# Output: Modified Config (The change is here)

Understanding this layering is critical for Linux DevOps. When a container runs out of space, it’s usually because the “upper” layer (the writable layer) is filling up the host’s disk, even if the base image is stored elsewhere.

Filesystem Corruption and Recovery

I dread the moment I see “Input/output error” in the console. It usually implies physical disk failure or severe filesystem corruption. When this happens, fsck (File System Consistency Check) is the first line of defense, but you must use it correctly.

I never run fsck on a mounted partition. Doing so will almost certainly destroy data because the kernel might write to the disk while fsck is trying to fix it. I always boot into a rescue mode or use a live USB (like a standard Ubuntu Tutorial live disk) to unmount the target partition before checking.

For deeper analysis, I use debugfs. This is an interactive file system debugger for Ext2/3/4. It allows you to navigate the file system structures without mounting them. I’ve used it to recover deleted files by manually resetting the inode pointers, though that is a tedious process I wouldn’t wish on anyone.

System administrator in server room - Male systems administrator in a large data center writing on a ...
System administrator in server room – Male systems administrator in a large data center writing on a …
# Open a partition in debugfs (read-only mode is safer)
sudo debugfs /dev/sda1

# Inside the prompt, you can view deleted inodes
debugfs: ls -d /home/user/deleted_directory

Permissions and ACLs

Standard Linux Permissions (rwx) are often insufficient for complex environments. I frequently encounter scenarios where I need to give a specific user access to a log file without adding them to the root group. This is where Access Control Lists (ACLs) come in.

I find that many admins ignore ACLs, but they are built into the file system. Using setfacl and getfacl allows for granular control. For example, if I’m managing a web server and need a developer to have write access to /var/www/html without changing the owner from www-data, I use:

# Grant user 'dev1' rwx permissions on the directory
setfacl -m u:dev1:rwx /var/www/html

# Ensure new files created inside inherit these permissions
setfacl -d -m u:dev1:rwx /var/www/html

If you see a + sign at the end of the permission string in ls -l (e.g., drwxr-xr-x+), that means ACLs are active on that file. Ignoring this little plus sign can lead to hours of debugging why a user has access when the standard permissions say they shouldn’t.

Wrapping Up

The Linux file system is more than just a storage bin; it is a sophisticated interface for the kernel, hardware, and processes to communicate. Whether you are using LVM to manage volumes, analyzing malware, or just trying to keep a production server alive, the file system is the foundation.

My recommendation is to stop taking commands like ls, cp, and rm for granted. Spend some time exploring /proc, play with debugfs on a test VM, and write a few Python scripts to interact with inodes directly. The next time a server acts up at 3:00 AM, you won’t be guessing—you’ll be navigating the structures like a pro.

Can Not Find Kubeconfig File