Wednesday, 17 July 2024

Streamlining Disk Space Usage with a Smart Bash Script


When managing server resources, particularly disk space, it’s essential to optimize how space is utilized. An efficient way to identify heavy usage is by finding subfolders that consume a significant amount of disk space. Let’s dive into creating a more effective Bash script that not only identifies these subfolders but also respects a given size threshold, minimizing redundant output in the process.

The Challenge

Our goal is to write a Bash script that takes two arguments:

  1. A parent directory (typically /apps/).
  2. A size threshold (e.g., 200M).

The script should highlight subfolders exceeding the specified threshold without cluttering the output with redundant data, such as parent directories already represented by their subfolders.

Current Approach Limitations

The existing command:

cd /apps/ && du -aBM 2>/dev/null | sort -nr | head -n 15

This approach lists the top 15 directories by size, including all nested directories, which can lead to repetitive and unhelpful information.

Proposed Solution

To create a script that provides clear, actionable information, we need to enhance our approach to:

  • Exclude sizes of parent directories if their subdirectories are already listed.
  • Ensure that only directories meeting the size threshold are displayed.

Script Overview

Here’s an improved version of the script that meets our requirements:

#!/bin/bash

# Validate arguments
if [[ $# -ne 2 ]]; then
    echo "Usage: $0 [directory] [size threshold]"
    exit 1
fi

directory=$1
threshold=$2

# Find and display directories exceeding the threshold without redundancy
du -aBM "$directory" 2>/dev/null | awk -v thresh="$threshold" '
{
    size = $1; sub(/M/,"",size); # Remove the 'M' from size and convert to integer
    path = $2;
    if (size+0 >= thresh+0) {
        if (!seen[path "/"] && !($2 in parent)) {
            print;
            split(path, parts, "/");
            temp = "";
            for (part in parts) {
                if (temp) {
                    temp = temp "/" parts[part];
                } else {
                    temp = parts[part];
                }
                parent[temp] = 1;
            }
        }
    }
}
' | sort -nr | head -n 15

Key Enhancements

  1. Threshold Validation: The script now correctly parses the threshold and compares each directory’s size against it.
  2. Redundancy Removal: By using an associative array (parent), the script tracks and skips directories that are parents of any already listed directory, thus avoiding redundant output.
  3. Output Control: The script only displays the top 15 directories that actually exceed the size threshold, sorted in descending order.

Usage

Run the script by passing the target directory and the size threshold:

./find_large_folders.sh /apps/ 200M

Conclusion

This script is a robust tool for systems administrators who need to keep a close eye on disk usage, especially on cloud servers where disk space can be costly. By providing precise and non-redundant information, it helps in making informed decisions about where to clean up and optimize storage space.

Labels:

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home