Master Python Automation: Extract and Display File Info Like a Pro

#python #automation #linux #aws

Scenario

“Your company needs to learn about the files located on various machines. You have been asked to build a script that extracts information such as the name and size about the files in the current working directory and stores it in a list of dictionaries.” -LUIT, Python2

Automation scripting is a game-changer and the LUIT-Python2 GitHub repository provides a robust solution. This repository offers a Python-based tool that simplifies tasks like network provisioning, application deployment, and system configuration.

Why Automate Infrastructure Management?

Efficiency: By automating repetitive tasks, you save time and reduce the risk of human error.

Consistency: Each infrastructure deployment is identical, ensuring that “it works on my machine” never becomes an issue.

Scalability: Automated scripts can scale effortlessly, provisioning multiple environments with just a few commands.

Maintainability: Once an automated script is written, it can be reused, updated, and version-controlled.

Getting Started

Before diving into the repository, make sure you have Python 2.7 (or compatible) installed on your machine. You will also need a basic understanding of Linux-based systems, as many of the scripts are tailored for that environment.

Clone the Repository: Start by cloning the LUIT-Python2 repository from GitHub:
git clone https://github.com/Judewakim/LUIT-Python2.git

Install Dependencies: The repository uses a set of Python packages to handle various tasks. You can install them using pip:
cd LUIT-Python2 pip install -r requirements.txt

Writing Python

To break down the essential components of the data_extraction.py file, lets start with the imports. This code requires to import information about the operating system and the datetime.

import os
import time

Next, we begin defining the first function of the code. This function will collect all the information that we will need. Remember, the goal of this code is to “build a script that extracts information such as the name and size about the files in the current working directory” so we will collect all the information about the working directory (or whatever directory is specified) and later we will present this information similar to the ls -al command in the Linux command line. That function will look like this:

def get_files_info(path='.'):  # Function that collects file details, defaulting to current directory
    files_info = []  # List to store file details

    for root, _, files in os.walk(path):  # Recursively traverse directories and files
        for filename in files:
            file_path = os.path.join(root, filename)  # Construct the full file path
            file_stat = os.stat(file_path)  # Get file details/statistics

            files_info.append({  # Store file details in a dictionary
                'name': filename,  # File name
                'path': file_path,  # Full file path
                'size_bytes': file_stat.st_size,  # File size in bytes
                'last_modified': time.ctime(file_stat.st_mtime),  # Last modified time
                'permissions': oct(file_stat.st_mode)[-3:],  # File permissions in octal format (last 3 digits)
            })

    return files_info  # Return the list of file details

Now, we have a function that collects the file name, file path, file size, last modification time, and file permissions of each file in the path. The next thing to do is create another function that will display all this collected data in the way we want it. That second function will look like this:

def print_ll_view(files_info):  # Function to print file details in 'll' view format
    for file in files_info:  # Loop through the list of file dictionaries
        print(f"{file['permissions']} {file['size_bytes']} {file['last_modified']} {file['path']}")  # Print file details

Lastly, we will call this functions and add a bit of user interaction to make the program more streamlined for the user. That last bit of code is set to only run when the Python is run directly and cannot be called from another file. This is what that looks like:

if __name__ == "__main__":  # Run the script only if executed directly
    path = input("Enter the directory path (press Enter to use current directory): ") or "."  # Prompt user for path, default to current directory
    files_data = get_files_info(path)  # Get file details for the specified path
    print("\nLinux CLI 'll' View:")  # Print header
    print_ll_view(files_data)  # Display file details

Running the Program

The program is created and ready to be used. Navigate to the location where the program is stored. If you cloned it from my Github repository the location should be LUIT-Python2 . Once there, you can run the program use the command python .\data_extraction.py.

At this point, you have a Python program that will display the files from whatever location you specify and will display those files in Linux ls format for easy readability. This program can be repeated with whatever file path you need.

You can modify this program to fit your company’s needs by cloning it locally or forking it on Github.

This entire project is available on GitHub

Originally published on Medium

Find me on Linkedin

DEV Community