Monday, 8 December 2025

Creating ETL Pipeline using Perl DBD::SQlite

Prerequisites

  1. Perl: Ensure Perl is installed on your system.
  2. CPAN Modules: Install the necessary CPAN modules:
    • Text::CSV_XS for CSV handling.
    • DBD::SQLite for SQLite database interaction.

You can install these modules using CPAN:

cpan Text::CSV_XS
cpan DBD::SQLite

ETL Pipeline Script

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use DBI;

# Configuration
my $input_csv  = 'input_data.csv';
my $output_db  = 'output_data.db';
my $table_name = 'transformed_data';

# Step 1: Extract - Read data from CSV
sub extract_data {
    my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
    open my $fh, "<:encoding(utf8)", $input_csv or die "Cannot open $input_csv: $!";

    my @data;
    while (my $row = $csv->getline($fh)) {
        push @data, $row;
    }
    close $fh;

    return \@data;
}

# Step 2: Transform - Perform data transformation
sub transform_data {
    my ($data) = @_;
    my @transformed_data;

    foreach my $row (@$data) {
        # Example transformation: Convert all names to uppercase
        my ($id, $name, $age) = @$row;
        $name = uc($name);

        push @transformed_data, [$id, $name, $age];
    }

    return \@transformed_data;
}

# Step 3: Load - Insert transformed data into SQLite database
sub load_data {
    my ($data) = @_;

    # Connect to SQLite database
    my $dbh = DBI->connect("dbi:SQLite:dbname=$output_db", "", "", { RaiseError => 1, AutoCommit => 1 })
        or die "Could not connect to database: $DBI::errstr";

    # Create table if it doesn't exist
    $dbh->do("CREATE TABLE IF NOT EXISTS $table_name (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)");

    # Prepare insert statement
    my $sth = $dbh->prepare("INSERT INTO $table_name (id, name, age) VALUES (?, ?, ?)");

    # Insert each row
    foreach my $row (@$data) {
        $sth->execute(@$row);
    }

    # Disconnect from database
    $dbh->disconnect();
}

# Main ETL Process
sub main {
    my $extracted_data = extract_data();
    my $transformed_data = transform_data($extracted_data);
    load_data($transformed_data);

    print "ETL process completed successfully!\n";
}

# Run the ETL pipeline
main();

Explanation

  1. Extract:

    • The extract_data function reads data from a CSV file using the Text::CSV_XS module.
    • It reads each row and stores it in an array of arrays.
  2. Transform:

    • The transform_data function takes the extracted data and performs transformations.
    • In this example, it converts all names to uppercase.
  3. Load:

    • The load_data function connects to an SQLite database using the DBD::SQLite module.
    • It creates a table if it doesn’t exist and inserts the transformed data into the table.
  4. Main:

    • The main function orchestrates the ETL process by calling the extract, transform, and load functions in sequence.

Running the Script

  1. Save the script to a file, e.g., etl_pipeline.pl.
  2. Make the script executable:
    chmod +x etl_pipeline.pl
    
  3. Run the script:
    ./etl_pipeline.pl
    

Sample Input CSV (input_data.csv)

id,name,age
1,John Doe,30
2,Jane Smith,25
3,Alice Johnson,35

Expected Output in SQLite Database

After running the script, the output_data.db SQLite database will contain a table transformed_data with the following data:

id name age
1 JOHN DOE 30
2 JANE SMITH 25
3 ALICE JOHNSON 35

This Perl script provides a basic yet complete ETL pipeline that can be easily extended or modified to suit more complex scenarios. It demonstrates how to extract data from a CSV file, transform it, and load it into a SQLite database. This example should serve as a useful reference for developers looking to implement ETL processes in Perl.

Labels:

Saturday, 6 December 2025

Safest Ways to Iterate Through Perl Hash Keys

When working with Perl hashes, choosing the right way to iterate through keys is essential for avoiding unexpected behavior and ensuring efficient memory usage. Here are the most common methods, their advantages, and the potential pitfalls associated with each.

Iterating with each

The each function retrieves one key-value pair at a time. This is memory-efficient and works well for large hashes or tied hashes.

Read more »

Labels:

Friday, 5 December 2025

How to Measure Program Execution Time in the Linux Shell

When running commands or scripts in the Linux shell, it’s often useful to know how long they take to execute, especially when optimizing or testing under different conditions. Here are several ways to measure execution time in Bash, from basic to more advanced methods.

1. Using the time Command

The simplest way to measure execution time is with the built-in time command, which outputs real, user, and system time taken by a command.

time sleep 2
Read more »

Labels:

Tuesday, 2 December 2025

Python Image Cropping: The Ultimate Guide for Beginners & Pros

Image cropping is a fundamental part of image processing and computer vision. Whether you’re building a photo editing app, preparing datasets for machine learning, or automating document processing, the ability to programmatically crop images is invaluable. With Python, cropping images is easier than ever, thanks to libraries like OpenCV and Pillow (PIL).

In this comprehensive blog post, you’ll learn everything about image cropping with Python—from simple manual crops to automatic cropping using edge detection. We’ll cover real-world use cases, multiple code examples, advanced tips, and troubleshooting common pitfalls.

Table of Contents

  1. Why Crop Images? Common Use Cases
  2. Popular Python Libraries for Image Cropping

    • OpenCV
    • Pillow (PIL)
    • scikit-image
  3. Basic Cropping with Pillow (PIL)
  4. Cropping with OpenCV (cv2)
  5. Automatic Cropping: Detect and Crop Objects
  6. Advanced Cropping: Smart and Dynamic Techniques
  7. Batch Cropping Images in Folders
  8. Tips, Troubleshooting & Common Pitfalls
  9. Conclusion & Further Resources
Read more »

Monday, 1 December 2025

Essential Helm Commands for Kubernetes Applications

 Helm, the package manager for Kubernetes, simplifies application deployment and management. By using Helm charts, you can automate deployments, manage configuration, and streamline upgrades. Here are some of the most important Helm commands with simple explanations to help you manage Kubernetes applications efficiently.

1. helm repo add

Command:

helm repo add stable https://charts.helm.sh/stable

Adds a Helm repository to your system. This is where Helm looks for charts when installing applications.

Read more »

Labels:

Sunday, 30 November 2025

How to Specify Multiple Return Types Using Type Hints in Python

How to Specify Multiple Return Types Using Type Hints in Python

In Python, type hints improve code readability and help developers understand what kind of values are expected for function arguments and returns. But sometimes, a function can return multiple types, which raises the question: how can we specify these multiple return types?

In this blog post, I’ll walk you through different ways to handle multiple return types using Python type hints, covering various versions of Python and how to use unions, tuples, and type-checking libraries.

Read more »

Labels:

Saturday, 29 November 2025

Mastering Variable Names with Perl’s Data::Dumper

Debugging in Perl can often involve delving into complex data structures, making readability of the output a crucial factor. The default behavior of Data::Dumper to generate generic variable names like $VAR1, $VAR2, etc., can be unhelpful for more intricate debugging or when aiming to produce easily reusable code snippets. This blog explores several approaches to customize Data::Dumper output, each illustrated with unique code examples to demonstrate their practical applications.

Read more »

Labels: