Thursday, 21 November 2024

Creating ETL Pipeline using Perl DBD::SQlite

Prerequisites

  1. Perl: Ensure Perl is installed on your system.
  2. CPAN Modules: Install the necessary CPAN modules:
    • Text::CSV_XS for CSV handling.
    • DBD::SQLite for SQLite database interaction.

You can install these modules using CPAN:

cpan Text::CSV_XS
cpan DBD::SQLite

ETL Pipeline Script

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use DBI;

# Configuration
my $input_csv  = 'input_data.csv';
my $output_db  = 'output_data.db';
my $table_name = 'transformed_data';

# Step 1: Extract - Read data from CSV
sub extract_data {
    my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
    open my $fh, "<:encoding(utf8)", $input_csv or die "Cannot open $input_csv: $!";

    my @data;
    while (my $row = $csv->getline($fh)) {
        push @data, $row;
    }
    close $fh;

    return \@data;
}

# Step 2: Transform - Perform data transformation
sub transform_data {
    my ($data) = @_;
    my @transformed_data;

    foreach my $row (@$data) {
        # Example transformation: Convert all names to uppercase
        my ($id, $name, $age) = @$row;
        $name = uc($name);

        push @transformed_data, [$id, $name, $age];
    }

    return \@transformed_data;
}

# Step 3: Load - Insert transformed data into SQLite database
sub load_data {
    my ($data) = @_;

    # Connect to SQLite database
    my $dbh = DBI->connect("dbi:SQLite:dbname=$output_db", "", "", { RaiseError => 1, AutoCommit => 1 })
        or die "Could not connect to database: $DBI::errstr";

    # Create table if it doesn't exist
    $dbh->do("CREATE TABLE IF NOT EXISTS $table_name (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)");

    # Prepare insert statement
    my $sth = $dbh->prepare("INSERT INTO $table_name (id, name, age) VALUES (?, ?, ?)");

    # Insert each row
    foreach my $row (@$data) {
        $sth->execute(@$row);
    }

    # Disconnect from database
    $dbh->disconnect();
}

# Main ETL Process
sub main {
    my $extracted_data = extract_data();
    my $transformed_data = transform_data($extracted_data);
    load_data($transformed_data);

    print "ETL process completed successfully!\n";
}

# Run the ETL pipeline
main();

Explanation

  1. Extract:

    • The extract_data function reads data from a CSV file using the Text::CSV_XS module.
    • It reads each row and stores it in an array of arrays.
  2. Transform:

    • The transform_data function takes the extracted data and performs transformations.
    • In this example, it converts all names to uppercase.
  3. Load:

    • The load_data function connects to an SQLite database using the DBD::SQLite module.
    • It creates a table if it doesn’t exist and inserts the transformed data into the table.
  4. Main:

    • The main function orchestrates the ETL process by calling the extract, transform, and load functions in sequence.

Running the Script

  1. Save the script to a file, e.g., etl_pipeline.pl.
  2. Make the script executable:
    chmod +x etl_pipeline.pl
    
  3. Run the script:
    ./etl_pipeline.pl
    

Sample Input CSV (input_data.csv)

id,name,age
1,John Doe,30
2,Jane Smith,25
3,Alice Johnson,35

Expected Output in SQLite Database

After running the script, the output_data.db SQLite database will contain a table transformed_data with the following data:

id name age
1 JOHN DOE 30
2 JANE SMITH 25
3 ALICE JOHNSON 35

This Perl script provides a basic yet complete ETL pipeline that can be easily extended or modified to suit more complex scenarios. It demonstrates how to extract data from a CSV file, transform it, and load it into a SQLite database. This example should serve as a useful reference for developers looking to implement ETL processes in Perl.

Labels:

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home