TMTOWTDI: Creating ETL Pipeline using Perl DBD::SQlite

Prerequisites

Perl: Ensure Perl is installed on your system.
CPAN Modules: Install the necessary CPAN modules:
- Text::CSV_XS for CSV handling.
- DBD::SQLite for SQLite database interaction.

You can install these modules using CPAN:

cpan Text::CSV_XS
cpan DBD::SQLite

ETL Pipeline Script

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use DBI;

# Configuration
my $input_csv  = 'input_data.csv';
my $output_db  = 'output_data.db';
my $table_name = 'transformed_data';

# Step 1: Extract - Read data from CSV
sub extract_data {
    my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
    open my $fh, "<:encoding(utf8)", $input_csv or die "Cannot open $input_csv: $!";

    my @data;
    while (my $row = $csv->getline($fh)) {
        push @data, $row;
    }
    close $fh;

    return \@data;
}

# Step 2: Transform - Perform data transformation
sub transform_data {
    my ($data) = @_;
    my @transformed_data;

    foreach my $row (@$data) {
        # Example transformation: Convert all names to uppercase
        my ($id, $name, $age) = @$row;
        $name = uc($name);

        push @transformed_data, [$id, $name, $age];
    }

    return \@transformed_data;
}

# Step 3: Load - Insert transformed data into SQLite database
sub load_data {
    my ($data) = @_;

    # Connect to SQLite database
    my $dbh = DBI->connect("dbi:SQLite:dbname=$output_db", "", "", { RaiseError => 1, AutoCommit => 1 })
        or die "Could not connect to database: $DBI::errstr";

    # Create table if it doesn't exist
    $dbh->do("CREATE TABLE IF NOT EXISTS $table_name (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)");

    # Prepare insert statement
    my $sth = $dbh->prepare("INSERT INTO $table_name (id, name, age) VALUES (?, ?, ?)");

    # Insert each row
    foreach my $row (@$data) {
        $sth->execute(@$row);
    }

    # Disconnect from database
    $dbh->disconnect();
}

# Main ETL Process
sub main {
    my $extracted_data = extract_data();
    my $transformed_data = transform_data($extracted_data);
    load_data($transformed_data);

    print "ETL process completed successfully!\n";
}

# Run the ETL pipeline
main();

Explanation

Extract:
- The extract_data function reads data from a CSV file using the Text::CSV_XS module.
- It reads each row and stores it in an array of arrays.
Transform:
- The transform_data function takes the extracted data and performs transformations.
- In this example, it converts all names to uppercase.
Load:
- The load_data function connects to an SQLite database using the DBD::SQLite module.
- It creates a table if it doesn’t exist and inserts the transformed data into the table.
Main:
- The main function orchestrates the ETL process by calling the extract, transform, and load functions in sequence.

Running the Script

Save the script to a file, e.g., etl_pipeline.pl.
Make the script executable:
```
chmod +x etl_pipeline.pl
```
Run the script:
```
./etl_pipeline.pl
```

Sample Input CSV (`input_data.csv`)

id,name,age
1,John Doe,30
2,Jane Smith,25
3,Alice Johnson,35

Expected Output in SQLite Database

After running the script, the output_data.db SQLite database will contain a table transformed_data with the following data:

id	name	age
1	JOHN DOE	30
2	JANE SMITH	25
3	ALICE JOHNSON	35

This Perl script provides a basic yet complete ETL pipeline that can be easily extended or modified to suit more complex scenarios. It demonstrates how to extract data from a CSV file, transform it, and load it into a SQLite database. This example should serve as a useful reference for developers looking to implement ETL processes in Perl.

Labels: Creating ETL Pipeline using Perl

TMTOWTDI
[There's More Than One Way To Do It]

Main Menu

Thursday, 21 November 2024

Creating ETL Pipeline using Perl DBD::SQlite

Prerequisites

ETL Pipeline Script

Explanation

Running the Script

Sample Input CSV (`input_data.csv`)

Expected Output in SQLite Database

0 Comments:

Post a Comment

About Me

Previous Posts

TMTOWTDI [There's More Than One Way To Do It]

Main Menu

Thursday, 21 November 2024

Creating ETL Pipeline using Perl DBD::SQlite

Prerequisites

ETL Pipeline Script

Explanation

Running the Script

Sample Input CSV (input_data.csv)

Expected Output in SQLite Database

0 Comments:

Post a Comment

About Me

Previous Posts

TMTOWTDI
[There's More Than One Way To Do It]

Sample Input CSV (`input_data.csv`)