Creating ETL Pipeline using Perl DBD::SQlite
Prerequisites
- Perl: Ensure Perl is installed on your system.
- CPAN Modules: Install the necessary CPAN modules:
Text::CSV_XS
for CSV handling.DBD::SQLite
for SQLite database interaction.
You can install these modules using CPAN:
cpan Text::CSV_XS
cpan DBD::SQLite
ETL Pipeline Script
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use DBI;
# Configuration
my $input_csv = 'input_data.csv';
my $output_db = 'output_data.db';
my $table_name = 'transformed_data';
# Step 1: Extract - Read data from CSV
sub extract_data {
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $input_csv or die "Cannot open $input_csv: $!";
my @data;
while (my $row = $csv->getline($fh)) {
push @data, $row;
}
close $fh;
return \@data;
}
# Step 2: Transform - Perform data transformation
sub transform_data {
my ($data) = @_;
my @transformed_data;
foreach my $row (@$data) {
# Example transformation: Convert all names to uppercase
my ($id, $name, $age) = @$row;
$name = uc($name);
push @transformed_data, [$id, $name, $age];
}
return \@transformed_data;
}
# Step 3: Load - Insert transformed data into SQLite database
sub load_data {
my ($data) = @_;
# Connect to SQLite database
my $dbh = DBI->connect("dbi:SQLite:dbname=$output_db", "", "", { RaiseError => 1, AutoCommit => 1 })
or die "Could not connect to database: $DBI::errstr";
# Create table if it doesn't exist
$dbh->do("CREATE TABLE IF NOT EXISTS $table_name (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)");
# Prepare insert statement
my $sth = $dbh->prepare("INSERT INTO $table_name (id, name, age) VALUES (?, ?, ?)");
# Insert each row
foreach my $row (@$data) {
$sth->execute(@$row);
}
# Disconnect from database
$dbh->disconnect();
}
# Main ETL Process
sub main {
my $extracted_data = extract_data();
my $transformed_data = transform_data($extracted_data);
load_data($transformed_data);
print "ETL process completed successfully!\n";
}
# Run the ETL pipeline
main();
Explanation
-
Extract:
- The
extract_data
function reads data from a CSV file using theText::CSV_XS
module. - It reads each row and stores it in an array of arrays.
- The
-
Transform:
- The
transform_data
function takes the extracted data and performs transformations. - In this example, it converts all names to uppercase.
- The
-
Load:
- The
load_data
function connects to an SQLite database using theDBD::SQLite
module. - It creates a table if it doesn’t exist and inserts the transformed data into the table.
- The
-
Main:
- The
main
function orchestrates the ETL process by calling the extract, transform, and load functions in sequence.
- The
Running the Script
- Save the script to a file, e.g.,
etl_pipeline.pl
. - Make the script executable:
chmod +x etl_pipeline.pl
- Run the script:
./etl_pipeline.pl
Sample Input CSV (input_data.csv
)
id,name,age
1,John Doe,30
2,Jane Smith,25
3,Alice Johnson,35
Expected Output in SQLite Database
After running the script, the output_data.db
SQLite database will contain a table transformed_data
with the following data:
id | name | age |
---|---|---|
1 | JOHN DOE | 30 |
2 | JANE SMITH | 25 |
3 | ALICE JOHNSON | 35 |
This Perl script provides a basic yet complete ETL pipeline that can be easily extended or modified to suit more complex scenarios. It demonstrates how to extract data from a CSV file, transform it, and load it into a SQLite database. This example should serve as a useful reference for developers looking to implement ETL processes in Perl.
Labels: Creating ETL Pipeline using Perl
0 Comments:
Post a Comment
Note: only a member of this blog may post a comment.
<< Home