Prerequisites
- Perl: Ensure Perl is installed on your system.
- CPAN Modules: Install the necessary CPAN modules:
Text::CSV_XS
for CSV handling.
DBD::SQLite
for SQLite database interaction.
You can install these modules using CPAN:
cpan Text::CSV_XS
cpan DBD::SQLite
ETL Pipeline Script
use strict;
use warnings;
use Text::CSV_XS;
use DBI;
my $input_csv = 'input_data.csv';
my $output_db = 'output_data.db';
my $table_name = 'transformed_data';
sub extract_data {
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $input_csv or die "Cannot open $input_csv: $!";
my @data;
while (my $row = $csv->getline($fh)) {
push @data, $row;
}
close $fh;
return \@data;
}
sub transform_data {
my ($data) = @_;
my @transformed_data;
foreach my $row (@$data) {
my ($id, $name, $age) = @$row;
$name = uc($name);
push @transformed_data, [$id, $name, $age];
}
return \@transformed_data;
}
sub load_data {
my ($data) = @_;
my $dbh = DBI->connect("dbi:SQLite:dbname=$output_db", "", "", { RaiseError => 1, AutoCommit => 1 })
or die "Could not connect to database: $DBI::errstr";
$dbh->do("CREATE TABLE IF NOT EXISTS $table_name (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)");
my $sth = $dbh->prepare("INSERT INTO $table_name (id, name, age) VALUES (?, ?, ?)");
foreach my $row (@$data) {
$sth->execute(@$row);
}
$dbh->disconnect();
}
sub main {
my $extracted_data = extract_data();
my $transformed_data = transform_data($extracted_data);
load_data($transformed_data);
print "ETL process completed successfully!\n";
}
main();
Explanation
-
Extract:
- The
extract_data
function reads data from a CSV file using the Text::CSV_XS
module.
- It reads each row and stores it in an array of arrays.
-
Transform:
- The
transform_data
function takes the extracted data and performs transformations.
- In this example, it converts all names to uppercase.
-
Load:
- The
load_data
function connects to an SQLite database using the DBD::SQLite
module.
- It creates a table if it doesn’t exist and inserts the transformed data into the table.
-
Main:
- The
main
function orchestrates the ETL process by calling the extract, transform, and load functions in sequence.
Running the Script
- Save the script to a file, e.g.,
etl_pipeline.pl
.
- Make the script executable:
chmod +x etl_pipeline.pl
- Run the script:
./etl_pipeline.pl
Sample Input CSV (input_data.csv
)
id,name,age
1,John Doe,30
2,Jane Smith,25
3,Alice Johnson,35
Expected Output in SQLite Database
After running the script, the output_data.db
SQLite database will contain a table transformed_data
with the following data:
id |
name |
age |
1 |
JOHN DOE |
30 |
2 |
JANE SMITH |
25 |
3 |
ALICE JOHNSON |
35 |
This Perl script provides a basic yet complete ETL pipeline that can be easily extended or modified to suit more complex scenarios. It demonstrates how to extract data from a CSV file, transform it, and load it into a SQLite database. This example should serve as a useful reference for developers looking to implement ETL processes in Perl.
Labels: Creating ETL Pipeline using Perl