Friday 5 August 2022

Given the input string 'hello boss!,where are you?,"how, are you!",hi,"di chellam","welcome"' our objective is to design a Perl program that intelligently separates and extracts each individual word from the CSV-like structure. The complexity lies in correctly handling quoted content containing commas.

CODE:

use strict;
use warnings;
my $input_string = 'hello boss!,where are you?,"how, are you!",hi,"di chellam","welcome"';

# Initialize empty array to store extracted words
my @words = ();

# Use a while loop with a regex to extract words
while ($input_string =~ /((?:(?<=,)|^)\s*"[^"]*"\s*(?=,|$)|[^,]*),?/g) {
    my $word = $1;
    # Remove leading and trailing spaces
    $word =~ s/^\s+|\s+$//g;\
    # Check if the word is quoted

    if ($word =~ /^\"(.*)\"$/) {
        # Extract the word without the quotes
        my $quoted_word = $1;
        # Add the word to the array
        push @words, $quoted_word;
    } else {
        # Add the word to the array
        push @words, $word;
    }
}

# Print the extracted words
print join("\n", @words), "\n";


The provided Perl code aims to extract words from a given input string that is formatted as a comma-separated list. The code utilizes regular expressions and string manipulation techniques to achieve this task.

Let's break down the code step by step to understand how it works:

use strict;
use warnings;

These lines enable strict and warnings mode in Perl, which helps in writing cleaner and safer code by enforcing stricter rules and displaying warnings for potential issues.


my $input_string = 'hello boss!,where are you?,"how, are you!",hi,"di chellam","welcome"';

Here, a variable named $input_string is declared and assigned a string value. This string represents the input from which we want to extract words. The string contains various phrases and words separated by commas. Some of the phrases are enclosed in double quotes.

my @words = ();

An empty array named @words is initialized. This array will be used to store the extracted words.


while ($input_string =~ /((?:(?<=,)|^)\s*"[^"]*"\s*(?=,|$)|[^,]*),?/g) {
    my $word = $1;
    $word =~ s/^\s+|\s+$//g;
    if ($word =~ /^\"(.*)\"$/) {
        my $quoted_word = $1;
        push @words, $quoted_word;
    } else {
        push @words, $word;
    }
}

This while loop is the heart of the code. It uses a regular expression to match and extract words from the input string. Let's break down the regular expression:

((?:(?<=,)|^)\s*"[^"]*"\s*(?=,|$)|[^,]*),?

(?:(?<=,)|^) - This part ensures that the word is either preceded by a comma or it is at the beginning of the string.

\s*"[^"]*"\s* - This part matches a word enclosed in double quotes, allowing for spaces before and after the quotes. The [^"]* matches any character except a double quote, allowing for words with commas inside the quotes.

(?=,|$) - This part ensures that the word is either followed by a comma or it is at the end of the string.

[^,]* - This part matches any word that is not enclosed in double quotes and does not contain a comma.

,? - This part matches an optional comma after each word.

The matched word is captured in the variable $word. The following lines of code remove any leading or trailing spaces from the word using the s/^\s+|\s+$//g substitution.

Next, the code checks if the word is enclosed in double quotes using the regular expression ^\"(.*)\"$. If it is, the quotes are removed, and the extracted word is stored in the variable $quoted_word. Otherwise, the original word is stored in $word.

Finally, the extracted word (either $quoted_word or $word) is added to the @words array using the push function.

print join("\n", @words), "\n";

This line prints the extracted words stored in the @words array. The join function is used to concatenate the words with a newline character (\n) as the separator. The resulting string is then printed using print.

Labels: ,

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home