Monday, April 24, 2006

Removing trailing and leading spaces in PERL

This was hidden somewhere in my forum and thought I would post it here:

I have something like this:

a231aaaa 321bbbbbbbb cccccccccc123

I have designed a parser with the space as the delimiter to accept 3 tokens.

The grammer is something like this:

/s*(.*)/s+(.*)/s+(.*)

The problem is after parsing I get spaces included in my tokens (original strin g had some trailing spaces). How do I get to remove the spaces out of the tokens?

Here is a small piece of code that does the trick for both ends.

Method 1:
$string =~s/^\s+//; -----> Front
$string =~s/\s+$//; -----> End

Some books even have this code:

Method 2: $string =~s[^\s*(.*?)\s*$][$1];

Method 1 is faster and much better than Method 2. There's a very good reason for that. The method 1 does not require any backtracking and can execute very quickly. The method 2 can involve a great deal of backtracking and, in the worst case, could take a very long time indeed. As a contrived example, run this:

$string = ' a' . ' ' x 100000 . 'z ';
print "Starting first trim method\n";
$string =~ s/^ +//; - FRONT
$string =~ s/ +$//; - END
print "Finished\n"; # Instantly

$string = ' a' . ' ' x 100000 . 'z ';
print "Starting second trim method\n";
$string =~ s/^ *(.*?) *$/$1/;
print "Finished\n"; # Six minutes later... zzzzzzz...

I understand the natural desire to express the conceptually atomic trim operation as a single line. One single line the method is:

s/^ +//, s/ +$// for $string;

Even better, this generalises neatly for more than one string:

s/^ +//, s/ +$// for $string1, $string2, $string3;
s/^ +//, s/ +$// for @whole_file_of_lines;

Credit duely given to all those who helped posting the answer on some other perl forums, Thanks!

0 Comments:

Post a Comment

<< Home