Arrays vs. Lists in Perl: What's the Difference?

| No Comments | No TrackBacks

One of the most common sources of confusion for new Perl programmers is the difference between arrays and lists. Although they sometimes look similar, they are very different things, and many bugs and misunderstandings are caused by not having a full understanding of the differences. Even experienced Perl programmers sometimes think arrays and lists are the same, but they are quite different in several important ways.

Here's a table which outlines some of the differences. Each of these differences is discussed in detail below.

ArraysLists
...are variables....are ephemeral.
...can be changed....are immutable.
...can be named or anonymous....never have names.
...use the sigil @....do not have sigils.
...can be referenced....distribute references.
...can not be passed to or returned from subroutines....are the only thing passed to and returned from subroutines.
...can be multidimensional....are always one-dimensional.
...have known behavior in scalar context....do not exist in scalar context.

Arrays are variables. Lists are...not

Perl has three basic variable types: scalars, arrays, and hashes. Scalars are single values: numbers, strings, and so on. They also include a type of pointer called a reference. All the data that you work with in Perl are scalar in nature; arrays and hashes are collections of scalar values. Arrays are ordered sequences of scalars, and hashes are unordered baskets of scalars, each of which is associated with a single key. Hash keys are always strings.

Most of the confusion about lists and arrays comes from the fact that arrays are often initialized using lists.

my @foo = ( 4, 5, 6 );

In the example above, the thing on the right-hand side of the equals sign is a list. We assign that list to the variable @foo to initialize it. That variable is an array; we know that because it begins with the @ sigil character. So a list can be assigned to an array, but arrays and lists are not the same thing. We could construct the exact same array this way:

my @foo;
$foo[0] = 4;
$foo[1] = 5;
$foo[2] = 6;

Or even like this:

my @foo;
push @foo, $_ for 4..6;

Lists can also be assigned to hashes. You've probably seen hash initialization code like this:

my %hash = ( foo => 42, bar => 43, baz => 44 );

This doesn't look exactly like the first list example, due to the presence of the => operator. That operator is known as the "fat comma" and is actually equivalent to a comma that quotes a bareword on its left side. So the above example is exactly equivalent to:

my %hash = ( 'foo', 42, 'bar', 43, 'baz', 44 );

Both arrays and hashes can be initialized with lists, and both can return lists when used in certain ways (see below.) But neither arrays nor hashes are lists; they are distinct variable types. A list can never be manipulated as a variable can be; they are ephemeral and their existence does not persist beyond the place where they are used.

Arrays can be changed

The value of a variable can vary over the course of your program. Here's an example of an array which is declared, populated, then emptied, then populated again:

my @array = ( 1, 2, 3 );
push @array, $_ for 4..8;
printf "value = %d\n", shift @array while @array;
@array = ( 'Larry', 'Curly', 'Moe' );
print "My stooges are: " . join( ", ", @array ) . "\n";

Try running this code to see what happens. In this example we declare an array called @array, and modify it in a few different ways. First, we assign a simple three-element list to the array when we declare it. Then, we use the push operator to append the numbers 4 through 8 to the array. push actually takes a list as its second argument; in this case we are pushing a list of one item five times. We could have also written one of the following equivalent operations:

push @array, 4..8;
push @array, 4, 5, 6, 7, 8;

The next step is to use shift to destructively loop through the array. shift removes the first element of an array and returns it. The while condition will keep executing the statement as long as @array is a true value; it will become false when it is empty.

Then we repopulate the array with completely new values, strings this time, and use it to print out a message.

What happens if we try to use the same kinds of operations on lists?

perl -e 'push ( 3, 4, 5 ), (6, 7)'
Type of arg 1 to push must be array (not constant item) at -e line 1, near "5 )"

This is a fatal error, because lists can't be modified.

Arrays can have names, lists can't

So far we've looked at examples of named arrays like @foo or @array. But not all arrays in Perl have names. Square brackets can be used to create a reference to an anonymous array:

[ 1, 2, 3, 'foo', 'bar' ]

What's that inside the square brackets? A list! That list gets assigned to an unnamed array in Perl's memory pool, and the expression returns a reference to that array. Perl references are like pointers; they hold the memory address of another variable. All references are scalars, therefore, we can save the anonymous array reference to a variable like this:

my $ref = [ 1, 2, 3, 'foo', 'bar' ];

By contrast, lists never have names. If you think about the idea of ephemerality that we learned above, this makes sense. Lists go away immediately after they are used. For example, the list inside our anonymous array reference disappears once the array to which it is assigned is constructed. They never persist, so they never need a name by which one can refer to them.

Arrays have sigils

Perl uses sigils to indicate the basic types of variables.

my $scalar = "a single value";
my @array = ( "some", "ordered", "values" );
my %hash = ( a => "mapping", of => "keys", to => "values" );

Even when dealing with references, we can use the sigil of the underlying type to get at the referenced value, even if it is anonymous.

my $ref = [ 1, 2, 3 ];
push @$ref, 4, 5, 6;
print "$_\n" for @$ref;

Lists never have sigils, because lists aren't variables. Lists appear in expressions, but can never be modified or referenced.

Arrays can be referenced

So far we've looked at anonymous array references, but named arrays can also be referenced:

my @array = ( 'foobity', 'doobity' );
my $ref = \@array;

What happens if we try to take a reference to a list? The results may be a little surprising.

perl -le 'print for \( 4, 5, 6 );'
SCALAR(0x7f9c3a82a138)
SCALAR(0x7f9c3a82a150)
SCALAR(0x7f9c3a82a168)

What's going on here? When you apply the reference operator (\) on a list, the result is a new list which distributes the reference operator over all of its items. Thus, the expression

\( 4, 5, 6 )

is equivalent to

( \4, \5, \6 )

a list of three scalar references. The odd output of the code sample above is what references look like when they are printed out. The format consists of the type of thing they reference (scalars in this case) and the memory address of the referenced item.

References can only point to variables, and lists are not variables. It makes sense that they can't be referenced.

Lists can be passed and returned to subroutines

Now that we've covered the mechanics of how lists and arrays work, it's time to look at an aspect of Perl that is often extremely confusing for beginners.

The arguments passed to subroutines are a list. Similarly, the values returned from subroutines are a list. There are no exceptions. Let's look at a few examples:

sub foo { 
    my @args = @_;
    print "$_\n" for @args;
}

my @numbs = ( 1.41421, 2.71828, 3.14159 );
my @ducks = ( 'Huey', 'Dewey', 'Louie' );

foo( @numbs, @ducks );

The output of this example is

1.41421
2.71828
3.14159
Huey
Dewey
Louie

Once we got into the subroutine, Perl treated our two arrays as one. Why? When an expression containing an array is evaluated in list context, the array returns a list of its elements. That means our subroutine call is equivalent to

foo( 1.41421, 2.71828, 3.14159, 'Huey', 'Dewey', 'Louie' );

Perl puts subroutine arguments into the special array @_, and we then assign that array to a lexical array called @args to work with it. Therefore, our assignment to @args is the same as:

my @args = ( 1.41421, 2.71828, 3.14159, 'Huey', 'Dewey', 'Louie' );

We've lost information by concatenating the lists returned by two arrays into one big list. What about hashes?

my %toons = ( bugs => "bunny", daffy => "duck", donald => "duck", mickey => "mouse" );

foo( @ducks, %toons );

Just like arrays, hashes will also return a list of their elements when used in list context. In this case, the list consists of key-value pairs. (Note that the order of the pairs may not be the same as when you assigned them!) Here's what the output for this example might look like:

Huey
Dewey
Louie
daffy
duck
mickey
mouse
donald
duck
bugs
bunny

We have a jumbled mess of keys and values and elements from an array, with no information about how to reconstruct these objects.

When passing a single array or hash to a subroutine, there's really no problem, because we can always construct a new array or hash from the argument list. But when we want to pass multiple aggregates, the only way to do so is to use references. Since references are scalars, we know where they'll be in the argument list and can copy or dereference their underlying values as necessary.

Return values from subroutines work the same way.

sub bar { 
    my @captains = ( "Kirk", "Picard", "Sisko", "Janeway" );
    my @ferengi = ( "Quark", "Nog", "Rom", "Brunt", "Zek" );
    return @captains, @ferengi;
}

print "$_\n" for bar();

Just like the first example, our two arrays will be flattened into one list. This code is equivalent to

print "$_\n" for ( "Kirk", "Picard", "Sisko", "Janeway", "Quark", "Nog", "Rom", "Brunt", "Zek" );

We can also use references to solve the flattening problem for return values.

Arrays can be multidimensional

So far we've only looked at one-dimensional structures. Arrays and hashes can only contain scalars, but references are scalar values and they can point to other arrays and hashes. That means we can make multidimensional arrays in Perl easily:

my @first_row = ( 1, 2, 3 );
my @second_row = ( 4, 5, 6 );
my @multi = ( \@first_row, \@second_row );

print $multi[1][2];    # prints 6

We could also use anonymous array references to make that a little cleaner.

my @multi = ( [ 1, 2, 3 ], [ 4, 5, 6 ] );

On the other hand, lists are always one-dimensional. What happens if we try to put a list in a list?

( 'foo', 'bar', ( 'baz', 'quux' ) )

That's exactly equivalent to

( 'foo', 'bar', 'baz', 'quux' )

Arrays may be used in scalar context

Finally, we come to one of the most misunderstood aspects of lists, arrays, and context in Perl. When you assign an expression to a scalar variable, that puts the expression in scalar context. Many things in Perl behave differently depending on contexts, including arrays. If you write

my @things = ( "doodad", "doohickey", "widget", "thingamajig" );
my $scalar = @things;

the value of $scalar will be 4. That's because when you evaluate an array in scalar context, it returns the number of elements it contains. What happens if we try the same thing with a list?

my $scalar = ( "doodad", "doohickey", "widget", "thingamajig" );

In this case, the value of $scalar will be thingamajig! Many people assume that this means that a list in scalar context returns its last item for some reason instead of its size. But that's not actually correct. Although what we wrote looks like a list, lists do not exist in scalar context. What we have is an expression in scalar context that uses the comma operator. In list context, that operator separates items of a list, but in scalar context, it behaves differently: it returns its right-hand operand. Our expression therefore reduces as follows:

my $scalar = ( "doodad", "doohickey", "widget", "thingamajig" );
my $scalar = ( "doohickey", "widget", "thingamajig" );
my $scalar = ( "widget", "thingamajig" );
my $scalar = ( "thingamajig" );

That's why you get the last item in the expression.

Acknowledgements

My thanks to Tim Heaney for pointing out a major error in one of the code examples in this post.

No TrackBacks

TrackBack URL: http://friedo.com/cgi-bin/mt/mt-tb.cgi/22

Leave a comment

About this Entry

This page contains a single entry by Mike Friedman published on July 10, 2013 3:21 PM.

MongoDB Driver 0.502 Released was the previous entry in this blog.

Removing Perl Boilerplate with Import::Into is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 5.14-en