Friday, September 06, 2013

Some details on perl's `my` internals

Recently I've been asked why using my when you unwrap arguments is slower than unwrap them into variables outside the function:

    sub foo { my $bar = shift; ... } # slower
    # vs.
    my $bar;
    sub foo { $bar = shift; ... } # faster

Also, on other hand the following shows opposite behaviour:

    sub foo { my ($bar, $baz) = (@_); ... } # faster
    # vs.
    my ($bar, $baz);
    sub foo { ($bar, $baz) = (@_); ... } # slower

Short answer

A function has its scope and my declares variable with lexical scope, so when scope ends perl has to take additional action to close the scope.

In second case the same rule applies and second code may win, but list assignment without my on left hand side (LHS) and @_ on right hand side (RHS) has to deal with possibility that both left and right sides share the same variable, like ($foo, $bar) = ($bar, $foo). So it has to copy arguments on RHS first and this penaly is larger than the win.

Long answer and how to draw conclusions on your own

Let's look at the following code

    my $bar;
    sub foo { $bar = shift; ... } # faster

It has problem of not being recursive ready. You can not call foo from inside foo as you use "global" variable to store argument. More correct code would be:

    our $bar;
    sub foo { local $bar = shift; ... }

It sure gets slower than any of two variants.

How to see difference

You can see difference without looking into perl source code, just look at optree of your code with B::Concise module:

    $ perl -MO=Concise,foo -E 'my $aa; sub foo { $aa = shift; undef }'
    ...
    4        <2> sassign vKS/2 ->5
    2           <0> shift s* ->3
    3           <0> padsv[$aa:FAKE:] sRM* ->4
    ...

I left out other lines to highlight assignment code. Compare above to:

    $ perl -MO=Concise,foo -E 'sub foo { my $aa = shift; undef}'
    ...
    4        <2> sassign vKS/2 ->5
    2           <0> shift s* ->3
    3           <0> padsv[$aa:47,48] sRM*/LVINTRO ->4
    ...

As you can see primary difference is in LVINTRO flag on padsv operation. pad* operations fetch a variable declared with 'my' from special lists for such variables (called PADs or PADLISTs). It's way harder to figure out what LVINTRO means without looking into source code, but in most cases it means that operation should be "localized". Whatever localization means for the operation.

How to find it in the perl code

Let's look at the code of padsv. You can find code by looking for pp_padsv in pp_*.c files:

    $ ack pp_padsv -A 30 pp_*.c
    pp_hot.c
    392:PP(pp_padsv)
    393-{
    ...
    406-    if (op->op_flags & OPf_MOD) {
    407-        if (op->op_private & OPpLVAL_INTRO)
    408-            if (!(op->op_private & OPpPAD_STATE))
    409-                save_clearsv(padentry);
    ...

Once again highlighted relevant code that says: if padsv is in lvalue context, localizing and it's not state declarator then save (not safe) our target variable from the PAD to be cleared when scope ends.

Excercise for readers

Now you have all the tools to figure out differences in other mentioned situations, here commands to get you there quickier, so you don't have excuse to skip:

    perl -MO=Concise,foo -E 'sub foo { local $aa = shift; undef}'

    perl -MO=Concise,foo -E 'my ($aa, $bb); sub foo { ($aa, $bb) = (@_); undef }'

    perl -MO=Concise,foo -E 'sub foo { my ($aa, $bb) = (@_); undef }'

    perl -MO=Concise,foo -E 'my ($aa, $bb); sub foo { ($aa, $bb) = (shift, shift); undef }'

A comment with explanation of list assignment would be nice :)