In my current job as an assistant to a web developer, I enjoy frequent debates about how best to style PHP code. All arguments that PHP is impossible to style aside, the last few weeks have gotten me thinking a lot about the many different ways PHP offers to perform the most simple of tasks.

Take output, for example. To print a simple string to the client you simply type

or

If you want to insert a variable into that string there are several options. One debate focused on two of them: concatenation and interpolation.

This came up right at the beginning of my current project, and my new boss told me off for using concatenation, reminding me that although performance is not the be-all and end-all in the work we're doing, every little helps and I should inline variables wherever possible.

I'd slept late the night before and had only downed three mugs of strong, black coffee so far, so I wasn't really paying attention beyond that and I accepted his thoughts at face value.

But now it dawns on me that he was wrong.

When PHP parses the code (which it does on every load since it's an interpreted language), it scans string literals for interpolated variables. There are several syntaces for embedding variables into literals, and PHP has to detect all of them:

So whenever the parser comes across a $-sign or an opening curly brace, it then has to scan for either an ending curly brace, or the most likely end of a variable, and continue.

Not to mention the method is prone to some easily-made user typos and thus could be considered inexplicit and hard to use:

Today I got to reading an old blog entry from Chris Shiflett responding to an entry from Richard Davey, which in turn was an assessment of a rather poor list of PHP tips. What I'm seeing from this entry is that Chris, and indeed the majority of the commenters, prefer interpolation stylistically.

And I'm not surprised. It is, after all, easier to read:

There is no denying that the concatenated version is far harder to read at a glance.

However, as your application gets larger and you start interpolating complex variables from objects and arrays, you will notice a performance hit. Maybe not a huge one at first, but concatenation is faster. With it, PHP will do a quick scan for dollar signs and curly braces, likely find very few and rarely have to scan for variable endings. The string literals are clearly delimited and the variable bounds are explicit. It can't go wrong.

So what's the compromise? Do we err on the side of legibility and maintainability, or of speed and safety?

Is premature optimization still the root of all evil?

The next major issue to come up was that of references. Consider the following example:

Does PHP return a copy of the array, or does it automatically optimize and return a reference to the existing one?

It took me a while to figure this one out as there appears to be little reliable documentation on the subject. I spent time poring through online boilerplate tutorials which recommended the standard loop-condition optimizations, output buffering and data caching. None of them made any reference to function return behaviour, and in fact the implication was that PHP does no optimization on its own.

So I began writing explicit reference operators into my code:

Notice the ampersand in the function prototype. PHP would now return an array reference for sure. Having doubtless achieved a significant performance increase at no loss to legibility or maintainability, I was happy and continued for a while with the next issue.

But my boss wasn't so sure. Ever the stalwart of concise code, he proclaimed with certainty that the reference operator was unnecessary and that PHP would automatically notice the complexity of the returned variable and alter it to a reference accordingly. (This was not long after he'd finished instructing me on how to optimize my code with much the same language as found in those tutorials.)

So I was confused again. I had conflicting evidence. On the one hand, to write efficient PHP code one must be sure to optimize one's writing. But on the other, PHP was to automatically optimize by-ref/by-val returns? This didn't seem likely. It also struck me as odd that the language would include an explicit return-reference operator if it were unnecessary.

I went back to my function and took another look. As it turns out, my boss was right.

The engine optimizes it on its own. But the return-by-reference syntax is made available to explicitly force the behaviour if, say, you wished to return a simple variable like an integer that wouldn't usually be optimized to a reference.

So, if the engine is smart enough to do that, why can't it optimize loop conditions? Manually doing this leads to benchmarkable performance increases, as in the following example:

The latter version runs faster, although a sub-millisecond speedup won't be useful unless you're working with large loops or you perform many of them. (Of course, in this case a foreach() structure would be far more appropriate and efficient; it's an arbitrary example.)

I eventually found a very useful article on PHP optimization which, had I read it originally, would have answered all my questions from the get-go.