Articles » 2008

Articles
Month: July 2008

31st Jul 2008

Jar Head Bear Meets Tragic End

Jar Head Bear A two-year old wild black bear has been sadly shot after it wandered into Minnesota with a plastic jar on its head. The jar had likely become lodged around the animals head as it foraged for food, and whilst the bear could breathe it could not eat or drink.

A local wildlife supervisor said efforts had been made to capture the bear alive as it moved through areas near the town of Lake George (where it was first spotted on 21 July) but that efforts to tranquilise the animal failed because the bear "stayed in forested areas".

Surely, though, if it was easy enough to shoot the bear dead as it entered a city, it would have been easy enough to shoot it with a tranquiliser at that point, haul it back into the woods, cut off the jar and leave some tasty food behind. So I see this as a pretty pointless waste of a beautiful life. What a shame.

Tags: News
Permalink | No Comments

30th Jul 2008

How To Provoke RSS Duplicates in Thunderbird

Mozilla Thunderbird Logo Aside from this blog I maintain a handful of other publications with feeds, and for convenience I have them aggregated in Mozilla Thunderbird v2.0.0.16 alongside all my mail. It's not a completely solid piece of software but it does the job. However, I managed to find one its foibles just the other day.

On one of my publications the "title" holds important metadata about an entry. In this case the item that the publication is about had just entered a new phase, and although we were already three days into it I decided to put a day count in entry titles from thereon in. Needless to say, I wanted the existing three entries to have day counts in the entry titles for consistency, so I decided to go back and edit them. That was fine.

But then I went into Thunderbird and the old title tags were still on the existing entries. Well, of course. I wouldn't expect any RSS aggregator to re-retrieve a post that had already been stored. They deliberately don't do that. So I figured if I just deleted the old entries, they'd be re-downloaded the next time Thunderbird sync'd with the internet.

No. They weren't.

Poking around in Thunderbird's internals can often be fun and it's surprisingly easy, thanks to the filesystem-based message storage. I have my profile stored in F:\thunderbird, and I soon deduced that the file F:\thunderbird\Mail\News & Blogs\feeditems.rdf was responsible for 'remembering' the UIDs of the most recently downloaded entries, thereby ensuring against duplicates.

So I figured, if I just erased the file's contents, the cache would have to be re-built, the last five entries would be re-downloaded, and I could simply delete the duplicates manually this one time. That's exactly what happened, the first time. And the second time, and the third… in fact, the cache no longer seemed to be working at all, even though it was being reconstructed inside feeditems.rdf.

I struggled with this for some time, recreating subscriptions, shuffling folders and even removing the entire post history. Nothing worked. In frustration my finger began to hover over the "delete" key and, as it happens, that file was selected in Directory Opus at the time. feeditems.rdf was gone, and suddenly the "new mail" alert sound stopped ringing in my ears. The duplicates had stopped piling up on each other.

I checked F:\thunderbird\Mail\News & Blogs\ and, indeed, feeditems.rdf had been recreated and reconstructed, and this time it was working properly. To get a clean setup back I purged my feed folder, deleted feeditems.rdf, did a Thunderbird sync and everything was how it should be. My blog entries were back and they were not duplicating.

Conclusion

As it turned out, if feeditems.rdf was empty or otherwise did not contain valid RDF XML, the caching system gets totally broken. But as long as you remove the file entirely it will be reconstructed on next sync. I can only imagine that there's some flag inside the application that isn't being set properly if feeditems.rdf doesn't physically need recreating on disk, even if its contents need reconstructing.

And also as it turns out, a far better way of re-grabbing a blog entry is to just remove its <RDF:Description/> tag from feeditems.rdf. I'm not really sure why I didn't think of that before.

Tags: Software
Permalink | No Comments

29th Jul 2008

C++'s Three Asterisks

Someone came into #C++ this evening with what turned out to be a very simple compiler error.

In header.c:

MyObjCluster::MyObjCluster( const std::vector<MyObj> a, std::vector<MyObjCluster>& b) { / ... */ };

In main.c:

std::vector<MyObj> A = new std::vector<MyObj>(); std::vector<MyObjCluster> B; MyObjCluster testfinder = new MyObjCluster(A,&B);

The error? "Conversion from std::vector<MyObjCluster, std::allocator<MyObjCluster> >* to non-scalar type std::vector<MyObjCluster, std::allocator<MyObjCluster> > requested."

And C++'s confusing type system strikes once again.

The guy was confused, thinking that you need the '&' in both the function signature and the calling line to pass by reference, and in this case the solution is of course to simply omit the '&' before B in main.c.

"Messy type system?", you ask dubiously. Here's a typedef that will have the token myFunctionPointer represent the function void* myFunction(int a, char* b):

typedef void (myFunctionPointer)(int, char *);

Yes, it's a mess. Add to that the fact that the asterisk has two — largely distinct — meanings (or three if you count multiplication on top of pointer-type and dereferencing) as does ampersand for just the same reason, which is what started this particular topic off.

Discuss.

Tags: C++
Permalink | No Comments

28th Jul 2008

How Cool Is Cuil, Really?

CNN reports that Anna Patterson — the brain behind Google's 2004 search engine upgrade — has banded together with her friends to create a new service that she feels will supercede the beast with all its shortcomings.

The project, named "Cuil", is backed by $33m in venture capital and opened for business this morning. Ms Patterson believes that, amongst other things, the fact that Google's look-and-feel hasn't changed significantly in ten years is not beneficial, so she's come up with a new interface and a new way of displaying results.

Cool idea. However, upon taking the service for a spin, I immediately missed that most blessed of Google's features: the one which removes (or at least hides) similar sites from its index. As it turns out, having the biggest number of sites indexed doesn't necessarily give you the best search service when there are so many cloning/mirroring sites out there. So far I'm on page 10 for my name and I'm still seeing the same random blog comment returned over and over again, recorded by Cuil's indexer from the gazillion sites that mirrored the article in question.

So, Cuil. Brave? Yes. A good idea? Yes. Nice layout? Yes. Is it refreshing to see someone tackle the search monster head-on with a realistic attempt? Absolutely.

But is it actually useful? Maybe not. And to be honest, I enjoy the simplicity of Google's results page. Perhaps there's a good reason that it hasn't changed in ten years.

Tags: Internet, News
Permalink | [2] Comments

28th Jul 2008

Belkin F5D7632 Wireless ADSL Modem/Router

I bought this router to replace my faithful old Netgear DG834G that our ADSL ISP suspected was contributing to some ridiculous loss of sync issues. It was direct off-the-shelf and I hadn't read any reviews, but knowing to stay away from Linksys and being careful not to go with the same brand that might have been ruining my connection, I figured Belkin would be safe.

So I took it home, plugged it in, and immediately noticed a speed improvement. The web-based configuration system is easy-to-use and comprehensive. I experienced a minor glitch that night and had to reboot the device, but I upgraded the firmware and thought little of it until the next day.

It wasn't until then that I realised this was no minor glitch. The router was hanging up at least once per day. I finally got around to reading some reviews online and discovered that overheating and crashing was a problem with Belkins. People had found themselves with devices that would never remain working for a straight day, and here I was having to power cycle it constantly. Were it just myself needing to use it that might have been simply an annoyance, but with an entire family wanting to use the internet and with me not always around/awake, that was a problem.

It was always just freezing, either. Sometimes the config system would be accessible, and sometimes it wouldn't. But no matter what happened and no matter how long it took me to notice and power cycle it (be it five minutes or an hour), as soon as the device came back, the internet connection was available. And it's not even overheating any more, not now that I have it hanging out a window into the fresh summer air.

So that's a firm thumbs down, which is a shame because despite the fact that it doesn't work it's a full-featured device and provides a fantastic service when it's operational. But the more-than-daily crashing renders it useless, and even at Â£50 retail that's not what you expect from a world-class brand.

Tags: Reviews
Permalink | No Comments

28th Jul 2008

Writing PHP With Style

In my current job as an assistant to a web developer, I enjoy frequent debates about how best to style PHP code. All arguments that PHP is impossible to style aside, the last few weeks have gotten me thinking a lot about the many different ways PHP offers to perform the most simple of tasks.

Take output, for example. To print a simple string to the client you simply type

print "Hello World!";

echo "Hello World!";

If you want to insert a variable into that string there are several options. One debate focused on two of them: concatenation and interpolation.

echo "Hello $name!"; // interpolation echo "Hello " . $name . "!"; // concatenation

This came up right at the beginning of my current project, and my new boss told me off for using concatenation, reminding me that although performance is not the be-all and end-all in the work we're doing, every little helps and I should inline variables wherever possible.

I'd slept late the night before and had only downed three mugs of strong, black coffee so far, so I wasn't really paying attention beyond that and I accepted his thoughts at face value.

But now it dawns on me that he was wrong.

When PHP parses the code (which it does on every load since it's an interpreted language), it scans string literals for interpolated variables. There are several syntaces for embedding variables into literals, and PHP has to detect all of them:

$name = "John"; $person = Array('name' => "John"); $guy = (Object)$person; echo "Hello ${name}!"; echo "Hello {$name}!"; echo "Hello $person[name]!"; echo "Hello {$person['name']}!"; echo "Hello {$guy->name}!";

So whenever the parser comes across a $-sign or an opening curly brace, it then has to scan for either an ending curly brace, or the most likely end of a variable, and continue.

Not to mention the method is prone to some easily-made user typos and thus could be considered inexplicit and hard to use:

$thing = "guy"; echo "Hello $things!"; // Variable $things is undefined // Expected output: Hello guys! // Output: Hello !

Today I got to reading an old blog entry from Chris Shiflett responding to an entry from Richard Davey, which in turn was an assessment of a rather poor list of PHP tips. What I'm seeing from this entry is that Chris, and indeed the majority of the commenters, prefer interpolation stylistically.

And I'm not surprised. It is, after all, easier to read:

$name = "John"; $verb = "Speaking"; $myCountry = "England"; $language = "PHP"; echo "Hi $name, I am $verb to you from $myCountry and writing in $language."; echo "Hi " . $name . ", I am " . $verb . " to you from " . $myCountry . " and writing in " . $language . ".";

There is no denying that the concatenated version is far harder to read at a glance.

However, as your application gets larger and you start interpolating complex variables from objects and arrays, you will notice a performance hit. Maybe not a huge one at first, but concatenation is faster. With it, PHP will do a quick scan for dollar signs and curly braces, likely find very few and rarely have to scan for variable endings. The string literals are clearly delimited and the variable bounds are explicit. It can't go wrong.

So what's the compromise? Do we err on the side of legibility and maintainability, or of speed and safety?

Is premature optimization still the root of all evil?

The next major issue to come up was that of references. Consider the following example:

function a() { global $db; $data = Array(); $result = $db->query("SELECT * FROM largetable"); while ($row = $result->fetch_assoc()) $data[] = $row; $result->close(); return $data; }

Does PHP return a copy of the array, or does it automatically optimize and return a reference to the existing one?

It took me a while to figure this one out as there appears to be little reliable documentation on the subject. I spent time poring through online boilerplate tutorials which recommended the standard loop-condition optimizations, output buffering and data caching. None of them made any reference to function return behaviour, and in fact the implication was that PHP does no optimization on its own.

So I began writing explicit reference operators into my code:

function &a() { global $db; $data = Array(); $result = $db->query("SELECT * FROM large_table"); while ($row = $result->fetch_assoc()) $data[] = $row; $result->close(); return $data; }

Notice the ampersand in the function prototype. PHP would now return an array reference for sure. Having doubtless achieved a significant performance increase at no loss to legibility or maintainability, I was happy and continued for a while with the next issue.

But my boss wasn't so sure. Ever the stalwart of concise code, he proclaimed with certainty that the reference operator was unnecessary and that PHP would automatically notice the complexity of the returned variable and alter it to a reference accordingly. (This was not long after he'd finished instructing me on how to optimize my code with much the same language as found in those tutorials.)

So I was confused again. I had conflicting evidence. On the one hand, to write efficient PHP code one must be sure to optimize one's writing. But on the other, PHP was to automatically optimize by-ref/by-val returns? This didn't seem likely. It also struck me as odd that the language would include an explicit return-reference operator if it were unnecessary.

I went back to my function and took another look. As it turns out, my boss was right.

The engine optimizes it on its own. But the return-by-reference syntax is made available to explicitly force the behaviour if, say, you wished to return a simple variable like an integer that wouldn't usually be optimized to a reference.

So, if the engine is smart enough to do that, why can't it optimize loop conditions? Manually doing this leads to benchmarkable performance increases, as in the following example:

$myArray = ...; for ($i = 0; $i < count($myArray); $i++) { echo $myArray[$i]; } for ($i = 0, $max = count($myArray); $i < $max; $i++) { echo $myArray[$i]; }

The latter version runs faster, although a sub-millisecond speedup won't be useful unless you're working with large loops or you perform many of them. (Of course, in this case a foreach() structure would be far more appropriate and efficient; it's an arbitrary example.)

I eventually found a very useful article on PHP optimization which, had I read it originally, would have answered all my questions from the get-go.

Tags: PHP
Permalink | [2] Comments

Tom Lachecki

(Tomalak Geret'kal)

Recent Posts

Tags

Archives