Giacomo Vacca: 2010

Friday, 12 November 2010

Inside-Out Objects for Perl – I’m not convinced

When I’ve found Inside-Out Objects not only mentioned but suggested inside Perl Best Practices (see "Always Use Fully Encapsulated Objects" section) I thought: “Cool, here’s a solution to enforce encapsulation on Perl classes. This was badly needed”. But then, the list of advantages wasn’t too exciting, while the drawbacks were quite scary… See for example this analysis.

To simplify the problem, note that all I want is: “a mechanism to prevent an object attribute to be read or changed from outside the object”. In Java, C++, PHP, you just declare that class attribute as private. Simple. (Python has a form of “privatization by obfuscation” that I don’t really like, but it may be better than what Perl does – which is nothing).

Perl (5) is not a full OO programming language, just a procedural language that has got some additions to allow for OO dynamics, but still I'm surprised I can't "protect" object attributes... Getting back to Inside-Out Objects, the best analysis I’ve found is from perlmonks in 2005.

I agree with the points raised by jhedden (however I find points 1, 2, 3 and 9 “less strong” than the others):

Hash-based objects are the standard. Using a different object API may lead to maintainability issues.
Inside-out object classes aren't compatible with hash-based object classes.
Inside-out objects are just another kludge on top of Perl's already kludgy OO mechanisms.
I haven't had any problems using hash-based objects. The encapsulation and compile-time checking advantages of inside-out objects aren't important enough to induce me to use them.
The encapsulation provided by inside-out objects is unenforcable because the source code is available.
Inside-out objects require a DESTROY method, and I don't what to have to remember to write one each time I create an inside-out class.
There are too many alternative inside-out object support modules that aren't compatible with each other.
I'm leery of the 'magic' the inside-out object support modules do 'under the covers'. There may be bugs or they may lead to unexpected problems when interacting with other modules.
I tried module Foo::Bar, and had problems with it so I gave up on trying to use inside-out objects.
You can't serialize inside-out objects either in general, or using Data::Dumper or Storable.
Inside-out objects are not thread-safe because they usually use refaddr as the key for storing object data.

I was hoping perl 6 will eventually solve the problem of private attributes (which is all I want here) and it seems I’m lucky:

Attributes are defined with the has keyword, and are specified with a special syntax:

class MyClass {

has $!x;

has $!y;

}

The ! twigil specifies that this variable is an attribute on an object and not a regular variable.

Attributes are private, but you can easily add an accessor for it, that is a method of the same name that can be called from the outside of the class and returns the value of the attribute. These accessors are automatically generated for you if you declare the attribute with the . twigil instead:

class MyClass

has $.x;

has $.y;

}

When you assign to an attribute from within the class, you still have to use the ! form.

Do I like that syntax? I do not. Do that mechanism satisfy my need? Yes, thank you.

Another reason to look forward to Perl 6.

Friday, 29 October 2010

Test reports on Hudson in 30 seconds (well, let's say quickly)

There are so many articles on how to generate test reports on Hudson for perl modules or applications, that I thought it was somehow complicated. It's not.

The underlying problem is that Hudson has support for test results in JUnit format. To solve this you need to convert from TAP to JUnit the test results.

Of course there are some pre-requirement:

Your perl code must have tests (duh)
You must have Hudson set up to run those tests
You need to install TAP::Harness::JUnit on the building box
You need to install 'prove' on the building box

Then all you have to do is add this command in the shell script you execute to run the tests:

prove --harness=TAP::Harness::JUnit

and configure the "Publish JUnit Test Result Reports" section of your project to point to the resulting XML file.

No mumbo jumbo.

Since TAP::Harness::JUnit is not available as a debian package, if you need to install it properly, just follow the usual procedure to generate debian packages from CPAN modules:

wget http://search.cpan.org/CPAN/authors/id/L/LK/LKUNDRAK/TAP-Harness-JUnit-0.32.tar.gz
tar -pzxvf TAP-Harness-JUnit-0.32.tar.gz
dh-make-perl TAP-Harness-JUnit-0.32/
cd TAP-Harness-JUnit-0.32
debuild

There you go.

Saturday, 23 October 2010

A rather serious matter – deciding what to read

If you are not interested in the full post, the concept of it is: “What are the 500 books that are worth reading before I die?” What do you recommend?

Here’s the full post:

I’m a keen reader since I was a child, and I’ve recently become an enthusiastic user of Kindle.

Infinite (well, I know they are finite, but allow me this mathematically inconsistent emphasis) opportunities have suddenly become available: I can have thousands of books on my Kindle in seconds. Kindle books cost less their paper version, and often are even free (what’s slightly concerning is how easily you can buy a Kindle book: you just click on a button and your Amazon account is charged automatically - that’s why I set up a password to lock the device).

Anyway the point of this post is about decisions: what to read.

Let’s start with estimating how much I can read in my life, to give more sense on what follows. I need about one month to read a technical book of 400-500 pages, about two weeks to read an interesting novel of 200-300 pages.

I’m not sure I should consider technical books as a separate topic, because I typically like them, and so they are relevant not only for my job but in general for me (or I just can’t manage to read them, so the time spent is 0).

For sure, they require considerably more time than novels (mainly because it’s more “studying” than “reading”).

I’ll leave them out to focus more on “what I should read as a human being, rather than as engineer”, and to simplify my point.

So in a month I could read 2-3 interesting books. Holidays will probably tend to increase the average, but first I don’t have so many holidays, and second there are working months in which I don’t find the time to read as much as I’d like.

Let’s say 2 books/month. It’s 24 per year; 20 with a more conservative estimate.

20 books per year means that from today until I die – depending on how things go – I could read from 0 (damn double decker on a Sunday morning) to about 800 books.

If you just refer to some “1000 books to read before you die” articles, you see that I already can’t read 200 of them (I think I can safely remove the “Love” section for now though), and still there are thousands of options out there. Furthermore, each year new books worth reading are published – let’s say between 5 and 10? This means that even if I want to keep reading books that are really worth the time, I have to pick up now only about 600 existing books. 500 makes it a nicer number.

So the question is: “What are the 500 books that are worth reading before I die?”

What do you suggest?

(hint: consider that 40 good books will keep me quiet for a couple of years, so no need to completely drain your energies on the other 460 for the moment).

I know what books I liked so far though: I’ll write a separate post on it.

Saturday, 16 October 2010

A forgotten note

Almost a year ago I attended "Literature and Freedom : Writers in Conversation", a discussion hosted by Tate Modern which presented among the others Tahar Ben Jelloun.

Now I wish I took more notes during that discussion, but this afternoon I found one - I'd rather write it here before losing even that piece of paper:

Behind every work of fiction lies a tragedy.

It's probably something you may easily believe reading "This blinding absence of light", which definitely impressed me this Spring, and I faced like a long work of poetry rather than a novel. Here's a review from The Guardian.

Need some constructive criticism on your code? Ask perlcritic.

perlcritic is "a static source code analysis engine", a CPAN module which can also be used as a command.

An example:

$ perlcritic -severity 3 src/AModule.pm

Subroutine does not end with "return" at line 24, column 1. See page 197 of PBP. (Severity: 4)

Return value of eval not tested. at line 32, column 5. You can't depend upon the value of $@/$EVAL_ERROR to tell whether an eval failed.. (Severity: 3)

As you can see you can get extremely useful information about the sanity of your code and how much it complies to good coding practices. There are 5 different severity levels (being 5 the "less critic").

Most Policy modules are based on Damian Conway's book Perl Best Practices (PBP), but it's not limited to it.

I run perlcritic (using typically severity 3) automatically on all the source code files, every time I run the unit tests and/or build a package, in order to get as soon as possible a valuable feedback after code changes. Since the same build scripts are used by a CI tool, I'm tempted to make the build fail if policies at level 4 or 5 are violated (as I do whenever even a single unit test fails), but I'm still not sure.

Anyway I strongly recommend this tool to anyone writing perl code, even if they are just scripts with a few lines (which are very likely to grow in the future, as a law on the increase of code complexity in the time - which I can't recall now - says ;-) ).

Personally, even if I don't agree 100% on what PBP recommends, I'd rather follow it and have a documented approach than do what I prefer (maybe just because I'm used to it) without a specific, well-documented reason.

If you just want to have a quick look, upload some perl code here, and see what feedback you get.

A note on the term constructive criticism: I chose it in the title as a mere translation from something I could have written in Italian, but then reading its definition on Wikipedia I think I've learned something more. Here's a quote:

Constructive criticism, or constructive analysis, is a compassionate attitude towards the person qualified for criticism. [...] the word constructive is used so that something is created or visible outcome generated rather than the opposite.

Wednesday, 6 October 2010

Merging hashes with Perl - easy but (maybe) tricky

Merging hashes with Perl is extremely easy: you can just "list" them.

For example try this:

use strict;
use warnings;

use Data::Dumper;

my %hash1 = ('car' => 'FIAT', 'phone' => 'Nokia');
my %hash2 = ('laptop' => 'Dell', 'provider' => 'Orange');

my %mergedHash = (%hash1, %hash2);

print "hash1: " . Dumper(\%hash1);
print "hash2: " . Dumper(\%hash2);
print "merged hash: " . Dumper(\%mergedHash);

You'll see that %mergedHash contains the result of merging the two hashes.

This is the simplest but not most efficient way to do it: here Perl Cookbook presents a different method, which uses less memory - interesting if you're merging big hashes (and have memory constraints).

Now the tricky bit is that you can have this merge even when maybe you don't expect it...

For example add this code to the previous lines:

useHashes(%hash1, %hash2);

sub useHashes {
my(%hashInput1, %hashInput2) = @_;

print "hashInput1: " . Dumper(\%hashInput1);
print "hashInput2: " . Dumper(\%hashInput2);
}

What would you expect? Try it out.

What actually happens is that the two hashes are merged inside the call to useHashes(), and all works as if useHashes() received a single argument (the merged hash), instead of two separate arguments (the two original hashes).

Monday, 27 September 2010

Configuration files: an important assumption debian takes

It took me more than expected (and in fact someone found it for me) to explain why a file installed by a debian package and not listed inside debian/conffiles was handled as a standard configuration file.

The file is installed in a subdirectory of /etc/, so intuitively you may assume it must be considered as a configuration file, although this wasn't clearly stated in a document until I read this from Debian New Maintainer's Guide , chapter 5:

Since debhelper V3, dh_installdeb(1) will automatically flag any files under the /etc directory as conffiles, so if your program only has conffiles there you do not need to specify them in this file. For most package types, the only place there is (and should be conffiles) is under /etc and so this file doesn't need to exist.

The conclusion is that if you install a file in a subfolder of /etc/, you don't need to list it inside debian/conffiles.

Saturday, 25 September 2010

Order of Ignorance

This is a concept that I've always found interesting.

This evening I recalled it and found a blog article with other classifications of order of ignorance.

This is the version I already knew:

0OI: I have Zeroth Order Ignorance (0OI) when I know something and can demonstrate my lack of ignorance in some tangible form, such as by building a system that satisfies the user.

1OI: Lack of Knowledge. I have First Order Ignorance (1OI) when I don’t know something and I can readily identify that fact.

2OI: Lack of Awareness. I have Second Order Ignorance (2OI) when I don’t know that I don’t know something. That is to say, not only am I ignorant of something (I have 1OI), I am unaware of that fact.

3OI: Lack of Process. I have Third Order Ignorance (3OI) when I don’t know of a suitably efficient way to find out that I don’t know that I don’t know something. So I am unable to resolve my 2OI.

In my words:

Level 0: I know the problem and I can solve it because I've already done it before.

Level 1: I know the problem but I don't know how to solve it.

Level 2: I don't know I have a problem.

Level 3: I don't know how to identify if I have a problem.

The new (for me) concept I was reading about, applies a similar concept to uncertainty :

Variation is where assigned cause has been eliminated and only chance cause exists within a known and understood system.

Foreseen Uncertainty is where there are identifiable risks and understood issues which affect the delivery of the project but the basic market for the deliverable is understood and the business model or go-to-market strategy is understood. However, there is sufficient uncertainty that assignable cause variation will be observed and must be dealt with through aggressive issue log management.

Unforeseen Uncertainty will feel out of control most of the time and what gets delivered won't be exactly what the customer wanted or when they wanted it. This could be because the software development is happening with a new paradigm of tools or method - when teams start using MDA or DSLs for example - but it may also occur in new markets where the business model is not understood and the degree of variation cannot be predicted. The answer is to iterate frequently and plan adaptively. It is primarily this class of project which Doug DeCarlo addresses with his Extreme Project Management method.

Finally, there is Chaos, the land where we don't know what we don't know and we are trying to find out - neither the market, the business model, the customer base, the product features, or the technology are understood. It's the land of research projects.

In my words:

Variation: No uncertainty. You know the market and what to push on it.

Foreseen uncertainty: You know what problems can arise and you're ready to face them.

Unforseen uncertainty: What you deliver is different than what you've planned because uncertainties moved your scope.

Chaos: You don't know what to do and how to do it.

If I could decide, I'd choose to work in a scenario where I have Order of Ignorance 1 (I know the problem but I need to learn how to solve it) and Forseen Uncertainty, as I find it the most interesting to be in.

I know what your question might be at this point: "It's Staurday night and the most exciting thing you can do is... write this post?".

Well, you can classify your plans for a Saturday night in 4 different categories...

Tuesday, 10 August 2010

Programming: the first thing to learn

Many of us had courses, attended lectures and speeches on programming. Many of us read (even many) programming books, and regularly read articles and blogs about programming.

Have you ever thought of what should have been the first thing to learn in programming? The usual "variables, operators, data structures, classes, templates, metaprogramming, ..." or something else?

I've come to the conclusion that the most important thing, the first thing to learn, the one that if you're not familiar with you shouldn't even keep studying programming, is the answer to this question: "How can I verify my code works?".

If I was a teacher, right before writing my name on the blackboard (yes, this is how I like it to imagine it), I'd write this: "How can I verify my code works?".

And I'll spend the entire lesson waiting for someone to come up with a reasonable solution. If nobody goes close to it, I'd assign it as an essay for the following lesson.

The concept I want to stress here is simple: The Code You Write Must Be Automatically Testable.

Emphasis on: Testable.

Emphasis on: Automatic.

1. At any given moment, you must be able to launch a script and verify that your code works as expected, while the result of the verification is automatically checked (refer to Unit Testing). Better if the script is fast.

2. Before changing your code, you must first write the tests to verify your changes will do what you expect (this is called TDD, Test Driven Development - there are many interesting and well written articles on this topic). Those tests will fail: your job will be making them pass.

3. You must not commit changes when some of the tests fail.

4. You must not build a package when some of the tests fail.

5. You must not be happy until all your tests pass.

(As a collateral note, you should beware of programming languages that don't provide mature and well maintained tools for unit testing.)

What happens if you don't follow this approach? Well, you will be able to write your code, have fun with it, pass your exams or even get paid for it, but sooner or later you'll end up with one or more of these questions:

"When did I break this?"

"How happened I didn't realize this wasn't working?"

"Now what else am I going to break if I add this new feature?"

"How can I test those 10K lines of code, without touching them?"

"How comes the testing guys never invite me for a beer?"

Inside the brilliant Head First - Software Development, you find the description of this simple process:

1. Write the tests

2. Make the code work

3. Refactor

4. Back to 1

On the Perl side, "Extreme Perl" gives a pragmatic view of testing.

Things start to get more complicated when the application you're writing has to interact with other "external" entities, which may be even remote. I'll write on this in the future.

Friday, 6 August 2010

Perl modules: from a new idea to a debian package in one minute

Time ago I wrote something about debianizing a perl module.

If you're not applying it to a tar downloaded from CPAN or in general a third party module, but on your own module, then you may find it useful to build the module's dir structure using module-starter (installed with libmodule-starter-perl).

The CPAN page for module-starter contains a simple description on how to use it and this post has interesting comments on even easier ways to achieve the same result (using dzil for example).

In summary this is what you need to do:

$ module-starter --module=My::AModule --author="Giacomo Vacca" --email="giacomo.vacca@email.email" --builder=Module::Install

Inside the created dir (My-AModule):

$ perl Makefile.PL

$ make

$ make test

(I'm not running make install on purpose)

Outside of My-AModule:

$ dh-make-perl My-AModule/

(builds the debian dir using the current configuration)

Inside My-AModule:

$ debuild

$ debuild -us -uc

if you don't want to sign the deb.

Then I just suggest to have dput configured and upload the package to your repo.

Friday, 16 April 2010

Building a debian package for a CPAN module, and a known issue

The good news is that building a debian package from the source of a CPAN module is easy, thanks to the dh-make-perl tool.

The bad news is that the configuration of dh-make-perl can lead to the generation of a debian package containing unneeded files, and as a collateral effect preventing package installation as the files belong to more than one package.

For example you can end up with something like:

dpkg: error processing /var/cache/apt/archives/scnlibpoe-filter-stomp-perl_0.01_all.deb (--unpack):
trying to overwrite `/usr/lib/perl/5.8/perllocal.pod', which is also in package scnlibpoe-component-client-stomp-perl

Somebody suggests a change in the configuration file used by dh-make-perl.

The same result can be achieved by applying this change to your debian/rules file:

--- debian/rules (revision 32211)
+++ debian/rules (working copy)
@@ -51,8 +51,11 @@
- rmdir --ignore-fail-on-non-empty --parents $(TMP)/usr/lib/perl5
+ rm -rfv $(TMP)/usr/lib

Tuesday, 13 April 2010

Creating (easily) a local debian repo

The goal is to make my life easier when the same package may go to different distributions.

This is not useful per se, but it becomes so when you use custom distributions names to distinguish separate product branches.

Let's assume you have an application app, which has different flavours, for example alpha and beta.

You may want to have two separate debian repos for app, one called alpha and one called beta.

Hosts within environment A may refer to the alpha repo, while hosts in another environment, let's say environment B, may want to refer to the beta repo.

This keeps the small or big differences in app isolated.

Now, assume app reaches a point when the only difference between A and B is the distribution name, so you can have the same version for the alpha and beta repo: how can you avoid building the package twice? Is there a way to make both repos happy?

These are the questions that led me to the need of having a local debian repo, to experiment a little on this topic (I know I should read all the Debian Policy docs and come up with one, single, incontrovertible answer. But I prefer the empirical way, sometimes).

So far I've just created a local repo using reprepro, and following this easy tutorial. I'll update this post with a simple procedure to build your own repo and some results on how to manage the distributions (although I must admit that in this particular moment I expect a simple solution to be:

1. Just stick both distributions inside debian/changelog

2. Add a Provides directive to debian/control

)

Giacomo Vacca