Giacomo Vacca

Friday, 19 October 2012

ejabberd clustering: a clever idea for Mnesia replication setup

The process of configuring a cluster of ejabberd entities serving the same domain is apparently simple (considering the ejabberd Installation and Operation Guide).

Assuming you have a node running, you create a second node with the same Erlang cookie, then setup Mnesia replication between the two. Assuming both ejabberd instances have a similar configuration setup and can connect to each other over the network, you're pretty much done.
This can be generalized and made N times to build your cluster.

ejabberd uses Mnesia as its internal DB. Although you can easily move to MySQL as storage backend, it's important to note that Mnesia is still necessary to successfully build the cluster.

ejabberdctl is the control script that can be used to start, stop, restart ejabberd. It can be also used to attach a debug console to a running ejabberd instance, or execute any command exposed by an ejabberd module.

The idea presented in Easy ejabberd clustering procedure is to extend ejabberdctl with an additional command ("attach") which hides the complexity of connecting to an Erlang node and setup Mnesia replication. I've found it an interesting idea, firstly because it becomes easier to automate the clustering configuration, secondly because goes in the right direction of hiding the setup complexities behind a single control tool.

Check the table size of a MySQL DB

I wanted to create a MySQL dump of a small DB, so run mysqldump and realized that what I was expecting to take a few MB was instead half a GB.
Where did things go wrong? Which table is storing such an unexpected amount of data?

Googling around I've found a few interesting posts on how to check the size of a table, and then selected this one.

My suggested approach is:

SELECT TABLE_NAME, table_rows, data_length, index_length, 
round(((data_length + index_length) / 1024 / 1024),2) "Size in MB"
FROM information_schema.TABLES WHERE table_schema = "schema_name";

where schema_name is the DB you're interested in.

I've then immediately spotted the "offending" table :-)

Monday, 9 April 2012

debian - Managing maintainer scripts for packages with multple .debs

If you've ever installed a package from command line on Linux, you must have noticed two main prompts related to package configuration: one asking what to do with installed configuration files with local changes, and one providing feedback right after the installation/upgrade/removal/purge has completed.

Under Debian the latter was most probably the postinst script, one of the Package Maintainer Scripts which is executed after installation and configuration.

These diagrams are very useful to understand what happens to the Maintainer Scripts in different circumstances: first installation, upgrade, removal, purge.

Their name is probably self-explanatory: preinst, postinst, prerm, postrm. Each of them takes zero or more arguments depending on the scenario. It can help to understand that those are really just scripts executed by dpkg - and typically they are shell scripts, sometimes with interactive prompts.

If you're building your own packages, you surely already have one of more of those scripts in the source debian/ directory. They are automatically included in the .deb by dpkg-buildpackage (or any of its wrappers) during the build.

What you may find interesting - and this is the purpose of this article - is that if you're building multiple .deb packages from the same debian/ directory (i.e. using the same debian/control makefile), having for example a single postinst script is not sufficient. Only the first built .deb will contain it, while the others won't.
The symptom in this example is that when installing the packages other than the first built, you won't see any of the prompts, feedback or actions you are delegating to postinst.

To fix this, it's sufficient to include for example a postinst script per package:
debian/control
debian/rules
[...]
debian/package_1.postinst
debian/package_2.postinst
debian/package_3.postinst

In case you're wondering, yes, you can have a single postinst script and just create a symbolic link to it for each package.

Tuesday, 3 April 2012

CANCELing a call - Trip-wires for the SIP fans

SIP is a relatively simple, text-based, human readable protocol that is now the standard de facto for VoIP signalling.
The protocol though (in my opinion!) is a little tricky, where typically the tricks are: details.

In this first post of a series of "Trip-wires for the SIP fans", I'll talk about CANCEL.

The main concept is easy: a caller may decide to cancel a call before this is answered. To do this, it sends a CANCEL request to the called party.

What's important to know is that:
A CANCEL request relates to an INVITE request, and does not relate to the SIP dialog the request may have created (or will create).

For this reason the To header tag must be the same as the INVITE request, even if meanwhile there's been a provisional response to the INVITE creating a dialog (e.g. a 180 with a tag in the To header).

From RFC 3261, 9.1:

The following procedures are used to construct a CANCEL request. The
Request-URI, Call-ID, To, the numeric part of CSeq, and From header
fields in the CANCEL request MUST be identical to those in the
request being cancelled, including tags. A CANCEL constructed by a
client MUST have only a single Via header field value matching the
top Via value in the request being cancelled. Using the same values
for these header fields allows the CANCEL to be matched with the
request it cancels [...].

Although not so common I guess, consider that a CANCEL can be sent also for a re-INVITE (i.e. a request to update an ongoing session, e.g. to add video). In this case there will be a tag in the To header, as the INVITE is an in-dialog request and the CANCEL just refers to it.

And if you use sipp unfortunately no, you can't rely on the [branch] syntax to assign a branch to the Via in the CANCEL request, because sipp doesn't have perception of precedent open transactions.
If the branch in the CANCEL's Via header is different than the one in the INVITE top most Via, the UAS will not be able to match the two requests and canceling will (most likely) fail.

Sunday, 4 March 2012

Script vs Program - A pragmatic view

First, it'd be useless to talk about the distinction between a "scripting language' and a 'programming language', because it's clear that the same language can be used in different contexts and environments, be interpreted in some cases or compiled in others.

The only distinction worth discussing in my opinion is whether a portion of source code is a script or a program.

A very easy conclusion can be found in "Building Skills in Python", S. F. Lott:

The “scripting” distinction is an operational feature of POSIX-compliant operating systems. Files which begin with the ‘#!/path/to/interpreter’ will be used as scripts by the OS. They can be executed from the command-line because the interpreter is named in the first line of the file.
Languages like Java, C and C++ do not have this feature; these files must be compiled before they can be executed.

So what happens if you have, say, a couple of thousands lines of Perl code, distributed in about a hundred classes, using a few CPAN modules?

It'll be interpreted, not compiled, but would you present it as a script? I don't think so.

Also, the "shebang" (#!/path/to/interpreter) is not really the point, is it? You can omit it and then specify which interpreter has to be used and still have an interpreted execution.

See this other definition (Think Python - How to Think Like a Computer Scientist, A. Downey):

script: A program stored in a file (usually one that will be interpreted)

As superficial as this may seem, I think this is actually getting to the point, so here comes my personal definition of "script":

A script is a sequence of instructions, stored in a file, which can be directly executed by an interpreter.

Of course this has be taken in an honest and pragmatic way. A good software developer won't put the thousands of lines of Perl mentioned above in a single file, even if it's absolutely legal.

When a script starts to become so big (say more than a page? - 200 lines of code or so) to require the inclusion of other files, depending on my definition that becomes a program, even if it's still interpreted.

What do you think?

Monday, 13 February 2012

A Practical Approach to the RSA Algorithm

So you're curious about the RSA algorithm, you like Internet technologies and mathematics?
This is a short article on how the RSA Algorithm works, with practical examples.

Preface and Disclaimer
Obviously the article itself and the code here presented are suitable for a learning activity only and in case you use them, you do it at your own responsibility.

I based this on the 1977's article from Rivest, Shamir and Adleman, (from which initial letters the algorithm takes the name), and ran the code on a Mac with python 2.6.1.

The code presented here is also not optimized. The reader is invited to suggest better implementations (in languages other than python as well).

Mathematics concepts you need to know
- What a 'prime number' is. (A integer number which can be divided evenly only by 1 and itself. e.g. 1, 2, 3, 5, 7, 11, ..., 47, ...)
- What the 'modulus' (here referred as 'mod') operation is. ('A mod B' is the reminder of A divided B. e.g. 7 mod 3 = 1)
- What 'factoring a number' means. (Finding the list of prime numbers that multiplied give that number. e.g. Factoring 12 leads to 2, 2, 3, because 2 * 2 * 3 = 12)

Usage of RSA
This algorithm is used to define a cryptographic system which uses a Public Key to encrypt messages, and a Private Key to decrypt them.

If Alice wants to send encrypted data to Bob, Alice encrypts the data with Bob's Public Key.
Only Bob, with his Private Key, can decrypt the data.

Security is the core of this algorithm, for which, if the keys are properly chosen, it's not possible, or so computationally hard to become impractical, to decrypt the data unless you have Bob's private key.

For this reason, the Public Key can be exchanged from Bob to Alice through an insecure transport, e.g. public Internet.

The mathematics behind it
The Public Key is a pair of integer numbers: e and n. (You can associate 'e' with 'e'ncryption).
The Private Key is a pair of integer numbers: d and n, (You can associate 'd' with 'd'ecryption).

'n' is computed as the product of two large prime numbers, p and q.
'd' is another prime number, bigger than p and q and 'relatively prime to p and q', for which exists an integer 'e', with this property:

e * d = 1 (mod fi)

where

fi = (p - 1) * (q - 1)

that is, the product of d and e, mod fi, equals 1, and said in other terms, 'd is the multiplicative inverse of e, modulus fi'.

'd is relatively prime to p and q' means that the greatest common divisor between d and fi is 1.

For the demonstration please refer to the original article and its related references.

If these conditions are met, then given a number M (< n) to be encrypted, Alice can encrypt it by computing:

C = (M exp e) mod n
Alice then sends C to Bob.
Once Bob receives C, he can decrypt it (because he knows the Private Key (d, n)) with:

D = (C exp d) mod n

D (the decrypted number) equals M (the original number).

As you can see, after having chosen the key components (n, d and e), encryption and decryption are conceptually easy (although consider that security relies on n being an extremely large number - say 200 digits).

Why is RSA secure

The algorithm is secure because to compute d, the Private Key, it's not sufficient to know 'n' and 'e' (which are included in the Public Key).
This is because d can be computed only knowing p and q and to know them, an attacker should retrieve them from n, i.e. it should 'factor n'.
It can be proved that 'factor n', when p and q are large, is so hard to be considered impossible (in a reasonable amount of... years).

A practical example
I'll use small values for p and q, just for the purpose of having simple computations that can be done with a normal calculator.
Again, remember that the security here relies on using very large prime numbers to compute n.

In fact, let's take the example from the original article:

p = 47
q = 59
d = 157
n = p * q = 2773
fi = (p - 1) * (q - 1) = 46 * 58 = 2668

With this values,
e = 17

So say Alice wants to encrypt the number 920 (associated through a dictionary to a combination of characters for example):

C = (M exp e) mod n = 948

Alice will then send 948 to Bob.

Bob will try to decrypt C with d:

D = (C exp d) mod n = 920

D = M, success!

How to compute d and e
The tricky bit, as you may have guessed, was to compute d and e.
Given p and q, and hence n, d was chosen from a set of candidates (the 'keyspaceì').
e was computed from d, p and q.

To experiment on this, first I wrote the algorithm to compute e given p, q and k:


def compute_bn(xn, x0, x1):
 an = -1
 bn = (xn - (an * x0)) / x1
 return bn

def compute_e(p, q, d):
 x0 = (p - 1) * (q - 1)
 x1 = d

 xnm2 = x0
 xnm1 = x1

 while True:
   xn = xnm2 % xnm1

   if (xn == 0):
     return compute_bn(xnm1, x0, x1)

  xnm2 = xnm1
  xnm1 = xn

 return 0

then I cycled through prime numbers in a certain range looking for values of d which gave an integer value for e:


#! /usr/bin/python

from rsa import *

def find_p_q_d_in_limit(starting_p, starting_q, limit):
 M = 920
 for p in range(starting_p, limit):
   if is_prime(p):
     for q in range(starting_q, limit):
       if is_prime(q):
          for d in range(starting_q + 1, limit):
            if (d > p and d > q and is_prime(d)):
              if (compute_and_check(p, q, d, M) == 0):
                print "p: ", p, ", q: ", q, ", d: ", d

find_p_q_d_in_limit(31, 37, 1000)

where compute_and_check(p, q, d, M) is:


def compute_and_check(p, q, d, M):
 n = p * q
 e = compute_e(p, q, d)
 Mexpe = M ** e
 C = Mexpe % n
 D = (C ** d) % n

 if M == D:
   print "p: ", p, ", q: ", q, ", d: ", d,
   return 0

 return 1

and is_prime(N):


def is_prime(N):
  i = 2
  while (i < N):
    if (N % i == 0):
      return False
    else:
      i += 1
  return True

With small values of p and q I've got a list of possible candidates for d, e.g.:


[...]
p:  47 , q:  59 , d:  157
p:  47 , q:  61 , d:  251
p:  47 , q:  79 , d:  97
p:  47 , q:  97 , d:  631
p:  47 , q:  97 , d:  773
p:  47 , q:  101 , d:  107
[...]

and then given p = 47 and q = 59 used d = 157 as in the original example.

The key of this process is the computation of e given d, p and q, which was done with these two functions:


def compute_bn(xn, x0, x1):
 an = -1
 bn = (xn - (an * x0)) / x1
 return bn

def compute_e(p, q, d):
 x0 = (p - 1) * (q - 1)
 x1 = d

 xnm2 = x0
 xnm1 = x1

 while True:

   xn = xnm2 % xnm1

   if (xn == 0):
     return compute_bn(xnm1, x0, x1)

  xnm2 = xnm1
  xnm1 = xn

 return 0

compute_e(p, q, d) gives the value of e using a variation of the Euclidean algorithm to compute the Greatest Common Divisor, as suggested by R, S and A in their article.

Encrypting
So at the end given you have a number to encrypt (M) and a public key (e, n) you can just call:

$ python encrypt.py 920 17 2773

encrypt.py


#! /usr/bin/python

 import sys
 from rsa import *

 M = int(sys.argv[1]);
 e = int(sys.argv[2]);
 n = int(sys.argv[3]);

 C = encrypt_M(M, e, n)

 print "C: ", C

with a very simple function (defined inside the module called rsa.py):


def encrypt_M(M, e, n):
 return pow(M, e, n)

(note here that I'm using a standard python function, pow(), which does all the 'M ** e mod n' computation for us)

Decrypting
When you have an encrypted number C, and the private key (d, n) you can just call:

$ python decrypt.py 948 157 2773

decrypt.py:


#! /usr/bin/python

 import sys
 from rsa import *

 C = int(sys.argv[1]);
 d = int(sys.argv[2]);
 n = int(sys.argv[3]);

 D = decrypt_C(C, d, n)

 print "D: ", D

with a very simple function (defined inside the module called rsa.py)


def decrypt_C(C, d, n):
 return pow(C, d, n)

Input is not sanity checked: you can/should add it depending on your needs.

Feedback is welcome!

Saturday, 5 March 2011

It's not that hard to manage expectations (with Perl)

Developers with a background in Ruby on Rails and PHP are familiar with the concepts of mocking objects and setting expectations on them.

The good news is that these powerful techniques for unit testing are available for Perl as well. Should I add you can find them on CPAN?

Before an example though, just a simple explanation about the topic.

Unit testing “is a method by which individual units of source code are tested to determine if they are fit for use” (Wikipedia). It’s a common practice to perform unit testing in isolation; in other words you focus testing on the source code, limiting as much as possible the interaction across modules or systems.

It’s almost always practically impossible to test a class without instantiating other classes on which it depends or interacts. What can be done is mocking objects: creating “empty objects” that emulate the external behaviour of real objects. They must be able to “fool” the class under test and allow the creation of an exhaustive set of tests around it.

Since the class under tests “expects” the other classes to do something, here comes the term expectation: the unit test expects that the class under test uses the mock object by calling a specific method, optionally in a specific order and optionally with specific arguments and return values.

An example with PHPUnit, where the class under test Person depends on a class Company, which is mocked:

$company = $this->getMock(‘Company’);
$person = new Person();
$person->setCompany($company);
$company->expects( $this->atLeastOnce() )->method(‘giveLaptop’);
$this->assertTrue($person->startFirstDay());

In this silly example, we are testing Person, and we want to verify that on the first day of work with a company, that person has a laptop assigned.
Person is actually instantiated, but Company, possibly a bigger and more complicated class, with other dependencies, is just mocked.
What we check is that inside Person::startFirstDay(), there’s at least one call to Company::giveLaptop().

As mentioned at the beginning of this article, expectations are available on Perl too, with the module Test::Expectation. The equivalent of the example before could be:

my $person = new Person();
my $company = Test::MockObject->new();

$person->setCompany($company);

it_is_a('Company');

it_should "give a laptop", sub {
Company->expects('giveLaptop');
is(1, $person->startFirstDay());
};

(Note that Test::Expectation uses internally Test::More with a plan, so if you’re using Test::Expectation AND Test::More you can’t set a plan with the latter, as perl will complain that there’s already a plan set by the former)

Unfortunately Test::Expectation is not available as a standard debian package, so if needed you may debianize it (just download, untar, dh-make-file and debuild. I wrote this article with some basic instructions).

Disclaimer:
- The code in this article can't work as is, i.e. it needs modules to be installed and configured, and more lines to include those modules.
- The code in this article has not been tested, and it doesn't come with any warranty
- I'm not implying that a company should provide each employee with a laptop during their first day