Monday 13 February 2012

A Practical Approach to the RSA Algorithm

So you're curious about the RSA algorithm, you like Internet technologies and mathematics?
This is a short article on how the RSA Algorithm works, with practical examples.

Preface and Disclaimer
Obviously the article itself and the code here presented are suitable for a learning activity only and in case you use them, you do it at your own responsibility.

I based this on the 1977's article from Rivest, Shamir and Adleman, (from which initial letters the algorithm takes the name), and ran the code on a Mac with python 2.6.1.

The code presented here is also not optimized. The reader is invited to suggest better implementations (in languages other than python as well).


Mathematics concepts you need to know
- What a 'prime number' is. (A integer number which can be divided evenly only by 1 and itself. e.g. 1, 2, 3, 5, 7, 11, ..., 47, ...)
- What the 'modulus' (here referred as 'mod') operation is. ('A mod B' is the reminder of A divided B. e.g. 7 mod 3 = 1)
- What 'factoring a number' means. (Finding the list of prime numbers that multiplied give that number. e.g. Factoring 12 leads to 2, 2, 3, because 2 * 2 * 3 = 12)

Usage of RSA
This algorithm is used to define a cryptographic system which uses a Public Key to encrypt messages, and a Private Key to decrypt them.

If Alice wants to send encrypted data to Bob, Alice encrypts the data with Bob's Public Key.
Only Bob, with his Private Key, can decrypt the data.

Security is the core of this algorithm, for which, if the keys are properly chosen, it's not possible, or so computationally hard to become impractical, to decrypt the data unless you have Bob's private key.

For this reason, the Public Key can be exchanged from Bob to Alice through an insecure transport, e.g. public Internet.


The mathematics behind it
The Public Key is a pair of integer numbers: e and n. (You can associate 'e' with 'e'ncryption).
The Private Key is a pair of integer numbers: d and n, (You can associate 'd' with 'd'ecryption).

'n' is computed as the product of two large prime numbers, p and q.
'd' is another prime number, bigger than p and q and 'relatively prime to p and q', for which exists an integer 'e', with this property:

e * d = 1 (mod fi)

where

fi = (p - 1) * (q - 1)

that is, the product of d and e, mod fi, equals 1, and said in other terms, 'd is the multiplicative inverse of e, modulus fi'.

'd is relatively prime to p and q' means that the greatest common divisor between d and fi is 1.

For the demonstration please refer to the original article and its related references.

If these conditions are met, then given a number M (< n) to be encrypted, Alice can encrypt it by computing:

C = (M exp e) mod n
Alice then sends C to Bob.
Once Bob receives C, he can decrypt it (because he knows the Private Key (d, n)) with:

D = (C exp d) mod n

D (the decrypted number) equals M (the original number).

As you can see, after having chosen the key components (n, d and e), encryption and decryption are conceptually easy (although consider that security relies on n being an extremely large number - say 200 digits).


Why is RSA secure

The algorithm is secure because to compute d, the Private Key, it's not sufficient to know 'n' and 'e' (which are included in the Public Key).
This is because d can be computed only knowing p and q and to know them, an attacker should retrieve them from n, i.e. it should 'factor n'.
It can be proved that 'factor n', when p and q are large, is so hard to be considered impossible (in a reasonable amount of... years).

A practical example
I'll use small values for p and q, just for the purpose of having simple computations that can be done with a normal calculator.
Again, remember that the security here relies on using very large prime numbers to compute n.

In fact, let's take the example from the original article:

p = 47
q = 59
d = 157
n = p * q = 2773
fi = (p - 1) * (q - 1) = 46 * 58 = 2668

With this values,
e = 17

So say Alice wants to encrypt the number 920 (associated through a dictionary to a combination of characters for example):

C = (M exp e) mod n = 948

Alice will then send 948 to Bob.

Bob will try to decrypt C with d:

D = (C exp d) mod n = 920

D = M, success!


How to compute d and e
The tricky bit, as you may have guessed, was to compute d and e.
Given p and q, and hence n, d was chosen from a set of candidates (the 'keyspaceƬ').
e was computed from d, p and q.

To experiment on this, first I wrote the algorithm to compute e given p, q and k:


def compute_bn(xn, x0, x1):
 an = -1
 bn = (xn - (an * x0)) / x1
 return bn

def compute_e(p, q, d):
 x0 = (p - 1) * (q - 1)
 x1 = d

 xnm2 = x0
 xnm1 = x1

 while True:
   xn = xnm2 % xnm1

   if (xn == 0):
     return compute_bn(xnm1, x0, x1)

  xnm2 = xnm1
  xnm1 = xn

 return 0



then I cycled through prime numbers in a certain range looking for values of d which gave an integer value for e:


#! /usr/bin/python

from rsa import *

def find_p_q_d_in_limit(starting_p, starting_q, limit):
 M = 920
 for p in range(starting_p, limit):
   if is_prime(p):
     for q in range(starting_q, limit):
       if is_prime(q):
          for d in range(starting_q + 1, limit):
            if (d > p and d > q and is_prime(d)):
              if (compute_and_check(p, q, d, M) == 0):
                print "p: ", p, ", q: ", q, ", d: ", d

find_p_q_d_in_limit(31, 37, 1000)


where compute_and_check(p, q, d, M) is:


def compute_and_check(p, q, d, M):
 n = p * q
 e = compute_e(p, q, d)
 Mexpe = M ** e
 C = Mexpe % n
 D = (C ** d) % n

 if M == D:
   print "p: ", p, ", q: ", q, ", d: ", d,
   return 0

 return 1



and is_prime(N):


def is_prime(N):
  i = 2
  while (i < N):
    if (N % i == 0):
      return False
    else:
      i += 1
  return True 



With small values of p and q I've got a list of possible candidates for d, e.g.:

[...]
p:  47 , q:  59 , d:  157
p:  47 , q:  61 , d:  251
p:  47 , q:  79 , d:  97
p:  47 , q:  97 , d:  631
p:  47 , q:  97 , d:  773
p:  47 , q:  101 , d:  107
[...]

and then given p = 47 and q = 59 used d = 157 as in the original example.

The key of this process is the computation of e given d, p and q, which was done with these two functions:


def compute_bn(xn, x0, x1):
 an = -1
 bn = (xn - (an * x0)) / x1
 return bn

def compute_e(p, q, d):
 x0 = (p - 1) * (q - 1)
 x1 = d

 xnm2 = x0
 xnm1 = x1

 while True:

   xn = xnm2 % xnm1

   if (xn == 0):
     return compute_bn(xnm1, x0, x1)

  xnm2 = xnm1
  xnm1 = xn

 return 0



compute_e(p, q, d) gives the value of e using a variation of the Euclidean algorithm to compute the Greatest Common Divisor, as suggested by R, S and A in their article.

Encrypting
So at the end given you have a number to encrypt (M) and a public key (e, n) you can just call:

$ python encrypt.py 920 17 2773

encrypt.py


#! /usr/bin/python

 import sys
 from rsa import *

 M = int(sys.argv[1]);
 e = int(sys.argv[2]);
 n = int(sys.argv[3]);

 C = encrypt_M(M, e, n)

 print "C: ", C


with a very simple function (defined inside the module called rsa.py):


def encrypt_M(M, e, n):
 return pow(M, e, n)


(note here that I'm using a standard python function, pow(), which does all the 'M ** e mod n' computation for us)

Decrypting
When you have an encrypted number C, and the private key (d, n) you can just call:

$ python decrypt.py 948 157 2773

decrypt.py:

#! /usr/bin/python

 import sys
 from rsa import *

 C = int(sys.argv[1]);
 d = int(sys.argv[2]);
 n = int(sys.argv[3]);

 D = decrypt_C(C, d, n)

 print "D: ", D


with a very simple function (defined inside the module called rsa.py)


def decrypt_C(C, d, n):
 return pow(C, d, n)


Input is not sanity checked: you can/should add it depending on your needs.

Feedback is welcome!

No comments:

Post a Comment

Wireshark setting to interpret UDP as RTP automatically

 Before I forget again, a Wireshark setting that can help saving time by trying to interpret any UDP as RTP, if possible: Analyze --> Ena...