MUST DO
-------

* add a -z option to the master to enable compression of commands and results
* get rid of Pyro, use MPI 2 instead?
* add --load-probe option to pause/resume jobs depending on load
  give a signal number as parameter, so that people can say their software
  to checkpoint
* add data node failure management
  (reschedule jobs not yet done when all commands were sent once, remove
  commands from the TODO list only once we received their output)
  this should get rid of the laggers problem and computing node failure, nice!
* add -[de]mux, then introduce data prefetching when this is done to increase
  data transfer overlap with computation

* introduce a lock to prevent multiple downloads at the same time on same dm
* the get command and its dirty lock can be done safer if we use an object
  provided by the getter, then in the same layer we can do everything
* allow to compress on put, almost same system than using checksums

* find a faster way to wait for the daemon to be started?
* start processing results before workers are finished
  do we really need locks in worker threads?
* use the same logger than DataManager.py in parallel.py instead of print
  use a shorter log format for time also
* main became a pretty big function, cut it into sub-functions so we
  can read it on one screen

NICE TO HAVE
------------
* add a distributed test and also another but parallel and distributed
  I just did one by hand, it is not so difficult using ssh and commands
  that just sleep some time then echo something else in order to mimic
  they are CPU (but not data) intensive
* think about the security of the client-server model:
  - a client shouldn't accept to execute commands from an untrusted server
    (commands could be any Unix command, including rm)
  - server shouldn't accept to give commands to untrusted clients
    (this would deplete the commands list, probably without doing the
     corresponding jobs, and this would expose the commands)
  - a client for which command execution result in errors should not be able
    to deplete the commands list (at least, we should detect they were not
    performed and reschedule them later on. At most, we should not send him
    commands too fast)
  Here is an idea for a safe communication protocol using asymmetric
  encryption:
    1) The server compress | encrypt | sign commands prior to sending them
       to workers.
       Compression could be a real time one like LZOP. This would decrease
       commands size and increase encryption quality.
    2) The workers verify signature | decrypt | decompress commands.
       If any of the above fails, it is logged and corresponding command is
       ignored.
    3) Workers can encrypt their results like the server does for commands:
       (compress | encrypt | sign).
  Here is another encryption idea for paranoid users ("one time pad" style):
    1) The server has a different random data source for each client
       (a cryptographic random number generator initialized differently for
        each client should be enough for non paranoid users)
    2) Each client also has the same random data source than the one the
       server is using to communicate with it
    3) Encryption: compress | one time pad [| sign ?]
    4) Decryption: [ verif sign? |] unpad | decompress
    5) random data must never be used twice (forward only in the random pool)
* profile with the python profiler
* add a code coverage test, python is not compiled and pychecker is too
  light at checking things
* generate tests to verify each option have an effect when used on the CLI

JUST DREAMING
-------------
* The client could be at the same time a server for other clients in order to
  scale by using hierarchical layers of servers instead of having only one
  server managing too many clients (Russian doll/fractal architecture).
  This would be only useful in a multi-cluster environment, and such bridge
  between cluster servers would be easily a bottleneck for data transfer.
  This kind of "special client" should request several jobs at a time, and
  send several results at a time (because he is the proxy for maybe an entire
  cluster, not just a "standard client").
* a little bit related with previous idea, if nodes have same speed
  and jobs require same amount of computations, nodes could request
  several jobs at the same time and do data prefetching for the next job
  while computing another
