TODO

Not everything here makes sense or will necessarily get done.

Short term:

* Fix autoconf macro so that it doesn't depend on locate.  Or maybe
   use BuildSystem.

* Change all occurences of archive to repository.

* Make the default log in _arx/log?

* have a history of which URL's you merged from in the project tree?
   Not really needed with hashed revisions, because you just sync repos?
* archive-cache mirroring is not robust.
* Have a way to make annotate start from a a particular revision.

* Make inventory non-recursive by default?  What does "cvs status" do?
* Allow use of just revision numbers ",23" when inside a project
   tree.  We have to handle all of the --dir options.
* Get rid of default archive names.  Get default archive and branch
  from the project tree.
* Use ARCHROOT or local tree instead of my-default-archive.
* Use revision ranges in a lot more places.  Something like
   foo.1.2,3-6, foo.1.2,-6, and foo.1.2,3-
   Use it in log, library, replay, get-patch, archive-cache,
   tree-cache, mirror, sig.

* figure out why commit emails are not working

* Better annotate:  Needed if we want to support weave merge.  Also
   just nice to have.

   Want to get the contributions from both sides of a merge.  e.g we
   have

   A1
   |
   A2
   | \
   A3 B1
   |  |
   A4 B2
   | /
   A5


   From A5, we annotate through both lines (A and B).  We will get
   different results because some things are changes in A and others
   changed in B.  They will both have some things in A5 which occur
   because of the merge.  We can then compare the annotations side by
   side.  If it is A5 and anything else, then it is not A5.  If a line
   has both A3 and B1, then pick one (A?).  This might be caused by
   cherry-picking, so we need a way to figure out how to annotate
   cherry-picking.

   First thing we have to figure out is if a patch actually affected
   the file.  For this to work, we really have to have the inventory
   id's in the log, because the names can change.  Otherwise, we would
   have to actually download the whole patch.  Maybe we can have a
   single metadata file instead of separate log and
   (orig|mod)-(file|dir) files.

   Given that, we can see if a particular patch hunk works against the
   source file.  If it does, then attribute those lines to that patch.
   The problem arises if we have a series of patches.  How do we
   decide on an ordering?  Time?  This is starting to sound like where
   you get into darcs' problems as you try to rearrange patch
   orderings to get a conflict-free result.  Given N patches, there
   are N! different ways to arrange the ordering.  We have some
   partial ordering, but I could imagine cases where someone
   deliberately applied a patch out of order because it would not
   conflict.  Maybe the proper response in that case is "You're out of
   luck".  Or maybe make it a command-line argument.  Bah.


* Write our own version of patch which does not complain about patches
   already applied.  Then we have most of idempotent merge.

* hooks for ediff etc.

* libcurl + libssh, or python with paramiko?

* CIA script

* Combine patch logs in the tree, so that we have a single file at
  patch-logs/branch/,logs.  Either serialize a vector of strings, or
  use a real database (sqlite?).  Need to modify mkpatch/dopatch so
  that it looks at the single file and not at the file directories,
  but it still reads/writes patches in those directories.

  This only saves space in the checked out tree, not in the archive.
  I am not sure how useful that really is, since everything can be hard
  linked.

  If we just put all of the patch logs into a single file, then
   patches would truly be invertible.

* Combine the patch-logs within a project tree into a single zip file
   for each branch

* Get rid of patch logs in the tree, and only record the names of the
   patches in a single file.  It would be the same as running "ls" on
   the archive, with each node showing its parent.  It would also get
   rid of most of the need for the annoying --remote option.

Long term:

* spawn different threads to download patches in parallel.  Use
   boost::thread or glib's thread facilities.  Probably requires a
   better tempname facility because otherwise temp files will start
   stepping on each other.

   Or just use the asynchronous download facilities in gnome-vfs et al.

* Use timestamps, but check before using by creating and touching a
   file twice.  If the file timestamp changes each time, then use
   timestamps.  Otherwise, go back to diffing the whole tree.

* More robust tempname facility

* L10N

* get arch-pqm to accept patches mailed directly, have
   something in ArX which emails patches.  Add a "send" command.

* Get cscvs or maybe tailor.py working with ArX

* ViewARCH or ArchZOOM

* VCG graph output for ancestry

* Have a status command

* modify xtla for emacs mode?

Archive breakers:

* move _arx to .arx and make it hidden on windows?
* Get rid of the "Summary:" header, and just use the body.
* Get rid of the "Standard-Date:" header

* Change the comma "," prefix to a period "." postfix so that ftp
   upload places with restrictive policies are ok.

   Or maybe make branches have periods on the end, and everything else
   that used to be prefixed with a comma now have nothing.

   From comcast website: 

   NOTE: File names must consist of characters from "a-z", "A-Z",
   "0-9", '_' (underscore), '.' (period), '-' (hyphen). No other
   characters (including spaces) can be included in the file
   name. File names must not start with '.' or '-'.

* skip-deltas? http://svn.collab.net/repos/svn/trunk/notes/skip-deltas
  It would make it longer to commit.  We might also need to make a
  delta combiner.  But O(log(n)) to get any revision is tempting.
  That might make it unnecessary to have archive caches.  Hmm.  Wait
  until the asynchronous downloading is done and go from there.

  Also, I should write a delta-combiner first.  That would be useful
   for other things, and would make skip-deltas pretty easy to
   implement.

  Delta-combiners require the latest continuation revision to be ,0.
   We can enforce that once we have hashes for revisions, because
   there is no longer a need to branch to a higher number.

  Some numbers:  For arx.2.2 revisions 0-161
  Size of cached revision ,112:   828 K
  Size of all patches         :  6700 K
  Size of all skip deltas     : 10280 K

   So there is a 30% space penalty.  This is for revisions with lots
   of PDF's.

  Benefits:
   log(N) access to any revision

   no need for cached revisions except for the beginning of a branch.
   Make that automatic?  Because a new branch means that you are
   splitting off and don't intend to merge.

  Drawbacks:
   commits becomes more complicated (have to get an older revision and
   diff against that, unless we just use a delta-combiner)

   Getting a patch for a single revision is no longer simple.  This
   makes annotate and replay harder.

   slight space increase.

  Merge might be faster (though maybe not, because we can generally
   patch from the current tree to the merge revisions with hashed
   revision names).


   We more or less have to have a delta combiner.  Otherwise, during
   commit, we have to create another tree.  This will be slow,
   especially for no-edit trees.

   We also have to have a delta splitter.  Otherwise, annotate will
   take too long.

   To make a delta combiner/splitter, we have to parse gnu patches
   (possible), but we also have to deal with xdelta patches.  We have
   to make sure that the checksums work out.  The big problem is when
   files alternate between binary and text.  Then, in the combined
   patch, we have to do something like .xdelta1, .patch2, .xdelta3,
   etc.  Annoying, but possible.


* Use hashes to uniquify revisions.  We already have the hash, so you
  can append the hash to the revision to uniquely specify a revision.
  As in

        foo.bar,23,4239874ab7f

  We only use the first eight hex characters, because that is all that
  is needed to guard against accidental collisions.  In the archive,
  we also append the hash of the previous revision.  That will allow
  us to determine the entire graph with a single "ls".

  Don't use UUID's, because that doesn't have the self-verifying
  properties that hashes do.

  When committing, require --force if creating divergence?

* need a copy operation

   Call it "cp".  Make a special log header: Copied-files with from
   and to.  In the patch, it looks almost exactly like a new file.
   There would just be an extra file that is a patch back to the
   original file that was copied from.  That makes annotate pretty
   easy.

   For the "cp", add a new type to the ++changes file which is a
   copy.  It would be from the current location of the source file, so
   you have to figure out where it originally came from.

Maybe:

* Get AIX working

* Use new sha-2 code?

* Binaries for Debian, Red Hat, Suse, Mandrake, AIX, HP-UX, OS X,
   Irix: relocatable install

* Write a post-commit hook that does smarter archive caching.
   Something like 'ssh -f foo@bar "(arx archive-cache -a "$ARX_REVISION"
   && arx archive-cache -d "$ARX_PREVIOUS_REVISION")'

* install a signal handler: We have to have a global list of temp
   directories and archive locks.  Then they could be cleaned up there.
   The signal handler in svn is really annoying.

* update to latest boost, maybe use it as is?  Have to get copy and
   serialization working.

* have a way of sealing and unsealing an archive.  Not like tla which
   has versionfix-N stuff.  It just prevents you from committing.
   Maybe a simple chmod on the archive/category/branch/ directory?
   Can also seal entire archives and categories.

* run gcov to figure out what is and is not covered by the test
   suite.  Already done, but difficult to figure out.

* Have a command, apply ordinary diff, that first runs "edit" on all
   of the files to be patched, and then patches.  Or maybe just tell
   people to run "commit *"

* trackdown, which allows you to run a function on every revision until it
   returns true.  Something like

   for i in `arx log --header Revision`;
   do arx get $i temp;
   cd temp
   if(exec function)==true
     print $i
     exit
   cd ..
   rm -rf temp
   done

   Maybe just document it.

* make it possible to force xdelta diffing on a file with a property
   arx:binary?  Or use a mime-type?
* Archway? Written in perl, so it ain't gonna be me.
* default regexes for _arx/ignore with "param ignore-default"

* Make ignore regexes work on the entire path?

* reorganize the archive so that we have ,0-100 , ,101-200 , ,201-300
   , etc. for the patches, and then each patch is within those
   directories.

   We could use a merkle tree, but then we could end up with a broken
   database.  Hmm.

   Currently, we use 20K for each revision (4K per file, patch & sig,
   sha & sig, and log).  This is for something that is really only
   1.8K big.  Waste, waste, waste.  Need to get rid of the multiple
   files, but then we can't just add sigs and hashes.  Gah.  Makes me
   want to use a database.  But Monotone's usage was higher.  Wierd.
   Also, with skip deltas, the deltas themselves get larger, making it
   less useful to combine the files.  Hmm.

   With hashes for revisions, we don't have to sign the hash anymore,
   so that gets rid of two files.  It would be nice to get rid of the
   log.  Hmm.

* Have a way to mark whether an archive is remote or not (NFS over
   thin pipes)
* Use the linux crypto API if it exists
* Pure-merge, where it applies each patch in turn and then commits
   with the original message.  Make it an option to "merge", or maybe
   "replay", since replay is already doing things one at a time.

   Or maybe not worry about it, since with revision hashes, they
   should have been using the same branch?

DONE

* make-archive
* add
* delete
* inventory
* move
* tagging-method
* tree-lint
* dopatch
* mkpatch
* naming-convention
* tree-version
* archives
* my-browser
* my-default-archive
* my-editor
* my-guidiff
* my-id
* register-archive
* whereis-archive
* undo/redo
* hooks
* make-log
* file-diffs, file-undo
* log-ls
* logs
* cat-log
* changelog
* patch-report
* what-changed
* SWIG
* Integrate tla-pqm
* networking
* categories, branches, versions, revisions
* get-patch
* get
* archive-cache-revision etc. renames to cache-revision (-d for uncache)
* pristines, delete-pristine, add-pristine
* my-revision-library
* library-find, remove, archives, categories, branches, versions,
   revisions.  Get rid of log, file.  Replace it all with
   library-browse.
* browse, to replace categories, branches, versions, revisions and *-readme.
* commit
* darcs critique
* create-version -> init-tree
* delete-category/branch
* create-branch -> fork
* tag
* replay
* update
* push-mirror
* Replace create-version with init-tree when versions are removed.
* break-lock
* star-merge
* Use g_spawn instead of system()
* make-sync-tree -> history
* explicit-default, join-branch lasts until the C++ rewrite is done
* arch-pqm
* build/update/replay-config
* Something like cat-library-file and library-find.  Maybe file-orig --uri?
* update-distributions -> make-dist
* derive arx_error virtually from std::exception
* change parse_package_name to automatically get the archive name
* change =tagging-method to tagging-method
* Change regexes to only four things: source, ignored, temp,
   unrecognized.  Everything else is unrecognized, and tree lint will
   complain.  Source is the current regex, ignored almost everything
   else, and temp is the junk syntax.
* Make initial patch to base-0 not include any of the directories in {arch}.
* change exclude regex to control
* Save the tag, not the type of tag + the tag
* make naming inventory {arch} like everyone else
* Change all arch to arx (arch-params, {arch})
* Remove internal_deprecated
* get rid of (orig|mod)-only-metadata in the patch.  This is just
   removing the save_directory_permissions call in make_patch.
* Change smash_non_graphical to url encoding
* when creating logs with commit, don't change the log contents unless
   you have to.  So Standard-Date etc. will be preserved, though not
   Revision.
* Make dopatch non-destructive by copying files, not just moving them.
* replace .arch-ids with a single checksum file
* speed up inventory by using an algorithm for names_tagging that
   doesn't require a directory traversal to find the root for every tag.
* Change file-diffs to file-diff
* Change all references from inventory tag to inventory id.
* get rid of names and internal inventory methods, and change
   {arx}/tagging-method to {arx}/ignore.  There will be a global
   ignore regex that matches current junk files (,,*).  It might also
   have *.a, *.o, and *.so.  That can be extended with appropriate
   regexes in the ignore file.  New ignore command.
* Change the ++foo files to ,,foo
* move patch-logs to patch-log/archive/branch/revision.
* get rid of locked/unlocked pristine, and change ++pristine-trees to
   ++cached
* Change =README, =meta-info, etc to ,README etc.
* make the default regexes not be so complicated, with all of the
   CVS, RCS, etc. stuff.  Make it empty. 
* get rid of short_revision in most places.
* change the syntax from a/c--b--v--r to a/c.b.v,r
* Change archive and library layout to cat/branch/patch
* Generalize delete-branch, remove delete-category.
* Allow any branching depth: first remove branch, fix all the errors,
   and change version to branch.
* Make history, list_tree_cached_revisions, list_tree_patch_logs,
   browse, library_browse, and mirror
   all use the same recursive browsing mechanism.
* Have invoke-hook make branches or revisions, not categories or versions.
* Make patch-number handle ",0" revisions.
* Clean up the short options
* Fix a problem with relative paths with the --paths option (--paths
   foo doesn't work when in a subdirectory bar, but --paths bar/foo
   from the parent does work)
* tests for (library-)browse and tree-cache.
* configure check for python
* Make everything accept -H
* Make the --paths option require arguments
* I18N: use a serializer from boost, not this -> (url encode
   everything that goes into files, including filenames.  Then url
   decode when we print things out.  But do we want to url-encode
   Summary: and the body in logs?)
* Make init-tree automatically add all files in the current directory.
* Change {arx} to _arx
* Instead of versioning permissions directly (which you don't usually
   want to do), allow arbitrary properties which can do things on
   "get" and when patching.  Do this by adding a map<string,string> to
   file_attributes (which should be path_attributes) and getting rid
   of permissions.
* Use bzip instead of gzip (NOT: bzip is slower.  For unpacking lots of small
   files (package-framework patches 1-532: gzip~3.5 s, bzip ~5.8 s)
* Get rid of ".original" and ".modified" in patches, and just use
   ".orig" and ".mod"
* Make most boost::filesystem exceptions caught in main.
* update to latest scons
* delete-revision, which deletes the contents of a revision, but not
   the directory.  Then you can tag off of an old revision.
* Change move to mv and delete to rm.
* get rid of make-log and init-tree's call to make-log.  The
   interactive stuff can cause problems if you aren't using a
   graphical editor.
* Put dists and arx together.  Only after replacing .arch-ids.
* dump/restore
* option to do_patch to speed it up for exact patching
* integrate xdelta
* Consolidate my-* commands into a single my-prog command
* Combine log, changelog, and revisions into one command and have
   --formatted and --remote options
* consolidate library-browse and library-revisions
* get rid of --paths arguments: undo, mkpatch, commit, diff
* automatically get revisions for diffs etc.
* arx edit
   1) in fill_path_list, check whether _arx/++edit exists.  If so,
   read from it and _arx/++changes and add all of the items to the
   path list.  Need to match up the user-supplied path list.

   2) get --no-edit makes an empty _arx/++edit and chmod's all of the
   files in the inventory

   3) mv, rm, add, and property don't need to do anything special,
   since they are already listed in ++changes.

   4) The _arx/++edit list only gets cleaned up and everything chmod'd
   back during full commits, not during a partial commit.  Need to
   make sure that ++changes isn't cleaned up in partial commits
   either.

   5) arx edit adds an entry into the ++edit file with inventory id
   and un-chmod's the file.  When fill_path_list reads ++edit, it
   matches the inventory id against any moves or deletes to get the
   current name.
* when getting a revision, replace the comma with a period.
* add --no-edit and --no-pristine option to config --get, fork,
   merge --new-tree.
* Make a way for get_revision to use hard links.  Have "get --no-edit"
   use that, and have "arx edit" break the hard link.
* set ARXTREEROOT for hooks
* improve the emacs mode for --no-edit trees.
* Make add_path take a list, so that we don't have to write a file
   10,000 times.
* Get rid of _arx/.arx-project-tree
* Use gnome_vfs_xfer_uri_list when getting a whole lot of patches:
   Tried (arx.2.1,84), it is actually slower for latent networks,
   about the same for local.  Much more complicated logic.
* Make it so that figuring out the continuation revisions doesn't take
   forever.  Something like O(number of continuations) rather than
   O(number of revisions).
   Add a header when commiting "Last-continuation" that holds the last
   continuation before the current revision (empty if Continuation
   header is present).  Then we can just read that header to find out
   what the last continuation is.  That should get rid of any O(number
   of revisions) behavior.
* With no-edit, if mv then rm, diff will fail.  To fix that, when
   reading the files in fill_path_list, we need to save the initial
   state of any moved files (including if they were just added).
* Also, if we rm and then diff against a version that never had that
   file, diff fails because that file is not in any manifest.  So we
   really want to just skip it.  So we skip if we are doing a straight
   diff with no-edit.
* Better-SCM site, dmoz, zooko.com, wikipedia
* Add a --delete-removed option to replay and merge
* Make a temp_file which deletes itself in the destructor.
* Use __DATE__ and __TIME__ in the version string.
* signatures
   1) Add a sign command with --patch, --revision options to sign only
   the patch or revision, a --delete option to remove signatures,
   a --replace option to replace the current signature, and a --verify
   option to verify the signature
   2) Add a --signature option to make-archive.  It puts the public
   keys in ,meta-info/sigs/NAME.  It can also be used to add public
   keys later.  Or maybe just an --archive option to sig.
   3) register-archive automatically downloads all of the possible
   signatures in ,meta-info/sigs/NAME.  Stores the location and list
   of valid signatures in .arx/locations/ARCHIVE
   4) Make verify_manifest_checksum also check signature matches one
   of the signatures listed for that archive.
   5) When committing, if an archive is signed, then require a
   signature before committing.  Check signature against list of
   signatures for that archive.
   6) Default is off.
   7) Store signatures in archive in patch.sig and revision.sig
   8) Have a param "gpg" that can be used to override what to use for
   the gpg program (e.g. agpg).
   9) Have a param "sign".  If true, then every archive should be signed.
* check archive public keys to make sure signature is listed before
   signing.
* Make sig recursive
* combine archives and register-archive into archives (-a -d).
* make-dist doesn't stop if it can't find a config file.
* Ask for confirmation for delete-branch and delete-revision unless
   --force.  Print out "arx browse" or "arx log --all" respectively
   before asking.  With delete-revision, delete any archive caches.
   Also need to make sure that deleted revisions are not "gotten".
   Perhaps by treating it as a continuation?
* Get plain http working better with something like http-blows and
   automatically updating .listing files, or maybe just use a cgi
   script, or maybe a post-commit hook and "arx fix-listing".
* Get a better error message when trying to mirror a revision that
   does not exist in the master.
* Get OS X and Windows working
* Make archive adding atomic
* make patch-report work on archives and tar.gz files, so you don't
   have to get-patch, untar, and then run patch-report.
* rename init-tree to init
* remove make-dist, add export, and have export print out the target
   directory.  Then we can tar manually.
* Have merge and replay do in-place operations by default.  Use the
   --new-dir option or make it just the last argument.
* Make configs be stored separately in the archive.  Then you can do a
   simple "get", "update", "diff", etc. to the whole tree.  That would
   get rid of the --config options, because you can just do the
   operations directly on the configs.

   Commands to fix: tag, config, make-dist

   Fixed commands: commit, fork, file-diff, file-orig, file-undo,
   tree-cache, get-patch, get, missing, archive-cache, sig, merge,
   replay, library
   
   usage: tag revision head (tail_dir tail)...
          tag revision -f FILE

   Tags are just a special case of configs.  So a syntax
     tag revision head (tail_dir tail)...
   would specify exact revisions
     tag branch --update would just update those revisions

   Store the projects and directories in the header "Tags".  It would be
   a list of revisions and directories.  We would also need a list of
   revisions and checksums, so that the log file could be
   checksummed.  Then we would check the checksum when diff'ing,
   get'ing, etc.

   This would make tagging very, very cheap, like it used to be.

   tags only store the log.  The log is the payload, so it will have
   to be checksummed and signed.

   You can't commit after a tag, or fork from a tag.  merge,
   missing, replay, get, make-dist, archive-cache, work
   recursively.

   diff, file-diff, file-orig, file-undo do not work with tags.

   sig needs to be aware that tags do not have a separate payload.

   Mirror doesn't need anything
* Add a --recursive option to diff
* Rename "browse" to "ls"?  No.  CVS stores everything in the ,v
   files, so "ls" actually maps well to the storage.  That is not the
   case for "browse".
* Get a good copy of the BGL
* diff should be able to be faster.  It spends a lot on IO, but it
   should be able to read the whole manifest in one second.  Maybe
   read it into a string and then do all of the manipulations.
   Actually, it spends most of its time stat'ing the project, checking
   for tree-lint problems.
* make update-listing take --add and --delete.
* have "arx help" work.
* something like blame, file-history?  Search the patch logs to find
   out which patches modify the files, including renames and deletes.
   Then manually scan the patches and put the prefix in front of the
   appropriate lines.
* Have an option to log that only prints out patches that modified a file.
* Have an option to log that prints out which patches modified which line
* get python autodetect from http://autoconf-archive.cryp.to/ax_python.html
* Fix bug when you delete a file with no-edit trees and do a partial
   commit.  The file is no longer in the manifest, but still in the
   ++edit file.
* Put note in docs about sftp needing auto-login.
* make dopatch not barf when adding the same file with the same id and
   contents twice.  If different contents, then make orig and mod files.
* If have a revision library, use that exclusively.  Need a way to set
   policy on the revision library, so that some branches have
   revisions deleted as new ones are created, while others keep them
   around.  The default is to delete.  library-policy?
   That way, the linking of branches makes branching even cheaper.

   Or maybe get rid of revision libraries?
* When merging, merging a new file into a directory that had moved does
   not put the new file into the new directory, but uses the old
   directory instead.

   Just make sure that all parent directory id's are included in
   the patch?  Then we don't have to change the patch format.
* Make merge error out when there is more than one ancestor, because
   that could be a criss-cross merge.  In that case, suggest 2-way
   merge?
* add a --header option to commit, so that you can set creator and date.
* Use -v -v and -q -q instead of --silent, --quiet, --report, --verbose
* Fix merge speed in the default update case so that it doesn't download and
   patch twice
* Add conflict markers so that you can't commit unless you run resolve
* Make an update command that just runs merge --update
* 3-way merge: just use diff3 + a way to deal with conflicting moves.


