.. -*- mode: rst; fill-column: 78 -*-
.. ex: set sts=4 ts=4 sw=4 et tw=79:
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
  #
  #   See COPYING file distributed along with the PyMVPA package for the
  #   copyright and license terms.
  #
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###


.. _scenarios:

.. index:: analysis scenarios

********************
 Analysis Scenarios
********************

.. index:: searchlight, cross-validation, NIfTI

Searchlight
===========

The term Searchlight_ refers to an algorithm that runs a
scalar `DatasetMeasure` on all possible spheres of a certain size within a
dataset. The measure typically computed is a cross-validated transfer error
(see :ref:`CrossValidatedTransferError <cross-validation>`). The idea to use
a searchlight as a sensitivity analyzer stems from a paper by Kriegeskorte and
colleagues [1]_.

A searchlight analysis is can be easily performed. The following code snippet
shows a draft of a complete analysis.

  >>> from mvpa.datasets.maskeddataset import MaskedDataset
  >>> from mvpa.datasets.splitter import OddEvenSplitter
  >>> from mvpa.clfs.svm import LinearCSVMC
  >>> from mvpa.clfs.transerror import TransferError
  >>> from mvpa.algorithms.cvtranserror import CrossValidatedTransferError
  >>> from mvpa.measures.searchlight import Searchlight
  >>> from mvpa.misc.data_generators import normalFeatureDataset
  >>>
  >>> # overcomplicated way to generate an example dataset
  >>> ds = normalFeatureDataset(perlabel=10, nlabels=2, nchunks=2,
  ...                           nfeatures=10, nonbogus_features=[3, 7],
  ...                           snr=5.0)
  >>> dataset = MaskedDataset(samples=ds.samples, labels=ds.labels,
  ...                         chunks=ds.chunks)
  >>>
  >>> # setup measure to be computed in each sphere (cross-validated
  >>> # generalization error on odd/even splits)
  >>> cv = CrossValidatedTransferError(
  ...          TransferError(LinearCSVMC()),
  ...          OddEvenSplitter())
  >>>
  >>> # setup searchlight with 5 mm radius and measure configured above
  >>> sl = Searchlight(cv, radius=5)
  >>>
  >>> # run searchlight on dataset
  >>> sl_map = sl(dataset)

If this analysis is done on a fMRI dataset using `NiftiDataset` the resulting
searchlight map (`sl_map`) can be mapped back into the original dataspace
and viewed as a brain overlay. The :ref:`example section <example_searchlight>`
contains a typical application of this algorithm.

.. _Searchlight: api/mvpa.measures.searchlight.Searchlight-class.html

.. Mention the fact that it also is a special `SensitivityAnalyzer`

.. [1] Kriegeskorte, N., Goebel, R. & Bandettini, P. (2006).
       'Information-based functional brain mapping.' Proceedings of the
       National Academy of Sciences of the United States of America 103,
       3863-3868.


.. index:: statistical testing

Statistical Testing of classifier-based Analyses
================================================

It is often desirable to be able to make statements like *"Performance is
significantly above chance-level"*. However, as with other applications of
statistics in classifier-based analyses there is the problem that we do not
know the distribution of a variable like error or performance under the *H0*
hypothesis to assign the adored p-values, i.e. the probability of a result
given that there is no signal. Even worse, the chance-level or guess
probability of a classifier depends on the content of a validation dataset,
e.g. balanced or unbalanced number of samples per label and total number
of labels).

One approach to deal with this situation is to estimate the *NULL* distribution.
A generic way to do this are permutation tests (aka *Monte Carlo*). The *NULL*
distribution is estimated by computing some measure multiple times using
datasets with no relevant signal in them. These datasets are generated by
permuting the labels of all samples in the training dataset each time the
measure is computed, and therefore randomizing/removing any possible relevant
information.

Given the measures computed using the permuted datasets one can now determine
the probability of the empirical measure (i.e. the one computed from the
original training dataset) under the *no signal* condition. This is simply
the fraction of measures from the permutation runs that is larger or smaller
than the emprical (depending on whether on is looking at performances or
errors).

PyMVPA supports such permutations test for :ref:`transfer errors
<transfer_error>` and all :ref:`dataset measures <measure>`. In both cases
the object computing the measure or transfer error takes an optional
contructor argument `null_dist`. The value of this argument is an instance
of some Distribution_ estimator. If this is provided the respective
TransferError_ or DatasetMeasure_ instance will automatically use it to
estimate the *NULL* distribution and store the associated *p*-values in a
state variable named `null_prob`.


.. _DatasetMeasure: api/mvpa.measures.base.DatasetMeasure-class.html
.. _TransferError: api/mvpa.clfs.transerror.TransferError-class.html
.. _Distribution: api/mvpa.clfs.stats.Distribution-class.html
.. _MCNullDist: api/mvpa.clfs.stats.MCNullDist-class.html

  >>> # lazy import
  >>> from mvpa.suite import *
  >>>
  >>> # some example data with signal
  >>> train = normalFeatureDataset(perlabel=50, nlabels=2, nfeatures=3,
  ...                              nonbogus_features=[0,1], snr=3, nchunks=1)
  >>>
  >>> # define class to estimate NULL distribution of errors
  >>> # use left tail of the distribution since we use MeanMatchFx as error
  >>> # function and lower is better
  >>> # in a real analysis the number of permutations should be MUCH larger
  >>> terr = TransferError(clf=SMLR(),
  ...                      null_dist=MCNullDist(permutations=10,
  ...                                           tail='left'))
  >>>
  >>> # compute classifier error on training dataset (should be low :)
  >>> err = terr(train, train)
  >>> err < 0.4
  True
  >>> # check that the result is highly significant since we know that the
  >>> # data has signal
  >>> terr.null_prob < 0.01
  True
