ift issues

ift issues https://gitlab.mpcdf.mpg.de/groups/ift/-/issues 2017-07-06T05:50:22Z https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/16 Add out-array parameter to numerical d2o operations. 2017-07-06T05:50:22Z Theo Steininger

Add out-array parameter to numerical d2o operations.

Numpy supports to specify an out array in order to avoid memory reallocation. a = np.array([1,2,3,4]) b = np.array([5,6,7,8]) # slow: a = a + b # fast: np.add(a,b,out=a) Numpy supports to specify an out array in order to avoid memory reallocation. a = np.array([1,2,3,4]) b = np.array([5,6,7,8]) # slow: a = a + b # fast: np.add(a,b,out=a) Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/15 d2o: Contraction functions rely on non-degeneracy of distribution strategy 2017-07-06T05:50:22Z Theo Steininger

d2o: Contraction functions rely on non-degeneracy of distribution strategy

Several methods of the distributed_data_object rely on the fact, that the distribution strategy behaves as if the local data was non-degenerate. Currently the non-distributor fixes this by returning trivial (local) results in the _allgat... Several methods of the distributed_data_object rely on the fact, that the distribution strategy behaves as if the local data was non-degenerate. Currently the non-distributor fixes this by returning trivial (local) results in the _allgather and the _Allreduce_sum method. Affected d2o methods are at least: _contraction_helper, mean Fix: Move the functionality for sum, prod, etc... into the distributor. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/14 d2o: _contraction_helper does not work when using numpy keyword arguments 2017-07-06T05:50:22Z Theo Steininger

d2o: _contraction_helper does not work when using numpy keyword arguments

The `_contraction_helper` passes keyword arguments to the underlying numpy functions (axis=, keepdims=). The result of the _contraction_helper's local computation is then an array and not a scalar. Therefore the dtype check fails. Fi... The `_contraction_helper` passes keyword arguments to the underlying numpy functions (axis=, keepdims=). The result of the _contraction_helper's local computation is then an array and not a scalar. Therefore the dtype check fails. Fix: After solving theos/NIFTy#2, adopt to the case that the local run's result object is an array and make a further distinction of cases, i.e for something like axis=0 for the slicing_distributor. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/13 Add `axis` keyword functionality to unary methods. 2017-07-06T05:50:23Z Theo Steininger

Add `axis` keyword functionality to unary methods.

Many numpy functions support the `axis` keyword in order to perform an operation only along certain directions of the array. The current implementation of d2o does not support this, e.g. for `all`, `any`, `sum`, etc... Related to: the... Many numpy functions support the `axis` keyword in order to perform an operation only along certain directions of the array. The current implementation of d2o does not support this, e.g. for `all`, `any`, `sum`, etc... Related to: theos/NIFTy#3 https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/12 Add `source_rank` parameter to d2o.set_full_data() 2017-07-06T05:50:23Z Theo Steininger

Add `source_rank` parameter to d2o.set_full_data()

A source_rank parameter should be added to the distributed_data_object in order to specify on which node the source data-array resides on. A source_rank parameter should be added to the distributed_data_object in order to specify on which node the source data-array resides on. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/11 Move `flatten` method into the distributor. 2017-07-06T05:50:23Z Theo Steininger

Move `flatten` method into the distributor.

At the moment `flatten` is performed by the distributed_data_object itself. Thereby it assumes, that flattening the local arrays produces the right result. In general with arbitrary distribution strategies this is wrong. At the moment `flatten` is performed by the distributed_data_object itself. Thereby it assumes, that flattening the local arrays produces the right result. In general with arbitrary distribution strategies this is wrong. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/10 Profile the d2o.bincount method 2017-07-06T05:50:23Z Theo Steininger

Profile the d2o.bincount method

The d2o.bincount method scales well with MPI parallelization but compared to single-core np.bincount has a rather big overhead. The d2o.bincount method scales well with MPI parallelization but compared to single-core np.bincount has a rather big overhead. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/9 Semi-advanced indexing is not recognized 2017-07-06T05:50:23Z Theo Steininger

Semi-advanced indexing is not recognized

a = np.arange(24).reshape((3, 4,2)) obj = distributed_data_object(a) Semi-advanced indexing a[(2,1,1),1] yields array([[18, 19], [10, 11], [10, 11]]) The ``indexinglist'' scheme in d2o expects... a = np.arange(24).reshape((3, 4,2)) obj = distributed_data_object(a) Semi-advanced indexing a[(2,1,1),1] yields array([[18, 19], [10, 11], [10, 11]]) The ``indexinglist'' scheme in d2o expects either scalars or numpy arrays as tuple elements and therefore: obj[(2,1,1),1] -> AttributeError However, obj[np.array((2,1,1)), 1] works. Solution: Parse the elements and in doubt cast them to numpy arrays. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/8 Add support for `from array` indexing 2017-07-06T05:50:23Z Theo Steininger

Add support for `from array` indexing

When building the kdict from pindex and kindex something of the following form must be done (a==kindex, b==pindex): a = np.arange(16)*2 b = np.array([[3,2],[1,0]]) In [1]: a[b] Out[1]: array([[6, 4], ... When building the kdict from pindex and kindex something of the following form must be done (a==kindex, b==pindex): a = np.arange(16)*2 b = np.array([[3,2],[1,0]]) In [1]: a[b] Out[1]: array([[6, 4], [2, 0]]) Currently, this is solved using a hack: p.apply_scalar_function(lambda z: obj[z]) This functionality could easily be added to the get_data interface. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/7 The d2o_librarian will fail when mixing different MPI comms 2017-07-06T05:50:23Z Theo Steininger

The d2o_librarian will fail when mixing different MPI comms

Every local librarian instance on a node of a MPI cluster just increments its internal counter by one when a new d2o is registered. This gets out of sync, when only a part of the full cluster is covered by a special comm. ?Possible s... Every local librarian instance on a node of a MPI cluster just increments its internal counter by one when a new d2o is registered. This gets out of sync, when only a part of the full cluster is covered by a special comm. ?Possible solution: The individual librarians store the id of 'their' d2o and communicate a common id for their dictionary. Con: Involves MPI communication. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/6 d2o cumsum and flatten rely on certain features of distribution strategy 2017-07-06T05:50:23Z Theo Steininger

d2o cumsum and flatten rely on certain features of distribution strategy

cumsum and flatten assume: if the shape of the d2o changes through flattening, the distribution strategy was "slicing". cumsum and flatten assume: if the shape of the d2o changes through flattening, the distribution strategy was "slicing". Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/5 Add function d2o.arange 2017-07-06T05:50:23Z Theo Steininger

Add function d2o.arange

Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/4 Add unit tests for `copy=True/False` functionality 2017-07-06T05:50:23Z Theo Steininger

Add unit tests for `copy=True/False` functionality

Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/3 Move obj.bincount and obj.unique into distributor and make them more efficient. 2016-05-26T11:09:09Z Theo Steininger

Move obj.bincount and obj.unique into distributor and make them more efficient.

For efficiency, use Allreduce instead of allgather in bincount. Use `fast-summation` in obj.unique <http://materials.jeremybejarano.com/MPIwithPython/collectiveCom.html#fastsum> For efficiency, use Allreduce instead of allgather in bincount. Use `fast-summation` in obj.unique <http://materials.jeremybejarano.com/MPIwithPython/collectiveCom.html#fastsum> Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/2 Add tensor-/outer-dot to d2o 2016-05-26T11:09:02Z Theo Steininger

Add tensor-/outer-dot to d2o

Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/17 Add axis keyword to obj.vdot 2017-07-06T05:50:22Z Theo Steininger

Add axis keyword to obj.vdot

Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/18 Improve obj.searchsorted such that it supports scalar, numpy arrays and distr... 2017-07-06T05:50:22Z Theo Steininger

Improve obj.searchsorted such that it supports scalar, numpy arrays and distributed_data_objects as input...

...and return a distributed_data_object if the input was distributed. ...and return a distributed_data_object if the input was distributed. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/19 reshaping -> enfold/defold 2017-07-06T05:50:22Z Theo Steininger

reshaping -> enfold/defold

enfold in the slicing distributor relies on the fact, that the global axes correspond to the local array axes, as it operates on local data from `get_local_data()`. In general this is not true for generic distribution strategies. enfold in the slicing distributor relies on the fact, that the global axes correspond to the local array axes, as it operates on local data from `get_local_data()`. In general this is not true for generic distribution strategies. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/20 Initialization: prefer global_shape/local_shape over shape of data 2017-07-06T05:50:22Z Theo Steininger

Initialization: prefer global_shape/local_shape over shape of data

Right now, durring initialization if some init-data is provided, the shape of this data is prefered over an explicitly given shape. -> Change the behavior in Distributor-Factory. -> Make a disperse_data instead of a distribute_data a... Right now, durring initialization if some init-data is provided, the shape of this data is prefered over an explicitly given shape. -> Change the behavior in Distributor-Factory. -> Make a disperse_data instead of a distribute_data at the end of init. Theo Steininger Theo Steininger https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/21 Create dedicated object for 'distribution_strategy' 2017-07-06T05:50:22Z Theo Steininger

Create dedicated object for 'distribution_strategy'

Only global-type distribution strategies are comparable by their name directly. In order to compare local-type distribution strategies as well -> implement an object which represents the distribution strategy. For 'freeform' this include... Only global-type distribution strategies are comparable by their name directly. In order to compare local-type distribution strategies as well -> implement an object which represents the distribution strategy. For 'freeform' this includes the individual slices lengths. Theo Steininger Theo Steininger