D2O issueshttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues2018-03-20T14:57:13Zhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/25Meta-issue: how to deal with open D2O issues?2018-03-20T14:57:13ZMartin ReineckeMeta-issue: how to deal with open D2O issues?@theos, @ensslint:
There are currently 22 open issues in D2O, and I don't expect that Theo will have the time to work on them. Unfortunately, no one else has the necessary knowledge.
Any suggestions how to proceed here?@theos, @ensslint:
There are currently 22 open issues in D2O, and I don't expect that Theo will have the time to work on them. Unfortunately, no one else has the necessary knowledge.
Any suggestions how to proceed here?https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/24Missing file in last commit?2018-03-18T10:55:45ZMartin ReineckeMissing file in last commit?Is it possible that you forgot to add a file `random.py` in your last commit?
As things are, D2O now imports Python's default `random` module, but this probably doesn't address the problems with MPI-parallel seeding you mentioned.Is it possible that you forgot to add a file `random.py` in your last commit?
As things are, D2O now imports Python's default `random` module, but this probably doesn't address the problems with MPI-parallel seeding you mentioned.https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/4Add unit tests for `copy=True/False` functionality2017-07-06T05:50:23ZTheo SteiningerAdd unit tests for `copy=True/False` functionalityTheo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/5Add function d2o.arange2017-07-06T05:50:23ZTheo SteiningerAdd function d2o.arangeTheo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/6d2o cumsum and flatten rely on certain features of distribution strategy2017-07-06T05:50:23ZTheo Steiningerd2o cumsum and flatten rely on certain features of distribution strategycumsum and flatten assume: if the shape of the d2o changes through flattening, the distribution strategy was "slicing". cumsum and flatten assume: if the shape of the d2o changes through flattening, the distribution strategy was "slicing". Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/7The d2o_librarian will fail when mixing different MPI comms2017-07-06T05:50:23ZTheo SteiningerThe d2o_librarian will fail when mixing different MPI commsEvery local librarian instance on a node of a MPI cluster just increments its internal counter by one when a new d2o is registered. This gets out of sync, when only a part of the full cluster is covered by a special comm.
?Possible s...Every local librarian instance on a node of a MPI cluster just increments its internal counter by one when a new d2o is registered. This gets out of sync, when only a part of the full cluster is covered by a special comm.
?Possible solution: The individual librarians store the id of 'their' d2o and communicate a common id for their dictionary.
Con: Involves MPI communication.Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/8Add support for `from array` indexing2017-07-06T05:50:23ZTheo SteiningerAdd support for `from array` indexingWhen building the kdict from pindex and kindex something of the following form must be done (a==kindex, b==pindex):
a = np.arange(16)*2
b = np.array([[3,2],[1,0]])
In [1]: a[b]
Out[1]:
array([[6, 4],
...When building the kdict from pindex and kindex something of the following form must be done (a==kindex, b==pindex):
a = np.arange(16)*2
b = np.array([[3,2],[1,0]])
In [1]: a[b]
Out[1]:
array([[6, 4],
[2, 0]])
Currently, this is solved using a hack:
p.apply_scalar_function(lambda z: obj[z])
This functionality could easily be added to the get_data interface.
Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/9Semi-advanced indexing is not recognized2017-07-06T05:50:23ZTheo SteiningerSemi-advanced indexing is not recognized a = np.arange(24).reshape((3, 4,2))
obj = distributed_data_object(a)
Semi-advanced indexing
a[(2,1,1),1]
yields
array([[18, 19],
[10, 11],
[10, 11]])
The ``indexinglist'' scheme in d2o expects... a = np.arange(24).reshape((3, 4,2))
obj = distributed_data_object(a)
Semi-advanced indexing
a[(2,1,1),1]
yields
array([[18, 19],
[10, 11],
[10, 11]])
The ``indexinglist'' scheme in d2o expects either scalars or numpy arrays as tuple elements and therefore:
obj[(2,1,1),1] -> AttributeError
However,
obj[np.array((2,1,1)), 1]
works.
Solution: Parse the elements and in doubt cast them to numpy arrays.
Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/10Profile the d2o.bincount method2017-07-06T05:50:23ZTheo SteiningerProfile the d2o.bincount methodThe d2o.bincount method scales well with MPI parallelization but compared to single-core np.bincount has a rather big overhead. The d2o.bincount method scales well with MPI parallelization but compared to single-core np.bincount has a rather big overhead. Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/11Move `flatten` method into the distributor.2017-07-06T05:50:23ZTheo SteiningerMove `flatten` method into the distributor.At the moment `flatten` is performed by the distributed_data_object itself. Thereby it assumes, that flattening the local arrays produces the right result. In general with arbitrary distribution strategies this is wrong. At the moment `flatten` is performed by the distributed_data_object itself. Thereby it assumes, that flattening the local arrays produces the right result. In general with arbitrary distribution strategies this is wrong. Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/12Add `source_rank` parameter to d2o.set_full_data()2017-07-06T05:50:23ZTheo SteiningerAdd `source_rank` parameter to d2o.set_full_data()A source_rank parameter should be added to the distributed_data_object in order to specify on which node the source data-array resides on. A source_rank parameter should be added to the distributed_data_object in order to specify on which node the source data-array resides on. Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/13Add `axis` keyword functionality to unary methods.2017-07-06T05:50:23ZTheo SteiningerAdd `axis` keyword functionality to unary methods.Many numpy functions support the `axis` keyword in order to perform an operation only along certain directions of the array. The current implementation of d2o does not support this, e.g. for `all`, `any`, `sum`, etc...
Related to: the...Many numpy functions support the `axis` keyword in order to perform an operation only along certain directions of the array. The current implementation of d2o does not support this, e.g. for `all`, `any`, `sum`, etc...
Related to: theos/NIFTy#3https://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/14d2o: _contraction_helper does not work when using numpy keyword arguments2017-07-06T05:50:22ZTheo Steiningerd2o: _contraction_helper does not work when using numpy keyword argumentsThe `_contraction_helper` passes keyword arguments to the underlying numpy functions (axis=, keepdims=). The result of the _contraction_helper's local computation is then an array and not a scalar. Therefore the dtype check fails.
Fi...The `_contraction_helper` passes keyword arguments to the underlying numpy functions (axis=, keepdims=). The result of the _contraction_helper's local computation is then an array and not a scalar. Therefore the dtype check fails.
Fix: After solving theos/NIFTy#2, adopt to the case that the local run's result object is an array and make a further distinction of cases, i.e for something like axis=0 for the slicing_distributor. Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/15d2o: Contraction functions rely on non-degeneracy of distribution strategy2017-07-06T05:50:22ZTheo Steiningerd2o: Contraction functions rely on non-degeneracy of distribution strategySeveral methods of the distributed_data_object rely on the fact, that the distribution strategy behaves as if the local data was non-degenerate. Currently the non-distributor fixes this by returning trivial (local) results in the _allgat...Several methods of the distributed_data_object rely on the fact, that the distribution strategy behaves as if the local data was non-degenerate. Currently the non-distributor fixes this by returning trivial (local) results in the _allgather and the _Allreduce_sum method.
Affected d2o methods are at least: _contraction_helper, mean
Fix: Move the functionality for sum, prod, etc... into the distributor.Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/16Add out-array parameter to numerical d2o operations.2017-07-06T05:50:22ZTheo SteiningerAdd out-array parameter to numerical d2o operations.Numpy supports to specify an out array in order to avoid memory reallocation.
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
# slow:
a = a + b
# fast:
np.add(a,b,out=a)
Numpy supports to specify an out array in order to avoid memory reallocation.
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
# slow:
a = a + b
# fast:
np.add(a,b,out=a)
Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/17Add axis keyword to obj.vdot2017-07-06T05:50:22ZTheo SteiningerAdd axis keyword to obj.vdotTheo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/18Improve obj.searchsorted such that it supports scalar, numpy arrays and distr...2017-07-06T05:50:22ZTheo SteiningerImprove obj.searchsorted such that it supports scalar, numpy arrays and distributed_data_objects as input......and return a distributed_data_object if the input was distributed.
...and return a distributed_data_object if the input was distributed.
Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/19reshaping -> enfold/defold2017-07-06T05:50:22ZTheo Steiningerreshaping -> enfold/defoldenfold in the slicing distributor relies on the fact, that the global axes correspond to the local array axes, as it operates on local data from `get_local_data()`.
In general this is not true for generic distribution strategies. enfold in the slicing distributor relies on the fact, that the global axes correspond to the local array axes, as it operates on local data from `get_local_data()`.
In general this is not true for generic distribution strategies. Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/20Initialization: prefer global_shape/local_shape over shape of data2017-07-06T05:50:22ZTheo SteiningerInitialization: prefer global_shape/local_shape over shape of dataRight now, durring initialization if some init-data is provided, the shape of this data is prefered over an explicitly given shape.
-> Change the behavior in Distributor-Factory.
-> Make a disperse_data instead of a distribute_data a...Right now, durring initialization if some init-data is provided, the shape of this data is prefered over an explicitly given shape.
-> Change the behavior in Distributor-Factory.
-> Make a disperse_data instead of a distribute_data at the end of init.
Theo SteiningerTheo Steiningerhttps://gitlab.mpcdf.mpg.de/ift/D2O/-/issues/21Create dedicated object for 'distribution_strategy'2017-07-06T05:50:22ZTheo SteiningerCreate dedicated object for 'distribution_strategy'Only global-type distribution strategies are comparable by their name directly. In order to compare local-type distribution strategies as well -> implement an object which represents the distribution strategy. For 'freeform' this include...Only global-type distribution strategies are comparable by their name directly. In order to compare local-type distribution strategies as well -> implement an object which represents the distribution strategy. For 'freeform' this includes the individual slices lengths. Theo SteiningerTheo Steininger