ChangeLog 11 KB
Newer Older
Martin Reinecke's avatar
Martin Reinecke committed
1
2
3
4
5
6
7
8
9
0.25.0:
- general:
  - try to fix the package on 32bit platforms

- nufft:
  - significant performance and accuracy improvements

- wgridder:
  - recalculated kernels, improved error model
Martin Reinecke's avatar
Martin Reinecke committed
10
  - small performance tweaks
Martin Reinecke's avatar
Martin Reinecke committed
11
12


Martin Reinecke's avatar
Martin Reinecke committed
13
14
15
16
17
18
19
20
21
22
23
24
25
0.24.0:
- general:
  - work around a compilation problem with gcc 7

- nufft:
  - beginnings of a non-uniform FFT module
    Conventions are closely following the FINUFFT library.
    The interface is not finalized yet.

- wgridder:
  - improved pre-sorting of visibilities


Martin Reinecke's avatar
Martin Reinecke committed
26
27
28
29
30
0.23.0:
- general:
  - improved template code for multi-array operations (internal detail)

- fft:
Martin Reinecke's avatar
Martin Reinecke committed
31
32
33
  - fix a bug in multi-D Hartley transform which was introduced in ducc0 0.21.
    This bug was triggered in cases with two or three transformed axes and at
    least one untransformed axis.
Martin Reinecke's avatar
Martin Reinecke committed
34
35
36
37
38
39
40
41
42
43
  - use clear dual-license headers in all files required for the FFT component

- healpix:
  - input arrays to all functions can now be float32/int32 as well

- wgridder:
  - performance tweaks to FFT and kernel evaluation parts; performance gain
    on the order of 10%.


Martin Reinecke's avatar
Martin Reinecke committed
44
45
46
0.22.0:
- general:
  - many internal cleanups and consistency improvements
Martin Reinecke's avatar
Martin Reinecke committed
47
  - preparations for release as an Alpine Linux package
Martin Reinecke's avatar
Martin Reinecke committed
48
49
50
51
52
53
54
55
56

- fft:
  - re-introduce plan caching. This is possible since plans for large 1D
    transforms no longer scale with the length of the transform,but only its
    square root, limiting the memory overhead
  - code tweaks to improve copying steps for multi-D transforms (basically a
    workaround for mis-optimizations by gcc)


Martin Reinecke's avatar
Martin Reinecke committed
57
58
59
60
61
0.21.0:
- general:
  - support for more platforms (e.g. Raspberry Pi)
  - rewrite of the classes for multidimensional array views, which allows
    many simplifications, multithreading etc.
Martin Reinecke's avatar
Martin Reinecke committed
62
63
64
65
66
67
68

- fft:
  - low-level tweaks which accelerate internal function calls; this especially
    helps multi-D transforms with short axis lengths
  - genuine Hartley transforms over 2 and 3 axes no longer require big temporary
    arrays

Martin Reinecke's avatar
Martin Reinecke committed
69
70
71
72
- healpix:
  - multithreading support for most functions


73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
0.20.0:
- general:
  - minimum required Python version is now 3.7
  - tests: retire Ubuntu 18, improve tests with icpx
  - fix compilation failure on non-x86 platforms

- fft:
  - allow individual, compile-time, selection of SIMD types to be used

- sht/healpix:
  - prepare better support for Healpix pixelization

- misc:
  - convenience function for building numpy arrays without critical strides


Martin Reinecke's avatar
Martin Reinecke committed
89
0.19.0:
Martin Reinecke's avatar
Martin Reinecke committed
90
91
92
93
94
- general:
  - binary wheels can now be built and uploaded to PyPI; the installation
    instructions have been updated accordingly. Please provide feedback in case
    of problems!

Martin Reinecke's avatar
Martin Reinecke committed
95
- fft:
Martin Reinecke's avatar
Martin Reinecke committed
96
  - C++ sources for FFT calculation now have their own subdirectory.
Martin Reinecke's avatar
Martin Reinecke committed
97
  - new function `r2r_fftw`, which supports FFTW's halfcomplex storage scheme.
Martin Reinecke's avatar
Martin Reinecke committed
98
99
  - new function `convolve_axis`, which performs efficient convolution of arrays
    with arbitrary 1D kernels, optionally followed by zero-padding/truncation.
Martin Reinecke's avatar
Martin Reinecke committed
100
101


Martin Reinecke's avatar
Martin Reinecke committed
102
103
104
105
106
107
108
109
110
0.18.0:
- sht:
  - implement adoint_analysis
    CAUTION: this is still really experiental!

- wgridder:
  - improve cost model, assuming that the FFT component will not scale perfectly


Martin Reinecke's avatar
Martin Reinecke committed
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
0.17.0:
- general:
  - more information available on PyPI

- fft:
  - performance tweaks for 1D FFTs
  - reduced memory overhead for 1D FFTs
  - multithreading support for 1D FFTs (this is only advantageous for very long
    transforms at the moment)

- sht:
  - interface for fully general SHTs is now accessible from Python; this is not
    completely finalized, however.
  - improved a_lm rotation performance


Martin Reinecke's avatar
Martin Reinecke committed
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
0.16.0:
- general:
  - the GIL is now released in many more functions

-fft:
  - very long 1D transforms now have a lower memory overhead and should be
    faster

-sht:
  - a_lm rotation is now much more accurate, but slightly slower
  - the improved apherical harmonic analysis capabilities are now documented

-misc:
  - two new convenience functions vdot() and l2error() were added


Martin Reinecke's avatar
Martin Reinecke committed
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
0.15.0:
- general:
  - the code is now compiled with the "-fvisibility=hidden" flag, which reduces
    the size of the resulting binary.
  - demo codes were adjusted to use the new SHT interface.

- fft:
  - added some functions to reduce the amount of unnecessary memory
    allocations and data copying.

- sht:
  - it is no longer necessary to pre-allocate an array for the output of the
    `sht.experimental.*2d*` functions. If not provided, the functions will
    create the array automatically now, which requires passing of new `ntheta`
    and `nphi` parameters in come cases.
  - the `sht.experimental.*2d*` functions now take an optional `mmax` parameter
    which can be used to limit the maximum azimuthal moment of a transform.
    If not supplied, the code assumes that `mmax` is equal to `lmax`.
  - added some unit tests for the new SHT interface.
Martin Reinecke's avatar
Martin Reinecke committed
162
  - reduced memory overhead forsome of the `sht.experimental.*2d*` functions.
Martin Reinecke's avatar
Martin Reinecke committed
163
164
165
166
167
168

- misc:
  - added functionality (originally from Planck Level-S) to simulate time
    streams of detector noise.


Martin Reinecke's avatar
Martin Reinecke committed
169
0.14.0:
Martin Reinecke's avatar
Martin Reinecke committed
170
171
172
- general:
  - ducc0.__version__ is now also defined under Windows

Martin Reinecke's avatar
Martin Reinecke committed
173
174
175
- sht:
  - further performance improvements
  - added functions for manipulation of "2D maps", i.e. maps consisting of
Martin Reinecke's avatar
Martin Reinecke committed
176
177
178
    (ntheta*nphi) pixels with equidistant pixels in phi, and rings distributed
    along theta according to one of the CC, DH, F1, F2, GL, MW, MWflip schemes.

Martin Reinecke's avatar
Martin Reinecke committed
179
180
181
182
- totalconvolve:
  - bug fix in the adjoint convolution: results were inadvertently conjugated


Martin Reinecke's avatar
Martin Reinecke committed
183
184
185
186
187
188
189
0.13.0:
- general:
  - more comprehensive references in README.md

- sht:
  - bug fixes
  - tweaks to the experimental interface for extracting moments up to lmax
Martin Reinecke's avatar
Martin Reinecke committed
190
    from maps with only lmax+1 or lmax+2 equidistant rings.
Martin Reinecke's avatar
Martin Reinecke committed
191
192


Martin Reinecke's avatar
Martin Reinecke committed
193
194
195
196
197
198
199
200
201
0.12.0:
- general:
  - update installation instructions in README.md

- sht:
  - expose functionality for computing gradient maps from spherical harmonic
    coefficients


Martin Reinecke's avatar
Martin Reinecke committed
202
203
204
205
206
207
208
209
0.11.0:
- general:
  - beginning of Doxygen documentation for the C++ part
  - fixes to the #include statements in header files; now every header can be
    included in isolation.
  - some CI streamlining


Martin Reinecke's avatar
Martin Reinecke committed
210
211
212
0.10.0:
- general:
  - HTML documentation generation using Sphinx
Martin Reinecke's avatar
Martin Reinecke committed
213
214
    Up-to-date documentation for the ducc0 branch is available at
    https://mtr.pages.mpcdf.de/ducc/.
Martin Reinecke's avatar
Martin Reinecke committed
215
  - more and improved docstrings
Martin Reinecke's avatar
cleanup    
Martin Reinecke committed
216
217
218
  - SIMD datatypes are now much more compatible with C++ upcoming SIMD types.
    The code can be compiled with the types from <experimental/simd> if
    available, with very small manual changes.
Martin Reinecke's avatar
Martin Reinecke committed
219
220
221
222
223
224
225
226
227
  - reshuffling and renaming of files

- fft:
  - 1D transforms have been rewritten using a much more flexible class hierarchy
    which allows more optimizations. For example 1D FFTs can now be partially
    multi-threaded and the Bluestein algorithm can be used as a single pass
    instead of just replacing a whole transform.

- sht:
Martin Reinecke's avatar
Martin Reinecke committed
228
  - design of a new SHT interface. Parts of this interface are made visible
Martin Reinecke's avatar
Martin Reinecke committed
229
    from Python, in the "sht.experimental" submodule. The "sharpjob_d"-based
Martin Reinecke's avatar
Martin Reinecke committed
230
    interface will be kept for compatibility purposes until ducc1 is released.
Martin Reinecke's avatar
Martin Reinecke committed
231
232
  - experimental support for spherical harmonic analysis that only requires
    lmax+1 or lmax+2 equidistant rings for exact analysis up to lmax.
Martin Reinecke's avatar
Martin Reinecke committed
233
234
235
236
237
238
239
240
241
242
  - misc.rotate_alm was moved to the sht submodule.

- totalconvolver:
  - interface change to synchronize it better with the upcoming SHT interface.
    Basically, if an array has a "number of components" axis, this is now
    always in first place.
    Strictly speaking this is an interface-breaking change, but to the best of
    my knowledge the interface in question has not been used in other projects
    yet.

Martin Reinecke's avatar
Martin Reinecke committed
243

Martin Reinecke's avatar
Martin Reinecke committed
244
245
0.9.0:
- general:
Martin Reinecke's avatar
Martin Reinecke committed
246
  - improved and faster computation of Gauss-Legendre nodes and weights
Martin Reinecke's avatar
Martin Reinecke committed
247
248
249
    using Ignace Bogaert's implementation (https://doi.org/10.1137/140954969,
    https://sourceforge.net/projects/fastgausslegendrequadrature/)
  - Intel OneAPI compilers are now supported
Martin Reinecke's avatar
Martin Reinecke committed
250
  - new accepted value "none-debug" for DUCC0_OPTIMIZATION
Martin Reinecke's avatar
Martin Reinecke committed
251
252
253
254
255
256
257

- wgridder:
  - fixed a bug which could cause memory accesses beyond the end of an array

- fft:
  - slightly improved buffer re-use

Martin Reinecke's avatar
Martin Reinecke committed
258
259
260
261
- misc:
  - substantially faster a_lm rotation code based on the Mikael Slevinsky's
    FastTransforms package (https://github.com/MikaelSlevinsky/FastTransforms)

Martin Reinecke's avatar
Martin Reinecke committed
262

Martin Reinecke's avatar
Martin Reinecke committed
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
0.8.0:
- general:
  - compiles and runs on MacOS 11
  - choice of various optimization and debugging levels by setting
    the DUCC0_OPTIMIZATION variable before compilation.
    Valid choices are
    "none":
      no optimization or debugging, fast compilation
    "portable":
      Optimizations which are portable to all CPUs of a given family
    "portable-debug":
      same as above, with debugging information
    "native":
      Optimizations which are specific to the host CPU, non-portable library
    "native-debug":
      same as above, with debugging information
    Default is "native".

- wgridder:
  - more careful treatment of u,v,w-coordinates and phase angles, leading to
    better achievable accuracies for single-precision runs
  - performance improvements by making the computed interval in "n-1" symmetric
Martin Reinecke's avatar
typos    
Martin Reinecke committed
285
    around 0. This reduces the number of required w planes significantly.
Martin Reinecke's avatar
Martin Reinecke committed
286
287
    Speedups are bigger for large FOVs and when FFT is dominating.
  - allow working with dirty images that are shifted with respect to the phase
Martin Reinecke's avatar
Martin Reinecke committed
288
289
290
291
    center. This can be used for faceting and incorporating DDEs.
  - new optional flag "double_precision_accumulation" for gridding routines,
    which causes accumulation onto the uv grid to be done in double precision,
    regardless of input and output precision. This can be helpful to avoid
Martin Reinecke's avatar
typos    
Martin Reinecke committed
292
    accumulation errors in special circumstances.
Martin Reinecke's avatar
Martin Reinecke committed
293
294
295
296
297

- pointingprovider:
  - improved performance via vectorized trigonometric functions


Martin Reinecke's avatar
Martin Reinecke committed
298
0.7.0:
Martin Reinecke's avatar
Martin Reinecke committed
299
300
- general:
  - compilation with MSVC on Windows is now possible
Martin Reinecke's avatar
Martin Reinecke committed
301

Martin Reinecke's avatar
Martin Reinecke committed
302
303
304
305
306
- wgridder:
  - performance (especially scaling) improvements
  - oversampling factors up to 2.5 supported
  - new, more flexible interface in submodule `wgridder.experimental`
    (subject to further changes!)
Martin Reinecke's avatar
Martin Reinecke committed
307

Martin Reinecke's avatar
Martin Reinecke committed
308
309
310
311
312
- totalconvolver:
  - now performs non-equidistant FFT interpolation also in psi direction,
    making it much faster for large kmax.
  - new low-level interface which allows flexible re-distribution of work
    over MPI tasks (responsibility of the caller)
Martin Reinecke's avatar
Martin Reinecke committed
313
314


Martin Reinecke's avatar
Martin Reinecke committed
315
0.6.0:
Martin Reinecke's avatar
Martin Reinecke committed
316
317
- general:
  - multi-threading improvements contributed by Peter Bell
Martin Reinecke's avatar
Martin Reinecke committed
318

Martin Reinecke's avatar
Martin Reinecke committed
319
320
321
322
323
324
325
- wgridder:
  - new, smaller internal data structure


0.5.0:
- wgridder:
  - internally used grid size is now chosen automatically, and the parameters
Martin Reinecke's avatar
Martin Reinecke committed
326
    "nu" and "nv" are ignored; they will be removed in ducc1.
Martin Reinecke's avatar
Martin Reinecke committed
327
328


Martin Reinecke's avatar
Martin Reinecke committed
329
330
331
0.3.0:
- general:
  - The package should now be installable from PyPI via pip even on MacOS.
Martin Reinecke's avatar
Martin Reinecke committed
332
    However, MacOS >= 10.14 is required.
Martin Reinecke's avatar
Martin Reinecke committed
333
334
335
336
337

- wgridder:
  - very substantial performance and scaling improvements


Martin Reinecke's avatar
Martin Reinecke committed
338
339
340
341
342
343
344
345
346
347
348
349
350
351
0.2.0:
- wgridder:
  - kernels are now evaluated via polynomial approximation, allowing much
    more freedom in the choice of kernel function
  - switch to 2-parameter ES kernels for better accuracy
  - unnecessary FFT calculations are skipped

- totalconvolve:
  - improved accuracy by making use of the new wgridder kernels
  - *INTERFACE CHANGE* removed method "epsilon_guess()"

- pointingprovider:
  new, experimental module for computing detector pointings from a time stream
  of satellite pointings. To be used by litebird_sim initially.