Implements a JaxOpt style optimization for jifty VI inference: `OptimizeVI`

Additionally reimplements `optimize_kl` using it.
As done in VC Gaussian, use vdot to compute the sum of squared residuals.

@gedenhof Shouldn't there be a cast to real numbers? By just using vdot the result is still complex with negligible imaginary part.
@gedenhof Shouldn't there be a cast to real numbers? By just using vdot the result is still complex with negligible imaginary part.
fix "sample_evi" docstring in "kl.py".

Give more control over how to map the drawing of samples to the user. In doing so restructure the way minimization works.

Approximates tr(log(M)) using tr(log(T)) where T is the projection of M into the krylov subspace K(M, v) where v is a sample from the metric M (i.E. v = v_lh + v_pr where v_lh/v_pr are samples from the likelihood/prior metric, respectively. In addition, the projected sample is constructed by taking v_pr projecting out the subspace K(M,v) using its eigen-basis. This ensures that both, the prior dominated part of v and the part already covered by tr(log(T)) is projected out.

Unfortunately, it performs much poorer than expected.

Change of transition operator to transition function as well as change of the corresponding tests