normalize descent direction for SteepestDescent and VL_BFGS (but not for RelaxedNewton!)
Following a discussion with Jakob, it appears that the length of the descent direction vector only has real meaning for RelaxedNewton. The other two minimizers can get into trouble (e.g. by going to extreme coordinates and causing overflows) when suplied with very long descent vectors, so I'm normalizing them. A good step length will hopefully be found by the line search algorithm.