If this is how Hugin/Enfuse are doing for alignment, then I think it might just be a coincidence that Hugin/EnFuse "worked better" than Zerene. Feature extraction depends heavily on texture. Back in 2008, I did some work on this, and using feature extraction to match between images, I used that for focus stacking, image stabilisation, super-resolution, etc, etc. One key point of failure of all is lack of "feature" or when image is out of focus. At the time, computing power is still limited, so some advanced algorithm was too slow for real time computer vision work. For example, image stabilization, I was merely computing first order derivative for each pixel, some matching algorithm, simple and fast for real time image stabilization.where Hugin's alignment method is based on detection and matching of distinctive local patterns of pixel values ("control points").
Of course, computer vision has advanced so much, anything is possible now, so time to dive back into this.