I've abandoned trying to use the espeak-ng speech synthesizer. After spending many weeks working it, it still will not correctly apply stress rules (adding stress to unstressed words) and has other issues I've run into that are very problematic, such as where "d", "g", "n" sound so similar that it is easy to mishear these consonants.
Instead I've switched to using a Tacotron 2 TTS system published by Tomáš Nekvinda for speaking Cherokee. While more work definitely needs done, I think this is a good start, and generally sounds much better than espeak-ng. Samples follow.