I think you should pay attention more to real singers. They’re rarely perfectly understandable. That’s why we have mondegreens.
Taylor Swift is a human singer, and yet in Blank Space, everyone thinks she’s saying “all the lonely Starbucks lovers” instead of “got a long list of ex-lovers.” This is probably because of the weird cadence placing unnatural emphasis on “of.” Western Vocaloid producers tend to ignore cadence when writing lyrics. If all the words ARE emPHAsized weirdLY, that makes it difficult to understand.
Another major contributor to whether lyrics are comprehensible is predictability. Early Fall Out Boy songs are known for their hard-to-make-out lyrics. If you listen to their song Lake Effect Kid (a new studio version of a song they wrote a long, long time ago) it’s almost impossible to understand because the lyrics are unusual and every line is about a completely new subject. Western Vocaloid producers don’t usually write about typical subjects (love, dancing, etc.) so it’s difficult to predict what will be said.
This is actually one of my issues with vocal synths. There’s too much of an emphasis on pronunciation being understandable over it being natural for a singer.