Here’s a video showcasing this.
In word format:
The phoneme transition is from [ dh ax ] into [ k ey ]. In the video, it initially plays smoothly and as (presumably) intended when [ k ey ] descends from the previous note by a whole step. However, when the note for [ k ey ] is moved to a higher note, ascending by a half step, the synthesis plays back an extra clicking noise before the [ k ] phoneme.
Even if I can probably find a workaround myself, this feels like it shouldn’t be here… I tested with Koharu Rikka AI in English cross-lingual synthesis, and Tsurumaki Maki AI English Lite, and it does not sound like this.