&Given that both names are flashed up, it wouldn't be the same trick, demonstration, experiment, whatever, without visual clues because the input would be monomodel, not multimodal.
Also, it's not "proving" anything. It's demonstrating an analog of the McGurk effect, that is, multimodsl stimuli, and results suggest different people react differently, no doubt due to differences in stimuli 3 to X. To prove anything, this would have to be done under controlled conditions, with participants selected according to wgat you were attrmpting to assert, such as a gtoup of similar age but differing hearing, as verified by kab heating tests, or comparisons between age groups. All samples, of course, need to be of sufficient size to be statistically significant, and selected to eliminate other factors (like educational level, native language, etc). The latter because some nationalities, like those raised on tonal languages, or with very different common pronunciation of certain letters (like the W or V sounds) will affect perception (and expectation) of what results tupically come from certain base sounds.
But what's being demonstrated, ultimately, is that hearing, or more specifically, speech recognition, are affected by more that just pure sound, and that we "hear" speech and many other sounds) in our brains, not our ears.