Traditional Windows SAPI5 speech synthesis voices are language specific. The voice is designed/recorded to be a specific language. This means that if you feed English text to a French voice it will read and pronounce it as if it were French, making it sound idiotic.
Google has a speech synthesis service as well, available in Chrome browser and as API and it behaves a lot differently. If you feed English text to a Dutch Google voice, it speaks/pronounces it properly but it gets a strong Dutch accent. This is not possible with traditional text to speech.
Please describe both processes and explain to me like I am 5 the fundamental difference between them that is responsible for this.