Google was quite 'involved' in adding voice chat integration to XMPP. They seemed to be pushing their solution as an 'industry standard' (which would be great), but they have never even provided proper specification and were breaking things that were already agreed on. This actually slowed down the process and ended with a few unofficial incompatible extensions for voice chats.
Also the very foundation of XMPP protocol ('XML streams') was quite unfortunate choice making implementations difficult and inefficient. I know that, because I have implemented three different client implementations, each time fighting XML parsers to do what XMPP expects (which is not what XML was designed for). And each extension to the protocol made things more complicated and more verbose, which didn't made adding features like voice chat any easier.
But e-mail is sent from one entity to another, through servers providing service for one or the other party. Most of Lemmy and Mastodon activities are publicly broadcasted and can be received and collected by any federated server.