This is a great idea and here is what I suggest:
VPN into a local PC on the same network as the ATEM and make sure that PC can see the ATEM on the network using the ATEM software control. You now have control of the ATEM.
Connect the ATEM USB out (assuming we are talking about the ATEM Mini line if not then you will need a USB video capture card instead) to the same PC you are using locally to control the ATEM. Now it will be your choice on what software you want to use to get the audio and video sent back to the remote operator. I would suggest Skype because a) Skype has very low latency with 2 way capabilities and b) also allows the remote operator to send audio back to the PC (as a talkback channel in the production world) which could be taken (routed) from the PC sound card back into the audio chain (example into an audio mixer and talk back to your camera people or the talent on location) should you need this functionality.
NDI protocol does not play nice with the ATEM gear in general and also adds additional latency as the audio and video will need to be encoded and decoded. This means by default the remote operator will be many milliseconds if not seconds behind the live action so I would avoid NDI altogether.
Hope that helps.