Right now Zammo does not have the capability to directly upload files (like mp3's) into UI Builder. If you would like to serve an .MP3 file as part of an answer, you would first need to upload the mp3 to a host location like Azure Storage then grab the link and add that into UI Builder.
On voice it will play immediately if you define it to behave that way. Go to this website: https://azure.microsoft.com/en-ca/services/cognitive-services/text-to-speech/#features Scroll all the way to the section where you can test. Select SSML and paste this
<speak xmlns="[<http://www.w3.org/2001/10/synthesis>](<http://www.w3.org/2001/10/synthesis>)" xmlns:mstts="[<http://www.w3.org/2001/mstts>](<http://www.w3.org/2001/mstts>)" xmlns:emo="[<http://www.w3.org/2009/10/emotionml>](<http://www.w3.org/2009/10/emotionml>)" version="1.0" xml:lang="en-US"><voice name="en-CA-ClaraNeural">I will start playing the mp3 now: <audio src="[<https://file-examples-com.github.io/uploads/2017/11/file_example_MP3_1MG.mp3>](<https://file-examples-com.github.io/uploads/2017/11/file_example_MP3_1MG.mp3>)"></audio></voice></speak>
It’s using the MP3 that you see in the snippet on the audio tag.
On chat you have an audio card that gives a player to the user who can hit play.
You can test it by going on this page: https://microsoft.github.io/BotFramework-WebChat/01.getting-started/a.full-bundle/ and typing audio