For Text:

In a Send Response block, add the following activity:

[Activity
     Value = 5000
     Type = delay
]

For Voice:

In the SSML voice response, include this parameter: <break time="3s"/> (or however many seconds you want). For example:


<speak xmlns="[<http://www.w3.org/2001/10/synthesis>](<http://www.w3.org/2001/10/synthesis>)" xmlns:mstts="[<http://www.w3.org/2001/mstts>](<http://www.w3.org/2001/mstts>)" xmlns:emo="[<http://www.w3.org/2009/10/emotionml>](<http://www.w3.org/2009/10/emotionml>)" version="1.0" xml:lang="en-US"><voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%"><break time="3s"/>This is where I would connect to the Salesforce Einstein bot.</prosody></voice></speak>