For Text:

In a Send Response block, add the following activity:

     Value = 5000
     Type = delay

For Voice:

In the SSML voice response, include this parameter: <break time="3s"/> (or however many seconds you want). For example:

<speak xmlns="[<>](<>)" xmlns:mstts="[<>](<>)" xmlns:emo="[<>](<>)" version="1.0" xml:lang="en-US"><voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%"><break time="3s"/>This is where I would connect to the Salesforce Einstein bot.</prosody></voice></speak>