Robots are quickly becoming fixtures of modern life showing up in a wide variety of contexts from museums to classrooms to train stations. Yet the timing requirements of face-to-face natural language interactions with humans are quite strict. In fact, the requirements are so strict that some natural language systems have implemented explicit protocols to show hesitation in order to seem more natural. Human speakers in face-to-face interaction rapidly and incrementally integrate syntactic, semantic, and pragmatic information with information from the visual environment and the words of the utterance itself to interpret the utterance as it is being spoken. While listening, humans also produce backchannel feedback such as eye-gaze movements and ``uh-huh' vocalizations. If this feedback is absent or delayed, the discourse seems unnatural and confusion is common. Yet the stage-by-stage natural language processing systems often used by robotic systems require that an utterance be completed and a full syntactic parse tree constructed before any semantic understanding can occur. Such systems are incapable of producing meaningful backchannel feedback during an utterance, and face difficulty in meeting the requirements of turn-taking. We present TIDE, a timing-sensitive incremental discourse engine, capable of simultaneously and incrementally processing an utterance at the syntactic, semantic and pragmatic levels while it is still being spoken. TIDE is capable of performing backchannel feedback actions at appropriate times during the utterance, responding to utterances within the boundaries of human turn-taking, and even interrupting a speaker as necessary. We argue that TIDE operates at a level of incrementality required for natural language interactions with humans, and demonstrate TIDE's functionality as a framework for further expansion with the implementation of a model of reference resolution in a shared visual environment.