During my Bachelor’s thesis I investigated if reinforcment learning is better than supervised learning in task completion dialogues. You can find the code for the RL part here and my thesis here.