[2404.09129] When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models