I’ve written and talked before about how the work of evaluators can be supported through mindfulness practice. Well, would you believe it if I said that mindfulness can be indirectly enhanced through the use of an artificial intelligence technology?
Let me explain how. Also, see the video embedded below.
Generative Pre-trained Transformer – 3 (GPT-3) was introduced in mid-2020 by Open AI, a San Francisco-based artificial intelligence research company. GPT-3 is what is referred to as a Transformer, which is based on a deep learning model called a neural network, and was first introduced in 2017. Shout out to Vasili Shynkarenka who taught a fantastic course that accelerated my understanding of how to use GPT-3 and its possibilities.
GPT-3 can be used by individuals who are not proficient in coding as it has a “text in, text out” interface that can complete many types of English language tasks. It was trained on data from Common Crawl, which is a regularly obtained archive of the web, as well as other web text, books, and Wikipedia. It utilizes 175 billion parameters compared with 1.5 billion parameters for the previous version, GPT-2, resulting in a sizable step up in performance.
Individuals and organizations can use GPT-3 through the OpenAI Playground, a web-based interface, or create applications that use the GPT-3 API in the background. The types of use cases for GPT-3 include: classification, completion, conversation, semantic search, summarization, factual responses, generation, and transformation. Developers have been working on many creative applications of GPT-3; here are some great examples.
GPT-3 has the potential to be dangerous as it can be difficult to tell the difference between its output and what is written by actual people. There are a multitude of nefarious opportunities, for instance, generating SPAM, fraud, and plagiarism. The developers highlight the potential danger of GPT-3 in their academic paper. Also, researchers, such as Timnit Gebru and colleagues, have identified challenges pertaining to these types of large scale language models including environmental costs (due to high energy consumption) that may impact marginalized communities disproportionately and the effect of racist, sexist, and abusive language that exists in the training data. Clearly, GPT-3 has to be used with care and supervision.
Regardless, in the coming years, Transformers such as GPT-3 and similar technologies will likely become ubiquitous because they can be very useful for many text related tasks. Microsoft obtained an exclusive license for GPT-3 for a billion dollars so I imagine that they will integrate it into their products or offer it in other ways soon. Also, Google recently indicated that they trained a language model, called a Switch Transformer, with 1.6 trillion parameters, so there is competition and growth is likely in this space.
Currently, GPT-3 can provide substantial benefits for professional evaluators when working on program evaluation projects. I have also used GPT-3 for various purposes including brainstorming learning outcomes and developing multiple choice questions to test knowledge related to text about Daniel Kahneman’s heuristics and biases research. It’s been great to have access to this tool.
Discipline of Evaluation
How does all of this relate to evaluation? First off, what is the discipline of evaluation? Various definitions of evaluation have been proposed through the years. In 2014, an American Evaluation Association task force proposed the following definition as a starting point for discussion and elaboration: “Evaluation is a systematic process to determine merit, worth, value or significance.”
I talked with Amy Gulickson about her 2020 article related to this topic. She proposed the following definition building on the work of other evaluators: “Evaluation is the generation of a credible and systematic determination of merit, worth, and/or significance of an object through the application of defensible criteria and standards to demonstrably relevant empirical facts.”
When most people ask me I say something like: “I help to determine how programs that are run by organizations that help people are doing and how to make them better.” If they ask me more questions I use a more proper definition.
But in order to accomplish this aim evaluators have to be able to think clearly. Specifically, as Ernest House (2015) argues, it is important for evaluators to become aware of our biases in thinking and take active steps toward mitigating them, learning from both the research methodology and social psychology research literature.
Mindfulness and the Role of Reflective Questions
But how does all of this relate to mindfulness? First off, what is mindfulness? Jon Kabat-Zinn (2017) described it as follows, “Mindfulness is awareness that arises through paying attention, on purpose, in the present moment, non-judgmentally… in the service of self-understanding and wisdom.”
Some mindfulness-based programs encourage the use of reflective questions to promote mindfulness in participants. Ronald Epstein and his colleagues (2008) describe the habit of self-questioning as important to developing self-monitoring and mindfulness in physicians. They indicated, “Using reflective questions enhances the ability to see familiar situations with new eyes and to self-monitor one’s actions during actual practice.”
This metacognitive practice can improve one’s ability to become more self-aware of bias in thinking. They also state, “In this way, we can consciously sense our tendency to draw rapid conclusions to quell our anxiety and have a conscious choice to inhibit such automatic reactions from controlling our decision-making processes.”
Cognitive Dissonance as an Example of Biased Thinking
Cognitive dissonance, an example of a process that can lead to bias, occurs when we hold contradictory cognitions or engage in a behavior that is inconsistent with a cognition. To manage the stress or discomfort that is experienced with this dissonant state, we may rationalize or justify how it is possible to have what appear to be incompatible views or behaviors.
An example of cognitive dissonance used by Leon Festinger, the originator of cognitive dissonance theory, is the case of one who smokes cigarettes even though they are confronted with evidence indicating smoking is harmful. Such an individual may initially experience a dissonant state that is emotionally uncomfortable. This dissonance can be alleviated by changing beliefs, for instance, devaluing the evidence of harm provided by research studies or by rationalizing that smoking improves stress management leading to an overall positive impact on cardiovascular health. We all do this, often beyond awareness, sometimes leading to holding irrational beliefs and poor decision making. I will come back to how reflective questions can be used to mitigate the impact of this process.
Reflective Questions for Evaluators
In their paper, Epstein and colleagues list a sample of reflective questions for physicians they use in their program that I thought could be modified so they were applicable for professional evaluators. Below are three of their questions along with reworded versions I developed for program evaluators.
Physician: If there were data that I ignored, what might they be?
Evaluator: What are other data sources to which I have not paid attention?
Physician: Is there another way in which I can formulate this patient’s story and/or my response?
Evaluator: Is there another way I can understand what is happening in the evaluation and/or my response?
Physician: What would a trusted peer say about how I am managing or feeling about this situation?
Evaluator: What would a respected colleague say about how I am conducting, managing, and feeling about this evaluation?
Using GPT-3 to Generate Reflective Questions
Using these as samples for what’s called “few shot training” and iteratively adjusting the GPT-3 parameters and running the API about 30 times, it generated some useful physician and evaluator reflective questions. I am most interested in the evaluator reflective questions but I listed the physician ones below as well. Here are 10 of the better pairs that it generated.
Physician: What aspects of this situation have caused me to feel uncomfortable?
Evaluator: What aspects of this evaluation have caused me to feel uncomfortable?
Physician: What is the most difficult issue or emotion that I have encountered in this clinical situation?
Evaluator: What is the most difficult issue or emotion that I have encountered in this evaluation?
Physician: What is the part of this clinical situation that I find most difficult to understand?
Evaluator: What is the part of this evaluation that I find most difficult to understand?
Physician: What would I tell a colleague to do in my situation?
Evaluator: What would I tell a colleague to do in his or her situation?
Physician: What am I most grateful for in this patient?
Evaluator: What am I most grateful for in this evaluation?
Physician: What can I do to reduce any biases that I may have in this case?
Evaluator: What can I do to reduce any biases I may have in this evaluation?
Physician: What questions have I not asked that I should have asked?
Evaluator: What questions have I not asked that I should have asked
Physician: What have I done that, if I could do it again, I would do differently?
Evaluator: What have I done that, if I could do it again, I would do differently?
Physician: What have I learned about myself as a physician?
Evaluator: What have I learned about myself as an evaluator?
Physician: What have I learned from this situation?
Evaluator: What have I learned from this evaluation?
This video shows how I used GPT-3 to generate these questions starting on 8:46.
Reflective Questions Can Decrease Bias in Thinking
F Scott Fitzgerald (1936) said, “The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function.” This quote relates to the approach suggested by Carol Tavris and Elliot Aronson (2020) with respect to effectively managing cognitive dissonance. Their suggestion is that when we feel the discomfort of dissonance that, instead of diminish it by changing a belief or attitude in an unreasonable manner, we should notice the feeling, sit with it, and reflect on the incompatible beliefs and/or behaviors in an honest and open manner, thereby engaging in a potentially fruitful struggle toward insight and truth.
Reflective questions, such as the ones noted above, can be used to improve self-awareness and decrease the impact of cognitive bias on judgement and thinking. For instance, the reflective question, “What aspects of this evaluation have caused me to feel uncomfortable?” can help identify the type of discomfort associated with a dissonant state referred to by Tavris and Aronson that can lead to the negative effects they describe.
Tavris and Aronson also emphasize that it is essential to have a mindset that it is permissible to make mistakes and that making a mistake does not mean we are stupid or inherently bad. They state that although most people say this, many do not really believe it. If we believe it is not acceptable to make mistakes then cognitive dissonance can lead us to deny that we made a mistake or to justify our behavior in some way. The reflective questions for evaluators, “What have I done that, if I could do it again, I would do differently?” and “What have I learned about myself as an evaluator?” are examples that help with cultivating a view that there is always room for improvement and missteps are natural and learning opportunities.
Overall Impression and Next Steps
After using GPT-3 for this purpose it became obvious that it can be a great idea generation tool but with the necessity of careful user guidance and supervision. Most of what it yielded was not valuable but I think there are some gems in the above list of reflective questions that can assist in deepening the mindfulness practice of evaluators. You can request access to the GPT-3 Playground here. I am going to continue to use GPT-3 to explore it’s utility and will share my findings with the evaluation community. Likewise, I am also going to use reflective questions more deliberately as part of my evaluation practice and devise ways of using them more routinely.
Let me know what you think about the reflective questions that were generated and GPT-3!