Replication of “Prescribed Optimism: Is it Right to Be Wrong About the Future?”
Posted on December 31, 2018Research Methodologies in Humanities and Science
The main assumption about personal predictions, although an untested one, has always rested on the rationale that people want to be as accurate as possible. Armor et al. wanted to explore this assumption in 2008 and found out that in reality personal predictions about the future are often optimistically biased. In this paper, we attempted to replicate their research and share the study design we adopted, the methods we applied, as well as the results we got at the end and our commentaries on certain aspects of the study, its first replication by van’t Veer et al. and our own replication.
accuracy, bias, open science framework, optimism, pessimism, prescription, questionnaire, replication, research
People tend to often make optimistically biased predictions. This rather simple observation can have important implications for psychologists, economists and decision-making theorists interested in rationality and the accuracy of self-knowledge (Armor & Taylor, 2002; Krizan & Windschitl, 2007; Sweeny, Carroll, & Shepperd, 2006). Nevertheless, the logical assumption is usually that people’s desire to be accurate when making personal predictions would take prevalence over optimistic or pessimistic prescriptions.
Surprisingly enough, this untested assumption that people’s main goal is to be accurate is conflicted by observations. This metacognitive phenomenon was tested by Armor, Massey and Sackett in 2008. Their study found that participants (n = 127) clearly recommended optimistic predictions, t(124) = 10.36, prep > 0.99, p < .001, Cohen’s d = 0.93 (Armor et al., 2008). These findings as well as those of its replication by van’t Veer, Lassetter, Brandt and Mehta in 2013 support the view that people believe optimistically biased predictions are ideal. Like the original study and its replication, we try to explore what kinds of predictions (accurate, optimistic or pessimistic) people ought to make.
For the experiment, we conducted a survey with the same material as the original study ; a questionnaire. We took the texts as provided in the replication material .
Participants are presented with a scenario, which is chosen at random from a pool of four (4) random scenarios:
- Award decision: Lisa is nominated for an award.
- Investment decision: Jane has received an inheritance.
- Party decision: Joe is organizing a party.
- Surgery decision: Mr. C. has been diagnosed with a heart condition.
See fig. 4, to see a table of the distribution of the scenarios.
For each of these scenarios, there are eight (8) vignettes, which could be described as variations on the scenario. In each vignette, changes are applied to the wording of the question in order to manipulate three (3) variables:
For example, in the case of manipulated agency, a version of the investment scenario would have Jane be able to choose in what to invest, and in another version, the executor of the estate would be responsible for investing the inheritance.
Before going through the survey, participants answered a Life Orientation Test (LOT) which allows to determine whether they self-identify as optimistic, pessimistic, or neutral.
Participants then read through each of the eight (8) vignettes of the random scenario which they have been assigned. Then, they answered questions which allow to determine what they believed is the best course of action in different contexts:
- Objectively: The question is framed so that it asks what is best to do in general, in life, given such a situation.
- For the protagonist: Questions ask what the protagonist should do, given the situation described in the vignette.
- For themselves: In this case, participants must answer what they would do if they were in the same situation.
At the end of the survey, participants responded to a few demographics questions.
It is important to note that in our replication—like in the one by van’t Veer et al. —the main focus was only about replicating the part of optimism prescription, reflected by reading the scenario and providing a prescription of how pessimistic, accurate or optimistic the protagonist of the scenario should be. Along with a robustness check for the three independent variables. We did not try to replicate the descriptive benchmarks about what kind of prediction the protagonist in each vignette would make or what kinds of predictions the participants themselves would make (reflected by two out of the five questions in each vignette).
Unfortunately when conducting the survey we failed to omit these two questions, resulting in a longer questionnaire than actually needed, which could in turn might have further contributed to participants getting a boring and rather repetitive feeling while filling out the questionnaire. This in part might have disturbed their focus and how representative their answers were.
2.1 Planned Sampling
The original study of Armor et al. used a rather large sample of 127 participants for the optimism prescription. In the replication of van’t Veer et al. , the authors’ power analysis indicates that one would need a sample size of thirty-eight (38) people to replicate the study assuming a high level of power and a small effect size compared to the original . Authors of the replication study also recommended an addition of at least ten (10) people to serve as a “cushion” in case there is missing data, hence having a final number of forty-eight (48) participants as a lower limit for the planned sample size.
In our replication, we managed to acquire forty-two (42) participants in our questionnaire and we decided to take into account the data provided by forty (40) of them, since two participants took an extraordinarily short time (less than four minutes) to complete the survey. Needless to say, we were hoping to acquire a larger sample so that we could infer conclusions for the population of Catalonia. Still, taking into account the power analysis of the first replication, our sample covers a bit more than thirty-eight (38) people and our improvised “cushion” of four people turned out to be enough.
While our project aimed to reproduce the original study, and its replication, we were also tasked to adapt some aspects. We noticed the original study and its replication mostly had university students—many of them in elite universities—as participants. As such, we thought it could stand to reason to believe that they would likely be optimistic not matter the context of the general population.
Universitat Pompeu Fabra (UPF), where our replication study was conducted, is a public university in Barcelona. Spain is also in a different socio-economic context than the United States, or the Netherlands. We originally thought to investigate under this aspect to see if there could be indicators of a difference, but we quickly ran into the complex issue of how it would be possible for us to compare such contexts without resorting to generalities and approximations.
Instead, we chose to expand our research by offering translations (see the 2.3 Procedure section), and by expanding the demographics questions. Here are a few interesting examples:
- The gender question in both the original study and its replication offered binary options for “male”/“female”. We opted to add the option “other,” which was indeed chosen by a few people (see fig. 6).
- We also expanded the age range of participants as much as possible in order to see if there could be a change in participants’ optimism/pessimism that we could relate to their age.
- We changed the question about what is the highest level of education the participant completed. The replication material had language that was very American-oriented (“college,” “freshman,” “sophomore”), so we adapted to something more local. Since we intended to widen our participants’ age range, we also thought that it could be possible that there would be varied types of education levels, so we included elementary school (“escuela primaria”), high school (“bachillerato”), and the different levels of university (Bachelor’s degree, Master’s degree, PhD).
- We also asked about the income range of the household of the participants, and the number of people living in said households.
Where the original study and its replication asked participants to complete the study in a paper-and-pencil format, we opted to create a web-based questionnaire. Our reasoning was supported by a few arguments:
- Avoiding manual data entry: Before designing our experiment, we were hesitating between using paper questionnaires, which meant manual data entry to a database afterwards, or a software that would save to the database directly. We chose the latter option, and then developed the software. The code of the application has been open-sourced and is publicly available for review .
- Easier and cheaper to fix issues: Working on an online application would also allow us to improve and correct issues and along the way as they occurred, to no additional price. If we were to go the paper way, we would have to discard the copies made in the case of an error, and pay to have them reprinted.
- Easier to convince participants: The original study had people complete the study in a laboratory. We did not have access to this kind of facility, and we knew this would be time consuming for participants, thus a likely deterrent.
Since our replication took place a year almost to the day after the Catalan independence referendum of 2017, we opted to translate the texts in both in Spanish and in Catalan to reduce any potential friction due to political beliefs. Pol Ricart translated all questions, responses, and options from the original English text.
In order to invite participants, we created small paper handouts that had a short paragraph in all three languages and a URL to the online questionnaire. These were mostly given out to students at the Universitat Pompeu Fabra (UPF) Poblenou Campus, but also on the Ciutadella Campus.
Moreover, since one of expansions on the original study was to see if different contexts—socio-economics, age, education—could affect the participants’ responses, we would also invite participants from cafés and other locations around the city. Pol Ricart also shared the link with his extended family, in the hopes of getting participants of a wider age range. The other researchers did not recruit their friends and family members because they were not living in Spain, thus were outside of the scope of the research.
2.4 Details About the Questionnaire Web Application
As mentioned above, we opted to build an application for our questionnaire. We originally investigated online services such as SoSci survey and Google forms, unfortunately, at one point or another, these service lacked a flexibility that we desired
(e.g. translations, randomization, control over data format, privacy, etc.).
As mentioned above, the source code for the application is available for review and replication from a repository on Github .
3.1 Some statistics
Before we look at the results from our tests, we believe it is interesting to mention a few statistics concerning the participants. It is worth mentioning that in order to respect the privacy of our participants, we did not collect any data that could link their devices to the demographic information they provided us. We ensured their anonymity as much as possible, even though we were asking potentially sensitive information.
As we explain further in the 3.1.2 Duration section, there were participants responses that we should discard from our optimism significance tests. However, we chose to keep their responses for analyzing general statistics.
3.1.1 Drop Rate
We were tracking the number of questionnaires started by adding an entry to a database when participants clicked on the “Begin” button. By also committing to the database another entry for when participants clicked on the “Finish” button. This allowed us to discover that there was a drop rate of 42.5% (see fig. 1).
While it could mean that only forty-two (42) of the seventy-three (73) participants that started the questionnaire actually completed it, we believe—based on the various comments and feedback given by participants—that it could also mean that some participants started, abandoned, and then started at another point later that day, or at another time.
When saving the beginning and the end of the questionnaire, we were also committing the associated timestamps to the database. Thanks to these, we were able to obtain some numbers on how much time participants took to complete our survey. A few items are worth mentioning before moving forward:
- Two (2) participants completed the questionnaire in less than five (5) minutes. With eight (8) vignettes which had six (6) questions each, as well as a Life Orientation Test (LOT), and demographics questions, we believe it is unlikely to complete the questionnaire in such a short time. It is more likely that those two people plowed through the survey just to complete it. As such, we should discard their data from our test.
- Another participant took over three hours and a half (3h30+) to complete the questionnaire. While the survey asked quite a few questions, we do not think it really took the participant such a long time to proceed through. Instead, it is likely that this participant had a browser tab open at the address of the survey, and was either working or doing some other tasks at the same time.
If we remove the outliers mentioned in the bullet points above, we obtain these values:
- Mean time taken: ~21:20 minutes;
- Median time taken: ~14:45 minutes.
See fig. 2 and fig. 3, to compare the time taken by participants to complete the questionnaire.
3.1.3 Scenario Distribution
After completing the Life Orientation Test (LOT), participants were assigned one of four scenarios randomly. Fig. 4 shows that the scenario distribution was mostly equal, with a variation of ±6%.
3.1.4 Age Distribution
As we mentioned in the 2.2 Material section, our experiment differs from the original in that we chose to expand the age range of our participants. Fig. 5, shows that while the majority of our respondents were in their twenties, we were able to obtain a few older participants.
It is likely that we were unable to obtain more older people to participate due to the fact that most of us did not speak Spanish enough, and no Catalan. As such, it was easier to convince university students of the interest and intent of our research, since many of them spoke English.
3.1.5 Gender Distribution
Our idea of adding another option (“other”) to the genders bore fruit: 3 people (7.1%) chose that option (see fig. 6).
3.1.6 Language Distribution
On the online questionnaire itself, buttons allowed users to swap languages at any moment during the questionnaire. Nevertheless, the vast majority of participants (~71.4%) opted to respond in English (see fig. 7).
3.1.7 Education Level Distribution
A little bit less than half (47.6%) of our respondents were studying at the time they participated in our experiment. Most of these participants had a bachelor’s degree or were currently studying for their Master’s degree (see fig. 8).
3.1.8 Household Income and Size
Our data shows that most of our participants household income range from 0 to €40,000 per annum (see fig. 9). At the same time, our data also shows that most of our respondents share their household, since most of them live in households of two (2) or three (3) people (see fig. 10).
We acknowledge that a bug in the demographics questions page allowed users to choose a household size of zero (0). A single user chose that option, and it is impossible for us to know if the user forgot to fill that field, and continued since there was no error message, or if that user chose willfully not to provide a valid response.
3.2 Significance Tests
Even though our sample size was greater than n = 30, the median optimism score was greater than the mean, thus our sample distribution does not show symmetry, and follows a left-skewed distribution (see fig. 11). Therefore, we performed a preliminary test for normality with an alpha level of 5%, which indicated that our sample distribution was significantly different than the normal distribution (Shapiro-Wilk Test: p-value = 0.0152 < 0.05). We then resorted to using purely non-parametric statistical tests to analyze our data.
We performed a one-sample Wilcoxon test to compare optimism prescriptions ranging from -4 (extremely pessimistic), through 0 (accurate), to +4 (extremely optimistic) against a mean of 0 (accurate responses). Our test scores show that our sample estimation of the mean (M = 1.13) is statistically significant (p-value = 0.00996). Hence, we can assume that people’s predictions tend to be optimistic rather than accurate. Furthermore, our mean estimate corresponds fully to the one of the original study.
3.2.1 Further testing
As a next step, we decided to explore if there was any relationship between age and optimism for the prescriptions of the participants. We conducted a non-parametric Spearman Correlation test which indicated a slight negative correlation (r = -0.169, see fig. 12); with the advancement of age, people tend to be less optimistic. Unfortunately, the test did not show statistical significance (p-value = 0.29648).
We also wished to explore age and optimism. We divided our sample data in two groups, using the age of thirty (30) years old as a threshold. We obtained a group with twenty-six (26) people younger than the threshold, and another group of fourteen (14) people (see fig. 13 and fig. 14). First, we performed a one-sample Wilcoxon test for each group against the population mean (μ = 0). The optimism score οf the younger group was significantly different than the population mean (p-value = 0.00327 < 0.05). On the other hand, the optimism score of the older group was not significantly different than the population mean (p-value = 0.11995 > 0.05), suggesting that the older group prescribed more accuracy than the younger one, in accordance with our weak negative linear correlation between age and optimism. However, the Mann–Whitney U test score showed that there is no statistically significant difference between the optimism prescriptions of the younger and older group
(U-stat = 174.0, p-value = 0.41363).
Furthermore, we wanted to see if there would be any differences in the personal predictions between male and female participants (see fig. 15). We conducted another Mann–Whitney U test, but again failed to achieve statistical significance (U-stat = 162.5, p-value = 0.40094). Interestingly enough, since we included a gender option of “other”—besides “male” and “female”—in our demographics questions, we are able to see a trend in our sample. Although there were only three (3) people who identified themselves as “other,” we can clearly see that their responses (see fig. 16) differ from the other two gender groups (mean score = -0.67, median score = -1.0). This is an interesting discovery, despite the small number of people, and is definitely a legitimate ground for further research.
In addition, since the original study also explores this, we decided to see if people who self-identified as pessimists in the Life Orientation Test-Revised (LOT-R) would also prescribe optimism. We identified eleven people as “pessimists” in the LOT-R—people who scored lower than the neutral answer (3) of the test’s scale and compared their personal predictions (mean = 1.36, median = 1.0) against 0 with a Wilcoxon Test. It turns out our results were statistically significant (p-value = 0.01644), pointing out that even people who think of themselves as pessimists also prescribe optimism throughout the different scenarios, corresponding to the original study results.
Moreover, we identified the optimists group (participants who had a mean score in the LOT-R greater than 3) in order to compare the prescribed optimism score between the two groups (pessimists and optimists). The score of the optimists was significantly different than the population mean (one sample Wilcoxon test: p-value = 0.01001 < 0.05). Comparing the scores of the pessimists and the optimists, we see that there is no significant difference between the two groups (Mann Whitney test: p-value = 0.32421; Kruskal test: p-value = 0.63212).
Finally, we wanted to explore further correlations, therefore we compared the prescribed optimism score of the participants who were currently studying with those who were not studying, as well as the score of the people who were working with that of people who were not working. For both cases, we performed again a one-sample Wilcoxon test against the population mean, following by the Mann Whitney and the Kruskal test to check for differences between the groups.
The optimism score (mean = 0.84, median = 1.0) of the participants who are studying (n = 19) is not significantly different from the population mean (one-sample Wilcoxon test: p-value = 0.10199 > 0.05). On the other hand, the optimism score (mean = 1.38, median = 2.0) of the participants who are not studying (n = 21) is significantly different than the population mean (one-sample Wilcoxon test: p-value = 0.00176 < 0.05). According to our results, participants who are currently studying actually prescribed more accuracy than optimism. However, when we compared the optimism score of the two groups, we did not find any significant difference, as the p-value of both the Mann Whitney test (p-value = 0.17954) and the Kruskal test (p-value = 0.35184) is greater than 0.05.
As for the working and not working group comparison, the prescribed optimism score (mean = 1.19, median = 2.0) of the participants who are working (n = 32) is significantly different than the population mean (one-sample Wilcoxon test: p-value = 0.00257 < 0.05). However, the optimism score (mean = 0.88, median = 1.0) of the participants who are not working (n = 8) is not significantly different than the population mean (one-sample Wilcoxon test: p-value = 0.16701 > 0.05). This result suggests that participants who are currently not working prescribed more accuracy than optimism, although the sample size is very small and both the Mann Whitney and the Kruskal tests showed no significant difference between the scores of the two groups (Mann Whitney test: p-value = 0.27765; Kruskal test: p-value = 0.54373).
3.2.2 Robustness Check
An essential aspect of the original study  was implementing three independent variables into the questionnaire to see if and how they influence personal predictions:
- Commitment: whether the decision to engage in a particular action has or has not been made;
- Agency: whether the decision to commit was, or will be, made by the protagonist or by another person;
- Control: the degree to which the protagonist can influence the predicted outcome.
The assumption was that these manipulations would have an effect on personal predictions of people, and that optimistic prescriptions would be higher when the decision to act was already made (high commitment), when the decision to act was, or will be made by the main protagonist (high agency), and when control of the unfolding events was within the protagonist (high control). The original study showed that these assumptions turn out to be true and statistically significant.
To replicate these results, we conducted three separate dependent samples one-tail Wilcoxon tests for each independent variable, comparing optimism prescriptions when the corresponding independent variable was set to “low” against it being set to “high.”
With alpha = 0.05, the tests results, with the exception of agency, give us enough evidence to assume that there is a significant positive difference between scenarios, where commitment is already undertaken, and where control is high compared to scenarios without commitment and lack of control over future events:
- Results for commitment: statistic = 997.0, p-value = 0.0000463;
- Results for agency: statistic = 1277.0, p-value = 0.1313264;
- Results for control: statistic = 1007.0, p-value = 0.0003727.
To ensure that the participants actually perceived the differences in commitment, agency, and control throughout the text in the vignettes, the questionnaire included three control questions in each vignette. Our logic was the following: answers of the control questions divided in two (for the “low-scenario” and the “high-scenario”), then we would draw the means to see if they correspond with the actual vignette setting.
For example, if the vignette implies low control, we should see the mean of the answers to the questions to be equal or lower than 33% or ideally 25% (or in the case of the vignette settings with high control, we would want a mean higher than say 66%, or ideally 75%), otherwise we cannot be sure if the participants have understood the vignette setting and thus our conclusions about the effects of the independent variables would be invalid.
- Low commitment mean: 40.8125;
- High commitment mean: 69.1875;
- Low agency mean: 46.3125;
- High agency mean: 50.75;
- Low control mean: 55.625;
- High control mean: 64.3125.
Unfortunately, we can clearly see that the mean scores differ from our rationale, and we cannot assume that the manipulations of the three independent variables had a valid effect on the personal predictions of the participants.
4.1 Participants Reactions
As we can see from the statistics, quite a few people did not complete the questionnaire (see fig. 1). Those who were more familiar with us mentioned that the survey was quite boring, since all the questions looked alike. A few of them believed that the questionnaire was broken because they could not tell the questions apart. In short, they were telling us that they did not pay attention to the questions and missed the subtleties mentioned above (see the 2. Methods section)
Other participants noted that while the number of the question changed, it was impossible to understand how far along in the questionnaire they were. They suggested to add a progress bar, so that other participants could quickly see their progress. This feature was added shortly after the survey started.
4.2 Possible Improvements for Future Iterations
While we believe our experiment went well, without major incidents, there is still space for improvements. We will discuss below a few ideas and corrections that we believe could be of use.
First, we chose to observe the same data as the replication project , meaning that we only wanted to explore what kind of optimism or pessimism our participants would prescribe to the protagonists. We understood a bit late that two (2) of the questions per vignette were related to other parts of the original research . As such, we believe that if the experiment were to be conducted another time, it would be wiser to remove those two additional questions. This would save a substantial amount of time to the participants, as that would remove sixteen (16) questions.
A few times, participants told us that they wanted to respond with the default value of the Likert scale (see fig. 20), but they were surprised that they obtained an error message when trying to move to the next question. In the current state of the application, all input fields are mandatory, to ensure that users do answer each question. Participants only had to click on the handle of the range scale, and their answer would be set.
However, it seems this wasn’t clearly conveyed to users by the interface alone. It would be interesting to conduct A/B tests with different layouts to discover what would be the best way to indicate to the users how easy it is to actually set the value: Should a small paragraph in the introduction indicate how to proceed? Should this paragraph be in the error message, so participants obtain this only once? Is there another way to inform respondents, and reduce that friction? We do not have an answer at this time, but we believe it is worth improving.
Running through the questionnaire ourselves in debug mode lead us to discover that while the eight (8) vignettes were presented in a randomized order—as designed—to participants, the six (6) questions associated with each vignette seem not to be presented in a randomized order. While this is not a major issue, it could have been a contributing factor to why participants believed they were answering the same question multiple times.
Since the application source code is hosted on Github , we created items to be fixed in the issue tracking system of the code repository.
Finally, we think it may be worth rewriting the questions in a clearer manner. Most participants believed they were answering the same question over and over, they were not all able to decipher the small textual variations. This could be an issue, since these variations are exactly what this research is using to judge the level of optimism of participants. This redesign would require some thought or A/B testing to uncover what way would be ideal to ensure respondents get what the variations are, and how to respond accordingly. Would it be necessary to be overly clear and mention the differences out right? How would this be presented to the users: a table, emphasis on text, etc.?
5.1 Summary of Replication Attempt
Overall, the aspects of the effect found in the original experiment  were replicated in our experiment.
In our sample, participants displayed a tendency to prescribe (or recommend) optimism, rather than pessimism or even accurate predictions. The modal prescription was moderately optimistic (+2 on the scale) which was recommended almost one third of the time (30.3%), whereas the accurate prescription (0 on the scale) was chosen 12.8% of the times. Like the first replication , based on the statistical significance of the test and the appropriate size of the sample, we could safely say that the effect here replicates the primary effect found by Armor et. al .
The robustness check for the assumed effects of three independent variables (commitment, agency and control) on the dependent variable (personal predictions) showed statistical significance for commitment and control, but not for agency. Unfortunately the participants’ answers to the control questions integrated in the survey did not ensure they grasped the differences in the different vignette situations, so we could not make valid conclusions and thus replicate this part of the original study.
Finally, consistent with the original explorations, we observed that even people who identify themselves as pessimists give rather optimistic prescriptions throughout the different scenarios.
Given the time and resources we had at our disposal, we decided to see if there were any other relations between personal predictions and some further variables such as age and gender. Although we did not find any significant differences or correlations, it is certainly interesting to make further exploration as to what else might drive people predictions in an optimistic or a pessimistic direction. We should point out though that acquiring a larger sample and applying a more refined sampling method, instead of “opportunity sampling” as in our case, could produce a better overview regarding such relations.
A good argument for further research could definitely be made regarding the people who had identified themselves “other” in the gender options, since the scores of their group, despite its small size, differed significantly from the rest.
- (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. doi:10.1126/science.aac4716
- Armor, D. A., Massey, C., & Sackett, A. M. (2008). Prescribed Optimism: Is It Right to Be Wrong About the Future? Psychological Science, 19(4), 329–331. doi:10.1111/j.1467-9280.2008.02089.x
- Janson Blanchet, M. (2018). Prescribed Optimism Questionnaire, GitHub repository, https://github.com/jansensan/prescribed-optimism-questionnaire
- Janson Blanchet, M. (2018, December 11). Prescribed optimism replication report presentation. Retrieved December 18, 2018, from https://academia.jansensan.net/30845/prescribed-optimism-replication-report-presentation/
- Janson Blanchet, M. (2018). Questionnaire. Retrieved December 26, 2018, from https://projects.jansensan.net/questionnaire/
- Krizan, Z., & Windschitl, P. D. (2007). The influence of outcome desirability on optimism. Psychological Bulletin, 133(1), 95–121. doi:10.1037/0033-2909.133.1.95
- Sweeny, K., Carroll, P.J., & Shepperd, J.A. (2006). Is optimism always best? Future outlooks and preparedness. Current Directions in Psychological Science, 15, 302–306.
- Van’t Veer, A., Lassetter, B., Brandt, M. J., & Mehta, P. H. (2013, October 10). Replication of Armor, Massey & Sackett (2008, PS, Study 1). Retrieved December 18, 2018, from https://osf.io/qlzap/