Global versus task-speci ﬁ c postoperative feedback in surgical procedure learning

Background: Task-speci ﬁ c checklists and global rating scales are both recommended assessment tools to provide constructive feedback on surgical performance. This study evaluated the most effective feedback tool by comparing the effects of the Observational Clinical Human Reliability Analysis (OCHRA) and the Objective Structured Assessment of Technical Skills (OSATS) on surgical performance in relation to the visual-spatial ability of the learners. Methods: In a randomized controlled trial, medical students were allocated to either the OCHRA ( n ¼ 25) or OSATS ( n ¼ 25) feedback group. Visual-spatial ability was measured by a Mental Rotation Test. Par- ticipants performed an open inguinal hernia repair procedure on a simulation model twice. Feedback was provided after the ﬁ rst procedure. Improvement in performance was evaluated blindly using a global rating scale (performance score) and hand-motion analysis (time and path length). Results: Mean improvement in performance score was not signi ﬁ cantly different between the OCHRA and OSATS feedback groups ( P ¼ .100). However, mean improvement in time (371.0 ± 223.4 vs 274.6 ± 341.6; P ¼ .027) and path length (53.5 ± 42.4 vs 34.7 ± 39.0; P ¼ .046) was signi ﬁ cantly greater in the OCHRA feedback group. When strati ﬁ ed by mental rotation test scores, the greater improvement in time ( P ¼ .032) and path length ( P ¼ .053) was observed only among individuals with low visual-spatial abilities. Conclusion: A task-speci ﬁ c (OCHRA) feedback is more effective in improving surgical skills in terms of time and path length in novices compared to a global rating scale (OSATS). The effects of a task-speci ﬁ c feedback are present mostly in individuals with lower visual-spatial abilities. (http://creativecommons.org/licenses/by/4.0/). and


Introduction
Feedback has long been recognized for its positive effect in surgical knowledge and skills training. 1 It has been shown to be crucial in technical skill development because it increases motivation, prevents incorrect actions, and reinforces correct actions. 2e3 Feedback can be provided based on direct observation of technical skills. 4 Within the surgical field, different observational assessment tools are available. 5 Assessment tools assess surgical performance on competences, skills, or surgical-specific items on a checklist. These tools can be used as a medium for feedback to provide information regarding a trainee's performance to improve on specific items that are being assessed. 1,5 Two main types of assessment tools can be recognized: global rating scales, which rate general surgical skills and are applicable to all surgical procedures, or procedure-specific checklists. 5 In both categories, many tools have been developed and validated. 4e5 A commonly used and generally accepted as "gold standard" assessment tool is Objective Structured Assessment of Technical Skills (OSATS), a global rating scale introduced by Martin et al for assessing technical skills of an entire surgical procedure. 5e6 OSATS is a reliable, validated tool that assesses 7 competencies on a 5point Likert scale. 6 It is feasible and effective in assessment of surgical skills of trainees in the operating room. 7 Although global rating scales such as the OSATS are easy in use, these scales can be imprecise. 4 A task-specific method may provide more concise and precise feedback. 4 A task-specific technical skills assessment method is the Observational Clinical Human Reliability Analysis (OCHRA). 8 An OCHRA checklist assesses in a stepwise manner whether a substep was correct or incorrect. 8 Both OSATS and OCHRA assessment tools have shown to be valid for providing constructive feedback. 4,7 However, according to constructive alignment theory, the OCHRA feedback might be more effective when the surgical procedure is also learned in a stepwise manner. 9 Although the validity of OSATS and OCHRA is demonstrated, these assessment tools are still based on individual judgments, which are inevitably associated with subjectivity. 10 Quantifying measures of technical skills may potentially mitigate this subjectivity. For open surgery, different motion tracking devices are described to measure either hand or instrument movements. 11e14 The outcomes of time to complete a task and total path length can differentiate between novices and experts. 13e15 Additionally, the effect of feedback in relation to visual-spatial ability, as another determining factor for technical skills development, is unrecognized. Visual-spatial ability is defined as the ability that allows individuals to construct visual-spatial (ie, 3dimensional) mental representations of 2D images and to mentally manipulate these representations. 16,17 This ability determines how well individuals are able to translate the acquired anatomical knowledge into clinical and surgical practice. Consequently, visual-spatial ability determines how well surgical residents are able to understand and perform spatially complex procedures. The positive association between visual-spatial ability and acquisition of surgical skills, including quality of hand motion, has been observed especially in the early phases of surgical training. 15,18e20 Moreover, visual-spatial ability can have a modifying effect on outcomes. Individuals with lower visual-spatial abilities tend to perform worse than individuals with high visual-spatial abilities on acquisition of anatomical knowledge and surgical skills. However, with supportive instructional methods and deliberate practice and feedback they are able to achieve a comparable level of competency. 15,21e23 The aim of this study was to investigate whether a task-specific, stepwise feedback checklist (OCHRA) leads to a greater improvement in performance of a surgical procedure compared to a global rating scale method (OSATS) in terms of improvement of overall performance score, time to complete task, and total path length. These outcomes were also evaluated in relation to learners' visualspatial ability.

Study design and population
A randomized controlled trial was conducted at the Leiden University Medical Center, The Netherlands. Participants were medical students and novices to almost any type of surgical procedures. Only right-handed students were included because lefthanded novice students may have difficulties with the surgical instruments. 24 Participation was voluntary, and written consent was obtained from all participants. The study protocol was approved by the Netherlands Association for Medical Education Ethical Review Board (NERB dossier number: 1013) (Fig 1).

Randomization
Participants were randomly allocated to either the OCHRA feedback (n ¼ 25) or OSATS feedback group (n ¼ 25) using an Excel random group generator.

Surgical procedure
The Lichtenstein open inguinal hernia repair was chosen as a procedure containing multiple surgical steps and because of its spatial complexity, which requires a certain level of surgical anatomical knowledge and visual-spatial ability of the learner. The first part of the surgery, until resecting the hernia sac, requires solely basic surgical skills such as incising, dissecting, and ligating. The second part, the placement and fixation of the mesh, is more complex. Each participant performed the Lichtenstein open inguinal hernia repair 2 times on a validated simulation model. 25 Participants were given access to the online course 1 week before the experiment to prepare for the experiment. The course consisted of 3 components: an introductory description that included text and figures regarding the surgical anatomy, a stepwise textual description, and a video demonstration of the procedure on the identical model used during the experiment (Supplementary  Table S1). 26 The video demonstration depicted all important steps that need to be undertaken during surgery. Video was accompanied by auditory explanation. Participants were able to retrieve the materials as many times as they wanted and were able to do it on their own pace.
On the day of experiment, participants were given 30 minutes to complete each procedure. 27 The second procedure was performed directly after the provided feedback on the first procedure. Both procedures were recorded on video for blinded assessment. Participants were wearing a right-hand glove for the recording of motion by a motion tracking device (PST Base, PS-Tech B.V., Amsterdam, The Netherlands).

Demographic questionnaire
The questionnaire was administered before the experiment to account for factors that could possibly influence the performance (Supplementary Table S1). In a previous study, the time students studied for the open inguinal hernia repair with the use of a video demonstration had a significant modifying effect on surgical performance. 27 Therefore, study time was included in the questionnaire and was accounted for in the data analysis.

Visual-spatial ability
Visual-spatial ability was measured by the mental rotation test (MRT) before the experiment. The MRT is a validated 24-item psychometric test and is the gold standard in assessing visualspatial ability in anatomical and surgical education. 19,28e30 Participants were given 10 minutes to complete the test. The maximum possible score for the test was 24 points.

Interventions
In the OCHRA feedback group, postoperative feedback was provided using OCHRA. The OCHRA checklist is a reliable and valid instrument that has been successfully used in assessment of performance in various surgical procedures. 8,31e33 It is a procedurespecific step-by-step skills assessment checklist that is characterized by a breakdown of a procedure into tasks. 26 Each step is assessed for being performed correctly and if errors are being made during the particular step. Provided feedback was based on the evaluation of each performed procedural step (Supplementary  Table S2). If a particular step was performed incorrectly, the error was discussed and a proper execution of the step was explained. No points or final scores were awarded for the performance.
In the OSATS feedback group, postoperative feedback was provided using the OSATS assessment tool (Supplementary Table S3). OSATS is a 7-item global rating scale that focuses on the following overall competencies: (1) respect for tissue, (2) time and motion, (3) instrument handling, (4) knowledge of instruments, (5) use of assistance, (6) flow of operation, and (7) knowledge of procedure. 6 The tool has been previously validated in a wide range of surgical procedures and disciplines with reasonable index of reliability. 6,34,35 Provided feedback was based on the evaluation of each of the 7 competencies in the exact order of OSATS. Suboptimal performance and errors made within a competence were discussed based on an example followed by an explanation for the improvement. No points or final scores were awarded for the performance to avoid any bias that could be introduced by grading the performance during the feedback phase.
In both groups, feedback was provided immediately after performing the first procedure. The total feedback time was held constant in both conditions and was approximately 10 minutes. Feedback was provided by 1 of the 2 researchers who were trained in providing both types of feedback in the context of this experiment. Care was taken to ensure that the feedback was complete and that participants were able to ask questions and verify whether they understood the information properly.

Performance score
Video-recorded procedures were assessed blindly by 2 independent researchers using OSATS, as the most common assessment tool for surgical performance. A minimum of 1 and a maximum of 5 points could be awarded for each of the 7 competences. A maximum possible performance score for each procedure was 35 points. Both researchers were trained in assessment of recorded procedures. Training was facilitated by a surgeon who is an expert in this field. It included a comprehensive study of the procedure using the provided study material followed by execution of the procedure on the model themselves. After that, researchers were trained in assessment until they got sufficiently familiar with all aspects of OSATS. The actual assessment of recorded procedures was performed independently. In case of discrepancies, consensus was reached by re-evaluating the procedure. Additionally, 5% of procedures were randomly selected and assessed by the expert to detect any discrepancies in scoring. No differences in ratings were identified.

Motion tracking
Motion tracking analysis was performed using a combination of a commercially available optical tracker system (PST Base, PS-Tech B.V., Amsterdam, The Netherlands) and a customized glove for the dominant right hand. This could track 6 degrees of freedom position in Cartesian coordinates (X, Y, and Z axis) at a rate of 30 samples per second. Time to complete the task and path length were measured. These have shown to be excellent markers of surgical performance. 11,36e38 Because not all participants were able to complete the procedure within 30 minutes, the completion of the step of hernia sac removal was chosen as the endpoint for the outcomes of motion tracking analysis.

Outcomes
The study outcomes were defined as the differences in mean improvement in performance score (as measured by the OSATS assessment tool; time (in seconds) and path length (in meters) between the first and the second procedure between 2 groups. Outcomes were stratified by MRT scores. Individuals who scored below the mean were assigned to the MRT-low group (n ¼ 22). Students who scored above the mean were assigned to the MRThigh group (n ¼ 28).

Statistical analysis
Because of the novelty of this study, no previous data were available to calculate the sample size. A sample size of 50 participants was assumed to be appropriate. Participants' baseline characteristics were summarized using descriptive statistics. Differences in baseline measurements were assessed with an independent t test for differences in means and c 2 test for differences in proportions. The differences in mean performance scores of the first procedure between groups were assessed with an independent t test. The improvement between second and first procedure within a group was assessed with a paired t test. The difference in mean improvement (D) in performance scores between second and first procedure between groups were assessed with a 1-way ANCOVA. DPerformance score was included as dependent variable, intervention group and study time as fixed factor (0e1 vs 1e2 vs 2e3 hours), and performance score on the first procedure and MRT score as covariates. Additionally, the outcomes were stratified by MRT score to evaluate the effect of intervention for different levels of visual-spatial ability. The analyses were repeated for mean improvement in time (D time) and path length (D path length). Partial eta squared was calculated and used as an effect size (0.2 ¼ small effect, 0.5 ¼ moderate effect 0.8 ¼ large effect). Analyses were performed using SPSS statistical software package version 25.0 for Windows (IBM Corp, Armonk, NY). Statistical significance was determined at the level of P < .05.

Results
A total of 50 medical students was included. There were no significant differences between groups on baseline characteristics, as shown in Table I.
Both groups improved significantly in terms of total OSATS score, time, and path length between the first and second time of performing the procedure (Table II). Since not all participants were able to complete the procedure within 30 minutes, the completion of the step of hernia sac removal was chosen as the endpoint for the outcome measures time (s) and path length (m). This step was performed by 42 (84%) of participants. Path length data of 5 of the participants was lacking due to technical issues.

Effect of visual-spatial ability
When outcomes were stratified by MRT scores, the greater improvement in time in the OCHRA feedback group was observed only among individuals with lower visual-spatial abilities (b ¼ e220.2; 95% CI [e418.4 to e22,1]; h2 ¼ 0.26; P ¼ .032) (Fig 2). As shown in Figure 3, a similar trajectory was observed for the improvement in path length. However, this difference did not reach the significance level (b ¼ e28. 2

Discussion
The aim of this study was to investigate whether a task-specific, stepwise feedback checklist (OCHRA) leads to a greater improvement in surgical performance compared to a global rating scale feedback method (OSATS). The outcomes were evaluated in relation to visual-spatial ability. The mean improvement in performance scores was not significantly different between the OCHRA and OSATS feedback groups. However, the OCHRA feedback showed a significant improvement on performance in terms of time and path length, as measured by the hand-motion analysis system. The effects of OCHRA feedback were present mainly among individuals with lower visual-spatial abilities. The observed effectiveness of OCHRA feedback on surgical performance in a simplified hernia repair model, as a more precise and concise approach, is supported by the instructional alignment theory. 39 When training and assessment methods are aligned, the effects of instruction are up to 4 times greater than in nonaligned methods. 39 In the current study, participants prepared for the open inguinal hernia repair procedure using a stepwise video demonstration. As OCHRA feedback was based on the evaluation of the subsequent surgical steps instead of competencies as part of the OSATS feedback, a greater alignment between learning and feedback could be achieved. Although this did not result in a difference in outcome in terms of the surgical scores, differences were found for the time and path length. In this study, most participants could not finish the entire surgical procedure within the 30-minute timeframe. Possibly, differences in surgical scores would have been found if students did complete the entire surgical procedure. Additionally, the value of a checklist (OCHRA) and global rating scale (OSATS) assessments may depend on the level of learners' experience. 40 Global rating scales have been reported to be more useful for learners with higher levels of expertise, whereas checklists may be more useful for novice learners, such as the participants in this study. 40,41 The observed modifying effect of visual-spatial ability on time and path length leads to important considerations. First, the findings are in line with previous research reporting positive association between visual-spatial ability and hand motion. 15,42e45 However, by treating visual-spatial ability as a possible effect modifier, this study showed that this association was present only for individuals with lower levels of visual-spatial ability. This effect, also referred to as the aptitude-treatment effect, 46,47 has been repeatedly observed in the research field of anatomical education. 21e23,46 Therefore, it is instrumental to consider possible modifying effects of visual-spatial ability on outcomes when designing new research. Second, the observed differences could be explained by the cognitive load theory. 48 Students with lower visual-spatial abilities are in general less effective in processing new spatial information in their working memory than students with higher visual-spatial abilities. However, in contrast to a global approach, the information from a task-specific stepwise feedback, building up on an already existing stepwise schema of a surgical procedure, could have decreased the cognitive load. 48 Subsequently, more working memory capacity could be created to process new procedural skills among low-performing individuals. This emphasizes the importance of an aptitude-based approach in learning and teaching surgical technical skills to novices. Lastly, the effect of visual-spatial ability on OSATS scores was found to be not significant. This could be because of the inability of most participants to complete the entire procedure within the given timeframe.
OSATS was used both as an intervention and assessment scoring tool in this study. The rationale behind the choice to use the OSATS as an assessment scoring tool is that OSATS is considered to be the "gold standard" assessment tool for surgical performance and one of the few actually used in residency training and research. 5,15 In The Netherlands, OSATS is incorporated within the surgical residency training. 49 Second, a systematic review comparing checklists  with global rating scales as assessment tools reported that global rating scales might be better in capturing nuanced elements of expertise. 40 Other assessment tools for surgical performance, such as the recently reported Surgical Quality Assurance (SQA), could have been an option, and perhaps would have found differences in surgical performance. 50 The timing of feedback is still debated. Xeroulis et al distinguished feedback provided during the task (concurrent feedback) and feedback upon completing the task (summary feedback). 3 The latter was found to be superior for learning basic surgical skills; however, Al Fayyad et al found the opposite. In their study, concurrent (immediate) feedback was perceived as superior in learning basic surgical skills compared to summary (delayed) feedback. 51 In our study, summary feedback was chosen because the students operated on a simulation model without the risk of doing any harm. With an actual patient, a trainee needs guidance from a surgeon using concurrent feedback to avoid harmful errors.

Limitations
This study has several limitations. First, the sample size could not be calculated beforehand due to the novelty of the study aim and design. Although it was sufficient to reveal significant differences in terms of time and path length, the sample size could have been too small to detect significant differences in OSATS scores. Second, not all participants were able to complete the procedure within given 30 minutes. As the step of hernia sac removal was reached by most participants, it was used as the endpoint to ensure a justified comparison in terms of time and path length. Allowing participants to complete the entire procedure would have provided a better display of their performance. Third, the participants were medical students with low and slightly various levels of anatomical knowledge and technical skills, including suturing. Due to random allocation, these differences are expected to have little to no effect on outcomes. Additionally, the mean improvement in outcome measures was chosen instead of the absolute scores to account for those differences. Another limitation is the possible inability to generalize the conclusions to left-handed students, because this study only included right-handed students. Furthermore, these findings cannot be generalized to other procedures outside of inguinal hernia repair. Last, the effect of OCHRA feedback was evaluated in a simulated environment. This study should be repeated among surgical residents with higher levels of anatomical knowledge and technical skills in a clinical setting on multiple procedures.
The findings of this study have implications for both practice and research. In this study, the open inguinal hernia repair was chosen as an exemplary procedure. It is unknown whether an inguinal hernia repair simulation is ideally suited to detect the effects of different types of feedback on study outcomes. The implementation of structured, stepwise feedback that is aligned with the learning activities should be considered especially in the early phases of surgical training. The aligned stepwise instruction using stepwise video demonstrations and procedure-specific OCHRA checklist assessment can be transferred to other surgical procedures. The stepwise segmentation of a surgical procedure can be made using the step-by-step framework. 26 This stepwise description of a surgical procedure can then be used to create a procedure-specific OCHRA checklist. Moreover, an aptitude-based approach in teaching and learning of surgical procedural skills could be of benefit for individuals with lower visual-spatial abilities. As demonstrated, it is crucial to consider the modifying effect of visual-spatial ability on surgical outcomes when setting up new research. In fact, when overall outcomes are not evaluated for different levels of visual-spatial abilities, the real differences may remain unrevealed.
In conclusion, a task-specific, stepwise feedback checklist (OCHRA) proves to be more effective in improving surgical skills, in terms of time and path length, among surgical novices compared to a global rating scale feedback (OSATS). The effects of a task-specific feedback are present mostly in individuals with lower visualspatial abilities.

Funding/Support
None.