






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The JIGSAWS dataset, captured using the da Vinci Surgical System, includes kinematic and manual annotations of surgical gestures and skill levels for eight surgeons performing three elementary surgical tasks. The goal is to improve surgical patient care by studying surgical motion and developing applications for teaching and assessing skillful motion to surgical trainees.
What you will learn
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Yixin Gao^1 , S. Swaroop Vedula^1 , Carol E. Reiley^1 , Narges Ahmidi^1 , Balakrishnan Varadarajan^2 , Henry C. Lin^3 , Lingling Tao^1 , Luca Zappella^4 , Benjam´ın B´ejar^5 , David D. Yuh^6 , Chi Chiung Grace Chen^7 , Ren´e Vidal^1 , Sanjeev Khudanpur^1 and Gregory D. Hager^1 (^1) Johns Hopkins University, Baltimore, MD 21210 (^2) Google Inc., Mountain View, CA 94043 (^3) Intuitive Surgical, Inc., Sunnyvale, CA 94086 (^4) Metaio GmbH, 80335 Munich, Germany (^5) Swiss Federal Institute of Technology in Lausanne, 1015 Lausanne, Switzerland (^6) Yale University School of Medicine, New Haven, CT 06520 (^7) Johns Hopkins Medical Institutions, Baltimore, MD 21224
Abstract. Dexterous surgical activity is of interest to many researchers in human motion modeling. In this paper, we describe a dataset of sur- gical activities and release it for public use. The dataset was captured using the da Vinci Surgical System and consists of kinematic and video from eight surgeons with different levels of skill performing five repe- titions of three elementary surgical tasks on a bench-top model. The tasks, which include suturing, knot-tying and needle-passing, are stan- dard components of most surgical skills training curricula. In addition to kinematic and video data captured from the da Vinci Surgical System, we are also releasing manual annotations of surgical gestures (atomic activity segments), surgical skill using global rating scores, a standard- ized cross-validation experimental setup, and a C++/Matlab toolkits for analyzing surgical gestures using hidden Markov models and using lin- ear dynamical systems. We refer to the dataset as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) to indicate the collabo- ration between Johns Hopkins University (JHU) and Intuitive Surgical Inc. (ISI), Sunnyvale, CA, on collecting these data.
Studying dexterous human motion is important for at least three reasons. First, insights on how humans acquire dexterity in motion can be applied to facili- tate skill acquisition. Second, skill may be objectively assessed using automated technology. Third, dexterous human motion may be partially or completely auto- mated. Partial automation involves human machine collaboration where humans and robots perform separate parts of the task, for example [8]. Surgery involves dexterous human motion. The eventual goal for studying surgical motion is to improve safety and effectiveness of surgical patient care.
Poor surgical technical skill has been shown to be associated with higher post- surgical patient complications including death [3]. Surgical technical errors were the most common reason for post-surgical complications including re-operation and re-admission [9]. Thus, improving how surgeons acquire technical skill can positively impact safety and effectiveness of surgical patient care.
1.1 The Language of Surgery project
We believe that surgical motion is analogous to human language because it is a composition of elementary activities that are sequentially performed with certain constraints. Consequently, surgical motion can be modeled using techniques that have successfully been applied for analyzing human language and speech. We define the Language of Surgery as a systematic description of surgical activities or proceedings in terms of constituents and rules of composition. Based upon [6], the language of surgical motion involves describing specific actions that surgeons perform with their instruments or hands to achieve an intended surgical goal. We study surgical activity as an example of dexterous human motion within the Language of Surgery project at the Johns Hopkins University. The overall goals for the project are to establish archives of surgical motion datasets, proce- dures and protocols to curate and securely store and share the datasets, develop and evaluate models to analyze surgical motion data, develop applications that use our models for teaching and assessing skillful motion to surgical trainees, and conduct research towards human machine collaboration in surgery. Surgical activity data may be considered to encompass several types of vari- ables related to human activity during surgery such as surgical tool motion (kine- matics), video, log of events happening within and beyond the surgical field, and other variables such as surgeon’s posture, speech, or manual annotations. The objective for this paper is to release and describe a dataset we compiled within one of our studies on skilled human motion where surgeons performed elementary tasks on a bench-top model in the laboratory using the da Vinci Surgical System [5] (dVSS; Intuitive Surgical, Inc., Sunnyvale, CA). The surgi- cal tasks included in this dataset are typically part of a surgical skills training curriculum. We refer to the dataset being released as the “JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS)” to indicate that this dataset was collected through a collaborative project between the Johns Hopkins University (JHU) and Intuitive Surgical Inc. (ISI).
1.2 The dVSS and its research interface
Within the Language of Surgery project, several studies have been conducted where we captured surgical activity data using the Application Programming Interface (API) of the dVSS. The dVSS is a tele-robotic surgical system that provides surgeons with enhanced dexterity, precision and control. The dVSS has been widely used to perform minimally invasive procedures in urology, gynecol- ogy, general surgery, and cardiothoracic surgery [5]. The dVSS is comprised of a master-side console with two master tool manipulators (MTMs) that are oper- ated by the surgeon, a patient-side robot with three patient-side manipulators
reported having more than 100 hours, subjects B, G, H, and I reported having fewer than 10 hours, and subjects C and F reported between 10 and 100 hours of robotic surgical experience. All subjects were reportedly right-handed.
2.3 Task repetitions
All subjects repeated each surgical task five times. We refer to each of the five instances of a study task performed by a subject as a “trial”. Each trial is on average about two minutes in duration for all three surgical tasks. We as- signed each trial a unique identifier in the form of “Task UidRep”. For example, “Knot Tying B001” in the dataset files indicates the first repetition of subject ”B” for the knot tying task. The JIGSAWS consists of 39 trials of SU, 36 trials of KT, and 28 trials of NP. The data for the remaining trials (1 for SU, 12 for NP, and 4 for KT) are unusable because of corrupted data recordings.
2.4 Data description
Kinematic data The JIGSAWS consists of three components: kinematic data, video data, and manual annotations. We captured the kinematic data from the dVSS using its API at 30 Hz. The left and right MTMs, and the first and second PSMs (PSM1 and PSM2, also referred as the right and left PSMs in this dataset), are included in the dataset. The motion of each manipulator was described by a local frame attached at the far end of the manipulator using 19 kinematic variables, therefore there are 76 dimensional data to describe the kinematics for all four manipulators listed above. The 19 kinematic variables for each manipulator include Cartesian positions (3 variables, denoted by xyz), a rotation matrix (9 variables, denoted by R), linear velocities (3 variables, denoted by x′y′z′), angular velocities (3 variables, denoted by α′β′γ′, where αβγ are Euler angles), and a gripper angle (denoted by θ). All kinematic variables are represented within a common coordinate system. Table 1 describes the details of the variables included in the kinematic dataset. The kinematic data for the MTMs, PSMs, and the video data were synchronized with the same sampling rate.
Table 1. Kinematic data variables.
Column indices Number of variables Description of variables 1-3 3 Left MTM tool tip position (xyz) 4-12 9 Left MTM tool tip rotation matrix (R) 13-15 3 Left MTM tool tip linear velocity (x′y′z′) 16-18 3 Left MTM tool tip rotational velocity (α′β′γ′) 19 1 Left MTM gripper angle velocity (θ) 20-38 19 Right MTM kinematics 39-41 3 PSM1 tool tip position (xyz) 42-50 9 PSM1 tool tip rotation matrix (R) 51-53 3 PSM1 tool tip linear velocity (x′y′z′) 54-56 3 PSM1 tool tip rotational velocity (α′β′γ′) 57 1 PSM1 gripper angle velocity (θ) 58-76 19 PSM2 kinematics
Video data We captured stereo video from both endoscopic cameras of the dVSS at 30 Hz and at a resolution of 640 x 480. The video and kinematic data were synchronized such that each video frame corresponds to a kinematic data frame captured at the same instant of time. The videos in the dataset being released are saved as AVI files encoded in four character code (FOURCC)
Fig. 2. Screenshots of the corresponding left and right images for a single frame in the suturing task.
format with the DX codec. The video files named “capture1” and “capture2” were recorded from the left and right en- doscopic cameras, respec- tively. Figure 2 shows a snapshot of correspond- ing left and right images for a single frame. The dataset does not include calibration parameters for the two endoscopic cameras.
2.5 Manual annotations
Surgical activity annotation A key feature of the JIGSAWS is the manually annotated ground-truth for atomic surgical activity segments called “gestures” or “surgemes” [6,11]. A surgical gesture is defined as an atomic unit of inten- tional surgical activity resulting in a perceivable and meaningful outcome. We specified a common vocabulary comprised of 15 elements as detailed in table 2, to describe gestures for all three tasks in the dataset through consultation with an experienced cardiac surgeon with an established robotic surgical practice.
Table 2. Gesture vocabulary
Gesture index Gesture description G1 Reaching for needle with right hand G2 Positioning needle G3 Pushing needle through tissue G4 Transferring needle from left to right G5 Moving to center with needle in grip G6 Pulling suture with left hand G7 Pulling suture with right hand G8 Orienting needle G9 Using right hand to help tighten suture G10 Loosening more suture G11 Dropping suture at end and moving to end points G12 Reaching for needle with left hand G13 Making C loop around right hand G14 Reaching for suture with right hand G15 Pulling suture with both hands.
Table 3. Elements of modified global rating score
Element Rating scale
Respect for tissue
1-Frequently used unnecessary force on tissue; 3-Careful tissue handling but occasionally caused inad- vertent damage; 5-Consistent appropriate tissue handling;
Suture/needle handling
1-Awkward and unsure with repeated entanglement and poor knot tying; 3-Majority of knots placed correctly with appropriate tension; 5-Excellent suture control
Time and motion
1-Made unnecessary moves; 3-Efficient time/motion but some unnecessary moves; 5-Clear economy of movement and maximum efficiency
Flow of operation
1-Frequently interrupted flow to discuss the next move; 3-Demonstrated some forward planning and reasonable procedure progression; 5-Obviously planned course of operation with efficient transitions between moves;
Overall performance
1-Very poor; 3-Competent; 5-Clearly Superior;
Quality of final product
1-Very poor; 3-Competent; 5-Clearly Superior;
dictionary; 3) a multiple kernel learning framework to optimally combine the LDS and BoF approaches.
4.1 Accessing the dataset
The JIGSAWS can be downloaded without a fee from the Language of Surgery website: http://cirl.lcsr.jhu.edu/jigsaws. Access to the dataset requires a free registration, including provision of a valid email address so that we can inform investigators about any issues or corrections to the dataset.
4.2 Data organization
The JIGSAWS is available for download as zip files. Users are free to choose among tasks and data types, including kinematics or videos, together with tran- scriptions for surgical gestures annotations and a meta file of self-proclaimed skill and overall OSATS score for each tasks. In addition, users can optionally download the experimental setup with standardized cross-validation folds we used in our analyses, and a C++/Matlab toolkit using this experimental setup for surgical gesture modeling. Partitioned downloads are enabled.
The JIGSAWS has been used for several research studies with two major areas of focus - surgical activity recognition and skill assessment. Techniques related to speech recognition, control theory, and computer vision have been developed and applied in the research studies using this dataset. We worked with various features extracted from the dataset, for example, features composed of a subset raw kinematics or converted kinematics, features extracted from videos, or fea- tures combining information from both kinematics and video, to classify (assign class label with known segment boundaries), or recognize (assign class label with unknown segment boundaries) surgical activities, and to evaluate dexterous skill. Below, we briefly summarize the prior work with the JIGSAWS. In an early study, [12] performed gesture classification using a 3-state elemen- tary HMM to model each gesture in a trial after applying linear discriminant analysis (LDA) to the continuous kinematic data for dimensionality reduction. Later on, [11] applied an HMM on discrete spectrum features extracted from a short time Fourier transform of the velocity features in the kinematic data with the goal of evaluating surgical skill in tasks and sub-tasks. In [10] studied the relation between subtasks and skill using the same features. [17] accom- plished a surgical recognition task (jointly identifying gesture boundaries and assigning gesture labels) by applying an HMM to features derived from model- ing the kinematic data (after an optional LDA) using Gaussian mixture models (GMMs). The topology for HMMs in [17] was entirely data-derived. These meth- ods were enhanced in [16] through the development of various statistical models