Spine endoscopic atlas: an open-source dataset for surgical instrument segmentation
Ethics statement
Shenzhen Nanshan People’s Hospital’s ethics committee (Ethics ID: ky-2024-101601) had granted a waiver of informed consent for this retrospective study. As all data were strictly anonymized after video export, with no involvement of any identifiable or sensitive patient information, the committee approved the use of the anonymized data for research purposes. Additionally, the approval allows for the open publication of the dataset.
Data collection
This study retrospectively collected endoscopic surgery videos of the cervical and lumbar spine from a total of 119 patients. All of the operations were performed by senior surgeons at two medical centers between January 1, 2022 and December 31, 2023. The video recordings were captured using standard endoscopic equipment from STORZ (IMAGE1 S) and L’CARE company, with all videos recorded in 1080 P/720 P at a frame rate of 60 frames per second. The endoscopes are from Joimax and Spinendos companies, respectively. To ensure the originality and completeness of the data, no form of preprocessing was performed before data storage. The video data were not subjected to denoising, cropping, color adjustments, or any other modifications, thus preserving all potentially important details during the surgery to provide the most authentic foundational data for subsequent image analysis and model training. All videos and images were automatically anonymized at the time of export from the recording devices and during the process of frame extraction. This procedure ensured that the final dataset contained no personally identifiable information and that individual participants could not be traced from the stored data.
Data processing, annotation and quality assessment
Image data
We created the SEA dataset using the collected video data. An expert with extensive experience in image processing selected segments from each video where the instruments appeared frequently. One frame per second was extracted from these segments, with priority given to selecting images that presented segmentation challenges. Data was excluded if no instruments appeared in the field of view. Each sample was stored as an independent entry in the corresponding folder.
The display of surgical instruments may vary under different perspectives. Specifically, the shape, size, contours, and visible parts of the instruments in the image may change due to variations in the endoscopic field of view. Thus, the images were preliminarily classified based on the size of the surgical working channel. Images where the instrument boundaries were continuous and clear were categorized as “Normal scenario”, while images where the instruments were obscured by issues like blood, bubbles, or tissue were classified as “Difficult scenario”. The six main instruments appearing in the dataset were further classified into designated folders. After classification, two experts reviewed the dataset repeatedly. It is worth noting that, despite efforts to minimize subjective errors through re-evaluation of the dataset, certain errors caused by lighting conditions (such as overexposure or underexposure) may still persist.
Segmentation of images
The segmentation process was performed using LabelMe (v5.0.2) and 3D Slicer (v5.0.2). Prior to the commencement of the annotation process, multiple preparatory meetings were conducted to thoroughly explain the annotation standards applicable to various scenarios. These sessions included detailed guidelines as well as hands-on practice with representative examples. Furthermore, feedback was provided to ensure the annotators achieved a consistent understanding of the criteria. In the initial stages of annotation, two annotators independently labeled a set of 200 images from the same dataset, followed by a consistency analysis of the annotation results. These annotations were then reviewed by two junior reviewers. Any annotations that did not meet the standards were returned for re-labeling until they were approved. Afterward, formal annotation began, with the annotators dividing the remaining tasks between them. Throughout the process, the two junior reviewers continued to assess the annotations, and when uncertain data was encountered, a senior reviewer conducted a final review to determine if any annotations needed to be returned for revision. The specific process flow can be seen in Fig. 1. It is particularly noteworthy that for instruments partially obscured by factors such as blood, tissue, or lighting issues, clinicians were still required to outline the full contours of the instruments as accurately as possible. For cases where most parts were unclear, the segmentation focused on the visible parts. Figure 2 illustrates examples of segmentation results for various types of surgical instruments.

Illustration of the instrument annotation, review, and revision process.

Instrument segmentation of ESS using LabelMe and 3D Slicer. (a) grasping forceps, (b) bipolar, (c) drill, (d) scissor, (e) dissector, (f) punch. (a) to (c) were segmented using LabelMe, while (e) to (f) were segmented using 3D Slicer.
Data statistics
Tables 1, 2 show the overview of data record and image dimensions in the dataset. The dataset comprises a total of 4,8510 images. These images come in three different dimension and are stored in both JPG and PNG formats. The total size of these image files is 9.99 GB (gigabyte). The segmentation mask files are stored in NRRD format only. The total size of all segmentation files is 0.08 GB.
Figure 3a shows the proportional relationship between the number of images in two different scenarios, with a clear bias towards images from difficult scenarios in the dataset. Regarding the issue of data imbalance, we did not employ any specific measure. If a model performs well only in normal scenarios, it may fall short in practical applications. By increasing the number of images from difficult scenarios, the model can gain more training data in these complex conditions, thereby enhancing its robustness and generalization capability.

(a) Number of images in the Spine Endoscopic Atlas dataset for normal scenario and difficult scenario; (b) number of images for each instrument category in the Spine Endoscopic Atlas dataset.
In a fluid-irrigated environment, instrument visibility issue may cause significant challenge. Figure 4 illustrates different types of challenging situations. When an instrument occupies less than 20% of the entire field of view, it is considered to have a small proportion. The dataset does not provide individual annotations for each specific difficulty, as these challenges often do not exist in isolation but rather arise from a combination of multiple factors occurring simultaneously, such as limited visibility, obstructed instrument operation, and difficulty in controlling bleeding.

Difficult scenarios of surgical instrument segmentation during endoscopic spinal surgery. (a) bubbles, (b) working channel interference, (c) bleeding, (d) underexposed, (e) overexposed, (f) bone debris, (g) small proportion of instrument, (h) tissue obstruction, (i) multiple difficult scenarios.
Figure 3b reveals the distribution differences of various types of surgical instruments in the dataset. The number of images for grasping forceps and bipolar was higher, at 4,918 and 2,933 images, respectively. The number of images for drill and punch is slightly lower, which is related to their use only in specific steps during surgeries. The dataset contains only 407 images of scissors and 241 images of dissectors, indicating that these instruments are relatively less frequently used during surgeries.
Data description
The dataset is available at the Figshare repository30. The entire folder structure is shown in Fig. 5. In general, the image dataset is divided into two main folders: “classified” and “unclassified”. To increase the diversity of recording equipment and sample variety, data from two medical centers have been included. Within the “classified” folder, the data is further divided into “big channel” and “small channel” subfolders based on the size of the working channel (diameter = 6.0 mm and 3.7 mm). Additionally, the data is categorized according to the surgical region, with separate folders for the “cervical” and “lumbar” areas. Each patient is treated as an individual sample stored in a separate folder. Samples from medical center 1 and 2 are labeled as “P + Patient ID” and “T + Patient ID”, respectively. Inside each sample folder, images of instruments are meticulously classified and stored under the “Normal scenario” and “Difficult scenario” subfolders. Within the respective scenario folders, both the images and their corresponding annotation files are organized under instrument-specific subfolders (e.g., “bipolar”, “grasping forceps”, “drill”, “dissector”, “punch”, “scissor”). The naming convention for both the image and annotation files follows this format: “P/T + Patient ID + video index + frame index” (e.g., P7-2-0204). All unclassified data comes from medical center 1, providing researchers with a rich source of raw data that aids in the development of more precise algorithms for automatically identifying and classifying complex medical images and instrument features.

Overview of the Spine Endoscopic Atlas dataset’s structure.
link
