In endovascular surgery, endovascular interventionists push a thin tube called a catheter, guided by a thin wire to a treatment site inside the patient’s blood vessels to treat various conditions such as blood clots, aneurysms, and malformations. Guidewires with robotic tips can enhance maneuverability, but they present challenges in modeling and control. Automation of soft robotic guidewire navigation has the potential to over- come these challenges, increasing the precision and safety of endovascular navigation. In other surgical domains, end-to-end imitation learning has shown promising results. Thus, we develop a transformer-based imitation learning framework with goal conditioning, relative action outputs, and automatic contrast dye injections to enable generalizable soft robotic guidewire navigation in an aneurysm targeting task. We train the model on 36 different modular bifurcated geometries, generating 647 total demonstrations under simulated fluoroscopy, and evaluate it on three previously unseen vascular geometries. The model can autonomously drive the tip of the robot to the aneurysm location with a success rate of 83% on the unseen geometries, outperforming several baselines. In addition, we present ablation and baseline studies to evaluate the effectiveness of each design and data collection choice.
The videos below show results for our best performing policy, trained with a feature-map goal representation, relative action representation, and all recovery data.
The ability to recover was also validated by starting the robot in a failed state (e.g. in the wrong branch).
The policy can generalize to differing visual feedback, such as a roadmap overlay. In this feedback mode, commonly used by clinicians, the vessel roadmap is overlayed on the live view. However, the registration of the static roadmap vessels to the live, deforming vessels can be inaccurate.
We perform a bench-top experiment investigating the feasibility of autonomous navigation of a soft fluidic robot for intracranial aneurysm treatment. As a critical step towards achieving autonomous navigation, we conduct our study using a large-scale robot. While progress towards miniaturizing soft robotic tools is being made, larger-scale prototypes are currently more mechanically robust and easier to track in camera images. Thus, they enable us to reliably collect hundreds of demonstrations to validate the autonomous navigation approach, before extending to the small-scale. In addition, we isolate our focus to a single bifurcation projected to a 2D plane. In a real case, navigation can be reduced to traversing a single bifurcation at a time, which often lies roughly in a plane after appropriate C-arm positioning. With this setup, we preserve several core difficulties of endovascular intervention: unpredictable vessel-tool forces via a soft robotic steerable guidewire attached to a flexible tube; high geometrical variation via modular 3D-printed vessel mazes; and ambiguous and incomplete visual feedback via a fluoroscopy simulator.
Since the materials used in soft robots are highly nonlinear, exhibit significant hysteresis, and are easily deformed by their environments, they are challenging to model and control [1]. Further, there are significant visual constraints in endovascular surgery. Under X-ray fluoroscopy, the vessels are not visible until a radiopaque contrast dye is injected that fills the vessels and then diffuses after a few seconds. When the dye has filled the vessels, a snapshot of the vessels called a vessel roadmap is captured. The static roadmap can be referenced while navigating the guidewire, but it is often slightly misregistered due to vessel deformation and inadvertent patient movement. Moreover, sensorization, while potentially beneficial, is still an open research challenge due to the tools' millimeter-to-sub-millimeter sizes and high flexibility [2]. Endovascular interentionists must rely on trial-and-error involving a combination of advancing, retracting, and rotating to enter the device into the correct vessel [3].
To control the soft robotic guidewire, a user inputs force commands through a teleoperated control handle. These forces are proportionally mapped to the bending and translation velocities of the robot, achieved by the syringe pump and translation drive, respectively.
To simulate an aneurysm navigation task in a 2D environment, we created 3D-printed modular bifurcated mazes. Each maze consisted of an entry, bifurcation, and two branches. The bifurcations varied in the angle of each connecting branch across the range of 25-70 degrees. Similarly, the branches varied in width, aneurysm distance from the bifurcation, which side the aneurysm branches from, and aneurysm diameter. Additional variations included secondary bends, empty branches, and bumps along the wall. Different combinations of the modular pieces led to three sets of mazes: the training set, rearranged test set, and novel test set.
We use a transformer-based action chunking model for soft robotic guidewire navigation, inspired by the Surgical Robot Transformer (SRT) [4] which applies [5] to surgical tasks with the da Vinci robot.
Intuitively, the goal representation provides a guiding vector at each pixel location within the vessels. This provides richer information than an image of the goal state or location and is naturally suited for input to a CNN. A relative action representation can account for the inconsistent relationship between shape, pressure, and syringe displacement by approximating it locally. The dataset contained largely "recovery" demonstrations in which the robot begins in a failed state (e.g. in the wrong branch, past the target, or nearly buckling against the wall), as opposed to "normal" demonstrations that begin at the base of the maze.
To evaluate our model, we designed two sets of geometries that were not in the training set; (1) three rearranged geometries, which were mazes that contain branches that appeared in the training set, but now in a unique combination, and (2) three novel geometries, which were mazes with branches and bifurcations not seen at all in the training set. We evaluated the policy on several ablative models described below and also had two trained clinicians perform the task. We measured the success rate in reaching the aneurysm and the final distances to the aneurysm boundary at the end of each trial.
To investigate the effect of including recovery data, we trained two ablation models: one trained on 218 normal demonstrations, excluding any recovery demonstrations ("0% recovery"), and another trained on half of the 429 recovery demonstrations in addition to the 218 normal ones, totaling 430 demonstrations ("50% recovery").
To evaluate the decision of whether to predict contrast injections within the imitation learning framework or choose a naive approach such as injecting at a constant interval, we evaluated two time intervals: injection every eight seconds ("Constant contrast (8 sec.)") and every sixteen seconds ("Constant contrast (16 sec.)").
To evaluate the decision of whether to predict contrast injections within the imitation learning framework or choose a naive approach such as injecting at a constant interval, we evaluated the policy while only injecting contrast at two time intervals: injection every eight seconds ("Constant contrast (8 sec.)") and every sixteen seconds ("Constant contrast (16 sec.)").
To investigate the choice of goal representation, we trained two models: "Binary goal" model uses the roadmap concatenated with a binary mask indicating aneurysm location, and the "No goal" which is trained with just the roadmap as the goal.
Finally, we trained a model that outputs the motor actions as absolute positions ("Absolute actions"), rather than relative to the motors' current positions.
We also evaluated several baselines as a comparison to our proposed model: a state-of-the-art Diffusion policy [6], a multi-layer perceptron (MLP), a classical centerline-following controller, and two clinicians trained in neurointerventional surgery. The Diffusion and MLP policies use the same inputs (live and goal images passed through Resnet encoders) and outputs (action chunk and contrast prediction) as our proposed model. We evaluated all these policies on the novel geometries test set.
Below shows representative failure modes for the policies trained with 50% recovery data, constant contrast injections, a binary goal representation, and absolute actions.
The following videos illustrate common failure modes of the baseline policies.