Abstract
The goal of this paper is to present a dedicated high-performance computing (HPC) infrastructure which is used in the development of a so-called reduced-order model (ROM) for simulating the outcomes of interventional procedures which are contemplated in the treatment of valvular heart conditions. Following a brief introduction to the problem, the paper presents the design of a model execution environment, in which representative cases can be simulated and the parameters of the ROM fine-tuned to enable subsequent deployment of a decision support system without further need for HPC. The presentation of the system is followed by information concerning its use in processing specific patient cases in the context of the EurValve international collaboration.
1. Introduction
In this paper, we present the outcome of 3 years of development work in the EurValve project [1], the aim of which was to develop a decision support system (DSS) for treatment of valvular heart conditions, specifically aortic valve stenosis and mitral valve regurgitation. Since the DSS must be able to operate without the need for high-performance computing (HPC) resources, a simplified model needed to be created where prospective patient cases could be analysed without requiring full computational fluid dynamics (CFD) simulations of the heart. In order to create such a model, an integrated solution for medical simulations was developed; this is referred to as the model execution environment (MEE). Starting with a definition of the problem (which involves simulating valvular heart conditions and the outcomes of treatment procedures), we provide a description of the HPC environment used to process retrospective cases in order to create a knowledge base which underpins the EurValve DSS and the reduced-order model (ROM) upon which it is based. We also provide specific examples of MEE usage and corresponding statistics.
The main focus of this paper is on the infrastructure created to facilitate the development and usage of the models and computations carried out in the course of the EurValve project. The clinical results and detailed computational methods will be described in separate publications that are in currently preparation.
2. Definition of the problem
One of the goals of the EurValve project was to elaborate and operate a flexible, easy-to-use environment for the development, deployment and execution of the large-scale simulations that are required for learning process development, for sensitivity analyses (MEE) and for the associated data storage. The simulations themselves require HPC resources and scientific toolkits such as Matlab and ANSYS (specifically we used Fluent Meshing and Fluent for meshing and flow simulations) [2]; however, their outcome, in the form of decision rulesets, is transferred from the research infrastructure to the DSS, enabling access to EurValve in a clinical setting. The challenge, then, was to create an environment which enables clinical researchers (also referred to as domain scientists) to carry out complex simulations with the use of HPC resources and manage their results in a secure manner while not requiring in-depth knowledge of interacting with HPC infrastructures and distributed storage resources [3].
At the core of the problem which EurValve addresses is the concept of deriving a heart model (described by a set of parameters) which, when coupled with patient-specific data, such as medical scans, can be used to compute the likely outcome of various treatment regimens. In a clinical setting, however, this computation cannot rely on HPC resources—this is because most hospitals do not operate their own HPC infrastructures and legal restrictions prevent patient data from being moved out of the clinical setting for external processing. Consequently, computationally expensive parts of the modelling process must be precomputed instead. The initial step is characterizing the valves, which can be done using a number of methods, depending on the available data:
— | if the dataset contains images of the valve that are compatible with parametrized segmentation, a ROM can be created | ||||
— | if the images are not compatible, the valve can be characterized using CFD. |
The most efficient method currently is the creation of a ROM, described in §5 and in detail in [2], which involves multiple runs of a full CFD simulation (figure 1a) prior to creating a patient-specific heart model with a response surface that allows for quick calculations of valve characteristics for any set of geometrical parameters, and thus any geometry. The CFD-based method requires a model of flow through the valve at different flow rates to be solved to estimate the pressure gradient versus flow relationship (although note that the specifics of the applied CFD methodology are beyond the scope of this paper and are not discussed here—they will, instead, be the focus of separate publications). This relationship is then used to characterize the valve. The valve characterization is then used in the heart model (figure 1b), which combines it with the corresponding zero-dimensional (0D) model. The 0D model is then parameterized, to represent the specific patient, using optimization methods. The heart model is subsequently used in the full sensitivity analysis pipeline (figure 2).
Interactions within the heart model, which includes the 0D model and ROM, are as follows.
— | The 0D model includes a ROM representation of the aortic valve or a valve characterization of either the mitral or aortic valve, which relate the flow through the valve to the pressure gradient across it. The purpose of the characterization, or ROM, which requires significant three-dimensional computations in its development and validation (but not in its eventual operation), is to support rapid computation of the cardiac and systemic circulation physiology for a patient-specific valve anatomy when there is no time to produce full four-dimensional analyses. | ||||
— | For the aortic valve, a ROM describes the relationship between the transvalvular flow and the pressure gradient as a function of anatomical/geometrical parameters describing the valve shape. The system's model uses these relationships, together with cardiac and circulation parameters, to compute the flows and pressures throughout the system for the individual patient in one or more physiological states. The ROM itself is based on large-scale simulations of the haemodynamics of a parameterized valve configuration under a range of flow rates that span the physiological space. | ||||
— | To calculate the pressure drop, the ROM builder needs to create a response surface which establishes the relationship between the geometrical parameters and flow, and the pressure drop in a form that is understandable by the ROM interpolation tool. To create the response surface, the ROM builder has to run full three-dimensional simulations multiple times with different input parameters. The result of each simulation is a single pressure drop value. The number of runs needed to characterize the response surface depends (nonlinearly) on the number of input parameters. This process needs to be run on an HPC for the final generation of the model, as processing the data for an individual case requires approximately 4000 CPU hours and involves over a dozen distinct simulation runs, with over 450 full CFD simulations involved in creating the ROM (please refer to §5 for details regarding the consumption of computing resources in the generation of the ROM). |
The sensitivity analysis tool executes the heart model multiple times based on the input population data, which consist of multiple parameters with ranges. Note that even the simplest cardiac/systemic circulation model has over 20 parameters, and it is very difficult to tune them effectively given the relatively sparse patient-specific physiological data. The sensitivity analysis will help us to focus our tuning efforts on the most important/rewarding parameters for personalization.
A ROM describes a parameterized valve, and this is the subject of a major, independent, operation of the computational infrastructure. Parameter estimation is a separate operation that is associated with the setting of parameters that are required by the system's physiology model, such as ventricular elastance or systemic resistance, and that might be associated with parameters/observations in the clinical record. These parameters are used subsequently in the ‘what-if’ scenarios, representing multiple possible treatment strategies, and returning the resulting parameters together with their uncertainty.
The detailed requirements of the sensitivity analysis are as follows:
— | Input for the 0D model is a 30-element vector; output is a 25-element vector. | ||||||||||||||||||||||
— | Output of the agPCE model consists of three plain-text files:
|
The presented pipelines may be repeated for the following reasons:
— | new input patient or population data become available | ||||
— | new versions of models are developed | ||||
— | earlier results need to be reproduced. |
For these reasons, the main requirement for the MEE is to support automation and repeatability of such pipelines.
3. State of the art—computational infrastructures for e-science
Although the primary output of the project is a clinically compliant DSS, in order to develop such a system—as explained in the following section—a dedicated research environment is required. This environment must be capable of leveraging the resources provided by HPC frameworks and deliver an easy-to-use interface which will facilitate the computations required to fine-tune the parameters of the DSS. This section details the progress which has recently been made in the area of computational infrastructures for e-science and outlines the innovation of the EurValve MEE.
3.1. Container technologies
Operating cloud-based resources is an everyday business for current distributed applications. Although umbrella programming interfaces allow for easy integration with many available cloud providers, dynamic computational and data load migration among cloud providers is still an issue. In clinical applications, where data protection is always an issue, lightweight machine migration and execution is a necessity. The use of container technologies—such as Docker [4] or Singularity [5]—makes migration between infrastructural cloud sites less problematic. Lightweight container technologies can be used to package and publish migrating application components that will run on various cloud resources, including standalone computers. Thus, support for encapsulation of computation steps in the form of lightweight containers is a desirable feature in a system whose purpose is to execute computational pipelines consisting of multiple stages.
3.2. New cloud computing solutions
Classic cloud provisioning mode assumes starting Virtual Machine and paying for it in hour-based intervals. For an application like DSS, this can result in an unnecessary waste of money, because user interaction with the application can be quite short (of the order of minutes). For this type of application, it is very convenient to pay per single request, as is possible with new services offered by cloud providers. Examples include Amazon AWS Lambda [6], Google App Script [7] and Azure WebJobs [8]. This introduces the possibility of novel computation policies—e.g. a biological model can be executed when a new file appears in Amazon S3 storage, when a dedicated event is generated by the user, or it can be started at scheduled intervals. In effect, these technologies enable us to contemplate an event-based infrastructure for distributed DSS. On the other hand, the usage of Platform as a Service infrastructures (such as Heroku [9], Cloud Foundry [10], OpenShift [11], Google App Engine [12]) allows resource consumption to be scaled up and down automatically and to pay only for resources used. Thus, the research infrastructure can be maintained in a ‘dormant' state (and incurring no costs) when unused, awaking only when an execution event appears.
Going beyond the state of the art, the presented platform aims to combine these innovations to provide a flexible and cost-effective research infrastructure. Operations-wise, the platform must be able to support various deployment models and be able to select specific implementations and deployments of services depending on their availability at the moment the given service is required. This also entails maintaining a data management infrastructure capable of delivering input data and collecting output regardless of where and how a given computation is taking place.
3.3. Integrated authentication, authorization and accounting
Security aspects represent one of the most crucial issues for any IT system operating on medical data. One of the security aspects is related to ensuring proper authentication, and accounting mechanisms, currently supported by several possible solutions. Kerberos [13,14], despite its age and complexity, is still used in many systems and provides inspiration for more lightweight systems as a good example of a ticket-based access system. Another interesting solution is Shibboleth [15], which is based on Security Assertion Markup Language assertions. Its architecture is well suited for web applications and services. OpenID is an even more relaxed authentication mechanism, based on a defined set of providers, which could be freely chosen from the numerous ones available or established on purpose. On the other hand, there is no need to form a closed federation between providers: authorization functionality can be provided by an external service, either custom-built or based on a standard such as OAuth [16]. Finally, there are numerous simple authentication systems such as plain old username and password mechanisms, mutual transport layer security (TLS) authentication using X.509 certificates or various locally generated ticket systems (such as the Keystone component of the OpenStack). These might still be appropriate for some specific services.
While the presented technologies address specific aspects of IT platform security, the goal behind MEE is to provide a lightweight solution which would not impact the performance of particular services (e.g. data storage access, model execution), yet on the other hand be pluggable and extendable, to permit integration with arbitrary computational platforms, as described in the previous section.
In addition to the need for proper authentication and authorization, services running in cloud infrastructures (especially public ones) require a high level of data security during the storage and processing phases. Additionally, a mechanism that ensures that data cannot be recovered using reasonable time and resources after being deleted is also required. Multiple factors such as the characteristics of certain physical media (especially magnetic) as well as possible write optimizations used in modern storage systems mean that it is impossible to ensure that data are destroyed even when they are overwritten. This results in the need to secure (e.g. encrypt) data before storing them (if permanent storage is needed). Multiple strong encryption systems exist, such as advanced encryption standard (AES) [17]. Some situations may require the usage of asymmetric systems such as the Rivest–Shamir–Adleman algorithm [18], which can be used to encrypt data or verify their integrity without the need to possess secrets required for decryption or signing. A highly comprehensive analysis of the problem has been performed in the scope of the CIRRUS project [19], including papers describing various aspects of data security and privacy. This reveals that the problem is complex on many levels, both technical and legal. Some of those aspects have more recently been addressed by the TClouds project [20], which assessed legal aspects and developed a ‘Trustworthy Internet-scale Computing Platform' to address technical issues related to data security. Another project—SECCRIT [21]—also focuses on legal and political aspects, revealing the need for complex planning from the initial assessment of threats through procedures development and gaining ability to ensure accountability if something potentially goes wrong. It is also important to mention that—while still relatively uncommon—some solutions exist on the provider side. A good example is a service called CloudHSM (Hardware Security Module) [22]. It enables secure storage of cryptographic material protected from unauthorized access, which can be used both for in-house cryptographic solutions as well as for Infrastructure as a Service, such as encrypted elastic block storage or databases.
Of course, as far as security is concerned, the main effort needs to be focused on preventing unauthorized access to the system; however, no security plan would be complete without the ability to handle security incidents that might happen regardless of any effort to prevent them. Part of this plan would need to take into account the complex nature of cloud forensics, as normal methods that assume having access to real physical hardware do not apply in this instance. Problems with good solutions for the above-mentioned problems have been described in the literature [23] and the MEE framework builds upon these achievements to ensure secure storage of data in the cloud, integrating existing solutions where feasible. This involves detecting anomalous connection attempts based on request origin and header content, along with encryption of the data at rest with a strong cryptographic algorithm (AES256) to make data hard to decode in case of a leak from the system.
4. The model execution environment
4.1. Architecture and usage
The goal of the EurValve project is to combine a set of complex modelling tools to deliver a workflow which will permit the evaluation of medical prospects and outlook for individual patients presenting with cardiovascular symptoms suggesting valvular heart disease (VHD). The goal of this section is therefore to present the platform's conceptual view. The vision of the MEE also follows from our research related to methods, techniques and environments for programming and execution of complex scientific applications on clusters, grids, cloud and container infrastructures [24–26], as well as the elaboration of environments for multi-scale, multi-infrastructure and multi-programming applications [27].
Figure 3 presents the architecture of the MEE as implemented during the EurValve project to facilitate the execution of the above-mentioned HPC computations.
The environment itself consists of a user interface enabling the execution of computational pipelines and retrieval of results, along with a middleware layer which takes care of submitting computations to computational resources, monitoring their status and orchestrating execution pipelines. It provides added value to users of the underlying HPC resources, specifically by:
— | integrating secure data storage repositories (including anonymized data describing patient cases which are derived from medical databases, such as the Trial Connect system deployed at Sheffield Teaching Hospitals); | ||||
— | allowing computational pipelines which process these data to be run on HPC resources using a simple user-friendly GUI without requiring domain scientists to become familiar with the specifics of HPC operation; | ||||
— | enabling individual steps of computational pipelines to be scheduled for execution on either HPC or cloud-based infrastructures; | ||||
— | implementing a shared security layer on top of all resources, with single sign-on access to all parts of the infrastructure (including computations on HPC clusters); | ||||
— | providing a range of visualization and data comparison interfaces to enable users to monitor the progress of their computations, visualize intermediate and final results, and compare the outputs of multiple simulation runs. |
The MEE organizes data and computations into context domains (referred to as Patients), while for each patient an arbitrary number of simulations (referred to as Models in figure 3) can be performed to generate the ROM described in §2. This workflow-like approach lends itself well to a variety of use cases where complex data have to be processed using HPC in the context of a specific entity (patient, project, observation, event, etc.)
4.2. Secure data storage and processing
Access to homogeneous data is crucial in the context of the EurValve research pipelines—and this involves both tabular and file data. Tabular data store all information about the patient which can be measured during the clinical study (e.g. blood pressure). File data comprise raw images, usually in the DICOM format, binary representations resulting from segmentation (also in the DICOM format) or model surface/volume representations, expressed in formats such as STL or VTK. Thus, the environment has to provide an appropriate storage solution and transfer data from the hospitals and share them with project partners. Both solutions have been implemented and deployed in the MEE release.
EurValve processes only anonymized data, but those data should nevertheless be robustly protected to ensure that they are accessible only by those who are legitimate researchers conforming to ethics approvals. Consequently, the system includes encryption protocols that ensure data privacy both during transfer (TLS) and at rest (AES-encrypted File Store) and grants access only to authorized parties. This is why the second most challenging objective facing the MEE is to deliver a security solution that takes into account the specific constraints inherent in processing medical data. To ensure proper access control, a policy decision point (PDP) and a policy enforcement point (PEP) were created in the scope of the framework, deployed and used by the storage and computing part of the MEE, as well as external services such as metadata storage (figure 4).
Given that the MEE is used to process sensitive data, including the results of medical scans and examinations, proper auditing and logging of operations is essential. To this end, the platform is capable of so-called anomaly detection (suspicious access attempts such as new logins from unusual geographical locations, browser changes, etc.) Such situations are logged, and a configurable auditing subsystem is used to notify administrators of suspicious activity and undertake preventative actions. An auditor's dashboard is implemented as a distinct platform component to enable users to track their activity and to provide administrators with detailed access statistics.
4.3. Compute platform
With data and security components in place, researchers can perform time/memory/storage intensive simulations; these are in silico experiments which are the main focus of the EurValve project. These calculations generate another objective for the MEE: we need to launch computational jobs in a distributed environment. Depending on the computational requirements, this can include private workstations, cloud infrastructures or the most powerful supercomputer in Poland (Prometheus—103rd place in the top 500 list). As deployed, the MEE provides full support for blood flow simulations and 0D heart model computations. For every registered patient in the system, all computations outlined in §2 can be invoked regardless of their specific deployment characteristics (whether HPC-based or cloud-based). In addition, both input and results can be directly retrieved by users at each stage of the pipeline computation process, with visualization plug-ins provided for popular file formats.
5. Results
Preparation of heart model data for the EurValve DSS called for processing of a cohort of patients referred to as ‘retrospective' (i.e. patients who had already undergone treatment for valvular heart conditions, enabling researchers to compare the outcome predicted by DSS under a variety of conditions with the actual outcome as observed for each patient). The analysis of each case was divided into two computationally intensive parts. The first one, characterization of the valve which extracted the valve coefficients, required running a set of five CFD simulations for each case. The meshing and simulation sets took up to 6 h to complete using 12 cores. Each individual valve was meshed separately using parameters that generated a good quality mesh for all the cases. The second part, personalization of the valve, was run using 24 cores for the genetic algorithm-based personalization of the model analysis with processing time ranging from 30 to 400 min.
Altogether, the system was used to process 66 distinct patient cases (42 patients requiring mitral valve replacements; 24 patients requiring aortic valve replacements, including 12 based on ROM), which involved the consumption of over 250 000 CPU hours on an HPC cluster—specifically, the Prometheus cluster at ACC Cyfronet AGH [28]. A summary of the requested and used computational resources can be found in table 1.
grant name | start date | end date | consumed CPU hours |
---|---|---|---|
EurValve 1 | 12.01.2016 | 12.01.2017 | 22 869 |
EurValve 2 | 17.02.2016 | 17.02.2017 | 38 018 |
EurValve 3 | 22.02.2017 | 22.02.2018 | 94 279 |
EurValve 4 | 05.03.2018 | 05.03.2019 | 117 333 |
In addition to making use of HPC resources provided by the EurValve infrastructure, the storage components of MEE were used to store 71 507 images and other patient input and output files (264 GB), along with platform logs. Both file-based and tabular data storage browsers were provided to platform users for convenience and in order to meaningfully manage pipeline runs and the associated input/output data.
The parametric geometrical model developed by Philips allows for the creation of a ROM that acts as a surrogate for the compute-intensive CFD simulations (see §2). Unfortunately, owing to the large variability of mitral valve insufficiency, we were only able to make a reliable ROM for aortic valve stenosis. The ROM uses response surface methodology to analytically relate simulation input parameters to simulation output parameters. Many simulations are required to adequately train a ROM (in particular with many input parameters) that is accurate enough to replace the CFD simulations [29].
The HPC infrastructure outlined in the previous section along with the MEE were used to generate the necessary data points to train the ROM. A Python script was used to control a CFD simulation pipeline. This pipeline takes a set of parameters that define the shape of the valve and the appropriate boundary conditions. The input parameters are provided in a text file, where each line is associated with a single training point. Typically, the text file is generated on a workstation and consequently uploaded to the MEE infrastructure to run the simulations. A bash script is used to distribute all defined training points over a given number of cores. Enough ANSYS licences were provided to launch 30 jobs in parallel. Following the successful processing of each patient case, simulation results were downloaded to a local workstation where a ROM was generated.
During the EurValve project, the parametric model was iteratively improved. For each improvement iteration of the parametric model, the data required to train the ROM were re-generated on the EurValve. The quality of the ROM was evaluated with respect to CFD performed on the reconstructed mesh, that is, the parametric description of the segmentation mesh.
All datasets were successfully processed through the MEE, resulting in a set of parameters describing the predicted haemodynamic changes following valve intervention. The usefulness of these data was examined by presenting five cases (three patients with aortic stenosis and two patients with mitral regurgitation) to 45 clinicians involved in the management of patients with VHD. Seventy-three per cent of these clinicians felt that this information was useful and would aid clinical management. We saw that this information presented as part of a Clinical Decision Support System can have a significant influence on the treatment decisions made by clinicians in borderline cases with aortic stenosis and improve confidence in all clinical decisions made. Whether this can have a positive effect on patients' outcomes needs further evaluation in a larger randomized control trial.
6. Conclusion and future work
As the MEE is a generic architecture, capable of supporting a wide range of simulations and HPC studies, not necessarily limited to medical sciences, work is underway to extend its usage to applications which call for simultaneous processing of data across a range of HPC centres. This is being done in research investigating the major challenges and the basic architecture of the scalable storage and compute platform for exascale processing of extremely large datasets [30,31], in the context of the PROCESS project [32], and involves developing support for simulation steps encapsulated in application containers (e.g. Docker and Singularity), which can be freely distributed to multiple computing infrastructures for better parallelization of computational tasks. Further application of MEE, and extension of its functionality, is taking place in the context of research of efficient support of childhood cancer evaluation, where it is proposed as a cohort-wide multi-pipeline simulation environment, automating HPC utilization for neuroblastoma and diffuse intrinsic pontine glioma (early childhood cancers) [33], in the framework of the PRIMAGE project [34]. MEE is also considered for broader application to a spectrum of computational medicine research projects, in the context of the Sano Teaming for Excellence Centre [35].
Data accessibility
This article has no additional data.
Authors' contributions
M.B. led the research described in this paper and was involved in the design of the presented framework. K.C. was involved in the development of the reduced-order model, as described in the paper, using the presented IT tools and services. D.R.H. contributed the research models which are processed by the presented infrastructure, and was involved in authoring the corresponding sections of the manuscript. T.G., M.K., M.M., J.M. and P.N. were the principal developers of the presented framework, ensuring integration with HPC, data storage and security infrastructures, along with the execution of computational models. They also authored the corresponding parts of the manuscript. S.W. was responsible for defining and developing the data storage components of the presented framework.
Competing interests
We declare we have no competing interests.
Funding
This study was supported by the EU projects: Eur-Valve Personalised Decision Support for Heart Valve Disease, H2020 PHC-30-2015 689617, and PRIMAGE (PRedictive In-silico Multiscale Analytics to support cancer personalised diaGnosis and prognosis, empowered by imaging biomarkers), H2020 SC1-DTH-07-2018 826494, as well as by the PL-Grid Infrastructure (www.plgrid.pl).
Acknowledgements
The authors would like to mention specific partners of the EurValve consortium whose work contributed to this paper: Martijn Hoeijmakers from the Eindhoven University of Technology and Ansys France, who created the reduced-order model; Roel Meiburg from the Eindhoven University of Technology, who carried out the sensitivity and uncertainty quantification computations; Philips Research Laboratories, Hamburg, who provided the parametrized geometrical model; Gareth Archer, Sheffield Teaching Hospitals; Marcus Kelm, Deutsches Herzzentrum Berlin; Jo Zelis, Catharina Hospital Eindhoven; who carried out the clinical analysis.