A Hybrid Architecture (CO-CONNECT) to Facilitate Rapid Discovery and Access to Data Across the United Kingdom in Response to the COVID-19 Pandemic: Development Study.
Jefferson E., Cole C., Mumtaz S., Cox S., Giles TC., Adejumo S., Urwin E., Lea D., Macdonald C., Best J., Masood E., Milligan G., Johnston J., Horban S., Birced I., Hall C., Jackson AS., Collins C., Rising S., Dodsley C., Hampton J., Hadfield A., Santos R., Tarr S., Panagi V., Lavagna J., Jackson T., Chuter A., Beggs J., Martinez-Queipo M., Ward H., von Ziegenweidt J., Burns F., Martin J., Sebire N., Morris C., Bradley D., Baxter R., Ahonen-Bishopp A., Smith P., Shoemark A., Valdes AM., Ollivere B., Manisty C., Eyre D., Gallant S., Joy G., McAuley A., Connell D., Northstone K., Jeffery K., Di Angelantonio E., McMahon A., Walker M., Semple MG., Sims JM., Lawrence E., Davies B., Baillie JK., Tang M., Leeming G., Power L., Breeze T., Murray D., Orton C., Pierce I., Hall I., Ladhani S., Gillson N., Whitaker M., Shallcross L., Seymour D., Varma S., Reilly G., Morris A., Hopkins S., Sheikh A., Quinlan P.
BACKGROUND: COVID-19 data have been generated across the United Kingdom as a by-product of clinical care and public health provision, as well as numerous bespoke and repurposed research endeavors. Analysis of these data has underpinned the United Kingdom's response to the pandemic, and informed public health policies and clinical guidelines. However, these data are held by different organizations, and this fragmented landscape has presented challenges for public health agencies and researchers as they struggle to find relevant data to access and interrogate the data they need to inform the pandemic response at pace. OBJECTIVE: We aimed to transform UK COVID-19 diagnostic data sets to be findable, accessible, interoperable, and reusable (FAIR). METHODS: A federated infrastructure model (COVID - Curated and Open Analysis and Research Platform [CO-CONNECT]) was rapidly built to enable the automated and reproducible mapping of health data partners' pseudonymized data to the Observational Medical Outcomes Partnership Common Data Model without the need for any data to leave the data controllers' secure environments, and to support federated cohort discovery queries and meta-analysis. RESULTS: A total of 56 data sets from 19 organizations are being connected to the federated network. The data include research cohorts and COVID-19 data collected through routine health care provision linked to longitudinal health care records and demographics. The infrastructure is live, supporting aggregate-level querying of data across the United Kingdom. CONCLUSIONS: CO-CONNECT was developed by a multidisciplinary team. It enables rapid COVID-19 data discovery and instantaneous meta-analysis across data sources, and it is researching streamlined data extraction for use in a Trusted Research Environment for research and public health analysis. CO-CONNECT has the potential to make UK health data more interconnected and better able to answer national-level research questions while maintaining patient confidentiality and local governance procedures.