Combining Optical Character Recognition with Paper ECG Digitization

Objective: We propose a MATLAB-based tool to convert electrocardiography (ECG) waveforms from paper-based ECG records into digitized ECG signals that is vendor-agnostic. The tool is packaged as an open source standalone graphical user interface (GUI) based application. Methods and procedures: To reach this objective we: (1) preprocess the ECG records, which includes skew correction, background grid removal and linear filtering; (2) segment ECG signals using Connected Components Analysis (CCA); (3) implement Optical Character Recognition (OCR) for removal of overlapping ECG lead characters and for interfacing of patients’ demographic information with their research records or their electronic medical record (EMR). The ECG digitization results are validated through a reader study where clinically salient features, such as intervals of QRST complex, between the paper ECG records and the digitized ECG records are compared. Results: Comparison of clinically important features between the paper-based ECG records and the digitized ECG signals, reveals intra- and inter-observer correlations of 0.86-0.99 and 0.79-0.94, respectively. The kappa statistic was found to average at 0.86 and 0.72 for intra- and inter-observer correlations, respectively. Conclusion: The clinically salient features of the ECG waveforms such as the intervals of QRST complex, are preserved during the digitization procedure. Clinical impact: This open-source digitization tool can be used to digitize paper ECG records thereby enabling new prediction algorithms to risk stratify individuals for cardiovascular disease, and/or allow for ECG-based cardiovascular diagnoses relying upon automated digital algorithms.

