Home
CliniDeID

CliniDeID

Automatic clinical text de-identification

CliniDeID® automatically de-identifies clinical notes and structured data according to the HIPAA Safe Harbor method. It accurately finds identifiers and tags or replaces them with realistic surrogates for better anonymity. It improves access to richer, more detailed, and more accurate clinical data for clinical researchers. It eases research data sharing, and helps healthcare organizations protect patient data confidentiality.

Available as free open source software (GPL v3 license): download link.

Concerned about exposing patients to privacy breaches?
About HIPAA violations?
Trying to share clinical data for research?

Increase patient trust through the ethical protection of confidentiality.
Magnify the impact of clinical data for research by providing the ability to reuse data without patient informed consent.
Greatly reduce financial risks by avoiding fines and other penalties that could otherwise arise from a leak of patient data.
Save money by reducing the cost and increasing the efficiency of clinical data de-identification.
Expand and scale existing research opportunities by providing richer, more detailed clinical data.
Uncover and facilitate new research opportunities through shareable, de-identified clinical data as well as provide larger, NIH-sponsored research with required data sharing capabilities.
“Future-proof” clinical data for unforeseen research opportunities by allowing records to be accurately linked even after de-identification.

Features

Clinical text de-identification: Uses advanced artificial intelligence algorithms to accurately identify all mentions of identifiers (PII) in unstructured clinical text notes and replaces them with realistic surrogates (PII resynthesis) or tags, as desired. Does not rely on any known identifiers but can use known identifiers to double-check the PII identification if available. Generalizes well to all common types of clinical notes.

Structured data de-identification, integrated with unstructured text de-identification for consistent de-identification throughout the patient record (in CliniDeiD-Complete). Currently compatible with standard data models: OMOP CDM v5.3 and v6; HL7 FHIR coming soon)

Replacement of identified PII with realistic surrogates and consistently across the whole patient record (PII resynthesis), or with tags (generic or PII categories).

Highly accurate identification of PII (as demonstrated in several peer-reviewed evaluations and comparisons available at the bottom of this page)

Multiple input and output data formats: plain text, HL7 CDA , relational databases (PostgreSQL, Oracle , MySQL, MS SQL Server, DB2)

Get CliniDeID, free and open source (GPL 3)

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

CliniDeID

Automatic clinical text de-identification

Features

Resources

De-Identification of Clinical Text: Stakeholders’ Perspectives and Acceptance of Automatic De-Identification

Improving De-identification of Clinical Text with Contextualized Embeddings

Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives

A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools

Voting Ensemble Pruning for De-identification of Electronic Health Records

Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives

Clinical Text Automatic De-Identification to Support Large Scale Data Reuse and Sharing: Pilot Results