Master the complexities of cloud compliance with expert resources and relevant insights.

What is Protected Health Information (PHI)?

The acronym: PHI stands for Protected Health Information - not personal health information (although that's in essence what it implies), not personally identifiable health information (I've seen it used although that would technically be PIHI) and I'm sure there are variants of this that you've heard as well.

The definition: Here's the Wikipedia definition. Protected health information (PHI) is any information about health status, provision of health care, or payment for health care that can be linked to a specific individual. HHS provides an even simpler of PHI - individually identifiable health information transmitted or maintained in any form or medium by a Covered Entity or its Business Associate; the definition of a "business associate" has been extended with the HIPAA Omnibus rule that went into effect in 2013. This term "information" is interpreted rather broadly and includes any part of a patient's medical record or payment history. The key here is this phrase "that can be linked to a specific individual". This is where the other acronym, PII (Personally Identifiable Information) - here's the link to the Wikipedia article on that - becomes relevant. The major difference between PHI and PII is that PII is a legal definition - i.e. PII is anything that could be used to uniquely identify an individual. PHI is a subset of PII in that a medical record could be used to identify a person - especially if the disease or condition is rare enough.


The core of the HIPAA regulations is to ensure that ownership of any and all medical data is retained solely by the individual. The individual can then decide to parcel out access to others - providers, family members, employers if needed or necessary or simply by the preference of the record owner. Only an individual has the right to grant access to their medical data. This was mainly done for the following reasons:

  1. Privacy: Obviously we would prefer that our neighbor (or in some cases, family members) not know about whatever condition we might be suffering from or medication we are taking.

  2. Bias and discrimination: AIDS, mental health, and other conditions have some (albeit declining) social stigma associated with it. The HIPAA PHI provisions ensure that employers and others do not have access to one's medical record and use the information contained within to discriminate against the individual based on their health information.


Obviously protection and privacy come into play once the individual can / has been uniquely identified. There are after all 25.8 million Americans have diabetes. This leads to the question of what data can be used to uniquely identify an individual. The generally accepted set of individually unique data elements include the following:






Well, of course i.e. first name, last name, maiden name combinations. One could argue that just any one of the above doesn't uniquely identify an individual after all "James" is a pretty common name. But it could be possible to identify an individual using a combination of data i.e. first name, zip code, street address, etc.


Geographic locators

Everything (street address, city, precinct, zip code, lat-long coordinates, etc.) is considered PII. The first three digits of the zip code are usually considered ok for use except in the case of certain zip codes that cover a small population (less than 20,000). There are currently 17 zip codes that fit that profile - 036, 692, 878, 059, 790, 879, 063, 821, 884, 102, 823, 890, 203, 830, 893, 556, 831. So three-digit zip codes are ok to be used except for the above listed ones.



Pertaining to significant events in an individual's life - birth, death, marriage, admission, discharge, etc. Just the year is generally considered fine for use except in the case of the very elderly (>89 years of age; in which case they would be represented by an aggregate category, e.g. <90)


Phone numbers

Well, of course.


Fax numbers

This is, IMHO, a carryover from the old days. Do you know a lot of people with a personal fax number? But, it does make sense.


Electronic mail addresses (email)



Social Security numbers



Medical record numbers

This is usually a "random" number and could be used if one also knew the institution that assigned it.


Health plan beneficiary numbers

This is your insurance card/member ID.


Account numbers

Bank numbers etc.


Certificate/license numbers

Drivers license, birth certificate number, etc.


Vehicle identifiers and serial numbers, including license plate numbers

If it's good enough for the police to track someone down...


Device identifiers and serial numbers

Medical devices have unique serial numbers. Your computer's MAC id is unique as well.


Web Universal Resource Locators (URLs)

This is a bit murky but is in here to cover all possibilities. isn't very unique. But if logged within a specific application, could indeed be very unique to an individual.


Internet Protocol (IP) address numbers

Your IP address can be used to easily identify your address. There are several free services that offer this (do a quick google search for an address from ip and try this as an example


Biometric identifiers, including finger and voiceprints

Don't forget retinal images.


Full face photographic images and any comparable images



Any other unique identifying number, characteristic, or code

Recode - note this does not mean the unique code assigned by the system to code the da

These 18 elements are the core set of data elements that individually or in combination can be used to uniquely identify an individual. And, considering the fact that the above list of identifiers has fax numbers and not Twitter @usernames, Facebook IDs, or a host of other modern, more common identifiers, it's clear that the PII list is not the most up to date and some more thought should go into recognizing and protecting identifiable information. However, since patient data is valuable in clinical trials, medical case studies, etc., the above list is used as a guideline to ensure privacy. This leads to...

Anonymization & De-identification

Anonymization is a process by which PHI elements are removed or changed with the purpose of minimizing/removing the possibility of going back to the original data set. This involves removing all identifying data to create unlinkable data.

De-identification under HIPAA occurs when data has been stripped of common identifiers by two methods:

  1. Remove the 18 elements listed above;

  2. If another approach is used, ensure a statistically small/negligible risk of re-identification which is validated by a statistics expert (you have to love the interpretability of that rule).

Designing HIPAA compliant solutions

Given the above constraints, it is essential that any application that you design takes this anonymization and/or de-identification requirements into account before any data is shared with any external entity. PHI data can be "shared" with an external entity provided you have BAA (business associate agreement) in place and the individual has signed the appropriate consent documents. How PHI needs to be managed under HIPAA is a whole topic in itself. Look for a blog post from us shortly describing how we enable HIPAA compliance and simplify adherence to the letter and spirit of the law at Datica.