24
the data subject and even without any direct human involvement. The section of this report that
addresses other laws pursues this issue in the discussion of the Fair Credit Reporting Act.
B. Defining Big Data
No serious person doubts the value of data in human endeavors, and especially in science, health
care, public policy, education, business, history, and many other activities.
63
When data
becomes big data is not entirely clear.
64
A common description is “[b]ig data is a term for data
sets that are so large or complex that traditional data processing application software is
inadequate to deal with them.”
65
Some definitions focus on the “high‐volume, high‐velocity and
high‐variety information assets that demand cost‐effective, innovative forms of information
processing for enhanced insight and decision making.”
66
Another definition adds veracity and
value as fourth and fifth “v’s,” along with volume, velocity, and variety.
67
Perhaps vague is
another word applicable to big data. Two scholars call big data “the buzzword of the decade.”
68
There is no reason here to attempt a definition. However, it is noteworthy that a common feature
of the definitions is a lack of any formal, objective, and clear distinction between data and big
data. A statutory definition that draws a bright line is absent.
69
That presents a challenge for any
regulation, a challenge similar to the problem of defining health information outside the HIPAA
context. What cannot be defined cannot be regulated.
A good example of big data outside HIPAA regulation comes from the Precision Medicine
Initiative, now known as the All of Us Research Initiative.
70
This program involves the building
of a national research cohort of one million or more U.S. participants. The protocol for the
initiative describes the plan for the dataset the program will maintain.
Ideally, in time, the core dataset will include participant provided information
(PPI), physical measurements, baseline biospecimen assays, and baseline health
63
The issues raised here are international in scope. See, e.g., Rosemary Wyber, Samuel Vaillancourt, William
Perry, Priya Mannava, Temitope Folaranmi & Leo Anthony Celi, Big data in global health: improving health in low-
and middle-income countries, 93 Bulletin of the World Health Organization 203 (2015),
http://www.who.int/bulletin/volumes/93/3/14-139022/en/
. This short but useful article is a summary of many of the
issued identified here.
64
See, e.g., MIT Technology Review, The Big Data Conundrum: How to Define It? (Oct. 2013),
https://www.technologyreview.com/s/519851/the-big-data-conundrum-how-to-define-it/
.
65
Big Data entry (Oct. 9, 2017), Wikipedia, https://en.wikipedia.org/wiki/Big_data.
66
See, e.g., Gartner, Inc., IT Glossary, cited in President’s Council of Advisors on Science and Technology, Big
Data and Privacy: A Technological Perspective (Obama 2014),
https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-
_may_2014.pdf.
67
See, e.g., Bernard Marr, Why only one of the 5 Vs of big data really matters (IBM Big Data & Analytics Hub
2015), http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters
.
68
Solon Barocas and Andrew D. Selbst, Big Data’s Disparate Impact, 104 Calif. L. Rev. 671 (2016),
http://www.californialawreview.org/wp-content/uploads/2016/06/2Barocas-Selbst.pdf
.
69
None of the eight bills introduced through October 9, 2017, that contain the words big data offers a definition.
70
See https://www.nih.gov/AllofUs-research-program/pmi-cohort-program-announces-new-name-all-us-research-
program. See also testimony of Stephanie Devaney, All of Us Research Program, National Institutes of Health,
National Committee on Vital and Health Statistics Full Committee (Sep. 13, 2017),
https://www.ncvhs.hhs.gov/transcripts-minutes/transcript-of-the-september-13-2017-ncvhs-full-committee-meeting/.