In data management and analytics, we often focus on correcting apparent inability and unwillingness on the part of business leaders to effectively gather and capitalize on data resources. With that perspective, we often see ethics as a side issue difficult to prioritize given the scale and persistence of our other challenges.
At least that was my perspective, and my initial response when confronted recently by a family member on this topic. Her view from outside the field was that ethics should be a primary concern. As I’ve reflected on this conversation, I’ve come around to her point.
In recent years we’ve seen many examples of data misuse due to ethical lapses. Here’s a post that gives five examples, including police officers looking up data on individuals not related to any police business, an employee passing personal data including SSNs to a text sharing site, and Uber’s “god view”, available at the corporate level, which an employee used in 2014 to track a journalist’s location.
Over the past year there’s been an upsurge of interest in establishing ethical principles for data management and analytics. This Wired article describes a February 2018 conference during which a group of data scientists began drafting a data science code of ethics (see the Global Principles below).
It may seem pretentious to think in terms of a Hippocratic Oath for data analysts. Care for a person’s data is arguably not as important as care for his or her health. However, it’s obvious that loss of a person’s data can result in harm, and there are many ways in which a data ethics lapse can harm entire populations. Consider election manipulation, or theft of millions of people’s credit histories associated with “failure to use well-known security best practices and a lack of internal controls and routine security reviews.”
The Hippocratic Oath establishes a set of values still broadly shared within the medical community:
It encourages practicing “in purity and according to divine law” and admonishes physicians not to “use the knife,” but rather, to “leave this to those who are trained in this craft.” It prohibits acts “of impropriety or corruption, including the seduction of women or men.” It advocates for patient privacy and concludes with the oath-taker’s need to strive for respect. (here)
While it also includes some more controversial admonitions, and not all medical schools adhere to its modern revisions, it still influences how we think about medical ethics today. Doctors don’t tend to talk about dropping the Oath, but rather replacing it with something more relevant.
What would a Hippocratic Oath for data look like? Despite the loose organization of our field, we seem to agree on certain baseline values of ethical data-related behavior that a declaration of ethical data principles should include:
- Protection of personal data
- Pre-approval before use of personal data
- Data accuracy
- Transparency of data collection and interpretation
- Correctness of algorithms
- Repeatability of analyses
- Non-deceptive presentation of results.
A data ethics statement should be broad brush yet still meaningful, avoiding especially specifics related to a sub-discipline or specific industry. It should permit no exceptions to following ethical practices. And it should steer clear of political stances not broadly shared in the data community.
There are a number of examples of codes of ethics for data management and analytics. Here are brief notes on three prominent ones:
This statement of data ethics is concise, to the point, and seems to cover the bases. To me its minor omission is that it isn’t specific about data use pre-approval in its statement “Protect the privacy and security of individuals represented in our data.”
It’s somewhat confusing that datapractices.org includes both the Manifesto linked above and a separate Global Principles on Ethical Data Practices. The Principles are an annotated restatement of the FORTS (Fairness, Openness, Reliability, Trust, and Social Benefit) framework found at www.datafordemocracy.org. The FORTS framework is both more specific and less assertive than the Manifesto. To my reading, the prevalence of conditionals like “make best effort” and “where possible” weaken what is a very comprehensive statement of good data practices.
This well-established code of conduct, while not strictly data-focused, is comprehensive and detailed. Its meaningful headings and concise but detailed text make it an unambiguous statement of what “a computing professional should” do. For example, the “Respect Privacy” section includes paragraphs admonishing the computing professional to understand privacy, in all its dimensions, and the rights and responsibilities of those who collect personal data. It goes on to describe the computing professional’s role in both setting policies and processes for protecting data privacy and faithfully implementing the policies.
Of course, codes of ethics in themselves don’t change behavior. Last August Mike Loukides, Hilary Mason, and DJ Patil published a concise book, free of charge for Amazon Kindle, on how to put ethical principles of data science into action. They suggest specific steps for organizations, and society as a whole, to bake ethical practices into their culture and behavior. Recommendations include ethics and security training, application of ethics checklists for data and analytics projects, taking specific actions to build ethics into organizational culture, and, at the national level, implementing regulations to enforce ethical behaviors, like GDPR or ethics enforcement by the US Federal Trade Commission.
In recent TDWI articles, Barry Devlin makes the point that privacy and appropriate use of personal data will depend in large part on data professionals choosing ethical behavior (here and here). As an emerging discipline, it is reasonable that a consensus statement of shared values has not yet emerged for data management and analytics. However, given the risks, time is short. Every data professional should understand ethical challenges in this field and help promote ethical behavior based on shared values. In Dr. Devlin’s words,
The time has come for us, as data management professionals, to make difficult choices. The need now is to apply ethical judgement to the entire analytics process, from requirements through to design, development, ongoing use, and eventual decommissioning.