8 Ethics

We often work with sensitive health data from vulnerable populations — people whose information and trust we can’t take lightly. We have to produce good science and protect the people whose data we use. Those aren’t separate obligations.

8.1 Research Ethics and Responsible Conduct of Research

Responsible conduct of research (RCR) covers how we design studies, handle data, run analyses, and report findings. The short version: don’t cut corners, and document what you do. In computational epidemiology, this means:

Design and planning: Being thoughtful about research questions and methods upfront. Don’t retrofit a hypothesis to fit your results.
Data stewardship: Handling data carefully — research participants and institutions gave us access, and we owe them rigor. We document our analyses, maintain data integrity, and report findings honestly — including null results and limitations.
Authorship and attribution: Being clear about contributions and honest about who did what. Co-authors take responsibility for the work we publish together.
Reproducibility: Writing code and documentation that allows others (and our future selves) to understand and verify our work. If somebody wants to check our work, they should be able to. This builds trust in our science.

Fabrication, falsification, and plagiarism can be career-ending.¹ They also make it harder for everyone else to get their work taken seriously.

8.2 Human Subjects and IRB

Many projects in this lab involve human subjects or their data. We have a legal and ethical obligation to protect participants and minimize risks.

Here’s what this means for you:

If you’re designing a study involving human subjects, surveys, interviews, or human-derived data, we need IRB approval before you begin. This is non-negotiable.
We submit protocols through Stanford’s eProtocol system. I’ll help you with this, but it requires thinking carefully about: Who are your participants? What are the risks? How will you protect confidentiality? What informed consent (if any) do they receive?
Some projects may qualify for expedited review, exemption, or a human subjects determination if they involve minimal risk or existing data without identifiers. But that’s for the IRB to decide, not us.
You should always use the data with the lowest risk surface for answering your question. If ecological or public-use deidentified data will answer the question, use that. If you need individual-level data, but it can be deidentified, use that. So on and so forth. We only use RIF or identifiable data in rare circumstances when it is necessary to answer our research question.

Even if you think a project doesn’t need IRB oversight, check with me or the IRB.

8.3 Data Privacy and Security

We handle sensitive health data. This is a privilege and a responsibility. We often access these data through the Center for Population Health Sciences or the US Census Bureau. When doing so, you must adhere to the rules of the data provider, without exception.

More generally, here’s how we approach data privacy:

HIPAA and Protected Health Information (PHI):

If we’re working with data from healthcare systems, patient records, or anything derived from patients, it likely contains PHI (Protected Health Information). HIPAA rules apply.
We only access and use PHI for purposes explicitly approved in our IRB protocol. No side analyses, no exploratory work with restricted data unless approved.

Restricted Data Enclaves:

Some of our data lives in restricted data enclaves — secure computing environments where you can access sensitive data without downloading it to your laptop. This can be PHS’s secure data environment or the US Census Bureau’s FSRDC. Always adhere to the rules of the data and computational environment.

General data security practices:

Use strong passwords. Use multi-factor authentication on accounts with sensitive data.
Your laptop should always be encrypted.
Don’t email health data or identifiable information.
Document what data you have, where it lives, who can access it, and when you’ll delete it. Data retention timelines matter legally and ethically.
De-identify or aggregate data whenever possible for analyses and papers.

Data breaches happen. If you suspect you’ve exposed sensitive data — even accidentally — tell me immediately. Don’t panic or hide it. We’ll figure out what happened and what to do next.

8.4 Conflicts of Interest

We all have lives outside the lab. The issue is when those outside interests could bias our research or create divided loyalties. Stanford takes this seriously.

Financial conflicts: If you receive funding, consulting fees, stock, or other payments from companies related to your research, disclose it. This includes work outside Stanford.
Personal relationships: Close relationships with collaborators can create dynamics that affect research judgment. Be transparent about these.
Other commitments: If you’re doing significant outside work, we should talk about how it fits with your lab responsibilities.

You’ll fill out a conflict of interest form when needed. This isn’t about accusing you of wrongdoing — it’s about transparency so we can sort out any conflicts before they become problems. Many conflicts can be managed; the problem is hiding them.

Always err on the side of overdisclosure.

8.5 AI and LLM Use in Research

NOTE

AI use in research is changing fast. This policy will adapt as we learn more, but I will always expect three things: you are responsible for your work, you disclose your AI use, and you assume anything you put into an AI tool will be made public.

Large language models and AI tools are increasingly part of how we work and do research. They can help with code, writing, and analysis — but they hallucinate, produce subtle errors in code, and make it harder to reproduce your work if you’re not careful.

Further, in computational epidemiology, writing is thinking and coding is researching. Those activities are difficult but necessary components of the research process. Outsourcing them to AI will generally lead to poorer research products because literally writing down your arguments helps you strengthen them and coding and running into errors will help you understand your data.

When it comes to AI in science gernerally, here’s our approach:

Disclose use: If you use ChatGPT, Claude, GitHub Copilot, or other AI tools in your research (including methods, code, writing, or figures), be transparent about it. Put it in your methods section, your code comments, or your acknowledgments. Reviewers and readers need to know.
Verify outputs: AI tools make mistakes and make things up. Never use AI-generated text, code, or analysis without checking it yourself. You’re responsible for what’s in your papers and code.
Responsible use: Don’t paste sensitive data into public AI tools. Don’t use AI to write methods you didn’t actually use, or results you didn’t actually get. Don’t let AI bypass your own critical thinking.
Check our policies: Stanford, journals, and funding agencies are updating policies on AI in research regularly. When you’re writing a paper or grant, check current guidance and be consistent with funding agency requirements. The vast majority of funding agencies and journals do not allow AI use for critiques.

The principle is simple: you stay accountable for your research. AI is a tool, not an author or excuse. Always follow the institutional policies regarding AI use.

If you do decide to use AI for research, you must be transparent. Your collaborators should know before you run it through an AI. If you are reviewing somebody else’s project, manuscript, grant, etc. you should confirm with them they are ok with running it through an AI. For example, grant applications may have sensitive data the author did not want run through an AI. Some people do not trust their very early projects to be sent through an AI for fear of getting scooped. Etc. We respect each other’s preferences.

8.6 Questions and Concerns

Ethics questions don’t have perfect answers. If you’re unsure about something, talk to me. It’s always better to have the conversation than to second-guess yourself later.

Or at least they should be. If you’re ever unsure about research practices, ask me.↩︎