Good Science, Good Code logo

Good Science, Good Code

Archives
Blog
16 April 2026

The Quiet Artifacts of Science

What the Single Responsibility Principle means for research code, plus the quiet artifacts of science that never get shared.

Hi,

I'm Caroline. I'm a doctor, epidemiologist, and senior software developer. I run a consulting company, teach on a health data science MSc, and I'm doing a PhD investigating how you create good quality synthetic data. I've spent the last few years thinking about one question: why is it so hard to write good research code, and what can we do about it?

This newsletter is a monthly digest of what I'm writing, building, and reading on that topic. It's short with no filler.

What I wrote this month

The Single Responsibility Principle as Applied to Research Code

The Single Responsibility Principle says that a function should do one thing, and one thing only. It is a well-known idea in software engineering, but I have been thinking a lot about how it relates to research code. We have all seen code (or have written code!) where there is a single huge function that does a bunch of actions. It is very easy for these types of functions to obscure the method you are applying and make it harder to read and debug. This post looks at what SRP means in practice for researchers.

Synthetic Data: The Series

This month I published the index page for my synthetic data series. I have been writing about synthetic data for around a year, and the series now has enough posts that it needed a proper home. I have grouped the posts into sections (foundations, methods, applications, evaluation) and given each one a short description so you can jump to whatever interests you.

What I am building

I am starting to put together a free online course called "How to Document Your Research Code." It is aimed at any researchers who write code, and it covers everything from writing a good README to documenting your environment. The course will be interactive, with exercises built around realistic research code rather than toy examples. You can follow along here. No courses are live at the moment but they are in-progress. Bookmark this! I will also announce the courses as they come online, probably via this newsletter.

Something worth reading

Good Enough Practices in Scientific Computing by Wilson et al. (2017). An oldie but a goodie. If you have not read this paper, I think it is the single best starting point for thinking about computing practices in research. It covers data management, project organisation, collaboration, and version control, all aimed at researchers who write code. It is not specific to any particular domain and is worth reading no matter your area. I come back to it regularly and still find it useful. 

Read the paper

One thing on my mind

This month I travelled up to Glasgow to teach postgraduates and researchers on creating codelists for epidemiological research. A codelist is a carefully put together list of clinical codes that defines what counts as a particular condition, medication, or procedure when you are working with health data. If you want to study diabetes, someone has to decide exactly which codes mean "diabetes." That is what a codelist does.

Creating a good one can take many hours of careful thought and clinical expertise. But codelists are often thrown away at the end of a study or hidden in a private project folder, which makes it very difficult for anyone else to reproduce the work or build on it.

It got me thinking about the quiet artifacts that different areas of science produces. I am sure that every field has them! For example, recalibration protocols for a specific analyser in a lab, the hand-labelled training data in a machine learning project, or the survey questions that were refined through rounds of piloting in social science. These things represent a huge amount of intellectual effort, but they rarely get the same attention as the code or the paper when it comes to keeping them and sharing them.

Things are improving. In my field, platforms like OpenCodelists have made a real difference. But there is still a long way to go, and I suspect every discipline has its own version of this problem. I have been thinking what other domains do with these quiet artifacts. Get in touch if you have thoughts on this, by commenting or contacting me here. 

Don't miss what's next. Subscribe to Good Science, Good Code:

Add a comment:

You're not signed in. Posting this comment will subscribe you to this newsletter with the email address you enter below.
www.carolinemorton.co.uk
Powered by Buttondown, the easiest way to start and grow your newsletter.