Diffing iPython notebook code in Git

Nowadays, I use iPython notebooks a lot in my software development nowadays. It's a nice way to debug things without having to fire up pdb; I'll often use it when I'm trying to debug and explore a new API.

Unfortunately, notebooks are really hard to diff in Git. I use magit and git diffs pretty extensively when I change code, and I rely heavily them to make sure I haven't introduced typos or bugs. iPython notebooks are just JSON blobs, though, so git gives me a horrible, incoherent mess. I basically commit them blindly without checking the code at all nowadays, which isn't ideal.

To solve this, I used a trick from a previous job: I generate a readable version of the notebook, and check the diff for that. Specifically, I wrote a script that extracts only the Python code from the iPython notebook (which is essentially a JSON file). Then, whenever I commit a change to the iPython notebook, it:

  1. Automatically generates the Python-only version alongside the original notebook.
  2. Commits both files to the repository.

Git compares the diff between the old notebook and the new one nicely, so I can easily see what code I've changed.

One of these is easier to read than the other!

To make sure it runs when I need it, I created a git pre-commit hook. Git's default pre-commit hooks are a little difficult to use, so instead I use the pre-commit package, a package that makes it easy to define and run git hooks in whatever language you like.

If you want to try it out, you can do so by setting it up (you likely already have it, if you do autoformatting on commit), and then including the following code in your .pre-commit-hooks.yaml file:

 - repo: https://github.com/moonglow-ai/pre-commit-hooks
    rev: v0.1.1
    hooks:
      - id: clean-notebook

If you want to edit or fork the hook, it's open source! You can do so here: https://github.com/moonglow-ai/pre-commit-hooks.

Some other approaches to solving this problem that I've seen include:

Stripping notebook outputs: The nbstripout package does this and also includes a git hook. It's a good idea for general security and hygiene reasons, but it still doesn't give me the easy code diff-ability that I want.

Just using python files with %% format (aka percent syntax): This is a neat notebook format you can use in VSCode, and many people I know use it as their primary way of running notebooks. It seems a little extreme to switch to an entirely new format altogether though.

jupytext: A library that 'pairs' an iPython notebook with a python file. It's actually quite similar in implementation to this hook. However, it runs on the Jupyter server, so it doesn't work out-of-the-box with the VSCode editor.


If you liked this, give Moonglow a try! It lets you start and stop GPUs instances and integrates with VSCode so that you can connect iPython notebooks to them without leaving your editor. We also support connecting to your own AWS instances. You can try it out for free - we give you $5 of free GPU credit when you sign up.

Subscribe to Moonglow Blog: tech notes for Jupyter notebook users

Sign up to get new updates.
jamie@example.com
Subscribe