Pragmatic & predictable notebook using Python environments

Going through the second half of the course of fastai, I was running into problems running the notebooks. I am relatively new to Python package management, and so some of the very cryptic error messages baffled me on how to resolve. 

An example

When trying to run the notebook #9 (may have failed earlier) from part 2, I got errors around NoneType object not being iterable when getting the data back from the dataloader. I was trying to ensure that I had the correct versions of libraries installed, so I went to the terminal window and tried a pip3 install of pytorch libraries, but then that would give this error:

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz

First, I realized after much searching on the internet that the above error message is a passive way (if you wish to install a library) of indicating that you should set up a separate Python environment where you can have specific libraries installed.

Second, I have installed some newer dependencies than what the fastai course expected (torch version needs to be 2.0.1 and numpy needs to be less than 2). So, I needed to have those legacy versions installed to run the fast.ai code. (example python3.12 is installed on my machine which is incompatible with torch 2.0.1).

So, I was wrestling with how I can specify those specific versions in the Jupyter runtime. Python environments help resolve this issue. The premise of Python environments is that you can independently manage all of the library versions for a particular environment without destabilizing other Python libraries you may need access to. You can think of it as a container which allows you to install specific versions that are only available within that container.

However, I had never worked with Python environments before, so I had to learn that. They are pretty easy to do (step 2 is an example below). I was able to isolate all of the package versions using the Python environment so that I could independently run the notebooks without additional problems. The steps below allowed me to do that.

figure 1: how to Python env allows you to target specific libraries

Solution

Steps to get a running local notebook (Macbook) for fastai/course22p2 in the Github repo:

  1. Get Python 3.10
    • brew install python@3.10
  2. Create a virtual environment which is python3.10 based
    • In the terminal window run
      • python3.10 -m venv .fastai2
      • source ~/.fastai2/bin/activate
    • Note: this creates a virtual environment from which you can install pre-reqs which could be incompatible with the currently installed default python version. For example, pytorch 2.0.1 is incompatible with python3.12.
  3. Install the pre-reqs (in terminal using Python env)
    • pip3 install -Uqq git+https://github.com/fastai/course22p2
      1. (Installs all of the miniai components created in earlier notebooks)
    • pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 
    • pip3 install matplotlib
    • pip3 install jupyter
    • pip3 install numpy==1.23
  4. Run jupyter (in terminal using Python env)
  5. Launch colab in browser
  6. Run the cells in the notebook, voila!

I hope this helps someone save time from figuring out these dependencies in the future.