Anaconda is a free package and environment manager for python distributed by Continuum Analytics. It has gained traction for ease of packaging and replicating modules or entire python environments on different systems. The distribution includes a set of core python packages and additional user packages can be installed from remote “channels”. Anaconda on the SCC requires additional setup beyond loading the module. Please follow the instructions below to configure anaconda on the SCC.

Configuring Anaconda for the SCC

Anaconda allows you to create an isolated programming environment or a container. This requires installing all of the Python packages required to run your code for each anaconda environment. These anaconda environments can take up a considerable amount of disk space and should be saved in your project spaces. To do this, in your home directory create the file ~/.condarc and add the following, making sure to replace your_project and your_loginname appropriately:

envs_dirs:
    - /projectnb/your_project/your_loginname/.conda/envs
    - ~/.conda/envs
pkgs_dirs:
    - /projectnb/your_project/your_loginname/.conda/pkgs
    - ~/.conda/pkgs
env_prompt: ({name})

Also, replace /projectnb with /restricted/projectnb if that is where your project has disk space available.

Setting up a custom installation of Python

The conda command is a tool to interact with, create, and modify a personal installation of Python (called an environment). You would want to create your own environment if

  • you needed a different version of a Python package
  • you needed to install a package (with conda or pip) that is not already installed
  • you prefer to manage and maintain your own library installations

conda is aware of a repository of pre-compiled copies of essential third-party dependencies, for example the graphics library wx-widgets needed by wxpython. So installing hard-to-compile dependencies is a simple process of downloading the pre-compiled versions (using the conda tool).

Below we provide several examples of initializing new conda environments. First we show you how to find and activate existing environments. This depends on which shell you use. The next example shows you how to start a new environment from scratch, and for that example we install Python 3.4 along with numpy, scipy and matplotlib. The following example shows you how to clone an existing environment, namely the “root” environment, which is the default environment you access when you load the anaconda module. This is the recommended way to start a new environment (see below: “Warning about mpich2, mpi4py and readline”). Lastly, we provide an example of installing Python packages using either conda or pip, the standard Python installer.


Example 1: Finding and Activating conda Environments

The conda command info provides you details about your anaconda installation. The flag -e, for environments, lists all the environments available to you, for example:

scc1% module load miniconda
scc1% conda info -e
# conda environments:
#
gbrs                     /projectnb/scv/cjahnke/.conda/envs/gbrs
snakemake                /projectnb/scv/cjahnke/.conda/envs/snakemake
base                  *  /share/pkg.7/miniconda/4.7.5/install

The * indicates which conda environment is currently active. The above example shows the default “root” environment is active. The “root” environment is what you get when you initially load the anaconda module.

Activating an Environment in Bash

With Bash, each environment is given a script to activate that environment. In the following example, we will activate an environment named gbrs:

scc1% source activate gbrs
(gbrs)scc1%

You will find, as we see above, that your command prompt will change to indicate which environment is currently active. You can change the environment by activating another, even the root environment, but you can also deactivate the environment, as we show next:

(gbrs)scc1% source deactivate
scc1%

This is the same as activating the ‘root’ environment, except now the prompt looks as it originally did.

Activating an Environment in csh/tcsh

In csh the activate script is not available, so you simply add the environment bin directory to your PATH environment variable. Where’s the environment installed? You can find the path printed next to the name in the output of a call to ‘conda info -e’ or when you create a new environment (more about this below) you’ll see a message in the output similar to the following:

Package plan for installation in environment /projectnb/scv/cjahnke/.conda/envs/gbrs

So you would want to add /projectnb/scv/cjahnke/.conda/envs/gbrs/bin to your PATH environment variable:

scc1% setenv PATH /projectnb/scv/cjahnke/.conda/envs/gbrs/bin:$PATH

Example 2: Creating a new environment with conda

In this example we use the conda create command to create a new environment named py3. This name can then be used later on to access this installation. Following the create instruction, we list the packages that should be installed into the new environment, namely python 3.4, numpy, scipy and matplotlib.

scc1% module load miniconda
scc1% conda create -n py3 python==3.4 numpy scipy matplotlib
conda create -n mynewenv numpy scipy matplotlib
Fetching package metadata: ..
Solving package specifications: .............
Package plan for installation in environment /projectnb/scv/cjahnke/.conda/envs/mynewenv:

The following NEW packages will be INSTALLED:

    dateutil:   2.1-py34_2
    freetype:   2.4.10-0
    libpng:     1.5.13-1
    matplotlib: 1.4.0-np19py34_0
    numpy:      1.9.0-py34_0
    openssl:    1.0.1h-1
    pyparsing:  2.0.1-py34_0
    pyqt:       4.10.4-py34_0
    python:     3.4.0-0
    pytz:       2014.7-py34_0
    qt:         4.8.5-0
    readline:   6.2-2
    scipy:      0.14.0-np19py34_0
    sip:        4.15.5-py34_0
    six:        1.8.0-py34_0
    sqlite:     3.8.4.1-0
    system:     5.8-1
    tk:         8.5.15-0
    zlib:       1.2.7-0

Proceed ([y]/n)? y

Linking packages ...
[      COMPLETE      ] |##################################################| 100%
#
# To activate this environment, use:
# $ source activate mynewnev
#
# To deactivate this environment, use:
# $ source deactivate
#

The last instruction explains how to activate this environment. As explained above, this instruction only works if you are using Bash, for [t]csh you must manually modify your PATH environment variable.

Example 3: Cloning an existing environment with conda

scc1% module load miniconda
scc1% conda create -n new_root --clone root
src_prefix: '/share/pkg/anaconda/2.0.0/install'
dst_prefix: '/projectnb/scv/cjahnke/.conda/envs/new_root'
Packages: 129
Files: 31
Fetching package metadata: ..
Linking packages ...
[      COMPLETE      ] |##################################################| 100%
#
# To activate this environment, use:
# $ source activate new_root
#
# To deactivate this environment, use:
# $ source deactivate
#

As with creating new environments, how you activate this new cloned environment depends on whether you’re using bash or [t]csh. After cloning and activating your own personal environment, you can modify it by installing new packages, either with conda or pip, as is described next.

Example 4: Moving an existing environment from your Home Directory to your Project Space

If you have created environments before configuring anaconda for the SCC then you will want to move them to your project space. First, create your ~/.condarc file. Then:

scc1% module load miniconda
scc1% source activate my_env1
(my_env1) scc1% conda list --explicit > my_env1_pkgs.txt
(my_env1) scc1% source deactivate
scc1% conda create --name my_env2 --file my_env1_pkgs.txt
scc1% source activate my_env2

Once the conda environment is recreated in the Project Space, remove the environment from the home directory using the following command:

scc1% conda remove --name my_env1 --all

Example 5: Increase Disk Space by Cleaning Anaconda Cached Packages

Anaconda stores an index cache, lock files, unused cache packages, and tarballs when packages are installed into environments. This is convenient for creating environments quickly when they contain similar packages as existing environments; however, you will want to remove these cached files if you have used Anaconda with your home directory on the SCC as all cached files should be in your Project Disk Space to keep your home directory under quota. If you have set up your ~/.condarc with the information from Configuring Anaconda for the SCC then all future installs will only use Project Disk Space, protecting you from going over quota in your home directory. To remove (or clean) these cached files, run:

scc1% module load miniconda
scc1% conda clean -a

Example 6: Installing Python packages: When to use conda and when to use pip

If you’ve created a custom Python environment with conda, you can install your own packages using either pip or conda. When you use conda, it downloads a pre-compiled version of the package you request. conda has access to a limited set of packages that are either essential to research computing or are hard to compile, or both. pip is a Python program to download and install Python packages from pypi.python.org, and it is available when you use the anaconda module. As opposed to conda, pip downloads, compiles, and installs the source code.

Which should you use? First try conda and then pip. If you have trouble installing a package, please contact us for help.

How do you install a package? Well, with conda, there is an install command. You can also specify a version by appending a package name with ‘==version-number’, for example numpy==1.7 will install numpy version 1.7. Here’s an example where we install pandas version 1.4 into our custom environment, named ‘mynewenv’:

scc1% module load miniconda
scc1% source activate mynewenv
(mynewenv)scc1% conda install pandas==0.14
Fetching package metadata: ..
Solving package specifications: .
Package plan for installation in environment /projectnb/scv/yannpaul/.conda/envs/py3:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pandas-0.14.0              |       np18py34_0         9.6 MB
    python-2.7.13              |                4        22.6 MB
    scipy-0.14.0               |       np18py34_0        28.9 MB
    setuptools-5.8             |           py34_0         431 KB
    ------------------------------------------------------------
                                           Total:        61.5 MB

The following NEW packages will be INSTALLED:
    pandas:     0.14.0-np18py34_0
    setuptools: 5.8-py34_0
    xz:         5.0.5-0

The following packages will be UPDATED:
    scipy:      0.14.0-np19py34_0 --> 0.14.0-np18py34_0

The following packages will be DOWNGRADED:
    numpy:      1.9.0-py34_0      --> 1.8.2-py34_0

Proceed ([y]/n)? y

Fetching packages ...
pandas-0.14.0- 100% |################################| Time: 0:00:01   7.11 MB/s
scipy-0.14.0-n 100% |################################| Time: 0:00:00  35.84 MB/s
setuptools-5.8 100% |################################| Time: 0:00:00   1.79 MB/s
Extracting packages ...
[      COMPLETE      ] |##################################################| 100%
Unlinking packages ...
[      COMPLETE      ] |##################################################| 100%
Linking packages ...
[      COMPLETE      ] |##################################################| 100%

Note that some packages needed to be downgraded, some needed to be upgraded, and others needed to be installed to get pandas 0.14 installed. Regardless, the whole process is automated.

Now you can do the same thing with pip, but pandas would need to be compiled. pip is useful, rather, for packages not available to conda. Here as an example we also install pint, a package to manage unit conversions, using the pip program:

(py3)scc1% pip install pint
Downloading/unpacking pint
  Downloading Pint-0.5.2.zip (134kB): 134kB downloaded
  Running setup.py (path:/tmp/pip_build_yannpaul/pint/setup.py) egg_info for package pint

Installing collected packages: pint
  Running setup.py install for pint

Successfully installed pint
Cleaning up...

 

There is additional information on available Python versions here.

Adding Conda Environment to Jupyter Notebook

To use an anaconda environment in Jupyter Notebook as a kernel, the environment needs to be registered with Jupyter as a kernel. To accomplish this, the anaconda environment needs to have the ipykernel package installed. The installation of ipykernel needs to be done in the terminal.

If creating a new environment, add ipykernel to the conda create command:

scc1% module load miniconda
scc1% conda create -n py3 python==3.6 numpy scipy matplotlib ipykernel

If you already have the conda environment created but don’t have ipykernel installed within that environment, then first activate the environment and use the conda install command to install ipykernel.

scc1% conda activate py3
(py3) scc1% conda install ipykernel

Next, use the ipykernel package to register the activated environment with Jupyter. Make sure the environment you want to be registered is activated first. For the flag --name it is suggested that you use the same name as the conda environment, as this will be the kernel name listed in Jupyter that you will need to select.

scc1% conda activate py3
(py3) scc1% python -m ipykernel install --user --name py3 

If successful, this will output a message indicating that the kernel was successfully install in a sub-directory of .local.

Installed kernelspec py3 in /usr1/scv/yannpaul/.local/share/jupyter/kernels/py3

Jupyter is now aware of the Conda Environment. Start an instance of Jupyter Notebook server and then create a new notebook by clicking “New” (top right corner). The Conda Environment that was just registered is now listed in the drop down menu.