Anaconda is a free package and environment manager for python distributed by Continuum Analytics. It has gained traction for ease of packaging and replicating modules or entire python environments on different systems. The distribution includes a set of core python packages and additional user packages can be installed from remote “channels”. Anaconda on the SCC requires additional setup beyond loading the module. Please follow the instructions below to configure anaconda on the SCC.
- Configuring Anaconda for the SCC
- Setting up a custom installation of Python
- Example 1: Finding and Activating
conda
Environments - Example 2: Creating a new environment with
conda
- Example 3: Cloning an existing environment with
conda
- Example 4: Moving an existing environment from your Home Directory to your Project Space
- Example 5: Increase Disk Space by Cleaning Anaconda Cached Packages
- Example 6: Installing Python packages: When to use
conda
and when to usepip
- Example 1: Finding and Activating
- Adding Conda Environment to Jupyter Notebook
Configuring Anaconda for the SCC
Anaconda allows you to create an isolated programming environment or a container. This requires installing all of the Python packages required to run your code for each anaconda environment. These anaconda environments can take up a considerable amount of disk space and should be saved in your project spaces. To do this, in your home directory create the file ~/.condarc and add the following, making sure to replace your_project
and your_loginname
appropriately:
envs_dirs:
- /projectnb/your_project/your_loginname/.conda/envs
- ~/.conda/envs
pkgs_dirs:
- /projectnb/your_project/your_loginname/.conda/pkgs
- ~/.conda/pkgs
env_prompt: ({name})
Also, replace /projectnb
with /restricted/projectnb
if that is where your project has disk space available.
Setting up a custom installation of Python
The conda
command is a tool to interact with, create, and modify a personal installation of Python (called an environment). You would want to create your own environment if
- you needed a different version of a Python package
- you needed to install a package (with
conda
orpip
) that is not already installed - you prefer to manage and maintain your own library installations
conda
is aware of a repository of pre-compiled copies of essential third-party dependencies, for example the graphics library wx-widgets needed by wxpython. So installing hard-to-compile dependencies is a simple process of downloading the pre-compiled versions (using the conda
tool).
Below we provide several examples of initializing new conda
environments. First we show you how to find and activate existing environments. This depends on which shell you use. The next example shows you how to start a new environment from scratch, and for that example we install Python 3.4 along with numpy, scipy and matplotlib. The following example shows you how to clone an existing environment, namely the “root” environment, which is the default environment you access when you load the anaconda module. This is the recommended way to start a new environment (see below: “Warning about mpich2, mpi4py and readline”). Lastly, we provide an example of installing Python packages using either conda
or pip
, the standard Python installer.
Warning about mpich2, mpi4py and readline
Warning about virtualenv
Example 1: Finding and Activating conda
Environments
The conda
command info
provides you details about your anaconda installation. The flag -e
, for environments, lists all the environments available to you, for example:
scc1% module load miniconda
scc1% conda info -e
# conda environments:
#
gbrs /projectnb/scv/cjahnke/.conda/envs/gbrs
snakemake /projectnb/scv/cjahnke/.conda/envs/snakemake
base * /share/pkg.7/miniconda/4.7.5/install
The * indicates which conda
environment is currently active. The above example shows the default “root” environment is active. The “root” environment is what you get when you initially load the anaconda module.
Activating an Environment in Bash
With Bash, each environment is given a script to activate
that environment. In the following example, we will activate an environment named gbrs
:
scc1% source activate gbrs
(gbrs)scc1%
You will find, as we see above, that your command prompt will change to indicate which environment is currently active. You can change the environment by activating another, even the root environment, but you can also deactivate
the environment, as we show next:
(gbrs)scc1% source deactivate
scc1%
This is the same as activating the ‘root’ environment, except now the prompt looks as it originally did.
Activating an Environment in csh/tcsh
In csh the activate script is not available, so you simply add the environment bin directory to your PATH environment variable. Where’s the environment installed? You can find the path printed next to the name in the output of a call to ‘conda info -e’ or when you create a new environment (more about this below) you’ll see a message in the output similar to the following:
Package plan for installation in environment /projectnb/scv/cjahnke/.conda/envs/gbrs
So you would want to add /projectnb/scv/cjahnke/.conda/envs/gbrs/bin
to your PATH environment variable:
scc1% setenv PATH /projectnb/scv/cjahnke/.conda/envs/gbrs/bin:$PATH
Example 2: Creating a new environment with conda
In this example we use the conda
create
command to create a new environment named py3
. This name can then be used later on to access this installation. Following the create
instruction, we list the packages that should be installed into the new environment, namely python 3.4, numpy, scipy and matplotlib.
scc1% module load miniconda
scc1% conda create -n py3 python==3.4 numpy scipy matplotlib
conda create -n mynewenv numpy scipy matplotlib
Fetching package metadata: ..
Solving package specifications: .............
Package plan for installation in environment /projectnb/scv/cjahnke/.conda/envs/mynewenv:
The following NEW packages will be INSTALLED:
dateutil: 2.1-py34_2
freetype: 2.4.10-0
libpng: 1.5.13-1
matplotlib: 1.4.0-np19py34_0
numpy: 1.9.0-py34_0
openssl: 1.0.1h-1
pyparsing: 2.0.1-py34_0
pyqt: 4.10.4-py34_0
python: 3.4.0-0
pytz: 2014.7-py34_0
qt: 4.8.5-0
readline: 6.2-2
scipy: 0.14.0-np19py34_0
sip: 4.15.5-py34_0
six: 1.8.0-py34_0
sqlite: 3.8.4.1-0
system: 5.8-1
tk: 8.5.15-0
zlib: 1.2.7-0
Proceed ([y]/n)? y
Linking packages ...
[ COMPLETE ] |##################################################| 100%
#
# To activate this environment, use:
# $ source activate mynewnev
#
# To deactivate this environment, use:
# $ source deactivate
#
The last instruction explains how to activate this environment. As explained above, this instruction only works if you are using Bash, for [t]csh you must manually modify your PATH environment variable.
Example 3: Cloning an existing environment with conda
scc1% module load miniconda
scc1% conda create -n new_root --clone root
src_prefix: '/share/pkg/anaconda/2.0.0/install'
dst_prefix: '/projectnb/scv/cjahnke/.conda/envs/new_root'
Packages: 129
Files: 31
Fetching package metadata: ..
Linking packages ...
[ COMPLETE ] |##################################################| 100%
#
# To activate this environment, use:
# $ source activate new_root
#
# To deactivate this environment, use:
# $ source deactivate
#
As with creating new environments, how you activate this new cloned environment depends on whether you’re using bash or [t]csh. After cloning and activating your own personal environment, you can modify it by installing new packages, either with conda
or pip
, as is described next.
Example 4: Moving an existing environment from your Home Directory to your Project Space
If you have created environments before configuring anaconda for the SCC then you will want to move them to your project space. First, create your ~/.condarc file. Then:
scc1% module load miniconda
scc1% source activate my_env1
(my_env1) scc1% conda list --explicit > my_env1_pkgs.txt
(my_env1) scc1% source deactivate
scc1% conda create --name my_env2 --file my_env1_pkgs.txt
scc1% source activate my_env2
Once the conda environment is recreated in the Project Space, remove the environment from the home directory using the following command:
scc1% conda remove --name my_env1 --all
Example 5: Increase Disk Space by Cleaning Anaconda Cached Packages
Anaconda stores an index cache, lock files, unused cache packages, and tarballs when packages are installed into environments. This is convenient for creating environments quickly when they contain similar packages as existing environments; however, you will want to remove these cached files if you have used Anaconda with your home directory on the SCC as all cached files should be in your Project Disk Space to keep your home directory under quota. If you have set up your ~/.condarc
with the information from Configuring Anaconda for the SCC then all future installs will only use Project Disk Space, protecting you from going over quota in your home directory. To remove (or clean) these cached files, run:
scc1% module load miniconda
scc1% conda clean -a
Example 6: Installing Python packages: When to use conda
and when to use pip
If you’ve created a custom Python environment with conda
, you can install your own packages using either pip
or conda
. When you use conda
, it downloads a pre-compiled version of the package you request. conda
has access to a limited set of packages that are either essential to research computing or are hard to compile, or both. pip
is a Python program to download and install Python packages from pypi.python.org, and it is available when you use the anaconda module. As opposed to conda
, pip
downloads, compiles, and installs the source code.
Which should you use? First try conda
and then pip
. If you have trouble installing a package, please contact us for help.
How do you install a package? Well, with conda
, there is an install
command. You can also specify a version by appending a package name with ‘==version-number’, for example numpy==1.7 will install numpy version 1.7. Here’s an example where we install pandas version 1.4 into our custom environment, named ‘mynewenv’:
scc1% module load miniconda
scc1% source activate mynewenv
(mynewenv)scc1% conda install pandas==0.14
Fetching package metadata: ..
Solving package specifications: .
Package plan for installation in environment /projectnb/scv/yannpaul/.conda/envs/py3:
The following packages will be downloaded:
package | build
---------------------------|-----------------
pandas-0.14.0 | np18py34_0 9.6 MB
python-2.7.13 | 4 22.6 MB
scipy-0.14.0 | np18py34_0 28.9 MB
setuptools-5.8 | py34_0 431 KB
------------------------------------------------------------
Total: 61.5 MB
The following NEW packages will be INSTALLED:
pandas: 0.14.0-np18py34_0
setuptools: 5.8-py34_0
xz: 5.0.5-0
The following packages will be UPDATED:
scipy: 0.14.0-np19py34_0 --> 0.14.0-np18py34_0
The following packages will be DOWNGRADED:
numpy: 1.9.0-py34_0 --> 1.8.2-py34_0
Proceed ([y]/n)? y
Fetching packages ...
pandas-0.14.0- 100% |################################| Time: 0:00:01 7.11 MB/s
scipy-0.14.0-n 100% |################################| Time: 0:00:00 35.84 MB/s
setuptools-5.8 100% |################################| Time: 0:00:00 1.79 MB/s
Extracting packages ...
[ COMPLETE ] |##################################################| 100%
Unlinking packages ...
[ COMPLETE ] |##################################################| 100%
Linking packages ...
[ COMPLETE ] |##################################################| 100%
Note that some packages needed to be downgraded, some needed to be upgraded, and others needed to be installed to get pandas
0.14 installed. Regardless, the whole process is automated.
Now you can do the same thing with pip
, but pandas would need to be compiled. pip
is useful, rather, for packages not available to conda
. Here as an example we also install pint
, a package to manage unit conversions, using the pip
program:
(py3)scc1% pip install pint
Downloading/unpacking pint
Downloading Pint-0.5.2.zip (134kB): 134kB downloaded
Running setup.py (path:/tmp/pip_build_yannpaul/pint/setup.py) egg_info for package pint
Installing collected packages: pint
Running setup.py install for pint
Successfully installed pint
Cleaning up...
There is additional information on available Python versions here.
Adding Conda Environment to Jupyter Notebook
To use an anaconda environment in Jupyter Notebook as a kernel, the environment needs to be registered with Jupyter as a kernel. To accomplish this, the anaconda environment needs to have the ipykernel
package installed. The installation of ipykernel
needs to be done in the terminal.
If creating a new environment, add ipykernel
to the conda create command:
scc1% module load miniconda
scc1% conda create -n py3 python==3.6 numpy scipy matplotlib ipykernel
If you already have the conda environment created but don’t have ipykernel
installed within that environment, then first activate the environment and use the conda install
command to install ipykernel
.
scc1% conda activate py3
(py3) scc1% conda install ipykernel
Next, use the ipykernel
package to register the activated environment with Jupyter. Make sure the environment you want to be registered is activated first. For the flag --name
it is suggested that you use the same name as the conda environment, as this will be the kernel name listed in Jupyter that you will need to select.
scc1% conda activate py3
(py3) scc1% python -m ipykernel install --user --name py3
If successful, this will output a message indicating that the kernel was successfully install in a sub-directory of .local
.
Installed kernelspec py3 in /usr1/scv/yannpaul/.local/share/jupyter/kernels/py3
Jupyter is now aware of the Conda Environment. Start an instance of Jupyter Notebook server and then create a new notebook by clicking “New” (top right corner). The Conda Environment that was just registered is now listed in the drop down menu.