- Where Should you Store your Files?
- Checking How Much Space you are Using
- Transferring Files To and From the SCC
- Working with Files/Directories under Linux
- Controlling Access to your Files
- Sharing Files with Colleagues Outside BU
- Recovering Lost Files
- Tar and Compressed Files
Where Should you Store your Files?
Users on the SCC are automatically granted several locations to store their files. Our overall file storage system is described here. Most users will be primarily storing files in three areas, all of which are generally accessible from all of the login and compute nodes; the exception is that the /restricted/ partitions are only accessible from the scc4.bu.edu login node and all of the compute nodes:
- Home Directory – This directory is entirely controlled by you and the default permissions are that nobody else can see or otherwise access your files. Home directories have a quota of 10 GB and this will generally not be increased. You will naturally store files directly related to your account here, such as dotfiles. It is also commonly used to store personal files, such as email or personal images. Although it is possible to do work in your home directory if it fits within the 10GB limit, we recommend you use Project Disk Space in case you end up needing more space than you anticipate. Home directories are both protected by Snapshots and also backed up off site.
- Backed Up Project Disk Space – Projects are by default granted 50 GB of space under
/project/project_name/(or/restricted/project/project_name/for most BUMC projects). This number can be increased to a maximum of 200 GB at the request of the project leader(s) but it can not go beyond that. This data is both protected by Snapshots and also backed up off site. Depending on the workflow of the project, a reasonable approach is to keep code and files you hand-edit in/project/and files downloaded or generated by code or applications in/projectnb/. - Not Backed Up Project Disk Space – Projects are by default granted 50 GB of space under
/projectnb/project_name/(or/restricted/projectnb/project_name/for most BUMC projects). This can be increased for free to a maximum total allocation of free disk space of 1000 GB and then beyond that additional Not Backed Up space can be purchased through either Buy-In or Storage-as-a-Service. Despite the name for this space, it is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). You will want to use this space for any large quantities of data you have. We have guidelines for what data should be stored in each partition. - You can see which projects you belong to by running the command
groupsand you can see how much space each of them have available and where by runningpquota. Note that there are a few special groups like gaussian that do not have any disk space associated with them;pquotawill tell you where you have disk space you can use. However, for those with space in/restricted/project/or/restricted/projectnb/groupswill list the directories as/rproject/project_name/and/rproject/project_name/but you must type the full/restricted/...to access them.
Checking How Much Space you are Using
Use the command pquota to see your quota and usage:
scc1% pquota -u animate
quota usage usage
project space (GB) (GB) (files)
----------------------------------- ------ --------- --------
/project/animate 50 0.00 1
/projectnb/animate 50 3.45 4328
15407 0.09 80
73043 0.25 61
82363 0.11 243
dcornell 0.29 104
laura 1.02 2114
rcrnl 1.68 1723
root 0.00 3
The -u option asks for a breakdown of usage by the users on the project, in addition to the default project totals. Information on quota (in GB), usage (in GB), and number of files is given for each partition the selected project group has access to. If there are any numbers instead of login names in the list, as in the example above, they refer to files owned by users who had accounts on the system long ago. Note that the pquota data on most filesystems is updated every five minutes so if you delete some files, you will need to a wait a few minutes to see the change reflected by the command.
A project’s Lead Project Investigator (LPI) or IT/Admin Contact can request that we delete or make accessible to him or her any files in a given project’s Project Disk Space or STASH areas. This request should be sent to help@scc.bu.edu.
If your project needs more space, the project LPI or IT/Admin Contact can request additional space but there is a charge for requests over 1000 GB.
The command for home directories to show quota (10 GB for almost all users) and usage is quota -s:
scc1% quota -s
Home Directory Usage and Quota:
Name GB quota limit in_doubt grace | files quota limit in_doubt grace
adftest2 0.00212 10.0 11.0 0.0 none | 287 0 0 0 none
The important items are highlighted in yellow. They show your usage (in GB), quota (in GB), hard limit (in GB) and number of files you have. You can exceed your quota but not your hard Limit for a period of 7 days. You should NEVER go up to your hard limit or you may be unable to log in as you will be unable to write any files and this usually causes a problem when trying to log in. Note that the quota data is only updated every five minutes so after you delete some files, you will need to a wait a few minutes to see the change reflected by the command.
You can see which directories, files, and dotfiles are taking up the most space in your home directory by running du -s .[^.]* * | sort -n; the largest items will be listed last:
scc1% du -s .[^.]* * | sort -n
...
32 helloWorld.o9436314
224 newdir
1760 .matlab
Transferring Files To and From the SCC
Please consult the appropriate instructions based on the operating system of the machine you are using to connect to the SCC.
Another option for file transfer is Globus Online which allows for transfer between your desktop/laptop and the SCC and also allows you to access data stored on a variety of national research clusters.
Finally, there is also the SCC Data Transfer Node which also runs on scc-globus.bu.edu but does not require using the Globus Online framework.
Working with Files/Directories under Linux
If you are not familiar with the commands for working with files and directories under Linux, please consult our Getting Started section, in particular the pages on commands and filesystem navigation.
Controlling Access to your Files
You can determine who, if anyone, can have read, write, and/or execute permission to your files using the commands chmod and umask. You can limit/allow access to each of your files/directories to yourself, your collaborators on a given research project, and/or all users of the system. The default behavior is that only you can modify the files/directories you create but others can read and, if applicable, execute them if they have access to the directory in which they are stored. The default is that others do not have any access to your home directory but your group members do have access to the Project Disk Space belonging to the project group.
People who are not members of your project group will not be able to access any files belonging to the group in general. For non-/restricted directories, the Lead Project Investigator for a project can have us change this default so that outside users can have access to files belonging to the group. More details on this policy are available.
Sharing Files with Colleagues Outside BU
If you have files that you wish to share (in either direction) with a colleague that is not at BU, either you or your colleague will need to get an account at the other person’s institution as in order to transfer files between machines, you must have an account on both sides of the transfer. If you are the LPI of an SCC project, you can add your colleague to your project as an external user or you could ask them to add an account for you on their system. You can then follow instructions on doing file transfers from or to the SCC.
For small files, using email may also be an alternative.
Recovering Lost Files
Every night starting at 12:01am copies are made of your files using Snapshots. This feature will let you recover files you mistakenly delete or overwrite. Snapshots are implemented for Home Directories, Project Disk Space, and STASH space. Follow the example here to recover your files. Note that there is generally no way to recover a file you just created; the file(s) must have had a chance to be snapshotted overnight.
Tar and Compressed Files
There are four main archiving (combining multiple files into one archive file) and compression (reducing the size of a file) tools on Linux systems with associated tools to reverse the process. It is common to both archive a set of files and then compress the archive, such as a file named myarchive.tar.gz
| Archive/ Compression Tool | Unarchive /Uncompression Tool | Archived/ Compressed Filename Extension | Purpose |
|---|---|---|---|
| gzip | gunzip | .gz | Compress a large single file |
| compress | uncompress | .Z | Compress a large single file |
| zip | unzip | .zip | Creating a single compressed file from a group of files |
| tar -c | tar -x | .tar | Creating a single file from a group of files |
Usage of the first two tools (gzip and compress) and their counterparts is straightforward:
scc1% ls
myfile
scc1% compress myfile # or replace with gzip
scc1% ls
myfile.Z
scc1% uncompress myfile.Z # or replace with gunzip of a .gz file
scc1% ls
myfile
On Linux systems, tar is used much more commonly for archiving sets of files than zip. However, you may very well come across ZIP archives, often generated on Windows, which you will need to unzip:
scc1% unzip example.zip
Archive: example.zip
creating: Packet1/
creating: Packet2/
scc1% ls
Packet1/ Packet2/ example.zip
Using Tar
Tar has many options. Shown below is an example of generating and then expanding a simple archive. Note that if you have a compressed tar file such as myarchive.tar.gz you will generally first want to uncompress it using the appropriate tool above and then untar it.
scc1% ls mydir
file1 file2
scc1% tar -cvf mydir.tar mydir/ # Generate the archive mydir.tar from the directory mydir and all of its contents.
mydir/
mydir/file1
mydir/file2
scc1% rm -r mydir # Remove the original directory for now
scc1% ls
mydir.tar
scc1% tar -tvf mydir.tar # Look at what is in the archive file.
drwx------ aarondf/scv 0 2014-04-30 13:26 mydir/
-rw------- aarondf/scv 8 2014-04-30 13:25 mydir/file1
-rw------- aarondf/scv 10 2014-04-30 13:26 mydir/file2
scc1% tar -xvf mydir.tar # Expand the archive file in my current directory.
mydir/
mydir/file1
mydir/file2
scc1% ls *
mydir:
file1 file2
mydir.tar
