CyVerse_logo

Home_Icon Learning Center Home

Managing your data in the cloud

Description:

After you’ve become familiar with downloading data from NEON Data API, or from other resources on the internet, into your cloud instances, you’re going to be in a situation where you need to move them and store them somewhere more permanently.

Its important to accept that many of these public data repositories are stable and that data will be available from them in the future.

This means that you should not create copies of original data unless you are in a situation where the data are very large and downloading them again is prohibitive of your time.

connect_de


Setting up iCommands

Data Store Guide

CyVerse Data Store uses a platform called iRODS to manage its data. iRODS has a command line application called iCommands for moving data over the terminal.

First, we need to initiate a connection to the CyVerse iRODS.

1. In the Terminal type in iinit

This should echo out a set of information in the terminal:

One or more fields in your iRODS environment file (irods_environment.json) are missing;
please enter them.

Enter the host name (DNS) of the server to connect to:

2. Enter in the following data for each field:

Enter the host name (DNS) of the server to connect to: data.cyverse.org
Enter the port number: 1247
Enter your irods user name: user_name
Enter your irods zone: iplant
Those values will be added to your environment file (for use by
other iCommands) if the login succeeds.

Enter your current iRODS password:
  • host name (DNS): data.cyverse.org

  • port number: 1247

  • irods user name: <your CyVerse username>

  • irods zone: iplant

  • current iRODS password: <your current password>

3. You should now be authenticated to the Data Store.

To test, try typing ils

If you do not echo back anything, try Step 2. again

Uploading with iCommands

4. Type in ils

rstudio@a4bdcc31:~$ ils

/iplant/home/username:
C- /iplant/home/username/analyses
C- /iplant/home/username/NEON_Downloads

You should now see the contents of your personal Data Store

5. Upload a single file to the Data Store using iput

You need to select the file you want to copy, and the location in the Data Store you want to copy it to.

iput -KPvf /home/rstudio/neon-shiny-browser/background.R /iplant/home/username/NEON_Downloads/

This command will take a single file background.R and copy it from the container to the Data Store folder /iplant/home/username/NEON_Downloads/

The flags K, P, v, and f are described in the help file.

6. Upload a folder with recursive sub-folders and files

Next, we want to upload an entire directory with many folders and files in it.

iput -KPbrvf /home/rstudio/NEON_Downloads/NEON_HARV_DP1.30003.001_2019 /iplant/home/<your-user-name>/NEON_Downloads/

I have added the flags b for bulk, and r for recursive to the iput command. This will upload the entire directory NEON_HARV_DP1.30003.001_2019 to the data store.

7. The P flag for Progressive and v flag for verbose will echo out the progress of the upload until it completes.

When it is complete, the terminal should be available again.

To test whether your files are now in CyVerse try:

ils /iplant/home/<your-user-name>/NEON_Downloads/

# and then

ils /iplant/home/<your-user-name>/NEON_Downloads/NEON_HARV_DP1.30003.001_2019

You should be able to see the contents of your directory in the Data Store

8. These files are now in your private user space. No one can see them, but if you did want to share them, you can do so by modifying their permissions directly in the Discovery Environment, as shown in Step 1, or by using the following commands:

ichmod

Follow the instructions in the help menu to set the user privileges and ownership.

This example makes your data directory public on the internet as a read-only archive:

ichmod read anonymous /iplant/home/<your-user-name>/NEON_Downloads/

Downloading with iCommands

It is also likely that you’re going to download data from the Data Store into your running Apps

9. Use the ils command to look for some shared data in the Data Store

ils /iplant/home/username/NEON_Downloads

10. Download a file using iget

iget -KPvf /iplant/home/username/NEON_Downloads/benchmarking.rmd

This should download an Rmd file into your local instance (whatever current working directory you’re in in terminal)

11. Download a directory using iget

time iget -KPbvrf /iplant/home/username/NEON_Downloads/NEON_HARV_DP1.30003.001_2019/

Here we’re using the time flag to tell us how long the download takes

Downloading with WebDav

CyVerse Data Store also uses WebDav, an https based protocol for read-only data downloads from the Data Store.

We can use wget or curl commands in the terminal to download files this way.

12. Download a directory using wget

time wget -r -nH --cut-dirs=5 --no-parent -l8 --reject="index.html*" https://data.cyverse.org/dav-anon/iplant/home/username/NEON_Downloads/NEON_HARV_DP1.30003.001_2017/

again, we’re using the time function to monitor the download speeds.

We’re also using some wget flags to just get the data and folders back from the Data Store.

Other Services: Downloading with S3

Many organizations are hosting data on Amazon Web Services S3, Google Cloud Storage, or Microsoft Azure.

Cloud buckets, like S3, use HTTPS protocols, just like WebDav.

OpenTopography.org (re)hosts some NEON lidar data, e.g. NEON D17 Pacific Southwest- California

We can download these using their Point Cloud Bulk Data Download option:

aws s3 cp s3://pc-bulk/NEON_D17/ . --recursive --endpoint-url https://opentopography.s3.sdsc.edu --no-sign-request


Fix or improve this documentation


Home_Icon Learning Center Home