Skip to content

Frequently Asked Questions#

Access#

Can I Access Elm from Sherlock and/or SCG?#

Yes. Although Elm is not mounted on Sherlock or SCG directly, you can use the S3 API to access Elm using many of the same tools you'd use to interact with common cloud vendor storage like rclone, aws-cli, and s5cmd. All these systems managed by Stanford Research Computing, including Elm, are connected to the Stanford Research Network at 100Gb/s. See our Elm User Guide page for more information.

Is Elm cold storage accessible through existing gateway access?#

No. The Elm core system can be accessed via S3 and the MinIO console only. The Elm team focuses on those two methods of access, and more especially the S3 MinIO API. In addition, Stanford Research Computing maintains various Globus connectors (AWS, Google Cloud) and we recently added a new one for Elm. No other methods of access/gateways to Elm are provided, but there are many tools that support S3 that you can use, like rclone.

I'm the administrator for my Elm workgroup; how does Elm handle access control and file sharing?#

Stanford Workgroup Manager is used to provide you with three Stanford Workgroups which allow you manage varying levels of access to your bucket. These are automatically created for you based on your bucket name and risk classification. Workgroups define who can read, write, or upload data to the bucket. For example, a bucket called examplebucket in Moderate-Risk will generate:

  • elm:p-campus_examplebucket

    • Read-only membership: Members can view objects but cannot create or modify them.
  • elm:p-campus_examplebucket_uploader

    • Read and limited write membership: Members can view and upload objects, but cannot delete them.
  • elm:p-campus_examplebucket_editor

    • Full read-write membership: Members can view, upload, and delete objects.

Note

High-Risk buckets prepend elm-hr: instead of elm: and use campus.elm-hr.stanford.edu for the URL.
The bucket’s workgroups become elm-hr:p-campus_<bucket>, etc.

Workgroup administrators can add or remove people in these workgroups via Stanford Workgroup Manager, ensuring your team has the correct level of access.

What if I want more granular control over who accesses the data?#

On Elm, each bucket comes with corresponding workgroups for pre-defined access control based on common roles. Please see the additional admin section above. If finer-grained access control is required for the data, it may be useful to create separate buckets, each with its own specific workgroups.

Moving Data#

Can I access Elm from my desktop/laptop?#

Yes, please see the Elm User Guide page.

Can I mount Elm on my own computer at Stanford?#

Yes. Tools like rclone, allow you to mount S3 storage targets like Elm.

How does data moves through Elm?#

Though Elm presents itself as a unified file system, there are many components behind the scenes that facilitate the journey your data takes when it is sent to a bucket. When data is initially received, it is directed to a resilient disk tier based on the same trusted technology as our Oak service. This ensures that initial uploads and reads are relatively quick — at least, for an archival storage platform! Typically, data deposited into Elm will remain on this disk tier for several days before being offloaded to its permanent home in Elm's LTO tape library. Once your data is at rest on tape, it will take significantly longer to retrieve.

Are there any data limitations I should know about?#

Some limits to consider with Elm:

  • Object limits: 2k objects (files) per TiB quota purchased, but we recommend keeping it closer to 1,000 objects per TiB as it works better in practice.
  • Upload limits: 5 TiB object size limit. This limitation is inherent to the S3 protocol utilized by Elm.

Should You Pack Your Files Before Transferring to Elm?#

Don't pack, transfer the files directly: If you have relatively few, larger files (less than 1,000 files per terabyte), you can use Globus or an s3 client like rclone to transfer your files directly to Elm

Do pack first: If you have many small files (more than 1,000 files per terabyte) Use elm_archive on Sherlock or Oak to help you pack and transfer your data

Why does this matter?#

Transferring thousands of tiny files one-by-one is slow and inefficient. Packing them into larger bundles first, will make the transfer much faster and restores will be more efficient.

Important tip about packing:

Pack files in groups you'd likely want to restore together later. For example:

  • Pack each user's data in separate folders
  • Pack each project separately

Not sure how big your data is?#

Run the ncdu command on Sherlock or Oak. It will scan through all your files and provide you the total disk usage, apparent size, and number of objects/files

Example:

$ ml system ncdu

$ ncdu -t20 /oak/stanford/groups/bprogers/vnwong

  • Total disk usage: 500 GiB Apparent size: 122.0 MiB Items: 2,304

    • With total disk size of 500 GiB and 2,304 files, I would use elm_archive to transfer my data since the number of files is over 1,000.
  • Total disk usage: 500 GiB Apparent size: 122.0 MiB Items: 30

    • With total disk size of 500 GiB and 30 files, I would use Globus to transfer directly to Elm

What is the data retrieval time from Elm?#

Files uploaded to Elm are guaranteed to remain accessible for at least one month on a storage disk tier, allowing time for data management and verification. After this period, some or all files will only be retrievable from Elm's tape backend, resulting in longer access times. Retrieval can take a few minutes to a few hours for single files, depending on system load and user demand, as Elm automatically loads the relevant tape to read the data back to disk. Restoring large datasets, such as directories with hundreds of files or terabytes of data, could take many hours or even days. Users needing mass restores should contact us at srcc-support@stanford.edu for assistance.

What's involved with data retrieval on Elm?#

For the sake of simplicity, let's say you stored a single 5 GiB file on Elm and need to retrieve it a year later. When you initiate the download, Elm will:

  • Locate which tape (or tapes) the file resides on
  • Send a robot to retrieve the tape from its physical storage location
  • Load the tape into a tape drive to read the data
  • Verify that the data is complete and matches what you originally uploaded
  • Copy your 5 GiB file to the faster disk tier to cache it in case you need to download it again soon
  • Notify the program you're using that the file is ready for download

From your perspective, it will still appear as a single filesystem, but it’s helpful to know what’s happening behind the scenes. While this technology allows us to offer very low-cost archival storage, you can probably already infer that restoring thousands or even millions of files can take a very long time due to the mechanical operations inherent to tape storage and retrieval. This is exactly why we encourage storing data on Elm using fewer large files, rather than simply copying an entire existing folder structure. The extra effort to prepare your data up front will pay dividends if and when you eventually need to retrieve it.

Why do I have to use 5GiB chunk size?#

Please see Minimum Chunk Size for details.

I'm using elm_archive, should I change the label name each time I run an upload?#

When using elm_archive, the name of the tar files uploaded to Elm is derived from the label provided by the --label option. In most cases, we recommend allowing elm_archive to automatically select the label from the source path rather than specifying one manually. If your source names are similar, it's crucial to choose the label carefully, as renaming objects in Elm after they are uploaded requires creating a full copy. However, if you make a mistake with the label, rest assured that your data won’t be overwritten since Elm buckets on campus instances are versioned.

Can I use the same directory target each time I run elm_archive?#

You can use the same destination argument each time, assuming the label is changing. All tar files will be available in the target directory.

Backup#

Is Elm backed up?#

No, Elm is not backed up. While Elm has redundancy features like erasure coding, it’s still located on campus. This means Elm should be considered a single copy of your data. If there’s a big disaster (think earthquakes), there’s a chance you might lose access. For extra-important data, it’s a good idea to keep a second copy on another system.

Performance#

How fast are reads and writes?#

Since Elm is designed for archiving data, it performs fastest when you are adding data to your bucket. Because retrieving data from Elm is expected to be infrequent, retrieving data will take longer. As with Oak, Elm operates as a shared storage system, and its performance may be impacted by concurrent activity from other users.

Performance will be dependent on many factors, including current Elm load, but we have seen upload rates of ~500MB/s to Elm via Globus when connected to the Stanford Research Network using a high performance link (10Gb/s or more). This is probably an upper transfer rate limit for Elm, and actual performance may be lower or may fluctuate depending on load and other factors.

In our early access phase, we've observed transfer speeds between Oak and Elm (using official Globus endpoints) ranging from 250-500MB/s with file sizes averaging ~100MB. Transfers of many small files into Elm can be significantly slower, so we highly recommend preparing data that will be sent to Elm with tools like tar or elm_archive. For optimal performance, we recommend file sizes averaging 100MB, but the larger, the better.

Please note that file permission issues may arise with elm_archive. To resolve this, send an email to srcc-support@stanford.edu with approval from the PI granting read permission for all files to the specific user running elm_archive.

Billing#

How is Elm storage billed?#

Unlike other cloud storage providers that charge you based on how much you use, Elm charges a flat fee based on the size of your bucket. For instance, similar to Oak storage, if you buy 50TB of Elm storage but only use 30TB, you'll still pay for the full 50TB. You are always welcome to reduce or extend your existing Elm storage by submitting a ServiceNow ticket.

Quotas#

How are Elm quotas measured?#

Elm quotas are measured in allocations of 1 TiB. Your quota is a hard limit that you set before you upload data to Elm. This ensures you never get a surprise storage bill and unlike cloud competitors, ongoing costs can be predictably managed and forecasted.

Directory#

Can I create a directory in S3?#

There is no concept of a directory in S3, just object keys and key prefixes. An S3 key may have a '/' character in it that serves as a useful analog to a directory path: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html

Under the hood, MinIO is backed by a real filesystem so it is in fact capable of creating empty directories, but there's no guarantee empty directories will be visible at the S3 layer: https://blog.min.io/prefix-vs-folder/

News & Alerts#

How can I stay up-to-date on Elm?#

We highly recommend that all Elm users join the SRCC Slack workspace (SUNet ID Required) and follow the #elm-announce channel to stay abreast of system updates, maintenance, or outages. #elm-users is best for sharing best practices and asking questions among a welcoming community of other Elm users.

Email notification for system maintenance and issues#

Elm administrators maintain a mailing list called Elm-announce. This very low-traffic list is used for official announcements such as planned maintenance. Elm users are automatically subscribed to this list by default. For more information, please see the following page:

Elm-announce List Information