How to use BaseSpace CLI to upload FASTQs to BaseSpace Sequence Hub

03/23/22


BaseSpace Sequence Hub (BSSH) is the Illumina cloud-based platform for data management, storage, and analysis. In addition to uploading instrument run data, locally generated sample data in the form of demultiplexed FASTQs that meet the file upload requirements can also be imported to a project for use as input for data analysis applications. The BSSH web importer allows for single sample uploads with a maximum size of 250 GB and 16 files per upload. In order to upload multiple samples or larger files, the BaseSpace CLI tool is required to communicate directly through the BSSH API. Note that using BaseSpace CLI requires familiarity with operating in a command line environment.

BaseSpace Command Line Interface (CLI)

Available for Linux, Windows, and Mac OS X, this tool allows for uploading of data directly to an existing project from the command line. The project ID number is required and can be obtained from either the URL of the project in the BSSH web user interface (such as https://basespace.illumina.com/projects/53489437 where 53489437 is the project ID number) or by generating a list of projects with BaseSpace CLI using the command: bs list projects

Installation note: If your system does not have the wget download tool installed, the bs executable can be manually downloaded using direct download link using the following links for the approproate platform Linux, Mac, or Windows.

For all platforms, the basic FASTQ upload command is:

bs dataset upload -p {ProjectIDNumber} --recursive {PathToFiles}

The path to the files can contain multiple folders. If uploading from within a folder containing all of the FASTQs to be uploaded, with the period in the following command meaning “this folder”, an example command is:

bs dataset upload -p 53489437 --recursive .

Notes:

  • Merged lane FASTQs produced by bcl2fastq using the --no-lane-splitting command line option are not immediately compatible for upload. To upload merged lane files, change the file name to include a lane number so that the file matches the format of SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz and add the --allow-invalid-readnames option to the CLI upload command. Merged lane files cannot be uploaded with the BaseSpace Sequence Hub web importer.
  • If you receive an error about files not matching the Illumina naming convention, make sure that the file name is formatted exactly as in the example above. Extra underscores in the SampleName will cause an upload failure as underscores are the delimiter for the rest of the file name.
  • CLI upload defaults to FASTQ as the file type. Other file types such as BAMs, VCFs, and BEDs can be uploaded with CLI as well by adding to the command line:  --type common.files
    Note that FASTQs cannot be uploaded with the common.files option, so must be uploaded separately from other file types.
  • Other special upload scenarios are discussed on the CLI Examples page.