Data Submission Flows

There are 2 options for submitting datasets through the Dataset Exchange API -- one for those with existing pre-signed URLS and one for users who want to request a pre-signed URL where they can load a dataset.

These flows are available after you've completed Authentication to the Dataset Exchange API.

Use case 1: You have a dataset (JSON or CSV) available at a pre-signed URL.

If you already have your datasets (formatted as either JSON or CSV) available via a pre-signed URL*, you can load it directly to the Dataset Exchange API:

  1. Hit POST /datasets to create the new dataset in our database and define the schema.
  2. Once you've received an id for your newly created dataset (in the response body of step 1), you'll hit POST /datasets/:id/records:load with your pre-signed URL in the body of the request.
  3. You're done!

*Pre-signed URLs can be from AWS S3 or GCP Cloud Storage.

Use case 2: You do NOT already have a dataset (JSON or CSV) available at a pre-signed URL.

If you need to load your dataset to a pre-signed URL you can request an upload URL from the Dataset Exchange API.

In order to generate a pre-signed URL and load a dataset you'll need to:

  1. Hit POST /datasets to create the new dataset in our database and define the schema.
  2. Once you've received an id for your newly created dataset (in the response body of step 1), you'll hit POST /datasets/:id/uploadUrl with a string specifying the content type of your dataset. Content type options are CSV or JSON. The /uploadUrl endpoint will return a pre-signed URL to a GCS bucket where you can load your dataset. Pre-signed URLs are only valid for 1 hour.
  3. You'll load your dataset to the pre-signed URL you received in step 2.
  4. You're done!