Data Submission Flows
There are 2 options for submitting datasets through the Dataset Exchange API -- one for those with existing pre-signed URLS and one for users who want to request a pre-signed URL where they can load a dataset.
These flows are available after you've completed Authentication to the Dataset Exchange API.
Use case 1: You have a dataset (JSON or CSV) available at a pre-signed URL.
If you already have your datasets (formatted as either JSON or CSV) available via a pre-signed URL*, you can load it directly to the Dataset Exchange API:
- Hit
POST /datasets
to create the new dataset in our database and define the schema. - Once you've received an
id
for your newly created dataset (in the response body of step 1), you'll hitPOST /datasets/:id/records:load
with your pre-signed URL in the body of the request. - You're done!
*Pre-signed URLs can be from AWS S3 or GCP Cloud Storage.
Use case 2: You do NOT already have a dataset (JSON or CSV) available at a pre-signed URL.
If you need to load your dataset to a pre-signed URL you can request an upload URL from the Dataset Exchange API.
In order to generate a pre-signed URL and load a dataset you'll need to:
- Hit
POST /datasets
to create the new dataset in our database and define the schema. - Once you've received an
id
for your newly created dataset (in the response body of step 1), you'll hitPOST /datasets/:id/uploadUrl
with a string specifying the content type of your dataset. Content type options are CSV or JSON. The/uploadUrl
endpoint will return a pre-signed URL to a GCS bucket where you can load your dataset. Pre-signed URLs are only valid for 1 hour. - You'll load your dataset to the pre-signed URL you received in step 2.
- You're done!
Updated about 1 month ago
What’s Next