Creating a dataset is a two-step process:
Make a request to create the dataset.
Make a request to upload data to the dataset.
After the initial data upload, you can upload new data to the dataset to create a new version of the dataset.
Create the dataset
Create a dataset by sending a POST
request to the
datasets endpoint:
https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets
Pass a JSON body to the request defining the dataset. You must:
Specify the
displayName
of the dataset. The value ofdisplayName
must be unique for all datasets.Set
usage
toUSAGE_DATA_DRIVEN_STYLING
.
For example:
curl -X POST -d '{ "displayName": "My Test Dataset", "usage": "USAGE_DATA_DRIVEN_STYLING" }' \ -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $TOKEN" \ "https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets"
The response contains the ID of the dataset, in the form
projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID
along with additional information. Use the dataset ID when making requests to
update or modify the dataset.
{ "name": "projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46", "displayName": "My Test Dataset", "usage": [ "USAGE_DATA_DRIVEN_STYLING" ], "createTime": "2022-08-15T17:50:00.189682Z", "updateTime": "2022-08-15T17:50:00.189682Z" }
Upload data to the dataset
After you create the dataset, upload the data from Google Cloud Storage or from a local file to the dataset.
The upload operation is asynchronous. After you upload the data, the data is ingested and processed. That means you must make an HTTP GET request to monitor the state of the dataset to determine when the dataset is ready to use or if there were any errors. For more information, see Get data processing state.
Upload data from Cloud Storage
You upload from Cloud Storage to your dataset by sending a POST
request to the
datasets endpoint that also
includes the ID of the dataset:
https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID:import
In the JSON request body:
Use
inputUri
to specify the file path to the resource containing the data in Cloud Storage. This path is in the formgs://GCS_BUCKET/FILE
.The user making the request requires the Storage Object Viewer role, or any other role that includes the
storage.objects.get
permission. For more information about managing access to Cloud Storage, see Overview of access control.Use
fileFormat
to specify the file format of the data as either:FILE_FORMAT_GEOJSON
(GeoJson file),FILE_FORMAT_KML
(KML file), orFILE_FORMAT_CSV
(CSV file).
For example:
curl -X POST -d '{ "gcs_source":{ "inputUri": "gs://my_bucket/my_csv_file", "fileFormat": "FILE_FORMAT_CSV" } }' \ -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \ -H "content-type: application/json" \ -H "Authorization: Bearer $TOKEN" \ "https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:import"
The response is in the form:
{ "name": "projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID@VERSION_NUMBER" }
Upload data from a file
To upload data from a file, send an HTTP POST
request to the
datasets endpoint that also
includes the ID of the dataset::
https://mapsplatformdatasets.googleapis.com/upload/v1/projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID:import
The request contains:
The
Goog-Upload-Protocol
header is set tomultipart
.The
metadata
property specifying the path to a file that specifies the type of data to upload, as either:FILE_FORMAT_GEOJSON
(GeoJSON file),FILE_FORMAT_KML
(KML file), orFILE_FORMAT_CSV
(CSV file).The contents of this file have the following format:
{"local_file_source": {"file_format": "FILE_FORMAT_GEOJSON"}}
The
rawdata
property specifying the path to the GeoJSON, KML, or CSV file containing the data to upload.
The following request uses the curl -F
option to specify the path to the two
files:
curl -X POST \ -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \ -H "Authorization: Bearer $TOKEN" \ -H "X-Goog-Upload-Protocol: multipart" \ -F "metadata=@csv_metadata_file" \ -F "rawdata=@csv_data_file" \ "https://mapsplatformdatasets.googleapis.com/upload/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:import"
The response is in the form:
{ "name": "projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID@VERSION_NUMBER" }
Get data processing state
The upload operation is asynchronous. That means after the API call to upload the data to the dataset returns, you must then poll the dataset to determine whether data ingestion and processing succeeded or failed.
To determine the state
of the
dataset, use Get a dataset. For example, while the data is being
processed, the state
is set to STATE_PROCESSING
. When the dataset is ready
to use in your app, the state
is set to STATE_COMPLETED
.
For example, make a GET call on the dataset:
curl -X GET \ -H "X-Goog-User-Project: PROJECT_NUMBER_OR_ID" \ -H "Authorization: Bearer $TOKEN" \ "https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46"
For a successful upload, the state
of the dataset is STATE_COMPLETED
:
{ "name": "projects/119757857/datasets/f57074a0-a8b6-403e-9df1-e9fc46", "displayName": "My Test Dataset", "description": " ", "versionId": "837c5a9e-c885-4a5f-a462-7e35673e5218", "usage": [ "USAGE_DATA_DRIVEN_STYLING" ], "localFileSource": { "filename": "Parks_Properties_20240529.csv", "fileFormat": "FILE_FORMAT_CSV" }, "createTime": "2024-05-30T16:41:11.130816Z", "updateTime": "2024-05-30T16:41:14.416130Z", "versionCreateTime": "2024-05-30T16:41:14.416130Z", "status": { "state": "STATE_COMPLETED", }, "sizeBytes": "6916924", "downloadable": true }
When data processing fails, state
is set to a value other than
STATE_COMPLETED
, such as STATE_PUBLISHING_FAILED
or any status ending in the
string _FAILED
.
For example, you upload data to a dataset and then make a GET
request to get the dataset details. Along with the state
property, the
response also includes a single errorMessage
property containing a description
of the error.
{ "name": "projects/119757857/datasets/f57074a0-a8b6-403e-9df1-e9fc46", "displayName": "My Test Dataset", "description": " ", "versionId": "837c5a9e-c885-4a5f-a462-7e35673e5218", "usage": [ "USAGE_DATA_DRIVEN_STYLING" ], "localFileSource": { "filename": "Parks_Properties_20240529.csv", "fileFormat": "FILE_FORMAT_CSV" }, "createTime": "2024-05-30T16:41:11.130816Z", "updateTime": "2024-05-30T16:41:14.416130Z", "versionCreateTime": "2024-05-30T16:41:14.416130Z", "status": { "state": "STATE_PUBLISHING_FAILED", "errorMessage": "INVALID_ARGUMENT: Skipping row because address could not be geocoded: 5521 18 AVENUE (from line 79)" }, "sizeBytes": "6916924", "downloadable": true }
Get data processing errors
When data ingestion and processing fails, the errorMessage
property contains a
single message describing the error. However, a single error message doesn't
necessarily provide sufficient information to identify and fix the problems.
To get complete error information, call the
fetchDatasetErrors
API. This API returns all data processing errors associated with a dataset:
curl -X GET \ -H "X-Goog-User-Project: PROJECT_NUMBER_OR_ID" \ -H "Authorization: Bearer $TOKEN" \ "https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:fetchDatasetErrors"
The response contains the errors
array. This array contains up to 50 errors of
type Status
per call, and supports up to 500 errors in total:
{ "nextPageToken": "cigKJkIkMTU3MzM0NjQtYzlmMy00YzYxLWIxM2YtYmVkYjFjYjRkYzRj", "errors": [ { "code": 3, "message": "INVALID_ARGUMENT: No address was derived from fields 2. (from line 631)" }, { "code": 3, "message": "INVALID_ARGUMENT: No address was derived from fields 2. (from line 457)" }, { "code": 3, "message": "INVALID_ARGUMENT: No address was derived from fields 2. (from line 31)" }, ... ] }
If there are more than 50 errors, meaning more than one page of
errors, then the response contains a page token in the nextPageToken
field.
Pass that value in the pageToken
query parameter of a subsequent call to get
the next page of errors. When nextPageToken
is empty, there are no more pages.
For example, to get the next page of errors using the token from the previous response:
curl -X GET \ -H "content-type: application/json" \ -H "X-Goog-User-Project: PROJECT_NUMBER_OR_ID" \ -H "Authorization: Bearer $TOKEN" \ "https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:fetchDatasetErrors?pageToken=cigKJkIkMTU3MzM0NjQtYzlmMy00YzYxLWIxM2YtYmVkYjFjYjRkYzRj"
By default, the response contains a maximum of 50 errors per page. Use
the pageSize
query parameter to control the page size.
Upload new data to the dataset
After you create the dataset and upload the initial data successfully, the state
of the dataset is set to STATE_COMPLETED
. That means the dataset is ready to
use in your app. To determine the state
of the dataset, see Get a
dataset.
You can also upload new data to the dataset to create a new version of the dataset. To upload new data, use the same process as you did to Upload data from Cloud Storage or Upload data from a file, and specify the new data to upload.
If the new data uploads successfully:
The state of the new version of the dataset is set to
STATE_COMPLETED
.The new version becomes the "active" version and is the version used by your app.
If there is an error in the upload:
The state of the new dataset version is set to one of the following states:
STATE_IMPORT_FAILED
STATE_PROCESSING_FAILED
STATE_PUBLISHING_FAILED
STATE_DELETION_FAILED
The previous dataset successful version stays as the "active" version and is the version used by your app.