Skip to content

Create a Dataset

A dataset can be created by sending a POST request to /v2/datasources. We support three types of datasets:

  1. A PDF or Excel file that can be uploaded to the One Creation platform.
  2. Connecting to a supported data location, which will allow us to automatically discover the metadata.
  3. Aggregating two or more existing datasets together, which will enable us to virtually merge them.

This guide details how to utilize our API to create any of these datasets.

Request Body

The request body contains two parameters as shown in the table below. If you are uploading a file as a dataset, please populate the "file" parameter. Otherwise, you would only need the "request" parameter to connect to a data location or aggregate existing datasets.

Key Type Description
request application/json Required. Contains connection information to your data location (database or filesystem)
file multipart/form-data Only required if you are attaching a file to be protected

Common Fields

Here is the list of required fields within the "request" parameter:

name

name of the dataset.

source

the type of the dataset. One of the following enumerated values:

- AWSS3
- DATABRICKS
- DREMIO
- EXCEL: as an attachment
- KX
- MYSQL
- ORACLE
- PDF: as an attachment 
- POSTGRESQL
- REDSHIFT
- SHAREPOINT
- SNOWFLAKE

reviewer

Optional. The email address of the reviewer

fields

The list of metadata/attributes to be used for setting up data sharing rules in the following format:

"fields":[
  {"level":0,"type":"folder","format":"","path":["table name"]},
  {"level":1,"type":"folder","format":"string","path":["table name","column1"]},
  {"level":1,"type":"folder","format":"string","path":["table name","column2"]},
  {"level":1,"type":"folder","format":"string","path":["table name","column3"]}]

E.g. assume an excel file or database table called "users" looks like this:

|name |gender |birthday |
|--------|-------- |-------- |
|Yang C |Female |7/19/2011 |
|Liat Z |Female |6/21/1993 |
|Kevin B |Male |2/1/1983 |

The value of "fields" will look like this:

"fields":[
  {"level":0,"type":"folder","format":"","path":["users"]},
  {"level":1,"type":"folder","format":"string","path":["users","name"]},
  {"level":1,"type":"folder","format":"string","path":["users","gender"]},
  {"level":1,"type":"folder","format":"string","path":["users","birthday"]}]

Connecting to a Dataset

The following fields are utilized to establish a connection to your respective dataset and utilize it to create your One Creation dataset. These are optional if you are creating a dataset manually or with aggregation.

datasourceConnection

Contains the lists of dataset specific fields:

Databricks

  • dbType: "databricks"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbTable: the name of the table/view

MySQL

  • dbType: "mysql"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbTable: the name of the table/view

Oracle

  • dbType: "oracle"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbSchema: the name of the database schema
  • dbTable: the name of the table/view

PostgreSQL

  • dbType: "postgresql"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbTable: the name of the table/view

Redshift

  • dbType: "redshift"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbTable: the name of the table/view

Snowflake

  • dbType: "snowflake"
  • dbName: the name of the database
  • dbHost: the database host URL
  • dbUsername: the login username
  • dbPassword: the login password
  • dbWarehouse: the name of the warehouse
  • dbRole: the login user role
  • dbSchema: the name of the database schema
  • dbTable: the name of the table/view

Dremio

  • dbType: "dremio"
  • dbSchema: the name of the database schema
  • dbHost: the database host URL
  • dbPort: the database port number
  • dbUsername: the login username
  • dbPassword: the login password
  • dbTable: the name of the table/view

AWS S3

  • awsAccessKeyId: the S3 bucket access key
  • awsSecretAccessKey: the secret key
  • bucketName: the name of the S3 bucket
  • region: the region of the S3 bucket. e.g. eu-central-1
  • templateName: the name of the template file with file extension. E.g. "data_template.xlsx"
  • filterPattern: optional field. This is the prefix of the data files. If the bucket contains multiple files with different names, we use this pattern prefix to find the data file for distribution.

OneDrive

  • tenantId: The tenant Id
  • secretValue The client secret key
  • applicationId The Id of the application
  • driveItemId The Id of the item you wish to utilize for the dataset
  • siteId The Id of the site

KX

  • dbType: "kx"
  • dbHost: IP of the database
  • dbPort: Port of the database
  • dbUsername: Username used for the database login
  • dbPassword: Password used for the database login
  • dbStoredProcedure: name of the stored procedure
  • dbStoredProcedureParameters: { "label":"", - The label of the stored procedure "type":"" - The type, either String, Number, Date, Time, Datetime, or boolean "length":"5", - Length of the value "value":"trade" - The value of the stored procedure }

Aggregation

If you wish to create a dataset that contains data from multiple connections or multiple files, you can create an aggregated dataset. In order to do so, you will need to create individual datasets seprately before aggregating them.

For example, if you wished to create a dataset that contains data from S3 and MySQL you will first individually create those datasets. Once those datasets are created, you will need to acquire their respective datasourceIds. Please see Get/Delete Dataset(s) to learn how to retrieve datasourceIds.

The fields used for aggregation are:

  • dataSourceIds - The ID of each dataset you wish to aggregate. This field is optional unless you are utilizing aggregation.

  • aggregationDatasourceKeys the attributes that you wish to aggregate between each dataset, typically the name of a column in a spreadsheet. This field is optional unless you are utilizing aggregation.

Manually Uploading a File

If you wish to manually upload a PDF or excel file as your dataset, the request still follows as detailed above. You must manually add a PDF or XLSX file as an attachment to the request. When manually uploading a file, datasourceConnection becomes optional and the source field must be either "XLSX" or "PDF".

Testing and Example

If you wish to test out these endpoints please refer to the corresponding Swagger page which will allow you to see the example request body and send requests to this endpoint. You can also use external software such as postman to test these endpoints as well. You can download our postman examples here.

Here's a sample "request" payload:

 {
   "name":"NAME",
   "source":"POSTGRESQL",
   "reviewer":"reviewer email",
   "fields":[
      {"level":0,"type":"folder","format":"","path":["table name"]},
      {"level":1,"type":"folder","format":"string","path":["table name","column1"]},
      {"level":1,"type":"folder","format":"string","path":["table name","column2"]},
      {"level":1,"type":"folder","format":"string","path":["table name","column3"]}],
  "datasourceConnection": {
    "dbType": "postgresql",
    "dbName": "database name",
    "dbHost": "database host url",
    "dbPort": "port number",
    "dbUsername": "username",
    "dbPassword": "password",
    "dbTable": "table name"
  }
}

Next Step

Once a dataset is created, you can proceed to Create a Policy.


Last update: May 18, 2023