Skip to content

Create Data Set#

🔒 CmfDataSet.Create

Overview#

The Data Set is a Data Platform entity that can be easily created by authorized users. In addition to the normal creation process, you can create Data Sets by using templates in .xml format and importing them using the Import button.

Data Sets are automatically created as part of the process of creating IoT Event Definitions, or they can be manually created by defining SQL queries. Data Sets created this way, without a direct link to IoT Event Definitions, are of type Query, meaning they behave as user-defined views on top of database tables.

Setup#

No specific setup is required other than to meet the preconditions of the transaction.

Preconditions#

  • The Data Set name must be unique in the folder in which it will be created.

Sequence of Steps#

Step 1: General Data#

  1. Enter the Name of the Data Set.

    Naming Restrictions

    This name will identify the Data Set throughout the Data Platform infrastructure, so choose a name that is unique and meaningful for later reference. Furthermore, do not use any of the following characters:

    \ / : ? | < > * " .
    

    If special characters are used, the corresponding OData URLs must be percent-encoded to access the data successfully.

  2. Enter the Description.

  3. Select a Data Group if you want to restrict the access to the data in the Data Set (assuming Objects Security Level is active in MES).
  4. Select the Folder where the Data Set should be placed.

    Invalid Characters

    When creating a new Folder to store a Data Set, do not use any of the following characters:

    \ / : ? | < > * " .
    
    If special characters are used, the corresponding OData URLs must be percent-encoded to access the data successfully.

  5. Select the Source which contains the tables with the data you want to access:

    • CDM - provides access to Canonical Data Model tables containing IoT Events data, including both system-generated CDM events and user-defined IoT Events.
    • DWH - provides access to Data Warehouse tables with automatically aggregated and transformed data for analytics purposes.
    • ODS - provides access to the Operational Data Store tables from the MES SQL Server database.
    • MES - provides access to the ClickHouse MES database.
    • ODS Data Silos - only available with Enterprise Data Platform license. Provides access to the ODS tables stored in each individual Data Silo. These are shown at the end of the list, separated by a divider, as shown in the image below.
  6. Choose the Owner Type, depending on which you will need to set additional configurations:

    • App - when available, select the App that will be using this Data Set.
    • Role - select the Owner Role for this Data Set.
    • User - select the Owner User for this Data Set.
  7. Select Next to continue.

    Create Data Set

    Create Data Set

Step 2: Fields#

In this step there are two tabs, Query and Fields.

  1. Write your SQL SELECT statement in the Query tab according to the guidelines below.

    The SQL SELECT statement can use database parameters, for example: SELECT * FROM PostTelemetry WHERE Material_Name = {name: String}. Each parameter must follow the syntax {<name>: <datatype>}, where <name> is the parameter name and <datatype> is the ClickHouse datatype ⧉ it is converted to.

    The query can also use reserved database macros to reference MES ClickHouse databases. These macros are automatically replaced by the Data Manager with the corresponding database name when the query is executed. Supported macros include: {db_MES}, {db_ODS}, {db_CDM}, {db_DWH}, and {db_<datasiloName>ODS}.

    An example of a SQL SELECT statement is shown below:

    SELECT * FROM {db_MES}.T_DatabasePatches

    Query Limitations in ClickHouse

    ClickHouse does not support cases where a parameterized view references another parameterized view across different databases. All parameterized views in a dependency chain must live in the same database, otherwise the query will fail during resolution (for example, using CoreDataModel_V_ST_ResolvedServiceContext cannot be used if the source selected is not ODS).

    Additionally, database macros are reserved and case-sensitive, meaning the exact macro name must be used. For example, using {db_mes} or {DB_MES} will cause the query to fail.

  2. Select the Validate button to validate your query and retrieve the fields.

    Query

    The Validate button will populate the grid of fields in the Fields tab, located on the top right. This enables you to edit the following properties for each field:

    • Description - choose the description for each field.
    • Field Type - choose the field type from the options below:

      • Dimension
      • Metric
      • Timestamp
      • None

      Info

      Fields of type Timestamp are automatically picked for the x-axis in time-series visualization in Grafana, so if a field has that type and you do not want to use it for that purpose you should set the type to None.

    • User Identifier - if enabled, it will show the user identity in reports built using this Data Set. This option is only available if the Data Type is set to String or String Array.

      User Identifier Behavior

      The User Identifier flag takes effect only when all of the following conditions are met:

      • The CmfDataSet.IdentityAccess security feature is enabled (this is the default setting). For more information, see Features.
      • When operating in EDP mode, the AppProperties_DataSilo is included in the OData selected fields. This requirement applies only to CDM and DWH tables and is not required for ODS or ODS tables.
    • Low Cardinality - this is a ClickHouse setting. If enabled, it changes the internal representation of other data types to be dictionary-encoded. For more information, see ClickHouse Low Cardinality ⧉.

  3. When you have edited the fields, select Create to complete the creation of the Data Set.

    Fields

The Data Set should now be available in the folder you picked, and its data readily available to be accessed via OData client such as Grafana, Power BI or Excel.