--- alias: user-guide-create-ml-model tags: - data platform description: "Create an ML Model object by defining data sets, features, and splitting data for training and evaluation" --- # Create ML Model :lock: MLModel.**Create** :lock: MLModel.**CreateTemplate** :lock: MLModel.**CreateFromTemplate** ## Overview This **Data Platform** entity is used to create an **ML Model**. You can also create a template using a similar procedure and use a template to create a new object (example: through the import/export of an xml file with the object settings). This selection is done in the **ML Model** menu. ## Setup No specific setup is required other than to meet the preconditions of the transaction. ## Preconditions * The **ML Model** name must be unique. ## Sequence of Steps There are several ways to create a new versioned object. Depending on the level, follow these steps to get started: * Entity - In the landing page of this entity type in the Business Data menu or in the details page of an existing entity of the same type, select **New** on the top ribbon. For more information see [Creating Entity Objects](../../../general/creating_entity_objects.md). * Revision - If you want to create a new revision, go to the **New** dropdown button on the top ribbon and select **Revision**. For more information, see [Revisions](../../../general/revisions/index.md). * Version - If you want to create a version associated to an existing revision, go to the **New** dropdown button on the top ribbon and select **Version**. For more information, see [Versions](../../../general/versions/index.md). ### Step 1: Change Set 1. Select an existing Change Set or select **Create** to create a new Change Set. If configured to support implicit Change Sets, it is also possible to check the option Automatic Change Set. 2. Optionally, select an Approval Role. ### Step 2: General Data 1. Enter the **Name** of the **ML Model** (must be unique). 2. Enter the **Description**. 3. Select **Next** to continue. ![Screenshot showing a UI with a filename field labeled "create ml model step 01" and nearby instructions to select "Next".](images/create_ml_model_step_01.png) ### Data Set In this tab, select the **Data Set** to be used as a data source. 1. Choose a **Data Set** that will be used as the source of data for this **ML Model**. 2. Choose a specific field to use when sorting the **Data Set**. You can preview the data retrieved through the **Data Set** by selecting the **Preview** button on the right side of the Order By selection field and reviewing the information displayed in the grid below. 3. Select **Next** to continue. ![Screenshot showing a data selection interface with "4 Great ML Models" and "Select Data Set!" buttons.](images/create_ml_model_step_03.png) ### Features In this tab, you will be able to view a summary of the data as well as edit the properties and transformations applied to the features associated to each of the fields retrieved through the selected **Data Set**. You can set the following properties depending on the Field Type: | Feature Field Type | Editable properties | | ------------------ | ------------------------------------------------------------------------------------------------------------------- | | Dimension | Mark as Label
Replace Nulls with Most Frequent
Encoding (`One Hot` or `Ordinal`) | | Numeric | Mark as Label
Replace Nulls with Mean
Normalize Min-Max
Remove Outliers (also specify the Sigma threshold) | | Timestamp | None | Only features that meet specific criteria will be available for editing and the system will ignore properties (typically known as "features" within the Machine Learning vocabulary) with single values - with no cardinality or very high cardinality (example: all values are different). From the recommended features you may now: * Mark a property as a label (a label is the target we want the machine learning model to learn). * Replace null cells with mean value. * Normalize the data by applying a MinMax Scaler. * Remove outliers from the dataset. ![Screenshot showing a machine learning model creation step with a title "Create ML Model" and options for number of cores.](images/create_ml_model_step_04.png) !!! info For supervised learning, one of the features must have the `Mark as Label` property selected. For unsupervised learning just do not choose any label. ### Data Splitting In this tab, choose the different weight that the **ML Model** will follow. It is possible to combine different criteria, either by adjusting the sliders or by inputting the proportion of records to be included in each set. | Data Split | Minimum weight | Maximum Weight | | ---------- | :------------: | :------------: | | Train | 50% | 80% | | Validate | 10% | 40% | | Test | 10% | 25% | All the weights used must add up to 100. The corresponding limits can be consulted in the table above. After you have finished the configuration, select **Create** to complete the operation. ![Screenshot showing the "Create ML Model" step in the configuration process.](images/create_ml_model_step_05.png) !!! info It is good practice to evenly split the validation and test set as represented in the picture. They should all present similar data distribution as well. If you are not sure, just use the default values.