Skip to content

R Integration#

Introduction#

Starting with the release of SQL Server 2016, Microsoft included the installation of an open source distribution of R, as well as packages provided by Microsoft that support distributed and/or parallel processing.

The ability to run R scripts comes through a new feature: SQL Server R Services.

The architecture is designed such that external scripts using R run in a separate process from SQL Server and enabling the passing of data between SQL Server engine and R.

R code that is run from "inside" SQL Server is executed by calling a stored procedure. Thus, any application that can make a stored procedure call can initiate execution of R code. Thereafter SQL Server manages the execution of R code as summarized in the following diagram:

rintegration01

  1. A request for the R runtime is indicated by the parameter @language='R' passed to the stored procedure, sp_execute_external_script. SQL Server sends this request to the Launchpad service
  2. The Launchpad service starts the appropriate launcher; in this case, RLauncher
  3. RLauncher starts the external R process
  4. BxlServer coordinates with the R runtime to manage exchanges of data with SQL Server and storage of working results
  5. SQL Satellite manages communications about related tasks and processes with SQL Server
  6. BxlServer uses SQL Satellite to communicate status and results to SQL Server
  7. SQL Server gets results and closes related tasks and processes

Before using the R language integration, you must enable your SQL Server instance to support external scripts, but before you can do that, you must ensure that R Services is installed correctly and up and running. Please refer to Set up SQL Server R Services (In-Database) ⧉ on how to install and configure SQL Server R Integration.

Installation Checklist#

To check if the R Integration is done successfully, and run the first test script, please go through the following steps:

  1. Connect to your SQL Server Database (where R Services is installed) using SQL Server Management Studio
  2. Run exec sp_configure and check the "external scripts enabled" variable. If the running value is already set to 1 go to step 6. Otherwise proceed to the next step
  3. Run exec sp_configure 'external scripts enabled',1; reconfigure with override;
  4. Restart the SQL Server Service. This should also restart the SQL Server Trusted Launchpad Service.
  5. Rerun exec sp_configure 'external scripts enabled' and check the running value. This should now be set to 1
  6. Run the following test script:
exec sp_execute_external_script  @language =N'R', @script=N'OutputDataSet<-InputDataSet', @input_data_1 =N'select 1 as hello' with result sets (([hello] int not null));
go

If all the steps were executed correctly, you should obtain the following result:

rintegration02

Using External Tools#

You can also connect to SQL Server R Services using a R Client such as RStudio ⧉.

The connection between SQL Server and RStudio is made using the RODBC library. Before using it, the RODBC package must be installed in RStudio.

After starting RStudio, open a command window and run Install.packages("RODBC"). This downloads and installs all the necessary packages and functionalities.

After the correct installation, run library(RODBC) in order to load the library in your current environment.

You can now connect to SQL Server using: (please change "SQL Server Server name" and "Database Name" to match your Critical Manufacturing MES installation, e.g. CMF-VM-CLT-DB1\ONLINE2016 and ProductionODS)

cn<-odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=<SQL Server Server name>;database=<Database Name>;trusted_connection=yes;")

The "cn" variable now holds the SQL Server connection and must be used to retrieve and save data directly to SQL Server

Examples:

table<-sqlFetch(cn,"table name",colnames=FALSE,rows_at_time=1000)
machinefailuresdata<-sqlQuery(cn,"select * from analytics.MachineFailures")
sqlSave(cn,machinefailuresdata, rownames=FALSE, tablename="MachineFailures_R",colname=FALSE,append=FALSE)
odbcClose(cn)

Please see Package 'RODBC' ⧉ for an extensive package documentation and usage examples.

Critical Manufacturing MES R Integration#

Critical Manufacturing MES uses R Integration to build rich content graphics and statistical analysis. Rich content graphics can be created and, in conjunction with SQL Server Report Builder, can be shown directly in a report as an image.

To enable this, Critical Manufacturing MES has built in a set of procedures that already handle the necessary R connection, image processing and cleanup, so that the integration is done seamlessly.

In addition to the standard R graphics library (Graphics), Critical Manufacturing MES also uses the GGplot2, Lattice ⧉ libraries.

For statistical analysis, Critical Manufacturing MES uses the Car and Stats ⧉ libraries.

Because these libraries aren't part of the standard distribution of R, you have to download them from CRAN ⧉ and install it. The installation must be done directly in the server where R Services is running (the same place where SQL Server is also running) using administrative rights.

To do so, please follow the next steps:

  1. Open a remote connection to the server where SQL Server is running
  2. Open a command prompt with Administrative privileges
  3. Go to C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin or where the R Services binaries are located
  4. Located in this directory you have the R.exe application that opens a terminal directly to the running R installation
  5. Run R.exe in order to install the necessary packages directly from the internet or from a zip file
  6. To install a package from CRAN (from the Internet), simply type install.packages("package_name"). This will take care of the download, its dependencies and installation of the desired package
  7. To install a package from a local zip file, type install.packages("package_zip_file_name", repos=null)

Please run either step 6. or 7. for the ggplot2, lattice, car and stats. After doing this, these packages can now be widely used by your current R installation and Critical Manufacturing MES packages.

Available Critical Manufacturing MES R Packages#

As stated before, in order to enable access to R Services, Critical Manufacturing MES installation has built-in a set of stored procedures that wrap a group of useful R graphics and statistical functions. These functions are accessible directly via the SQL Server Reporting Services (graphics and statistics) or via SQL Server Management Studio (statistics).

The output of each graphics procedure is a binary string that represents the result in a jpeg format. This output string can then be used directly in SQL Server Report Builder as an image by creating a dataset with the following generic procedure call:

DECLARE @id uniqueidentifier=NEWID()
exec p_dynamic_plot
    @id,@DependentVar,@IndependentVar,@Parameter3,@Parameter4,
    @Table,
    @GraphName,
    @Flip,
    @Position,
    @Width,
    @Scalefill,
    @FacetWrap,
    @Smooth,
    @Jitter,
    @DateTimeColumn,
    @BeginDate,
    @EndDate,
    @AdditionalFilter

SELECT plot FROM GraphPlots where identifier=@id
DELETE FROM GraphPlots where identifier=@id

This will create a dataset with one column (plot) and one record. This column will contain the binary string that represents the jpeg image.

You can then add an image object to the report, specifying the source to be "Database", MIME type image/jpeg and then selecting the field as an expression with the content: =First(Fields!plot.Value, "Plot") (this code assumes that the dataset created above is called "Plot")

rintegration03

rintegration04

rintegration05

Statistical tests return a Key/Value (K/V) pair dataset. These K/V pair will vary between each individual test as they represent the output of the test itself.

Similar to the graphics integration, statistical tests can also be used in conjunction with SQL Server Report Builder. To do so, create a dataset with a generic procedure call:

DECLARE @id uniqueidentifier=NEWID()

exec p_dynamic_statistic_test @id,
                              @TestName,
                              @Table,
                              @DependentVar,
                              @IndependentVar,
                              @Parameter3,
                              @DateTimeColumn,
                              @BeginDate,
                              @EndDate,
                              @AdditionalFilter,
                              @Paired,
                              @Method,
                              @Alternative,
                              @ConfidenceLevel,
                              @Ordered,
                              @MedianTest

SELECT * FROM StatisticsTestTable where identifier=@id
DELETE FROM StatisticsTestTable where identifier=@id

This will output a table with four columns (RowNumber, Description, Value and Identifier). The dataset will contain as many lines as necessary to output the test result.

rintegration06

You can then insert this dataset as a pivot table in you report by creating a matrix using the following configuration:

rintegration07

Graphics#

Critical Manufacturing MES contains a set of stored procedures that serve as a wrapper to the R graphics functions. The main entry point for these procedures is dbo.p_dynamic_plot. This procedure accepts multiple arguments. Some arguments are shared between all of the graphics procedures (e.g. table name that contains the raw data), others are optional depending on the graph type that is being called.

PROCEDURE p_dynamic_plot(@id uniqueidentifier,
       @Parameter1 nvarchar(256),
       @Parameter2 nvarchar(256),
       @Parameter3 nvarchar(256),
       @Parameter4 nvarchar(256),
       @Table varchar(256),
       @GraphName varchar(256),
       @Flip integer,
       @Position nvarchar(256),
       @Width Decimal,
       @Scalefill nvarchar(256),
       @FacetWrap integer,
       @Smooth integer,
       @Jitter integer,
       @DateTimeColumn varchar(256),
       @BeginDate datetime,
       @EndDate datetime,
       @AdditionalFilter varchar(256))
Parameter Name Description
@id Internal identifier
@Parameter1 Column name for chart E.g.: 'Humidity'. This column name must match one column name of the input dataset (@Table)
@Parameter2,@Parameter3,@Parameter4 Optional parameters that specify the dependent, numeric variables or categorical variables. These values are dependent of graph type being called and always refer to the @Table input parameter
@Table Table name that contains the raw data or dataset. This can be any database object (table, view or function) as long as it is created in the analytics schema
@GraphName Graph name to be created
@Flip Coordinate flipping (0 - no, 1 - yes)
@Position 'identity' to use default positioning or 'dodge' for side by side grouping
@Width Histogram chart bar width
@Scalefill Color scheme. 'none' - standard, 'blue' or 'grey'
@FacetWrap Group by facet type (0 - no, 1 - yes)
@Smooth Use linear smoothing lines in ggplot2 ScatterPlot (0 - no, 1 - yes)
@Jitter Adds a small amount of random variation to the location of each point for Box Plot using ggplot2
@DateTimeColumn Identifies the one Datetime column in the input dataset (@Table)
@BeginDate Useful to filter out data using @DateTimeColumn
@EndDate Same as above
@AdditionalFilter Extra filter clause, e.g.: "where/and Temperature>60". If used with @DatetimeColumn use "and" else use "where"

Table: R - Graphics

The next table describes all the available graph types and also what R function they implement

Graph Name R Package R Function
MultipleRegressionPlot graphics ⧉ lm ⧉
Estimate-Real Plot graphics ⧉ lm using abline ⧉
Barplot ggplot2 ⧉ geom_bar ⧉
Barplot2Var ggplot2 ⧉ geom_bar ⧉ using two variables
StackedBarplot ggplot2 ⧉ geom_bar ⧉
Histogram ggplot2 ⧉ geom_histogram ⧉
Scatterplot ggplot2 ⧉ geom_point ⧉
StripPlot ggplot2 ⧉ geom_point ⧉
Boxplot ggplot2 ⧉ geom_boxplot ⧉
Boxplot1var ggplot2 ⧉ geom_boxplot ⧉
AreaGraph ggplot2 ⧉ geom_area ⧉
DensityGraph ggplot2 ⧉ geom_density ⧉
LevelPlot ggplot2 ⧉ geom_tile ⧉
StackedDensity ggplot2 ⧉ geom_density ⧉
QqPlot ggplot2 ⧉ geom_qq ⧉
CirclePlot ggplot2 ⧉ geom_bar and coord_polar ⧉
StackedAreaGraph ggplot2 ⧉ geom_area ⧉
DotPlot ggplot2 ⧉ geom_dotplot ⧉
LinePlot ggplot2 ⧉ geom_line ⧉
BoxplotLattice lattice ⧉ bwplot ⧉
DensityLattice lattice ⧉ densityplot ⧉
QqplotLattice lattice ⧉ qqmath ⧉
3DScatterPlotLattice lattice ⧉ cloud ⧉
ScatterPlotLattice lattice ⧉ xyplot ⧉
HistogramLattice lattice ⧉ histogram ⧉
StripPlotLattice lattice ⧉ stripplot ⧉
LevelPlotLattice lattice ⧉ levelplot ⧉
BarchartLattice lattice ⧉ barchart ⧉
WireframeLattice lattice ⧉ wireframe ⧉
ContourPlotLattice lattice ⧉ contourplot ⧉

Table: R - Available graph types

Statistical Tests#

Similarly to the graphics procedures, all the statistical tests procedures are wrapped with a generic stored procedure named dbo.p_dynamic_statistic_test. This procedure also accepts multiple arguments. Some arguments are shared between all of the statistics tests procedures (e.g. table name that contains the raw data), others are optional depending on the test type that is being called.

PROCEDURE p_dynamic_statistic_test(@id uniqueidentifier,
           @TestName varchar(256),
           @Table varchar(256),
           @Parameter1 nvarchar(256),
           @Parameter2 nvarchar(256),
           @Parameter3 nvarchar(256),
           @DateTimeColumn varchar(256) NULL,
           @BeginDate datetime NULL,
           @EndDate datetime NULL,
           @AdditionalFilter varchar(256) NULL,
           @Paired integer NULL,
           @methodCor varchar(256) NULL,
           @alternat varchar(256)NULL,
           @conflevel float NULL,
           @Ordered integer,
           @median integer)
Parameter Name Description
@id Internal identifier
@TestName Test name to run
@Table Table name that contains the raw data or dataset. This can be any database object (table, view or function) as long as it is created in the analytics schema
@Parameter1 Comma separated test column names, e.g.: 'Humidity,Temperature'. These values are dependent of test type being called and always refer to the @Table input parameter
@Parameter2 Comma separated test column names (optional), e.g.: 'Humidity,Temperature'. Same as above
@Parameter3 Test column name (optional), e.g.: 'Humidity'. Same as above
@DateTimeColumn Identifies the one Datetime column in the input dataset (@Table)
@BeginDate Useful to filter out data using @DateTimeColumn
@EndDate Same as above
@AdditionalFilter Extra filter clause, e.g.: "where/and Temperature>60". If used with @DatetimeColumn use "and" else use "where"
@Paired Optional test parameter (0 - Samples are not paired, 1 - Samples are paired)
@methodCor Optional correlation method name ('kendall', 'pearson' or 'spearman')
@Alternat Test dependent parameter indicating the sign of autocorrelation in alternative hypothesis ('two.sided','less' or 'greater')
@ConfLevel Confidence level for the returned confidence interval. Default value is 0.95 (95%)
@Ordered A logical value indicating if the levels of the factor should be ordered according to increasing average in the sample before taking differences (0 - not ordered, 1 - ordered)
@Median A number specifying an optional parameter used to form the null hypothesis (Wilcoxon test)

Table: R - Statistical tests

The next table describes all the available statistical test types and also what R function they implement

Test Name R Package R Function Description
MultipleRegression stats ⧉ lm ⧉ Fitting Linear Models
Anova stats ⧉ anova ⧉ Anova Tables
Durbin car ⧉ durbinWatsonTest ⧉ Durbin-Watson Test For Autocorrelated Errors
Levene car ⧉ leveneTest ⧉ Levene's Test
Tukey stats ⧉ TukeyHSD ⧉ Compute Tukey Honest Significant Differences
Outlier car ⧉ outlierTest ⧉ Bonferroni Outlier Test
Correlation stats ⧉ cor.test ⧉ Test For Association/Correlation Between Paired Samples
Covariance stats ⧉ cov ⧉ Correlation, Variance And Covariance (Matrices)
Ttest stats ⧉ t.test ⧉ Student's T-Test
OneSampletest stats ⧉ t.test ⧉ One Sample Student's T-Test
ChisqTest stats ⧉ chisq.test ⧉ Pearson's Chi-Squared Test For Count Data
Shapiro stats ⧉ shapiro.test ⧉ Shapiro-Wilk Normality Test
Smirnov stats ⧉ ks.test ⧉ Kolmogorov-Smirnov Tests
Wilcoxon stats ⧉ wilcox.test ⧉ Wilcoxon Rank Sum And Signed Rank Tests
Friedman stats ⧉ friedman.test ⧉ Friedman Rank Sum Test

Table: R - Statistical test types