DMX-SQL SERVER 数据挖掘简介


SQL #dmx2012-11-14 09:12

SQL SEVER中, DMX可以用于:

1 SSMS

2 SSDT

3 EXCEL DATAMINGING ADDIN

 

基本句法如下:

创建STRUCTURE

CREATE MINING STRUCTURE [People1]
(
[CustID] LONG KEY,
[Name] TEXT DISCRETE,
[Gender] TEXT DISCRETE,
[Age] LONG CONTINUOUS,
[CarMake] TEXT DISCRETE,
[CarModel] TEXT DISCRETE
)


创建MODEL

ALTER MINING STRUCTURE [People]
ADD MINING MODEL [PredictGender-Tree]
(
[CustID],
[Gender] PREDICT,
[Age],
[CarModel]
) USING Microsoft Decision Trees


进行TRAIN

INSERT INTO MINING STRUCTURE [People]
([CustID], [Name], [Gender], [Age], [CarMake],[CarModel])
OPENQUERY(Chapter3Data,
’SELECT [Key], Name, Gender, Age, CarMake, CarModel
FROM People’)


查询STRUCTURE

SELECT * FROM MINING STRUCTURE People.CASES WHERE IsTestCase()


查询MODEL

SELECT * FROM ClusterDrillthrough.CASES WHERE IsInNode(’001’)


 

预测查询

SELECT t.[Name], Predict([Gender]) AS PredictedGender
FROM [PredictGender-Bayes]
PREDICTION JOIN
OPENQUERY(Chapter3Data,
’SELECT [Key], [Name], [Age], [CarModel]
FROM [NewPeople]’) AS t
ON [PredictGender-Bayes].[Age] = t.[Age] AND
[PredictGender-Bayes].[CarModel] = t.[CarModel]

 

SQL SERVER 提供下列SCHEMA查询:

 

DMSCHEMA_MINING_SERVICES,
DMSCHEMA_MINING_SERVICE_PARAMETERS,
DMSCHEMA_MINING_MODELS, DMSCHEMA_MINING_COLUMNS,
DMSCHEMA_MINING_MODEL_CONTENT,
DMSCHEMA_MINING_FUNCTIONS,
DMSCHEMA_MINING_STRUCTURES,
DMSCHEMA_MINING_STRUCTURE_COLUMNS,
DMSCHEMA_MINING_MODEL_XML,
DMSCHEMA_MINING_MODEL_PMML

 

通过这些查询,可以知道,SQL SERVER 提供9种算法,

service_name
Microsoft_Association_Rules
Microsoft_Clustering
Microsoft_Decision_Trees
Microsoft_Naive_Bayes
Microsoft_Neural_Network
Microsoft_Sequence_Clustering
Microsoft_Time_Series
Microsoft_Linear_Regression
Microsoft_Logistic_Regression

实际上,是7种,LINEAR REGRESSION是DECISION TREES的变种,LOGISTIC REGRESSION是NEURAL NETWORK的变种。

同时, 也可以了解每种算法支持的数据类型,比如:

service_name supported_input_content_types
Microsoft_Association_Rules Cyclical,Discrete,Discretized,Key,Table,Ordered

就不支持连续数据。

当然, 也包括每种算法支持的函数,例如,NAIVE_BAYES包括下列函数:

function_name
Predict
Predict
PredictAdjustedProbability
PredictAssociation
PredictHistogram
PredictNodeId
PredictProbability
PredictSupport
$AdjustedProbability
$NodeId
$Probability
$Support
BottomCount
BottomPercent
BottomSum
IsDescendent
RangeMax
RangeMid
RangeMin
TopCount
TopPercent
TopSum
IsTrainingCase
IsTestCase
Exists
StructureColumn
StructureColumn

也可以知晓每种算法需要的参数,比如,CLUSTERING可以输入下列参数:

parameter_name
CLUSTER_COUNT
CLUSTER_SEED
CLUSTERING_METHOD
MAXIMUM_INPUT_ATTRIBUTES
MAXIMUM_STATES
MINIMUM_SUPPORT
MODELLING_CARDINALITY
SAMPLE_SIZE
STOPPING_TOLERANCE


相关文章

粤ICP备11097351号-1