DMX-SQL SERVER 数据挖掘简介
SQL #dmx2012-11-14 09:12
SQL SEVER中, DMX可以用于:
1 SSMS
2 SSDT
3 EXCEL DATAMINGING ADDIN
基本句法如下:
创建STRUCTURE
CREATE MINING STRUCTURE [People1] ( [CustID] LONG KEY, [Name] TEXT DISCRETE, [Gender] TEXT DISCRETE, [Age] LONG CONTINUOUS, [CarMake] TEXT DISCRETE, [CarModel] TEXT DISCRETE )
创建MODEL
ALTER MINING STRUCTURE [People] ADD MINING MODEL [PredictGender-Tree] ( [CustID], [Gender] PREDICT, [Age], [CarModel] ) USING Microsoft Decision Trees
进行TRAIN
INSERT INTO MINING STRUCTURE [People] ([CustID], [Name], [Gender], [Age], [CarMake],[CarModel]) OPENQUERY(Chapter3Data, ’SELECT [Key], Name, Gender, Age, CarMake, CarModel FROM People’)
查询STRUCTURE
SELECT * FROM MINING STRUCTURE People.CASES WHERE IsTestCase()
查询MODEL
SELECT * FROM ClusterDrillthrough.CASES WHERE IsInNode(’001’)
预测查询
SELECT t.[Name], Predict([Gender]) AS PredictedGender FROM [PredictGender-Bayes] PREDICTION JOIN OPENQUERY(Chapter3Data, ’SELECT [Key], [Name], [Age], [CarModel] FROM [NewPeople]’) AS t ON [PredictGender-Bayes].[Age] = t.[Age] AND [PredictGender-Bayes].[CarModel] = t.[CarModel]
SQL SERVER 提供下列SCHEMA查询:
DMSCHEMA_MINING_SERVICES,
DMSCHEMA_MINING_SERVICE_PARAMETERS,
DMSCHEMA_MINING_MODELS, DMSCHEMA_MINING_COLUMNS,
DMSCHEMA_MINING_MODEL_CONTENT,
DMSCHEMA_MINING_FUNCTIONS,
DMSCHEMA_MINING_STRUCTURES,
DMSCHEMA_MINING_STRUCTURE_COLUMNS,
DMSCHEMA_MINING_MODEL_XML,
DMSCHEMA_MINING_MODEL_PMML
通过这些查询,可以知道,SQL SERVER 提供9种算法,
service_name
Microsoft_Association_Rules
Microsoft_Clustering
Microsoft_Decision_Trees
Microsoft_Naive_Bayes
Microsoft_Neural_Network
Microsoft_Sequence_Clustering
Microsoft_Time_Series
Microsoft_Linear_Regression
Microsoft_Logistic_Regression
实际上,是7种,LINEAR REGRESSION是DECISION TREES的变种,LOGISTIC REGRESSION是NEURAL NETWORK的变种。
同时, 也可以了解每种算法支持的数据类型,比如:
service_name supported_input_content_types
Microsoft_Association_Rules Cyclical,Discrete,Discretized,Key,Table,Ordered
就不支持连续数据。
当然, 也包括每种算法支持的函数,例如,NAIVE_BAYES包括下列函数:
function_name
Predict
Predict
PredictAdjustedProbability
PredictAssociation
PredictHistogram
PredictNodeId
PredictProbability
PredictSupport
$AdjustedProbability
$NodeId
$Probability
$Support
BottomCount
BottomPercent
BottomSum
IsDescendent
RangeMax
RangeMid
RangeMin
TopCount
TopPercent
TopSum
IsTrainingCase
IsTestCase
Exists
StructureColumn
StructureColumn
也可以知晓每种算法需要的参数,比如,CLUSTERING可以输入下列参数:
parameter_name
CLUSTER_COUNT
CLUSTER_SEED
CLUSTERING_METHOD
MAXIMUM_INPUT_ATTRIBUTES
MAXIMUM_STATES
MINIMUM_SUPPORT
MODELLING_CARDINALITY
SAMPLE_SIZE
STOPPING_TOLERANCE
相关文章
- SQL Server之SQL的分页加排序等写法 2012/11/02
- MySQL通过命令导入导出数据的方法 2012/10/29
- 基于MySQL的BBS数据库设计教程 2012/10/29
- 关系数据库设计范式深入浅出 2012/10/29
- SQL截取字符串的方法 2012/10/25
- SQL Server删除重复数据的几个方法 2012/10/24
- SQL Server导入导出 2012/10/24
- SQL Server常用技巧 2012/10/24
- SQL Server常用函数 2012/10/24
- 维护SQL Server的交易日志 2012/10/24