The difference between SEMMA and CRISP-DM

Share

SEMMA means: Sample, explore, modify, model, assess.

CRISP-DM means: Cross-industry standard process for data mining.

SEMMA and CRISP-DM are both process models used in the field of data mining and machine learning to guide the steps involved in developing predictive models and extracting useful insights from data. 

While they share some similarities, they also have distinct differences. Below is a comparison of SEMMA and CRISP-DM.

Origin and Purpose of SEMMA and CRISP-DM

CRISP-DM: Developed in the late 1990s, CRISP-DM is a comprehensive and widely recognized framework for data mining projects. It was designed to provide a structured approach to guide the entire data mining process, from understanding business objectives to deploying models.

SEMMA: SEMMA was developed by SAS (a software company) as a framework for their data mining software. It focuses primarily on the modeling phase and is more specific to SAS’s software suite. However, it has also been used more broadly in the context of data analysis and modeling.

Six phases with CRISP-DM vs Five phases with SEMMA

CRISP-DM: CRISP-DM defines six distinct phases: 

  1. business understanding, 
  2. data understanding, 
  3. data preparation, 
  4. modeling, 
  5. evaluation, and 
  6. deployment.

CRISP-DM covers the entire data mining project lifecycle, including understanding business goals, data collection and preparation, model building, evaluation, and deployment.

SEMMA: SEMMA outlines five key phases: 

  1. sample, 
  2. explore, 
  3. modify, 
  4. model, and 
  5. assess.

SEMMA focuses primarily on the modeling phase, offering guidance on data sampling, exploration, modification, modeling, and model assessment.

Which is more flexible? SEMMA or CRISP-DM?

CRISP-DM: CRISP-DM is considered a more flexible and comprehensive framework, suitable for a wide range of data mining and machine learning projects.

SEMMA: SEMMA is more specific to SAS software and is often used as a companion to other, more comprehensive methodologies like CRISP-DM.

Which is more practical?

CRISP-DM is widely adopted and has extensive documentation and support from the data mining community. It is generally seen as a practical and effective methodology for data mining projects.

SEMMA, while useful for model-building within the SAS environment, may be less familiar and less widely adopted outside of the SAS user base.

CRISP-DM Lifecycle versus SEMMA Framework

CRISP-DM is a more comprehensive and widely accepted data mining process model that covers the entire project lifecycle. 

SEMMA, on the other hand, is a more specialized framework, primarily focusing on the modeling phase and is closely associated with SAS software. 

The choice between the two depends on the specific needs and tools of a given project, with CRISP-DM as a more general and flexible approach.