Methods used for finding patterns in data:

  • Cluster analysis – algorithm finds groups of similar data points by examining distance between points, density, ranges etc. Models for cluster analysis:
    • connectivity – organizes points based on how close they are to each other
    • partitioning – each data point is associated into some cluster (mostly commonly used algorithm is “K-means”)
    • distribution model – uses statistical distribution
    • density analysis – basd on how close points are – DBSCAN – for highly concetrated data, OPTICS for more broad distribution
    • Cluster cen be:
      • hard – every point only in one cluster
      • soft – point can be in more clusteres
    • Rules for partitioning – strict, overlapping, hierarchical
  • Detection of anomalies
  • Association rules – represent set of decisions which can be made based on data we have.