by admin | Nov 22, 2015 | R language - statistics, data mining

To find patterns and relationships in text we can use the same technics as for data sets based on numbers. Source for it will be “document term matrix” we created using technics for text preparation. k-mean clustering find frequent words – in R:...
by admin | Nov 22, 2015 | R language - statistics, data mining

To process text we need to make some preparations: convert whole text to lowecase – in R: tm_map(ourtextvariable, tolower) remove punctuations – in R: tm_map(ourtextvariable, removePunctuation) remove numbers – in R: tm_map(ourtextvariable,...
by admin | Nov 22, 2015 | R language - statistics, data mining

Let’s try density on data from k-means example: > dens1<-density(aggtime) > dens1 Call: density.default(x = aggtime) Data: aggtime (14 obs.); Bandwidth 'bw' = 10.25 x y Min. :-29.855 Min. :3.287e-05 1st Qu.: 1.298 1st Qu.:2.239e-03 Median : 32.450...
by admin | Nov 21, 2015 | R language - statistics, data mining

Very important data mining (data analytical) pattern finding method. It is the basic method for intrusion / fraud detection and system health check. These all there areas needs to know about anomalies which are very different from other data points. Very simple...
by admin | Nov 21, 2015 | R language - statistics, data mining

Clusters are in hierarchy – smaller cluster(s) is (are) part of bigger cluster and so on. There are two methods how to achieve this. Logically they are “from the top to the bottom” and vice versa. Implementation in R: “hclust” Clusters...
by admin | Nov 21, 2015 | R language - statistics, data mining

Is similar to “k-means” clustering. But does not create new points as centroids. Instead uses existing data points and tries to find “k” centroids among them. At the start randomly chooses “k” data points and computes distance of...