Class: knn

knn()

new knn()

KNN Module. The concept for knn via levenshtein is taken from Open refine https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth
Source:

Methods

(static) addNgram(str, id, ngrams) → {Void}

Helper function.
Parameters:
Name Type Description
str string str to be added to the ngrams array.
id integer id of element in data.
ngrams object array of ngrams.
Source:
Returns:
- .
Type
Void

(static) analyse(_clusters, _data, limit, type) → {object}

Analyse the ngram groups.
Parameters:
Name Type Description
_clusters object Created in module.prepare.
_data array Original array of strings.
limit float When are two strings assumed to similar (levenshtein in percentage), default:0.1.
type string How should the levenshtein distance be checked: absolute|percentage, default:percentage.
Source:
Returns:
results.
Type
object

(static) cluster(results) → {object}

Cluster results of module.analyse.
Parameters:
Name Type Description
results object created in module.analyse.
Source:
Returns:
outClusters - Clustered map.
Type
object

(static) prepare(_data, ngramSize) → {object}

Collect strings into groups that share certain ngrams.
Parameters:
Name Type Description
_data array Array of strings.
ngramSize integer Size of ngrams.
Source:
Returns:
cluster of ngram groups.
Type
object

(static) process(id, clusters, data, limit, type) → {object}

Calculate and report levenshtein distance for a group of strings.
Parameters:
Name Type Description
id integer id of item in array
clusters array Strings.
data array Original data.
limit float When are two strings assumed to similar (levenshtein in percentage), default:0.1.
type string How should the levenshtein distance be checked: absolute|percentage, default:percentage.
Source:
Returns:
results.
Type
object

(static) readableCluster(clusters, reduced_data, data) → {object}

Translates the cluster from module.cluster into an easy to read and edit object.
Parameters:
Name Type Description
clusters object created in module.cluster.
reduced_data array created in module.reduce.
data array original data.
Source:
Returns:
Clustered map.
Type
object

(static) reduce(_data) → {array}

Remove duplicates from array.
Parameters:
Name Type Description
_data array Array with strings.
Source:
Returns:
reduced_column.
Type
array