new knn()
KNN Module.
The concept for knn via levenshtein is taken from Open refine
https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth
- Source:
Methods
(static) addNgram(str, id, ngrams) → {Void}
Helper function.
Parameters:
Name | Type | Description |
---|---|---|
str |
string | str to be added to the ngrams array. |
id |
integer | id of element in data. |
ngrams |
object | array of ngrams. |
- Source:
Returns:
- .
- Type
- Void
(static) analyse(_clusters, _data, limit, type) → {object}
Analyse the ngram groups.
Parameters:
Name | Type | Description |
---|---|---|
_clusters |
object | Created in module.prepare. |
_data |
array | Original array of strings. |
limit |
float | When are two strings assumed to similar (levenshtein in percentage), default:0.1. |
type |
string | How should the levenshtein distance be checked: absolute|percentage, default:percentage. |
- Source:
Returns:
results.
- Type
- object
(static) cluster(results) → {object}
Cluster results of module.analyse.
Parameters:
Name | Type | Description |
---|---|---|
results |
object | created in module.analyse. |
- Source:
Returns:
outClusters - Clustered map.
- Type
- object
(static) prepare(_data, ngramSize) → {object}
Collect strings into groups that share certain ngrams.
Parameters:
Name | Type | Description |
---|---|---|
_data |
array | Array of strings. |
ngramSize |
integer | Size of ngrams. |
- Source:
Returns:
cluster of ngram groups.
- Type
- object
(static) process(id, clusters, data, limit, type) → {object}
Calculate and report levenshtein distance for a group of strings.
Parameters:
Name | Type | Description |
---|---|---|
id |
integer | id of item in array |
clusters |
array | Strings. |
data |
array | Original data. |
limit |
float | When are two strings assumed to similar (levenshtein in percentage), default:0.1. |
type |
string | How should the levenshtein distance be checked: absolute|percentage, default:percentage. |
- Source:
Returns:
results.
- Type
- object
(static) readableCluster(clusters, reduced_data, data) → {object}
Translates the cluster from module.cluster into an easy to read and edit object.
Parameters:
Name | Type | Description |
---|---|---|
clusters |
object | created in module.cluster. |
reduced_data |
array | created in module.reduce. |
data |
array | original data. |
- Source:
Returns:
Clustered map.
- Type
- object
(static) reduce(_data) → {array}
Remove duplicates from array.
Parameters:
Name | Type | Description |
---|---|---|
_data |
array | Array with strings. |
- Source:
Returns:
reduced_column.
- Type
- array