Nearest neighbour
Posted: Thu Sep 04, 2014 11:43 am
Slightly random question, but can anyone recommend a fast algorithm for nearest-neighbor search in four(ish) dimensions? Heuristic/probabilistic solutions also welcome.
The Place to Start for Operating System Developers
http://forum.osdev.org./
Code: Select all
float dist(float w1, float x1, float y1, float z1, float w2, float x2, float y2, float z2) {
float dw = w1 - w2;
float dx = x1 - x2;
float dy = y1 - y2;
float dz = z1 - z2;
float mag2 = dw * dw + dx * dx + dy * dy + dz * dz;
return sqrt(mag2);
}
It's all integers. What I'm trying to construct is really a sort of spatial queue. When you enqueue an object it gets inserted into the space, you dequeue from a particular point in the space and it gives you back the nearest.MessiahAndrw wrote:What units are you using?
If your dataset is a grid (w,x,y,z), and you have decimal number, just round it to the nearest whole number.
If it's freeform data, you can loop through each item and save the nearest one (O(n) complexity). The algorithm for calculating the distance between between 2 4d points is:However, since you're just comparing them, you can use the distance squared (as d1 < d2 implies d1 * d1 < d2 * d2) - that saves you from doing a computationally expensive square root. Return mag2 instead of sqrt(mag2). You can square root the number later once you've found the closest one if needed.Code: Select all
float dist(float w1, float x1, float y1, float z1, float w2, float x2, float y2, float z2) { float dw = w1 - w2; float dx = x1 - x2; float dy = y1 - y2; float dz = z1 - z2; float mag2 = dw * dw + dx * dx + dy * dy + dz * dz; return sqrt(mag2); }
If you have a large dataset (millions!) and want reasonable or realtime performance, you'd probably want to spacially divide the dataset.
If the data is fairly evenly distributed across a closed area, you could probably make 4d grid of 'buckets' and just scan your bucket, if nothing found, scan buckets in +1 and -1 of every direction around you, if nothing scan in +2 and -2 of every direction, etc.
If the data is unevenly distributed or in an open world, you could probably use a 4d variation of an octree (a tree with 2 children per axis permutation - 16 children in 4D). Walk down the octree from the top, scanning the children closest to your point first (and only scan other children if they're closer than your nearest found point.) A well balanced 4D octree may get you O(log n) complexity.
A k-d tree is similar to an octree except instead of having 16 children (in 4D), each node is split into 2 children along at point 'k' along an axis 'd'. Scanning a k-d tree starts with walking down it, find what axis it's split along, test if you're before or after the axis split and scan that child first - only scan the other side only if the axis split point is closer than the nearest found point (along that axis.)
Feel free to ask any questions if none of that made sense.
Code: Select all
float dist_squared(float w1, float x1, float y1, float w2, float x2, float y2, float dz_squared) {
float dw = w1 - w2;
float dx = x1 - x2;
float dy = y1 - y2;
return dw * dw + dx * dx + dy * dy + dz_squared;
}
Code: Select all
float dist_squared(float w_squared, float x_squared, float y_squared, float z_squared) {
return w_squared + x_squared + y_squared + z_squared;
}