Random search in 2da

I need to find in a small 2da all the lines that contain in a column, a certain parameter and then get one randomly, I have absolutely no idea how to get it, someone can help me.

A typical way to do this would be to first run a loop over the 2DA to see how many elements there are, then use Random() to pick a random row. Something like this:

string GetRandom2DAString(string s2DA, string sColumn) {
    int rows;
    for (rows = 0; Get2DAString(s2DA, sColumn, rows) != ""; rows++);
    return Get2DAString(s2DA, sColumn, Random(rows));
}

there is a faster way to find the row count, but it’s probably needlessly complicated for what you want to do.

If I’ve understood the OP correctly, a second counter is required, to include matching values only.

int GetRandom2DAString(string s2DA, string sColumn, string sMatch) {
    int rows       = -1;
    int matches = 0;
    int chosen;
    string sValue = Get2DAString(s2DA, sColumn, ++rows);
    while (sValue != "")
       {
         if (sValue == sFilter) ++matches;
         sValue = Get2DAString(s2DA, sColumn, ++rows);         
       }
    chosen = Random(matches) + 1;
    rows       = -1;
    matches  = 0;
   sValue = Get2DAString(s2DA, sColumn, ++rows);
    while (sValue != "")
       {
         if (sValue == sFilter) ++matches;
         if (matches == chosen) return rows;
         sValue = Get2DAString(s2DA, sColumn, ++rows);         
       }
}

This function returns a random row number in which the search column matches the parameter. Get2DAString can then be used on that row to return whatever column(s) are required.

It can be made much more efficient, but for a small table the above will work.

The code assumes that sColumn can never contain blank values. If that is not true, the loops must be based on another column which is guaranteed to contain non-null values, otherwise they will terminate before end-of-file.

thanks to everyone, I’ll do some tests.

I forget to mention the case where no rows match the parameter - the code should also test for that, returning -1 or whatever for failure.

Ah, I misunderstood the OP. Proleric’s code is straightforward and will do what you want. I advise using that.

However, here’s an optimized but more complicated solution that only iterates the 2DA once:

int GetRandom2DARow(string s2DA, string sColumn, string sFilter) {
    int chosen = -1, matched_count = 0, i = 0;
    while (1) {
        string val = Get2DAString(s2DA, sColumn, i++);
        if (val == "") // reached end of 2da
            break;

        if (val == sFilter)
            chosen = (Random(++matched_count) != 0) ? chosen : i - 1;
    }
    return chosen;
}

the trick used here is that Random(N) == 0 has 1/N chance of changing which row is chosen, and (N-1)/N chance of leaving it. So first time you see a match, there’s a 100% chance it’ll be chosen. Next match, it’s 50/50 whether you keep the first one or switch over to the second. On average, after you’ve iterated the whole 2DA each matching row has 1/N chance of being picked.

EDIT: Fix typo (== -> !=), thanks @NWShacker!

3 Likes

Put your data in your 0 row (IE how many rows, valid items in column etc…) this will save you on multiple runs on the 2da.

@sherincall’s algorithm is the O(n) reservoir sampling with 1-element memory. If you follow the link you can see how it can be expanded if needed to sample several values in one run.

I believe however this line should be like this:

chosen = (Random(++matched_count) != 0) ? chosen : i - 1;

to match the 1/n chance to change.

Except the first one no, it’s OK :slight_smile:


If you plan to call such function multiple times, my approach would be to iterate over the 2DA once and store indexes of matching rows in smaller array, say a string like this: 1|20|40|78|120 (or values themselves if you don’t need to access other columns: foo|bar|baz).

Then just call Random on the number of delimiters (d). To reduce sampling complexity from O(d) to O(1) (read: avoid using FindSubString), consider padding the numbers to fixed width: 1xxxx|20xxx|120xx. This should have good performance to few thousands of elements.

1 Like

In fact I tried both approaches, creating a dedicated 2da, I will decide later which one is better, I thank everyone for the help you gave me.

Just FYI, accessing a cell in a custom 2DA is O(1). The very first time you do it, you have slow disc access, but all subsequent accesses are O(1) lookups. So for static arrays that don’t change it’s probably a better option than anything else.

I know it gets cached, the problem arises when you loop over it… for several items :wink: The string method is a good tier-2 cache if one is ready to write extra code (fixed element width = offset calculation = O(1) too).