Advanced Search Time Improved But Not Fixed By 9.1.0

The list of items included in the 9.0.1 update did not mention any performance improvements in RM9’s Advanced Search. The list of update items did mention performance improvements when filtering by groups. It has long seemed to me (with only weak evidence) that the root cause of poor performance in RM9’s Advanced Search might be similar to the root cause for RM9’s poor performance when filtering by groups. So I decided to test the performance of Advanced Search before and after the 9.0.1 update in my database of about 42,000 people.

I tried two different search criteria: BIrth => Date => Is Blank and also Birth => Date => Is Not Blank.These are two separate criteria for two separate tests.

Before the update, either criteria required about 65 seconds to complete the search. After the update, either criteria required about 24 seconds to complete the search. So something has improved, even though no improvements for Advanced Search were listed in the list of items for the 9.0.1 update. However, the improvement in filtering groups with the same criteria was much more dramatic,going from about 15 seconds to about 1 second or less.

The thing that’s always puzzled me about the Advanced Search in RM9 is that in theory all the search has to do is to produce a list of RIN numbers that satisfy the search. And indeed, RM9 appears to do that part of the search very quickly, like in under one second, and that part of the search is way faster than in RM7. But then the RM9 Advanced Search locks up and stays locked up for an extended period of time - first I/O bound like it’s reading the entire RM database very inefficiently and then CPU bound like it’s building a table of results very inefficiently.

I don’t like RM9’s Advanced Search very much anyway because it doesn’t support simply stepping through people that match the search one at a time. But if it’s going to produce the whole list of people instead of stepping through the list of people one at a time, it seems like it could just list the RIN numbers and names (and even the birth year and death year because they are in the same database table as the names). If it did that, RM9’s Advanced Search should be pretty nearly instantaneous. Then the only thing I would have left to complain about with respect to RM9’s Advanced Search would be the lack of Next and Previous options.

1 Like

Jerry, thanks and very well done. Your extensive efforts on our behalf have paid off. I suspect, as you clearly do, that there are still significant inefficiencies in various areas of the code and chipping away at particular repeatable examples of surprisingly poor performance will, over time, result in improvements across a number of areas, not just the specific one reported. We need to keep up the pressure and report repeatable examples.

1 Like

Confirming issue has been reported to development.

I agree that the advanced search needs to be fixed, but I think I showed in response to an earlier post that any part of the problem caused by retrieving all the data fields to be shown for the people found is insignificant. You can show this very clearly now by using a large group to filter the people screen; the result is almost instantaneous.

As a further demonstration, I did an advanced search with a single criterion - Birth place contains Ireland. As usual, this wizzed through (2-3 seconds) to 99% complete and then got stuck (30 seconds+) with the inefficient stuff that you have commented on. The result then shows up with 12,000 of the 60,000 people in my database.

You can split this operation into the two parts that you describe by using groups. Creating a criterion group for the same criterion is essentially making a list of the RIN numbers, and filtering the people screen on the new group displays the values. The first part is near instantaneous and the second part takes something like half a second. The combined time of circ 1 second is what the search would take if it was written efficiently; there is no need to bring the people up one at a time. Indeed carrying out a single SQL operation should always be more efficient in a relational database than having code work through a record at a time.

I have wondered the same thing, but haven’t commented on it. You are correct that People List view seems to show that the time required to retrieve all the data is not the problem because People List view with no filter seems to be extremely fast, no matter how many columns of data are defined for People List view nor what those columns are. But then that begs the question of what on earth Advanced Search is doing after that first second or two when it seems already to have retrieved all the RIN’s.

It’s really hard to reverse engineer what software is doing internally just by observing its external behavior. By watching RM with Windows Task Manager during an Advanced Search, I have guessed (with emphasis on guess) that what it is doing is retrieving all the columns of data. But if that’s what it’s doing, then why is it doing it so much more slowly than than People List view? And if that’s not what it’s doing, then what on earth is it doing during all that time? These are questions for which I don’t have an answer. I’m sure my guess about retrieving and formatting all the data is not entirely correct and it may not even be close to right. But Advanced Search is still extremely slow in a way that makes very little sense just looking at RM from the outside and trying to look inside.

Jerry-
I was intrigued by the performance issue you see where the advanced search query- Birth Date is blank- takes a short time to show the 99% finished and a long time to display the results.
I’m trying to reproduce that and I can’t.
On my machine, I also get the 99% in under a second but the results are finished displaying in under 2 seconds.
My database has only 12K people, 62K events. The query mentioned above returns about 225 people.
When I run is not blank, instead, the results take about 3 s, but that is returning about 12K - 225 =~ 12K people.
I remember when v8 came out, you also had slow scrolling (in the People List??).
I’m just interested in what the difference is between out environments.

I’ve been thinking about creating a couple of “standard databases” that could be available for download that were sanitized, in the sense that there would be no personal info to worry about when the db is shared.

It’s hard to know why the performance test results can be so different. I do have plenty of hardware.

The performance problem seems to be related to the total number of people in the database and also to the number of people who match the search criteria. At about 42,000 people in my database, I get good search performance when the number of matches is a few hundred or maybe even a very few thousand. But the performance is very poor when the number of matches is tens of thousands.

As you say, it is very hard to find out what it is doing, but we can do some experiments. For example, is the delay related to the number of records returned (which it might be if the problem was either in retrieving the data for them or sorting them)? Or is it related to the criteria (text or numeric, text equals or contains, etc). Or is it related to the number of different criteria? Or to the need to filter out duplicates?

I tested the first question by searching for the existence of a custom fact only attached to one person (x fact exists true). This obviously produces a single record in response. RM9 took 2-3 seconds to get up to 99%, and from there less than a second to render the final result. So then I tried the opposite (x fact exists false). This obviously produces all the people in the database except one (roughly 60,000). This took the same 2-3 seconds to get to 99% and also rendered the final view in less than a second after that. There was none of the very inefficient stuff that you have remarked on. From this we can firmly conclude that the problem is not simply a function of the number of records for which there is data to retrieve.

Then I tried data edited = 11/11/2023 (c 10 records) and date edited <> 11/11/2023 (c 60,000 records). As before, the first stage took 2-3 seconds in each case, and the second stage was practically instantaneous. After the first result, this is what I expected.

Next I tried to split the number of show/don’t show records a little more evenly. For this, I first used date edited < 11 Oct 2023 (c 57,000) records. The first stage of this took 2-3 seconds as before, but the second stage took very slightly longer - say a second or so. To vary things some more, I tried date edited > 9 June 2023 (the date I created my RM9 database) (c 7,000 records). I ran this twice to double-check the timings; the second stage took 20 seconds, and as you have frequently noted, RM’s use of CPU dropped during this time but its use of disk reached 85% of the maximum.

So the process works very quickly if either only a few records are included or if almost all of them are, but takes longer (with lots of disk) when it has to split the records a little more evenly.

I also tried a large number of other options, with more complicated criteria and tests that involved scanning lots of text fields (eg any fact place contains string). Not surprisingly the first stage of the process took more time as the criteria became more complicated. (The same applies for criterion groups). And any result which returned between say 15,000 and 45,000 records took much longer in the second stage than one with fewer or more records than that.

As to what’s going on, it is hard to tell. It looks as if the process for creating criterion groups uses a single SQL statement. Perhaps the advanced search first creates a temporary table and there is some inefficiency in working on that to retrieve the unique RINs required and sort them in numerical order. But if it was just this, then the process would take longer as the number of records increased. Without looking at the code or looking more closely at the database while the process is running, it isn’t possible to know.

1 Like

Thanks for your efforts. I hadn’t managed to characterize it quite like you just did but I think I agree with your characterization. For example, I had noticed that a search with criteria that literally matched everybody was very fast and I had noticed that a search with a criteria that matched a small number of people was very fast.

My test cases of Birth Date Is Blank and Birth Date Is Not Blank are more in your “split the records a little more evenly” situation. That’s because I have a distressingly large number of people in my database with a blank birth date.

In RM7, I was using the search for Birth Date Is Blank as my “to do” list to find people without birth dates so I could fix them by finding them and giving them a birth date. RM7 is slower than RM9 going through whole list, except that RM7 doesn’t go through the whole list. It stops on the first match very quickly, and I’m ready to go to work. And my first match usually finds a whole family grouping of people without birth dates, and I work through the whole family grouping just from the one match. And after that on RM7, I could do a Find Next to get the next person to work on. Unlike what everybody seems to say, RM7’s Find Next does not start over again. It picks up where it left off. Making the whole list of matches is not an advantage when all I need to go to work is the first match.

I tried using the same technique in RM8 and now in RM9 with extremely long search times as have reported several times. I did submit my database to the RM HelpDesk for problem analysis. They reported that they could see the same results as me except that their search times were not as dramatically bad as mine - bad, but not quite as bad as mine.

The workaround I have developed is to make a group instead of doing an Advanced Search because making groups is so much faster than doing an Advanced Search. The group becomes my “to do” list. Before Refreshable Groups, I would refresh the group manually and now I use Refreshable Groups instead. That left me with the slow group filtering problem. I solved the slow group filtering by making my groups much smaller, like by including a criterion such as Surname Begins With C or some such and I would work my way through the alphabet. Now that slow group filtering is fixed I no longer need to make my groups so small.

The blank birth dates are just one of many examples where in RM7 I used Advanced Searches as my “to do” lists and now in RM9 I use groups as my “to do” lists. For example, I use criteria such as Born Before 1910 And Died After 1910 And 1910 Census Exists Is False as a “to do” list for finding 1910 census. I use criteria such as Died After 1913 In Tennessee And Death Certificate Exists Is False as a “to do” list for finding Tennessee death certificates. I use criteria such as Died After 1925 And Obituary Exists Is False as a “to do” list for finding obituaries. Etc. There are lots more examples, and many of them include additional refinements to narrow things down further.

l find this kind of tool to be much more in accord with my own personal work style than RM7’s To Do lists or RM9’s Tasks. I find my thing to do, I do it, and it automatically removes itself from my “to do” list. So I was and still am profoundly frustrated with RM8 and RM9’s Advanced Search. I’m perfectly willing to use groups instead, but I still think RM9 needs to solve the performance problem in Advanced Search and I still think RM9’s Advanced Search needs a Find Next type of facility.