New Manifestation of Poor Performance with Search or Making Groups

A question just came up on the Facebook page about finding facts without sources. The standard answer is to do it from the Fact Type List, For example, print a report of all Birth facts without a source. This process will not make a group, and you might want to make a group.

I was about to respond that that as an alternative you could make a group using the following criterion.

Birth => Source => Does Not Exist

Or you might make the criterion a little more comprehensive as follows to avoid including people with no Birth facts in the group.

Birth => Date => Is Not Blank
AND
Birth => Source => Does Not Exist

Before posting my message to the Facebook group about this possibility, I tested these criteria. They do work, but on my database the groups result in a huge performance problem. Filtering by groups with these criteria locks up my RM for about 30 seconds. It’s a very persistent problem. For example, if I switch to the Home tab and then back to the People tab, the same 30 second lockup occurs again.

I tried it with a Free-Form group and also with Saved Criteria group. The same performance problem occurs either way.

This is not a new problem and I have posted about it before. It’s just a new variation on the same theme. You don’t have to be making a group to encounter the problem. In fact, I first encountered the problem using Advanced Search.

My current workaround is to make groups smaller, for example to include a limitation by surname in the criteria. But I sure which this problem could be solved.

These are the filters I use:
Fact Type List, List type: Facts without sources.
Group with filter Any Fact - sources - does not exist

I am not seeing an issue with creating a group or searching. How many are in your database?

My database is presently down to about 43,000 individuals. I continue to do a lot of cleanup.

WIndows 10. RM9.0.5.0. I have plenty of hardware - 32GB memory, 4 processor cores, fast SSD disk.

As I have discussed before, the severity of the problem seems not to be associated with the size of the database Rather, it’s associated with the size of the group (or for an Advanced Search, the number of hits on the Advanced Search). If I use the criteria Birth => Date => Is Not Blank AND Birth => Source => Does Not Exist, then there are 26,440 hits. The reason the number is so large is that I entered or imported the people in question mostly back in the 1990’s before much software had fact sources and before many users had fact sources. Think PAF, for example. So Sources were person sources.

I’ve been cleaning up ever since and been moving person sources to the appropriate facts. Searches or groups have always seemed like the best way to manage which individuals need to be cleaned up in this fashion. RM7 was pretty instantaneous in filtering by large groups. RM7 was pretty instantaneous in getting to the first or next hit in a search with a large list of hits. RM8 and RM9 have instead a very serious performance problem with groups with lots of members or with searches with lots of hits. The workaround I have developed for RM9 is to add additional criteria to make the group smaller, like only people whose surname starts with a particular letter or things like that.

I have submitted my database with descriptions of how to reproduce the performance problem in other contexts. My only reason for this thread is just that I encountered a different way to reproduce the same problem.

Remember that I find it quite fast to to make a large group and quite slow to do an Advanced Search with lots of hits. The performance problem with groups does not manifest until the group is used as a filter. So whatever is so slow in an Advanced Search with a large number of hits has moved in groups to the filtering stage.

If you want to open a support ticket with the file we could test using it.

You all already have my database. If I remember correctly, you all were able to reproduce my symptoms the first time around. I didn’t hear back anything about a solution. Do you wish for me to submit it again for this new manifestation? If so, I’ll be happy to submit it again.

I went ahead and submitted it just now.

I think that this is the same issue with criterion groups that I posted about here.

My main database now has just over 57,000 people in it. To test the issue I created two groups, a criterion group (has any fact with date < 1900) containing just under 34,000 people and a freeform group containing ancestors and their descendants going back 10 generations and then forward 10 generations (containing roughly 43,500 people).

The larger freeform group takes about 10 seconds to load in the index on the people screen and the smaller criterion group takes about 25 seconds. There is the same delay when you move from other screens to the people screen or move from for example the groups option back to the index option or when you remove a more detailed filter (eg typing ‘Watson’ into the search box and then clearing it) on the index.

In each case, RM9 seems to re-apply the filter, and it each case it takes longer to do this for a criterion group than for a freeform group with more members, even though it is not supposed to do any calculations on the fly for the criteria.

I am running the 64 bit version of RM9 on Windows 11 using a fast, modern laptop with lots of cores etc, 16gb of ram and a fast ssd which is not being synched anywhere else.

Thank you for posting. Many users seem not to see the problem, or at least do not seem to see the problem quite so acutely. So having additional instances reported helps to confirm that there really is a problem.

I suspect that having a freeform group vs. having a criterion group has little or nothing to do with it. You could test this by recreating your freeform group as criterion group with a different name or vice versa and testing with those additional groups.

However, sometimes a freeform group cannot be recreated as a criterion group because the allowable definition of a criterion group is somewhat limited. For example, I don’t know of any way to create your group of ancestors and descendants as a criterion group. But you could recreate your “any fact date < 1900” group as a new freeform group with a different name, just for testing purposes.

I just now created your “any fact date < 1900” group in my database, both as a free from group and as a criterion group. There are 22305 people in the group either way, out of 42920 people in my database. The filter time for either group is about 25 seconds.

The amount of time required to filter a group does seem to be related to how many people are in the group. The filter time for a group of 100 people is fine. The filter time for a group of 25,000 people is not fine. But the filter time does not always seem to be directly proportional to the number of people in the group. For example, I just made a group of “any fact date is blank OR any fact date is not blank” which was deliberately designed to include all 42920 people in the database. The filter time is 15 seconds whether I made the group as free form or criterion, which is faster than the 25 seconds for a group of 22305 people.

Remember that this problem has been there since the advent of RM8, long before the implementation of criterion groups. I first noticed the problem in the new Advanced Search. During an Advanced Search, the initial pass at the criteria is quite fast, often less than a second and indeed faster than in RM7. But then there are two very slow phases that you can only see with Windows Task Manager. In the first very slow phase, the I/O rate extremely high where the Advanced Search seems to be reading the entire database and collecting data. In the second very slow phase, the CPU rate is very high with one processor core completely maxed out. It’s a guess, but during this phase I suspect that Advanced Search is building the results list in the format for final display.

The time seems to go up faster than the number of people in the group or in the Advanced Search. If N is the number of people in the group or the Advanced search, I suspect the time is going up proportional to N-squared rather than going up proportional to N.

In using groups, I don’t see any of this delay while building a group. Rather, I see this delay while filtering by a group. And I only see the CPU bound phase of the delay when filtering by a group without seeing any of the I/O bound phase of the delay.

If the problem is going to be solved, I think the solution is going to need to start with the delays in Advanced Search. If the problem is solved there, the same solution will surely also work for filtering groups.

I had forgotten your previous comments about spikes in disk and cpu, so I set my notebook up with the screen split between RM9 and Task manager and watched. I carried out five different operations to check their time and their resource consumption.

  1. Filtering the index on my criterion group (any fact has date < 1900 c 34,000 people). As before this took c 25 seconds. CPU quickly rose to 16/17% and stayed there. I think there was some disk usage, but it was barely noticeable. (Filtering the people screen takes the same time and resources.)

  2. Filtering the index on my freeform group (ancestors going back 10 generations & their descendants, c 43,5500 people. As before, this took about 10 seconds and as above, CPU was at 16/17% throughout with a barely noticeable tick in disk.

  3. Refreshing the people in my criterion group. (To be clear, nothing may have changed since I created it.) This took about 3 seconds. Disk peaked at 1% for about 1 sec; CPU rose to 16/17% for about a second.

  4. Re-creating my criterion group with the same criteria after having first deleted it. This followed a similar pattern to 3 above - rather impressive performance.

  5. Using advanced search to search for exactly the same criterion as used to define my group, all people with any fact before 1900. This took over a minute to run. It started with CPU climbing from zero to 16/17% for five seconds or so; then it used loads and loads of disk rising to 85% for about 40 seconds; then it went back to using CPU at 16/17% with disk slowing down and moving to zero.

In summary, there is plainly something very inefficient in the advanced search. It is hard to see why a searching on a criterion and displaying the results should take so much longer than creating a group with the same criterion and filtering on it to display the results. In particular the group/filter process uses about 1% of disk for about 1 second whereas the advanced search takes about 85% of disk for about 40 seconds - about 3,400 times as much disk usage. Does the process for creating groups read an index whereas the search scans one or more large tables?

There is also something strange going on with the display of my criterion group, but it is not as as weird as the advanced search and appears to involve extra CPU usage rather than disk.

Alan

Your 16/17% is maxing out one processor core, and 16/17% is about 1/6 of 100%… That clearly means that your machine has 6 processor cores. I get 25% because my machine has 4 processor cores.

Your most telling experiment is experiment #5 with Advanced Search. The inefficiency with Advanced Search is much worse than the inefficiency associated with filtering by group. However, the inefficiency with Advanced Search only happens one time if you then can use the results of the Advanced Search over and over again without redoing the search. But with a group, the filter keeps getting re-applied over and over and over again as you navigate between RM screens. Both problems need to be solved.

RM7 did not have the RM8/9 style Advanced Search. It did Advanced Search in a very different way that met my needs much better than does the RM8/9 style Advanced Search. I only want to get the first or next match to get to one match at a time, and RM7 brought up the first or next match very quickly. I didn’t have to wait minutes sometime for the whole list to come up before I could start my work.

RM7 did support filtering by group for People List View. Such filtering was essentially instantaneous. So better filtering performance is obviously possible.

I can only guess about the nature of the problem. For Advanced Search, guess #1 is that the search is probably building a screen that is intended to include all the same columns as are currently defined for People List View. I think that’s not a good design. All that the search should be producing is a list of RIN’s. Guess #2 is that the search for a group may be building indexes to make sorting of the columns of People List view faster when filtered by a group, and maybe it’s building the indexes in a very inefficient way. What gives lie to my guesses is that People List View is very fast when it is not filtered by a group. It comes up very quickly and it sorts columns very quickly. So I don’t really understand why adding a group filter to the mix slows it down so much.

It was to anticipate your guess 1 that I added a note to my operation 1.

‘Filtering the people screen takes the same time and resources.’

Plainly, even if the advanced search is doing this, it is not doing it well.

Some queries are just not written well. And some databases mess up queries even when they are written well. I remember a system in which stock trading deals were given a sequential suffix to their reference number when amended xxxxxxxxx/1, xxxxxxxx/2 etc. A standard screen retreived the latest version of a trade when its number (xxxxxxxxx) was input. On investigation of slow performance we found that the database was first retreiving the latest version of all the deals in the table and then selecting the deal concerned from that, rather than first retreiving all the versions of the deal concerned and then selecting the latest version. Nothing that you did to the SQL, the indexes etc, altered the underlying process; we had to write a database procedure to update a new table with the latest version of all deals as they were amended and then link to that. Then the enquiry was lightning fast.