Search troubles

Let’s start with the search on the “Couple List” or “People List” page. If I enter all letters in lowercase, the search finds nothing. Only when the first letter is uppercase does the search work. Additionally, the search only works by surname; it doesn’t consider first names. This is very strange because if you’re using full-text indexing, there should be no difference for such a search.

Now let’s look at the general search at the top right with the placeholder “Find everywhere.” It does indeed search everywhere. But this search is too slow—10-20 seconds. Moreover, it is also case-sensitive.

Consider the search in the right sidebar called “Index.” It also searches only by surnames and is case-sensitive.

And finally, the “Search” in the left menu. Let’s start with “Person search.” If I enable the “Close match” option, no matter what I enter in the search fields (something that isn’t in my database), I get the same unchanged list of results. What does this mean? Instead of returning an empty list, the search returns my entire database? And again, the search is case-sensitive, even with the “Close match” option enabled. How does this option work at all?

Search with typos also doesn’t work, although full-text search should be able to do this.

So, I’m very surprised that the application has many search areas, but none of them work well:

  1. All searches are case-sensitive.
  2. Almost all searches can’t search by first name.
  3. Some of them are too slow but still don’t fully use the capabilities of full-text search.
  4. The “Close match” function should solve the case sensitivity and typo search issues (in my opinion), but it doesn’t.

Please explain what I misunderstood or didn’t fully grasp about the functionality. Or maybe I understood everything correctly, and these are not bugs, but that’s how it’s supposed to work according to the developer.

Thank you.

Rather than an outright enter keyword, then search… RM’s (name) Search controls present as “filtering down” an index of “already-loaded” two-part names in a distinct Surname, GivenName orientation with the comma being a functional division for them. Within each of those two subdivisions the search term character “groupings” are checked for first their presence and second their placement (first few characters/middle few/last few) within those of the full index. Then, they are displayed (within the “filtered” results) in ascending Surname,GivenName order.

In other words, start by typing a lower case ‘b’… you get all Surnames with a ‘b’ in it and sorted Surname ascending alphabetical, GivenName alphabetical. Same for ‘be’ or ‘Be’, same for be,k or Be,k or BE,k or Be,K, ETC.
If all you know is their GivenName is Frank… enter ,frank or ,Frank to get all names with those terms in the GivenName.

RM’s searching functions are not perfect. There are definitively some areas where I would like to see some improvements. But my experience is very different than the limitations you describe. In particular, in my experience the searches are not case sensitive and you can search by first name. For example, here is a search on the surname peters which does find Peters.

And here is a search on the first name of alfred which does find Alfred. I type a comma before the first name to get it to search for the first name rather than for the last name.

yeah, it works. This is really strange behavior for me, but OK.

But lowercase is still not working. Maybe Cyrillic is the reason?

Find Everywhere doesn’t actually search everywhere, but it does search lots of places. I find that it searches enough places to be very useful.

It’s hard to see how to make it much faster. I don’t use it very often, but when do use it then it is fast enough for my needs. The reason it’s hard to see how to make it much faster is that it’s searching areas of the RM database that really are pretty impractical to be indexed, such as all the various note fields for facts and for sources and for citations.

I think that Find Everywhere needs two particular improvements. One is that it really does need to search everywhere. The other is that there needs to be a checklist of items it searches and doesn’t search so that the search can be more targeted when such targeting is warranted. But these kinds of improvements to Find Everywhere are pretty far down my list of areas of RM which I think need improvement.

Yes, really. lowercase search works for Latin, but not works for Cyrillic.
Is it possible to fix this issue?

Yes, Cyrillic is surely the reason.

If you are searching non-Latin alphabets, then I agree there are significant searching problems. And even with Latin alphabets, then RM’s treatment of non-English letters is pretty sub-optimal.

Sometimes I think I understand these issues pretty well and sometimes they are so complicated that I think that I understand nothing. The RM database uses SQLite as its datastore. SQLite’s characters in turn are UNICODE characters. UNICODE can store pretty much any letter or character from any alphabet in the world, plus UNICODE can store lots of other symbols such as math symbols and even emojis. But UNICODE doesn’t really define collation for alphabets, especially when it comes to case insensitive collations and comparisons. The most basic comparisons are simply bit by bit comparisons.

Well, the bit codes for English letters are arranged such that A is before B which is before C, etc. But there is nothing in UNICODE per se which which will compare an A with an a as equals for case insensitive comparisons. Instead, that sort of thing is left to the application. In the case of SQLite, there is a NOCASE collating sequence that is defined, but it only works for English letters. RM itself has added an RMNOCASE collating sequence, but it is only intended to collate and search a few non-English letters as if they were English letters. For example, the Norwegian Å is collated by RM as if it were the English letter A, even though Å comes after Z in the Norwegian alphabet.

But there is nothing in either the NOCASE collation nor in the RMNOCASE collation that handles the Cyrillic alphabet. You are therefore stuck with case insensitive searching and sorting. The same is true for such any other non-English alphabet such as the Greek alphabet, the Arabic alphabet, or the Hebrew alphabet.

I made reference to UNICODE throughout my note. SQLite actually stores UNICODE in a UTF-8 encoding. UTF-8 is still UNICODE, but it encodes UNICODE so that English letters can be stored in 8 bits rather than in the 21 bits that would be required for pure UNICODE. And because of the way computers work, storing 21 bits for each letter is tantamount to storing 32 bits for each letter.

I’m just an RM user and not an RM developer, but I think it would be profoundly difficult to fix this issue - not impossible perhaps, but profoundly difficult.

First of all, I can’t see any way to sort mixed alphabet names in any rational sense. Like what if you had a bunch of English surnames along with a bunch of Cyrillic surnames in your database. How would you expect to sort the English names as compared to the Cyrillic names?

I do think that problem of case insensitive searches could be solved. To solve it, the functioning of RMNOCASE would need to be radially changed. First of all, it would need give up its attempt to collate non-English letters in the Latin alphabet as if they were the English letters they most resembled. Then for as many alphabets as possible, including Cyrillic, it would need to define a collation that placed an upper case letter and the corresponding lower case letter in the same order. I do think that’s possible, but I’m not 100% sure.

RM has mentioned publicly that they have long range plans to support languages other than English. That’s not the same thing as a commitment to a time frame. Nor is it a commitment to what that support might look like. There is genealogy software on the market that does support languages other than English. But I don’t know that that feature actually looks like nor do I know if that feature supports case insensitive sorting and searching in languages other than English. I would only repeat that case insensitive sorting in multiple languages is surely not an impossible problem, but equally surely it is a very difficult problem.

It depends on how you choose to solve this problem. I know one solution where you don’t need to match uppercase letters with lowercase ones. This solution looks like this (for developers):

Let’s assume you currently have names and surnames stored in the database in the name and surname columns. You need to add two more paired fields, which you could call, for example, index_name and index_surname. These fields will duplicate the previous two, but during data entry, they will be converted to lowercase using a simple native function. In PHP, for example, this is the lowercase() function. Other languages also have similar functions. These two columns are the ones you need for full-text indexing for search. When a user performs a search, you should apply the same action to the search string—convert it to lowercase. Then you can perform a full-text search or a regular search in index_name and index_surname. This completely solves the case sensitivity issue regardless of the language.

It’s interesting you should mention this solution. RM has always had three name columns called Surname, Given, and Nickname. RM8 introduced three new columns called SurnameMP, GivenMP, and NicknameMP. As of RM8, the three new columns were not being populated. As of RM10, they are now being populated and I don’t remember noticing when they first began being populated.

Even though the columns are now being populated, they do not yet appear to be doing anything. Perhaps they are there in preparation for some new feature. For example, perhaps they are there in preparation for being able to sort and search names in languages other than English as you need for your Cyrillic names.

That is a wild guess, and remember that I’m just a user and RM does not announce future plans. Another wild guess (perhaps not so wild), would be that the M and P stand for something like Matronymic and Patronymic and that those types of names are the purposes for the new columns rather than for sorting and searching names in languages other than English. We as users will know only when and if a new feature appears. And there are columns in RM’s database that have been there for many years without any apparent connection to any current RM feature.

I confess I was looking at the problem of case insensitive searching in languages other than English in a more holistic sense than just searching on name columns. Having search columns for names as you suggest would be very easy to implement and would be a very powerful new tool to be added to RM. But being able to have case insensitive searching everywhere including in things such as notes would be a much more difficult problem to solve. I can’t picture replicating every single column containing alphabetical characters, with one column being upper and lower case with the other column being lower case only. That actually would solve the case insensitive search and comparing problem, and conceptually it would be easy. But it would nearly double the size of the database and it would double the time for any update operations for those columns.

1 Like

Yes, I agree with you.