Beware of merging sources and/or citations

I recently made a resolution to move entirely from RM7 to RM9, largely because the search features in RM9 make it much easier to work with names and sources, reducing duplication since I can find what I want in the lists. I worked for ten days and cleaned up a lot, including more than 500 separate sources from Find-a-Grave. Each uses the built-in (RM7) template and is listed separately in my source list like this: “Prescott, Sarah Hayward (1645-1709) - FindaGrave #37130889.”

I went on to other clean-up tasks and then ran the tools to merge duplicate sources and duplicate citations. I thought it was amazing that the source count went from 2330 to 1520, and the citations list went from 29370 to 4640. I thought – “wow, what an improvement!” I spent the next five days working on all sorts of other details to freshen my main file.

I should have been thinking “OMG, what a disaster,” because I just now found only FOUR FindaGrave citations left in the file after I ran the merge tools. It doesn’t appear other sources were affected like this. Now I need to choose whether to rebuild yesterday’s file (without nearly 500 FindaGrave sources) or last week’s backup (with all the FindaGrave sources intact but without five days of other editing).
So, before I do anything I’m asking for help and advice!

There is a rock and a hard place involved in your question.

On the one hand, it’s almost mandatory that you run the Merge Duplicate Citations tool after you import your RM7 database into RM9. Otherwise, duplicate footnotes and endnotes will not merge in any way when you run reports. For me at least, that’s pretty awful.

On the other hand, it’s almost mandatory that you not run the Merge Duplicate Citations tool after you import your RM7 database into RM9. Otherwise, certain citations that are not really duplicate will merge anyway creating the kind of disaster you have just encountered. That’s also pretty awful.

The situation that I’m most familiar with concerning false merges of citations that are not really duplicate is when the citations differ only in having a different image file. That can happen when downloading from Ancestry using TreeShare when the Ancestry collection providing the source is only partially indexed and when the citations in Ancestry differ only by the image file.

I’m not familiar with the same thing happening with FindaGrave citations. Your example of “Prescott, Sarah Hayward (1645-1709) - FindaGrave #37130889" sounds on its face like it should be ok because there seems to be a lot information that should be different for every citation. However, my RM7 does not have a built-in source template for FindaGrave, so your RM7 doesn’t either. That template has to have come into your RM7 database some other way. Maybe it was put there by TreeShare?

I enter all my sources completely by hand, so I don’t run into the problem of citations that differ only by the media file. And citations that differ only by a WebTag could have the same problem of differing only by the WebTag.

I don’t understand your situation well enough to provide a complete solution. But I do think you are going to need to delete your current RM9 database and import it all over again from RM7. But before you do, I think you are going to need to look into your RM7 database and look at that FindaGrave template to see what’s going on. Also, look at some of the FindaGrave citations in RM7 to see how they do or do not differ.

1 Like

If the Citation Details are blank, and you only have added items like Media or WebTags it won’t see them as unique citations and they will be merged. To prevent those merges you need to give the Citation an unique Citation Name.

To preclude such false merges, the RM application needs to take into account whether there are differences among WebTags and among Media tags. This has been a requested fix for the user trap created by the Merge All Duplicate Citations feature since it was first reported years ago.

Thank you for helping here. I’ve been digging deeper and see that four of these FindaGrave records did survive – four out of 542, but I don’t see why. Three are on based on my RM7 template (which I think came with an update) and one on the ee-FindaGrave template. Here’s a sample of one that didn’t survive – I don’t have anything in the “citation detail text, media” box, but obviously I have unique information in each “citation details” panel.

Maybe the more important question is “what now?” Do I re-enter every single FindaGrave source using the ee-FindaGrave template, and then put something unique in the “citation detail text, etc.” panel. I guess I need to experiment. Maybe someone can suggest a shortcut to ease the pain!

One suggestion I might make is to go back to your last RM9 backup before the Auto Merge rather than going all the way back to RM7 as a part of recovering.

I have some puzzlement about why so many of the FindaGrave citations merged. The only way to tell would be to look at the citation in RM7 or to restore to RM9 from before the merge and look at the citations there. I don’t think there is any way to figure out what happened just by looking at your RM9 database after the merge.

Looking at your screen shot, any differences in the Citation Name or Name or Born-Died or Comments or Memorial No. or Research Note or Detail Comment should prevent the merge. Differences in Media and Citation Web tags will not prevent the merge, but this citation doesn’t have any Media or Citation Web tags anyway.

If you do decide to restore your “before the merge” RM9 database, be aware that it will be restored with the same database name as before. There is no way to rename it as a part of the restore process. That means that by default, the restore will destroy your “after the merge” RM9 database. You can solve this problem either by restoring to a different folder or else by renaming the “after the merge” RM9 database before the restore. The rename is probably simplest. Just rename it to something like “after_the_merge” without the quotes and then do the restore

Then look very closely at some of the FindaGrave citations that merged that shouldn’t have. What you see there is going to guide the “what do I do now?” process.

I am confused by the attached image. The “Citation Used”= 1 so this doesn’t appear to show a merged citation. Also, the source name indicates that you’re a source splitter so I wouldn’t expect this source to have many citations under it. I’m struggling to see this screenshot as an example of a merge problem.

[edit: added the following additional text]
Typically, when people post about automerge problems it’s related to problems resulting from ancestry treeshare. Some ancestry collections are severely lacking in the source and citation information that gets sent across the API to RM. (Ancestry’s US Directory collection is a prime example of this issue.) If the RM user doesn’t review and edit the citation after it’s been treeshared then the only unique item will be the attached media. Neither the media field nor the webtag field is considered in RM’s merge process and the result is a mess.This is the issue that Jerry and Tom were raising. The problem is avoided if the citation adds any unique information to the citation. The image you posted shows fields that are not in the source template that gets created via Treeshare and the image shows source and citation details that I assume you manually entered and would have made it unique. It shouldn’t have caused a problem when merging.

1 Like

I am inclined to agree and suggest that its loss is the result of manually merging the different FindAGrave Sources. There still may be another unwanted outcome from the flawed Merge All Duplicate Citations but that example looks to be a unique Source that should have persisted through a Merge All Duplicate Sources type operation.

1 Like

Yes, that would make sense. I reread the original post and it speaks of merge vs automerge. When I saw the numbers involved I assumed automerge was the culprit.

All the merges for sources and citations have been from the “merge all duplicate” buttons. I’ve thought of various tests and will get to them after the holidays. I created a group in the RM9 file (the one that lost all but four of the FindaGrave sources), that includes all 150 people I had added to the file since I had imported from RM7. Then I started over: 1) Importing into RM9 the original RM7 file that has all 550 FindaGrave sources but lacks the 150 later edits, 2) running “merge all duplicate citations” then "merge all duplicate sources, and rejoicing when it worked with any apparent loss of sources, 3) importing a GEDCOM of 150 names I had edited since I had closed the original RM7 file.

Going forward I’m doing only ONE file operation at a time, then backing-up with a unique name and making a note so I’m clear what I have done and when.

In the meantime, thank you all for your helpful ideas. If I find the gremlin, or can duplicate my error, I’ll give an update.

1 Like