Merging Citations - Caution!

I discovered independently yesterday that I have the same problems created by choosing merge citations. However I think that my last known good backup is from last October when I first switched to RM8 from RM7. This means that I will have to redo hours and hours of work … and yes I do backup every time I use RM8 but a backup of bad data does no good at all :slight_smile: I have found the problem in many many sources and I think it is a bit disingenuous to blame the problem on how Ancestry might handle sources and citations.

As a user (even as a user with years of professional IT and computer experience) I would assume that when a program says it will merge identical citations that it would do just that. I would never for a moment think that two citations with different media attached would be considered identical citations.

Now that I am faced with a monumental cleanup task I may just also try out competing genealogy programs after being a faithful RM7 user and RM8 tester. The various “problems” in RM8 usability such as multiple mouse clicks for simple tasks compared to RM7 combined with this merging citations idiocy have, I think, brought me to the end of the road with RM8.

It seems that I have the same issue on a few sources and I have no clue when the issue occurred. That means I don’t know when the ‘last known good’ backup is. I have done a lot of work in the last couple of weeks and it seems no matter which route I decide to go (Try and tidy up the merged media or backup and redo all the work) it’s going to be a lot of work.

I’m not too sure how a source with different media can be classed as ‘identical’, but there you go.

Somehow I just realized that I have about 2 dozen sources that have merged most of the citations and, you guessed it … different media tags. My backup files all have the error, so I checked out my Ancestry trees and found a previous version from January, but that tree is no longer linked to my RM tree and I have added so many people, made so many corrections, cleaned up place names, sources, etc. that I don’t think it is worth using. I read somewhere that exporting your RM tree to GEDCOM would separate all of the citations, but it didn’t. I also tried a drag and drop. Has anyone come up with a fix? I agree with all here who say citations with different media are NOT duplicates!!!

Exporting to GEDCOM actually does separate all of the citations because GEDCOM doesn’t support the merged citations so it has to separate them back out. But exporting to GEDCOM will not get you your lost media tags back which is probably what you are looking for.

RM has a serious design problem with media tags and citations. As you suspect, if you merge citations that differ only in media tags, you lose media tags. For citations you enter yourself, you are extremely unlikely to encounter this problem. Citations you enter yourself which have different media tags are almost certain to differ in other ways as well. For you citations you import from ancestry via TreeShare, there is a possibility of encountering this problem and you obviously have encountered the problem. Most ancestry data collections seem not to have the problem, but some ancestry data collections do have the problem. The collections with the problem are “incompletely indexed” (my term for it - I’m not sure if there is a better term) where different images are indexed the same but have a different media link. The ancestry media link is then turned into a media tag in RM when TreeShare does the import and the ancestry index is turned into a citation. So the result is citations in RM which are the same except for different media tags.

This is your situation. If you don’t merge duplicate citations, then there are two problems. One is that you are not able to take advantage of RM’s reusable citations feature which is a very useful feature. The other problem is that you are not able to take advantage of RM’s feature to Reuse Endnote Numbers in reports which also is a very useful feature. And if you do merge duplicate citations, you lose media tags. The best advice is probably not to merge duplicate citations if you are using TreeShare to import citations, but I find that to be very unsatisfying advice.

I really don’t know how to cleanup the situation you find yourself in where you have already merged duplicate citations. I use a lot of data from ancestry, but type it in myself rather than importing it from Ancestry. I don’t try to keep my RM tree and my Ancestry tree synced. I just use Ancestry as a resource for data. Maybe RM users who do keep their RM databases in sync with ancestry can offer some advice.

Jerry, thank you for that. I have not lost the media tags, but merging the duplicate citations resulted in one citation with anywhere from 5 to 500 media tags. After doing more digging last night I found that I had a RM7 tree that was dated August 31 (I fought having to switch to RM10!), so I am going to use that one. Before I update it to RM 10 I am going to run some of those point form reports that I love so much and cannot get in RM10. I think there are also some other SQL scripts that I like that don’t work in RM 10. So I’m not as bad off as I thought. I will never merge citations again unless RM changes the way the merge duplicate citations works. Thanks for your explanation! Jaime

That’s interesting. Either I was totally wrong about the way it works (or the way it used to work), or else I was right about the way it used to work and something has changed. I’m 99% sure that something has changed because I’m 99% sure that I once investigated this thoroughly and that it only kept the the media tag from the first of several citations that were merged.

In any case, doing the merge while keeping all the media tags may not be quite as bad as losing media tags. But it’s still pretty bad because then there is no way to know which media tag goes with each usage of the citation.

Here’s an analogy. Suppose you were using Smith Family Book as a source and suppose you had citations for pages 12, 97, and 104. So your footnotes would be something like the following.

Smith Family Book, page 12.
Smith Family Book, page 97.
Smith Family Book, page 104.

(I’m not saying these are the greatest citations in the world. They are just a simple example.) Then suppose that the Merge All Duplicate Citations tool merged these three citations into a single citation as follows.

Smith Family Book, page 12, page 97, page 104.

The citation in RM where you had used the info from page 12 would then say
Smith Family Book, page 12, page 97, page 104.

The citation in RM where you had used the info from page 97 would then say
Smith Family Book, page 12, page 97, page 104.

The citation in RM where you had used the info from page 104 would then say
Smith Family Book, page 12, page 97, page 104.

I would emphasize that the Merge All Duplicate Citations tool does not make this mistake when there are differences in the text of the footnotes such as the differences the page numbers. This was just an example. But the Merge All Duplicate Citations does make this mistake for the citations from Ancestry that really are different but which differ only in their Web tags. I continue to believe that this is a bug that needs to be fixed.

I follow your logic Jerry and think you are right that it is better to have all of the media tags instead of just a few. I forgot to mention that these same citations have a corresponding number of weblinks too. I am fairly sure that the media file(s) pertaining to the citation being opened are at the top of the list, but most are not identifiable by their filename. It turns out that I had a late July version of my tree in RM7 (before I fully committed to RM 10) and plan to upgrade it to RM1), but of course it will not sync with my Ancestry tree so I am thinking it all out before I do anything hasty. I’ve gotten myself into trouble too many times before!

The best thing to do is add a citation name when there is nothing in the citation fields and different media and webtags are linked to it. Merging citations will only look at what is contained in a footnote. Those other items are not being considered.

Yeah, I didn’t notice this right away and lost months of work. Completely destroyed by Ancestry tree and I had to re-upload a 25,000 person tree. Personally I think they should disable the function until it works correctly.

I’m not sure what you are suggesting. To disable TreeShare? To disable Merge All Duplicate Citations? Which function should be disabled?

A lot of users depend on TreeShare. A lot of users depend in Merge All Duplicate Citations. For the most part, the conflict is only between TreeShare and Marge All Duplicate Citations. And even then, the conflict does not involve all citations that come into RM via TreeShare. The conflict is only for those citations that come into RM via TreeShare and that differ only in WebTags. The problem definitely needs some attention. But as a practical matter, I’m not sure what could be turned off in the interim.

Sorry Jerry I wasn’t clear.

It’s obvious the “Merge All Duplicate Citations” isn’t working as people expect due to the media tags, and should be disabled until fixed.

In my case the huge number of media that ended up attached to each citation were impossible to clean out through any effective means on Ancestry once they were updated via TreeShare, so my Ancestry tree was basically trashed.

In addition it made reporting and exporting a GEDCOM impossible.

I was able to “fix” the problem by manually deleting all the citations that had multiple media items but all those citations were permanently lost. My bad for not recognizing the problem right away, but I was still reeling from the issue with placeids > 32768 tossing corrupting places on a place details merge.

In any case as long as the possibility exists that someone could essentially lose all the Census citations because they all merge and then lose the ability to crate a GEDCOM I feel Merging Citations should be turned off. Saying someone needs to go into thousands of citations and add a comment is ridiculous.

I would have to object to this request. Clearly you are having serious issues with your use case (which I assume is related to Treeshare.) My use case is different. I do not use Treeshare but I do import GEDCOMs regularly from another tool. The merge facility has proven to be a valuable tool for me so I think “turning it off” would be a mistake.

As with any tool that has potential to alter massive amounts of data at once. It should only be used if you have fully recoverable backups at your disposal.

Perhaps the best resolution is to have the developers issue a precise definition of which fields are used/not used to define a Duplicate Citation or Source and to describe the expected results of the Merge operation.

Users guessing and running test cases to determine how these tools work seems to be far less than optimal.

The issue at the heart of this is a bug, and the effects of that bug aren’t immediately apparent, and not simply isolated to TreeShare. Any situation where the thing that made the citation unique was the media was affected. For example if you cited “Research Notes from X” where the individual citations were one of the 20 pages, then post merge you ended up with a citation with 20 media items attached. It’s only relation to TreeShare is when you merge all your census sources to one or more per census then all the individual media items all end up on one merged citation, and Census is one of the most common things cited repeatedly with media via TreeShare.

Discovering the negative effects of the bug could take a person months if their database was smaller. The only reason I found it was it disabled my ability to generate certain books via publisher and export a GEDCOM. At that point your backup is worthless as teets on a boar hog.

So they need to either a) fix the bug, b) disable the function, or c) post up a message that saids “You’re about to totally hose your database if you have media items attached to your citations, are you sure?” before running the merge citations.

Or an accompanying informational error message with either the conflicting field or the actual textual difference(s) (ie. the “delta”).

This feature screwed up one of my files over a year ago using RM 9 but I didn’t realized it until I had added a great many new records to the file so it didn’t make sense to restore an earlier backup. I think the feature is a poor design since media and tags both have a direct relationship with the citations being merged and should be considered. If not eliminate the feature, display a very large warning to users when they use the feature. I teach monthly in a local RM user group and I always warn people NEVER to use this feature. It would be better to add a citation merge feature similar to the Place merge feature with boxes and allow the user to consider the media and tag data before completing the merge.

If I had to follow the advice NEVER to use this feature, then it would have been necessary for me to remain on RM7 forever. I depend heavily on printed descendant narrative reports for family reunions. For those printed reports, I depend heavily on the Reuse Duplicate Endnote option. The Reuse Duplicate Endnote option does not work unless you DO use this feature after importing from RM7 to RM8/9/10.

I am a sample size of one. I enter all my citations by hand. I do not import citations from Ancestry via TreeShare. I don’t have any citations that are different and where the only difference is in the Web Tags or in the media files. Therefore, I do not encounter the problem. But I’m aware that many RM users do encounter the problem. TreeShare is one way to encounter the problem, but it’s not the only way.

I really think it’s a bug, and I think that it’s a bug that needs to be fixed posthaste. If it’s an intentional design rather than a bug, then it’s a bad design and the software is broken as designed. The suggestion for users to solve the problem themselves by going to each citation name one at a time and making each citation name unique is not even remotely practical.

I am definitely not denying the shortcomings of this feature and the potential problems it can create. While I support your suggestions a and c (preferably option a), option b is rather 'big hammer" response that removes a feature that has proven useful in my use case even with all its shortcomings. I think I have successfully avoided this problem for the most part because almost none of my citation details have media attached at that level.

To me, the ideal situation would be for Merge Duplicate Citations to have checkboxes of what you wanted it to compare. You could check webtags and media tags so that citations that were the same except for one of those tags were not considered duplicates. Of course I am no software engineer, but I am thinking of the way you can merge duplicate people by comparing the spelling of their names, birth dates, etc. If merge duplicate citations worked the same way, everybody could have it the way they wanted.

1 Like