Excessive time for GEDCOM export

keithcstone · June 23, 2024, 1:23am

Maybe, but the last drag n drop was on RM10, which took over 48 hours to complete. The issue is how do I git rid of all of the refuse.

I’m thinking my main option may be to delete all the media and citation links between a certain time period. It’s going to hose the Sh*t out of some of my work lately, but right now I can do nothing as GEDCOM, Books, etc, are totally disabled.

TomH · June 23, 2024, 2:17am

If your RM10 drag’n’drop started with 30k citations and 31k media links, then it shows its procedure is also disintegrating citation uses into individual citations. This is not refuse that you need to clean out; outputs such as reports should be identical from both files. You must keep it as-is or use the tool Merge All Duplicate Citations to get the numbers back down - don’t delete!

Of course, one wonders how transparent both procedures are.

kevinm · June 23, 2024, 4:42pm

Apologies in advance that this suggestion is far less than ideal but I’ve been following this issue that you’re working through and am not sure that there is an ideal solution, beyond a code fix that might be a long time coming.

I’m not sure what the original issue was that caused you to do the drag n drop that expanded your citation count by a factor of 5 and medialinks by a factor of 50 but have you considered importing the 11/23 db to rm10 and doing an immediate drag n drop?

I realize that you’d loose all updates but, at a minimum, you could determine whether the new db copy process for drag n drop would yield better results than gedcom export/import. The size of your dbs makes me hesitate to offer strategies to identify an update delta. If that could be done somehow, perhaps the new, more granular import capabilities might help. While painful, this approach avoids the risks that merge all duplicate sources and citations carries if you treeshared those ancestry birth and death records.

ennoborg · June 23, 2024, 5:31pm

Assuming that they’re duplicates, wouldn’t it be better to try some SQLite tools to see what they are?

Also, when you do a GEDCOM export, does it slow down in a specific phase? And if so, could that point to problems with particular objects?

When I do an export here, with a test tree with more that 900,000 persons, I can see how fast it collects persons, families, etc., and in your case, it might slow down collecting citations, or media items and links. Does it do that? Or does it slow down in a later phase, while linking these objects before the final stage of the export.

keithcstone · June 23, 2024, 8:07pm

I’ve looking at the database with SQLite. When I run this:

SELECT OwnerId,OwnerType,COUNT(LinkId) AS Total From MediaLinkTable GROUP BY OwnerId,OwnerType ORDER BY COUNT(LinkId) DESC

I get thousands of rows with exactly 529 links to citations. All links seem to be to one source.

ennoborg · June 23, 2024, 8:53pm

Any idea where that 529 comes from? Is there a source with 529 citations?

keithcstone · June 23, 2024, 9:03pm

The issue that caused me to drag and drop in the first place was when the Id of a Place or PlaceDetail was greater than 32768 and you attempted to merge them the UI merged with the wrong place and your place information was lost. I’m pretty sure the index in the UI was 16 bits and overflowed.

I saw it because I clean up places after every session (which is nearly daily now) and know that accurate places are critical for electronic research which I’m doing almost exclusively.

Once I realized my places were corrupted I started working backwards on my backups to find the newest DB without corruption (the 11/23 one) and restored that. I then used the RecordId and UTCModDate in the DB to determine what records were updated in the latest DB and mark them as a group. I exported all of them as a GEDCOM and imported them to the restored database AFTER I’d done a drag and drop on it (using version 9 at the time) to resequence all the PlaceIds and get things (mostly) current. I needed to manually export the ConfigTable to SQL Insert statement and recreate all my Books.

SO no, this isn’t my first (or even second) rodeo. That’s why I’d like RM to address the root causes and improve their cleanup tools so I don’t have to do this again. This is my second big issue in three months.

TomH · June 24, 2024, 12:02am

As I have posited earlier, it sounds very much like you have run the Merge All Duplicate Citations in your RM9 database.

Unfortunately, it also sounds like you made extensive use of Ancestry sources delivered via TreeShare. Some Ancestry collections deliver citations that are identical except for the media item so the RM Merge tool thinks they are duplicates. Moreover, the Source may be simply Ancestry.com for all the citations. The end result then is one Citation per collection or fraction of a collection with each Citation reused many times. All the individual media items previously linked to the pre-merged duplicate Citations are now linked to the single merged Citation. In your case, resulting in 529 merged Citations, each with multiple reuses and a corresponding number of media links.

When you drag’n’drop or export-import, each reuse is converted back to an individual Citation record and each record gets as many media links as the merged Citation had. So medialinks really mushroom.

Now that’s just normal bad behaviour by RM that should have no bearing on your problem that the export cannot complete. I think something else is afoot. However, I think your citations were pretty useless after they were merged and were already uninformative for text reports beforehand.

keithcstone · June 24, 2024, 12:50am

Unfortunately Tom you seem to be correct on all counts. I think I’m going to need to delete the offending Source, and thus lose all the potential valid references to it. I’m making one last shot at merging all duplicate citations that I’ll let run overnight.

TomH · June 24, 2024, 7:41am

Given what I believe you have, do not use Merge All Duplicate Citations. That is going to cause irreversible damage to your citations. RM needs to enhance the tool to take into account differences in media and webtags for it to be safe to use.

ennoborg · June 24, 2024, 12:15pm

Is this still true? I never edit places in RM, for a couple of reasons, and in the datebase, the ID is an integer. I checked that with an SQLIte browser, and I can also see that in my test database, with more than 900,000 people, there are almost 59,000 places, and the SQLite browser shows IDs above 59,000.

This suggests that RM 9.1.1. can at least work with 16 bit unsigned integers, but given the number of persons, I’d rather say that it works with 32 signed integers, so that 2 billion places should not be a problem.

What part of the UI are you referring to here?

keithcstone · June 24, 2024, 12:49pm

Is was true with 9.1.3, and I assume it’s still an issue with 10 as the UI hasn’t changed. While the database contains a full 32 bit integer what I can see is the UI component for merging place details does not. Since place and place details are stored in the same table what was happening was places and place details and were disappearing after a merge.

This was particularly problematic for me as I make heavy use of place details to keep the places clean for electronic matching, so I has several places that had 50-60 place details attached to them that simply went “poof”, and I found that I has numbers of place detail rows that has master ids that no longer existed.

TomH · June 24, 2024, 1:38pm

After what type of merge? People or places or…

If this was during the flakier era post-RM7, might it have been a failed merge, one that stopped before completion? Given your many complaints about slowness and the large record numbers you seem to have in some tables, that seems to me to have been and still may be a real possibility.

keithcstone · June 24, 2024, 3:21pm

Merge of place details.

ennoborg · June 24, 2024, 3:52pm

OK, thanks. It is a part that I never use, because I don’t have place details in my database.

keithcstone · June 25, 2024, 8:36pm

As an update I just had a nice online session with Diane from support. I should have apologized in advance because I had RM up three machines (two remote hookups) and made a separate copy of my DB for the session, so it was confusing as hell until I got a bunch of windows closed. I assumed we’d be sharing a window not the whole desktop. Sorry Diane.

In any case it looks like the root case is the Drag N Drop I did a few months back to fix the Place Details problem I had. From what I can see it looks like all the media for a for a source got linked to every citation for a source. I have 529 media links to 694 citations, and another 512 media links to another 1189 citations. So that’s 367126 + 608768 = 975894 links. The bulk of my extraneous links, the issue is how to get rid of them.

TomH · June 25, 2024, 10:25pm

The blowup of the media links has nothing to do with Place Details. It is entirely due to having used Merge All Duplicate Citations when you had many Citations differing only in their one media item followed by a drag’n’drop which converted all the reuses of a Citation having many media items to individual Citations, each now with multiple media items.

Maybe you did the drag’n’drop on the advice of Support to try to solve your problem with Place Details and I fault RM for advocating drag’n’drop as a solution for anything because it is lossy and for letting MADC into the wild with the glaring oversight that it ignores differences in media and webtags. So your citations were already screwed up and only made worse by drag’n’drop.

You’ll never be able to reverse the mess. Best to restore from a backup made before MADC and redo everything you’ve done since except MADC and drag’n’drop.

keithcstone · June 25, 2024, 10:51pm

The only reason the blowup has to do with place details is it’s the only time I’ve done a Drag N drop.

I agree there’s no clean way to fix this, I’m going to lose some citations, but the links in question seem to be clustered on a few on them. No idea why all the media for a source got linked to every citation for that source, I need to fix this issue first then work on that.

Since I’m fairly good with backups I can likely go to before the DND and try it again just to see what happens,

One other possibility is going back to 11/23, grabbing all the updates into a GEDCOM like I did before from the current file and running it forward again. I still have the change date. Be a royal pITA, but I could reconstruct most of the citations from the synced Ancestry tree

keithcstone · June 25, 2024, 11:09pm

Likely the only “clean” way to do it is mark all the changes since 11/23, export a GEDCOM, do a drag and drop of the 11/23 database to a fresh one, mark all those changes as synced with Ancestry, load the GED of changes, and painfully sync them click by click to get current.

Still debating.

TomH · June 25, 2024, 11:39pm

I’ve explained this repeatedly but perhaps poorly so here’s an example:

Treeshare downloaded 100 Citations for one Source. Each Citation has one unique media item but they are from a Collection that produces no differences among them in their text fields.

Next, run Merge All Duplicate Citations (MADC). Now there is but 1 Citation (because the 100 were identical, text-wise) and it has the 100 different media links and is used 100 times. At this point, there is no relationship with which you can tell which media was the evidence for the fact the Citation is supposed to support. So the Citation is pretty useless and the Merge cannot be reversed.

Then drag’n’drop converts the 100 uses of the 1 merged Citation having 100 media links to 100 Citations each with 100 media links. That’s how the original 100 media links to 100 images downloaded from an Ancestry collection became 10,000 media links (100 links from each Citation to the 100 different media files).

Do you now see why I argue that you need to wind back to before you ran MADC? That said, I get that the Ancestry Sources which MADC mishandles might be less important to you than other work you have done since then and deleting them may be a tolerable loss if it means you can regain the functionality you need along with the more valuable data and evidence.

Topic		Replies	Views
Gedcom export fails RootsMagic question	8	234	December 29, 2023
RM9 create gedcom hangs after family count RootsMagic question , rootsmagic	7	317	April 17, 2023
RM8 Database File Size RootsMagic question , rootsmagic	4	349	January 8, 2022
Export data file (gedcom) errors RootsMagic question	19	764	November 1, 2021
Performance issue import gedcoms with large DB RootsMagic question	15	502	April 30, 2022

Excessive time for GEDCOM export

Related topics