Source/Citation Transfer from RootsMagic to TreeShare for Ancestry

I understand it might be difficult to get Ancestry to make modifications to their API, but it’s worth a try.

If there is one thing that could be implemented from my wish list for TreeShare improvements, it would be that the length of the citations be increased to say 1,000 characters. I think some of the other shortcomings can be dealt with. It would at least allow the important information to be transferred even though it wouldn’t be EE true form. I was able to add the full long citation to the Detail Comment under Citation Detail, text, media, etc. heading in RM8.

It transfers nicely to the Ancestry tree under Citation Information just below the Details in a separate section called Other Information. It is missing italics for one field, and I replaced the “>” character with the character “|”, which I have seen used to separate items in a website path. When I was testing, I found that the Other Information category could accept over 20,000 characters. I know, I went a little overboard. I was looking for a limit.

The only problem is that the truncated citation detail still shows up in the Ancestry tree. That information has to be left in the RM record so that reports will be created correctly.

I agree the character limit is an issue but I think it is an Ancestry issue. The exact same problem exists when you sync FTM 2019 with an Ancestry tree. In FTM 2019 they have two citation fields “Citation detail” and “Citation text”. The “Citation detail field goes to the same field in Ancestry as does the equivalent RM 8 field. FTM 2019 data gets truncated so you lose important information. Getting Ancestry to change anything is nigh on impossible having tried to get them to fix a bug with their IOS app. I share your frustration.

My initial suspicion was that the character limit is imposed by Ancestry. Doesn’t the RootsMagic team work with Ancestry regarding TreeShare, and can’t they propose changes to the API? It seems like a relatively straightforward task to increase the size of a text based data object. No data should be lost. As I mentioned earlier, some of the other text data objects in Ancestry trees allow over 20,000 characters to be transferred.

I suspect it has to do with how Ancestry implemented the GEDCOM standard. Each line in the GEDCOM standard has a character limit of 255 characters including white spaces. To populate a field in an application the application reading the data has to be programmed to read successive lines lines and concatenate the subsequent lines. I guess they made an assumption that the particular field would never have more than 255 characters. The same issue is prevalent in the Ancestry tree syncing feature in FTM 2019. In FTM 2019 the Citation detail field which goes to the same Ancestry field as RootsMagic’s data is concatenated if the data in that field is greater than 255 characters. The problem probably has something to with the historical ownership of FTM by Ancestry. The behemoth that is Ancestry is very unlikely to change for either RootsMagic or Mackiev the owners of FTM.

1 Like

I have no idea if ancestry’s problem is the 255 character limit per line in GEDCOM or not. But if it is, then they are doing a terrible job of understanding GEDCOM. That 255 character limit per line has nothing to do without how big a note can be transmitted in GEDCOM. For all practical purposes, the length of a note in GEDCOM is unlimited. That’s because a note can be encoded as multiple GEDCOM lines via judicious use of GEDCOM’s CONC and CONT tags.

These essentially are tags that let you continue a note. You have to be judicious in how the CONC and CONT tags are mixed and matched because that’s the GEDCOM mechanism for transmitting line breaks that are in the notes. An actual carriage return character will terminate a line of GEDCOM rather than being a new line in a note. That’s why separate CONC and CONT tags exist rather than having just one GEDCOM tag for continuing a note, and not all genealogy software processes the CONC and CONT tags in quite the same way.

1 Like

@mikeb
The issue of compatibility of RM sources and citations when interchanged with other systems predates TreeShare is a longstanding one. See
https://sqlitetoolsforrootsmagic.com/source-templates/#A-Trio-of-Templates
for some of my study when GEDCOM was the only channel between an Ancestry Tree and a personal database.

At heart, TreeShare is likely a subset of GEDCOM and the Ancestry database structure for Trees shares much in common with GEDCOM but adapted, interpreted and extended to support online features.

GEDCOM standards before 7 have no support for templates other than what corresponds closely, if not exactly, to RM 's Basic Book Template. But here’s the rub: RM converts all source types to its Free Form type on exporting to just two of the several fields for sources in GEDCOM, same for TreeShare. The irony is that TreeShare downloads in Ancestry Record Template which is a modified Basic Book Template. When RM uploads to Ancestry, it is likely converting it to FreeForm putting the sentence sans citation detail into Title and the latter’s field values concatenated into Page as it does for standard GEDCOM.

TomH

Thanks for the history. My main issues are for very long citations that follow the Evidence Explained layered format, where the first layer is the source title and the specific citation, the second layer is mainly identifying online database or image locations, and the third layer is for the source of the source. The layers are separated by semi-colons. Second and third layers are not needed for many book type sources. I believe the basic source information is transferred properly via TreeShare for many of my custom RM source templates being used. They seem to be using the long footnote portion of the sentence up to the first citation field. They add words from the template that are not part of the field values, but don’t handle font formatting for italics (also no bold or underline, although I’ve had no need for those). It’s the citation information that is concatenating all of the citation fields and separating them by semi-colons, which conflicts with the layered approach used by Evidence Explained. They also do not handle italics or extra words that are part of the citation sentence. It seems like if the same methodology is used for the citation as for the source information, that would make the citation on Ancestry look better. And of course, for long citations, they get truncated at 256 characters. Ancestry would need to increase the field length to accommodate them.

My interest is only in transferring my RM data to Ancestry, but if one is interested in downloading Ancestry trees into RM, they could always transfer the source in one long field and the citation in another, essentially making all Ancestry to RM sources free-form. I believe that is how it is working now.

Ancestry sources are downloaded in the Ancestry Record Template structure which is why we saw that template introduced when TreeShare was launched. It did not exist previously. See
https://sqlitetoolsforrootsmagic.com/ancestry-treeshare-impact/#New-Source-Templates-RM75-RM8

Set aside Evidence Explained and your concept of “layers”, at least insofar as there being more than two. GEDCOM and effectively all systems whose database structures are compatible with it have only two levels:
SOUR {SOURCE}:=
The initial or original material from which information was obtained. (e.g., the book)
PAGE {PAGE}:=
A number or description to identify where information can be found in a referenced work. (obviously, the page number in the book source)

An example from the GEDCOM 5.5.1 spec:

0 @1@ INDI
1 NAME Robert Eugene/Williams/
1 SEX M
1 BIRT
2 DATE 02 OCT 1822
2 PLAC Weston, Madison, Connecticut
2 SOUR @6@
3 PAGE Sec. 2, p. 45
3 EVEN BIRT
4 ROLE CHIL

0 @6@ SOUR
1 DATA
2 EVEN BIRT, DEAT, MARR
3 DATE FROM Jan 1820 TO DEC 1825
3 PLAC Madison, Connecticut
2 AGNC Madison County Court, State of Connecticut
1 TITL Madison County Birth, Death, and Marriage Records
1 ABBR VITAL RECORDS
1 REPO @7@
2 CALN 13B-1234.01
3 MEDI Microfilm

The lines
2 SOUR @6@
3 PAGE Sec. 2, p. 45
are the citation, combining the PAGE (Citation Detail) info with the SOUR (Master Source) info under
0 @6@ SOUR

All the softwares that produce sentences from these fields recognise the most common tags and make assumptions about the sentence construct. The ones in common use are Title, Author, Publisher, Publishing Place… for the Source plus Page for the citation. Evidence Explained was not written for software developers nor does it consider the constraints of GEDCOM; rather it is about recommended content and style for many different kinds of sources for authors to use.

So software developers are faced with trying to fit many different shapes of pegs into one round hole when they try to support both EE styles in their software’s reports and websites outputs and transport of the sources and citations to 3rd party systems using GEDCOM and its ilk (TreeShare). RM’s solution for export is to use only the TITL and PAGE tags, regardless of the template structure. That makes some sense but I wish they had left open the option for the Basic Book Format sources to be exported intact as that structure would be preserved by almost all software that I know of.

So imagine trying to take an EE template with a dozen arbitrarily named fields in the Master Source and a half-dozen in the Citation Detail surrounded by conditional switches with conditional and non-conditional text and punctuation and mapping that into the very limited set of tags in GEDCOM. Each source template would need to have its own export template to create something that would work well across many platforms. That is, a sentence template for the TITL tag and another for the PAGE tag. That has been requested long ago because RM has only the sentence template for its own reports, which you would still want. And so its solution was to export to TITL the sentence stripped of the Citation Detail field values and dependent text and punctuation and concatenate their values to the PAGE field. The other software then, hopefully, reassembles those as TITL, PAGE without extraneous stuff.

1 Like

My point regarding the layer concept is that the length of a multi-layer source/citation often increases beyond what is currently allowed in Ancestry trees. Both the source and the citation detail fields in Ancestry are limited to 256 characters. Therefore long citations created in RM get truncated. I can see that the construct of complex source templates would be difficult to program as part of TreeShare. I am not suggesting that additional layers be accommodated, but rather have the citation (or PAGE) include any additional layer information not part of the basic source (or TITL). If Ancestry allowed longer citations, that would solve the truncation problem.

It seems next to impossible to have Ancestry source/citations uploaded to RM with all the fields generally used in RM source template constructs. I like the way RM handles the footnote constructs for its own reports, but just wish the long footnotes would all transfer into Ancestry.

The length limitation is not just an issue with Ancestry Trees. It is an issue with GEDCOM and some other software as well. Arguably, it would be solved also if the software restricted data entry so that the output to a tag value (whether GEDCOM or TreeShare or FamilySearch Family Tree) was restricted to the maximum allowable length. But even Ancestry disregards that with its own generation of text for sources which can be truncated in the download via TreeShare or GEDCOM.

Within RM, the easiest place to apply that restriction would be on the FreeForm Footnote and Page fields but it should also be feasible to measure the aggregates from a templated source that would be exported individual tags or fields and prevent entry or give warning whenever it is exceeded.

Tom

On most any genealogy topic, Tom has more expertise than me so I’m reluctant to stick my neck out with a question. But here I go anyway.

It was a long time ago, but I did once upon a time spend some time studying the way GEDCOM deals with sources and citations. I came to the conclusion that there were four GEDCOM tags of interest. There were the obvious two, namely SOUR and PAGE which have been well discussed in this thread. But my conclusion was that you also had to deal with the ADDR and REPO tags to the extent that your sources and citations dealt with repositories.

With so many things online, I think the whole treatment of repositories needs to be re-examined. And even before things were online, was the real repository for a reel of census microfilm my local genealogy library or was it the National Archives? But in any case, it seems to me that footnote sentences sometimes include repository information that is outside the scope of GEDCOM’s SOUR and PAGE tags and which instead involves the ADDR and REPO tags.

To tell you the truth, I’m interested in reducing this down to just one GEDCOM tag - call it FOOTNOTE if you wish, even though that’s more than four letters. But I’m sure there must be a fatal flaw in my scheme.

Looking at the GEDCOM created from my RM file, it appears that the source that is being transferred through TreeShare to my Ancestry tree comes directly from the 0 @S1312@ SOUR, 1 TITL and the two continuation lines 2 CONC shown below. This matches the source portion of the long footnote in RM.

0 @S1312@ SOUR
1 ABBR Finland, Oulu, Kuusamo Church Records, births & baptisms, 1837-1855
1 TITL Kuusamo seurakunta [parish] (Kuusamo, Oulu, Finland), Syntyneiden luett
2 CONC elo [List of births], IC: 5, 1837-1855, (also includes marriages and de
2 CONC aths).

The citation appears to be coming from the PAGE tag for this source shown below, which is 548 characters excluding the 7 characters of 3_PAGE_. This was all on one line in the GEDCOM and was built from the individual fields in my citation separated by semi-colons.

2 SOUR @S1312@
3 PAGE birth / baptism entry for Johan Kallungi, 28 January 1846 / 20 February 1846, unnumbered pages arranged chronologically by baptism date; Kansallisarkisto [National Archives of Finland]; Digitaaliarkisto [Digital Archives]; browsable images; Digitaaliarkisto ||http://digi.narc.fi/digi/puu.ka; http://digi.narc.fi/digi/view.ka?kuid=5583057; downloaded; 24 September 2021; > Kuusamon seurakunta > Kuusamon seurakunnan arkisto > Syntyneiden ja kastettujen luettelot > Syntyneiden luettelo 1837-1855 (IC:5); image 77 of 215

What seems to be transferred to my Ancestry tree is shown below, which is truncated at 256 characters not counting the 7 characters of 3_PAGE_.

2 SOUR @S1312@
3 PAGE birth / baptism entry for Johan Kallungi, 28 January 1846 / 20 February 1846, unnumbered pages arranged chronologically by baptism date; Kansallisarkisto [National Archives of Finland]; Digitaaliarkisto [Digital Archives]; browsable images; http://digi.nar

It appears to me that the SOUR tag and the following TITL and CONC tags may be used to transfer the main source, and the PAGE tag is used for the citation by TreeShare. It would seem that Ancestry truncates the PAGE tag to 256 characters. To me it seems natural that the PAGE tag for the citation could be created by RM to follow the same logic as the SOUR, TITL and CONC tags. My thoughts are that if the PAGE tag size could be increased by Ancestry, then all of the additional information used to create long Evidence Explained source/citations could be added as part of the citation portion. Then everything could be transferred with SOUR and PAGE tags.

As a side note, the GEDCOM file has a FOOTNOTE tag which matches exactly with the long footnote template from my RM file. All of the field names are also found in the GEDCOM file.

1 NAME __Church Records (online images)
1 DESC Church records; church & series as lead element
1 CAT [EE, QC-7, p314]
1 FOOTNOTE <[Church_Author]>< ([Location])><, [RecordSeries]><, [RecordBookID]><, [
2 CONC RecordType]><, [ItemOfInterest]><; imaged in< [WebsiteCreator], > [W
2 CONC ebsiteTitle]
><, [ItemType]><, ([Permalink_URL] :>< [AccessType]>< [
2 CONC AccessDate])><, path: [Path]><, [Image_ID]><; citing [Citing]>.

@mikeb, yes, you’ve refreshed my memory and got what I described:

  1. RM transmits (exports) data more or less in accord with the GEDCOM 5.5.1 spec
  2. The specified maximum length of a GEDCOM line is 255 characters including the tag.
  3. For Sources, RM transmits with the TITL tag the Footnote sentence constructed from the template variables and sentence template but devoid of the Citation Detail fields and dependent text and punctuation. The TITLe tag value can be extended with CONCatenation and CONTinuation (new line) tags. In the case of FreeForm, the entire value of the Footnote field (not the Footnote sentence) is sent to TITL. The spec’d maximum length of the TITL value is in the ballpark of 4000 altho’ I cannot find the reference (the spec shows {1:M}). edit: 5.5.1 set no max; the gedcom.org 5.5.5 fork spec’d 4095
  4. For the citation, RM concatenates all of the values of the Citation Detail fields, separated by semi:colons, and appends the result to the PAGE tag (WHERE_WITHIN_SOURCE) whose value is spec’d limited to 1 line of 248 characters.

The limitations are a legacy of the specifications’ evolution from the earliest days of personal computers and the maintenance of some level of backwards compatibility. There may still be software in use whose database system has defined field or column widths determined by those limits. That is not the case for RM since v4 with SQLite but may have been in previous versions using a xBase engine. We know even less about Ancestry’s limitations.

So it is interesting that you found that RM exported a string 548 characters to GEDCOM as the PAGE value and Ancestry displayed that value truncated at 256. What we do not know is whether the truncation occurs because of a TreeShare specification that RM must abide by or because Ancestry’s Tree database is the constraint. You might try editing the citation in your Ancestry Tree to see if the value can be extended. I did this a couple of years ago but don’t remember and there have been many changes on that side in the interim.

What is also interesting is that the entire source-citation sentence could be accommodated by the TITL tag. Perhaps that is true also on the Ancestry Tree side and you might wish to try it and find the limit. If so, then it is another justification for the use of extremely split sources in RM. The footnote sentence is delivered complete, intact, exactly as generated in a RM report with very little risk of truncation (that would be a footnote longer than a page!).

Moving the mountain beyond 256 characters is not going to be easy. I asked Ancestry for it more than a decade ago. Easier and immediate is to adapt one’s practices to work within the limits.

And, re your side note, the FOOTNOTE tag is RM’s proprietary customisation of GEDCOM that only a few other softwares recognise. It is exported along with other custom tags when you check the option in the export dialog to include RM special features and is the means by which custom source template definitions can be transported between RM databases.

That is one of the major things GEDCOM 7 sets out to do. It will eliminate the CONCatenation tag as it would be unneeded. So when both Ancestry and RM start exporting and importing GEDCOM 7 files, there is a possibility that TreeShare constraints would also be lifted. Are we anywhere close to seeing GEDCOM 7 from either of them? I doubt it.

TomH, Thanks again for your detailed replies.

I did try editing the citation in my Ancestry Tree and determined that the value cannot be extended beyond 256 characters. I also tried it with the source and it also is limited to 256 characters. In a different string on the RM Community, someone suggested using the Detail Text Comments field in RM for the additional source/citation layer information. It shows up in the Ancestry tree under the heading “Other Information” below the “Detail” in “Citation Information”. I tried that and it worked. I even experimented with it and found that TreeShare could transfer upwards of 20,000 characters into this field. If Ancestry can allow long text strings in this field, it seems like they could probably allow it in the source and citation fields. The GEDCOM file for this test named the field “NOTE” as shown below. This contains 404 characters.

3 NOTE ; imaged in Kansallisarkisto [National Archives of Finland], Digitaalia
4 CONC rkisto [Digital Archives], digital images, (http://digi.narc.fi/digi/vi
4 CONC ew.ka?kuid=5583057 : downloaded 24 September 2021), path: Kuusamon seur
4 CONC akunta | Kuusamon seurakunnan arkisto | Syntyneiden ja kastettujen luet
4 CONC telot | Syntyneiden luettelo 1837-1855 (IC:5), image 77 of 215; citing O
4 CONC ulun maakunta-arkisto [Oulu Provincial Archive].

This is how it looks in the Ancestry tree:

There is an option within RM that appends the Detail Text Comments to the end of the footnote, so this can be done as a workaround, but it kind of defeats the advantages using the source templates to create complex footnotes. I am not sure how this transfers in TreeShare when uploading an Ancestry tree to RM. I haven’t tested it yet.

That’s a Report option, not available in TreeShare.

Thanks for confirming that the Source is also limited to 256. That was my hazy recollection. So TreeShare is not even supporting the GEDCOM Source-Record structure which allows a value for TITL to be extended across multiple lines.

Have you looked at what happens when you TreeShare download your uploaded sources to a new database? Apart from the loss of template structure, I think the same Source for different people will be transformed into independent sources (unless RM is auto-merging duplicate sources).

5.5.1 is a FamilySearch specification (drafted in 1999, promulgated as a Standard in Nov, 2019)
5.5.5 is a GEDCOM.ORG specification created by an independent group seeking to clean up the somewhat higgledy-piggledy, ambiguous, inconsistent and loaded with historical baggage predecessor. Published in Oct, 2019 - did that trigger the Nov release by FS of 5.5.1 as a Standard?

From page 20 of 5.5.5:

The GEDCOM 5.5.1 specification did not set maximum length for the records concerned. The
GEDCOM 5.5.5 specification sets their maximum length at either 248, 4095 or 32.767 (2^15-1)
code units. This is in deference to the old (GEDCOM 5.5.1) rule that GEDCOM records should
fit into a memory of less than 32KB. A Unicode application using UTF-16 internally will need no
more than a 64 KB buffer to hold the largest legal values.
item maximum
IND.DSCR <PHYSICAL_DESCRIPTION> 4095
SOUR.AUTH <SOURCE_ORIGINATOR> 248
SOUR.TITL <SOURCE_DESCRIPTIVE_TITLE> 4095
SOUR.PUBL <SOURCE_PUBLICATION_FACTS> 4095
SOUR.DATA.TEXT <TEXT_FROM_SOURCE> 32.767
SOUR.TEXT <TEXT_FROM_SOURCE> 32.767
NOTE <USER_TEXT> 32.767

GEDCOM 7.0 also seems non-explicit with respect to the limit on the length of a line value (although a superficial read might suggest it’s x10FFFF or 1MB) or the maximum payload spread over multiple CONTinuation lines. From page 15 of FamilySearch GEDCOM 7.0:

Previous versions limited the number of characters that could appear in
a tag, cross-reference identifier, and line-value. Those restrictions were removed
in version 7.0. The CONC pseudo-structure, which allowed line values to have a
shorter length restriction than payloads, was also removed

TomH,

I did a TreeShare download of my Ancestry tree into a new RM8 database as you suggested. The source type it created in the new RM8 file was an Ancestry Record with the Title and Source Name and Citation Name and Detail headings matching the corresponding Source Information Title and Citation information Detail in the Ancestry tree. The Author, Publish Place, Publisher, and Publish Date fields, all part of the Ancestry Record template, are all blank. Also, the Detail Comment in RM8 came directly from the Citation information Other information heading in the Ancestry tree.

However, when I print a Narrative Report from RM8, the created footnote has converted the entire Title field to italics and it added (N.p.: n.p., n.d.) to the end because the Publish Place, Publisher, and Publish Date fields are all blank. It also starts with a comma because there is no Author. And of course, none of the Citation Detail contains any italics formatting because that was stripped during the TreeShare transfer to Ancestry tree done originally. That is another TreeShare issue that probably requires modifications from the Ancestry side. And the RM report settings need to be modified to include the notes. But the main thing is, all of the information can be transferred to and from an Ancestry tree, just not formatted as cleanly as I had hoped. It also means modifying all potentially long RM source/citation templates to have one source field and one citation detail field. Then include what I call the 2nd and 3rd citation layers to the citation note field.

Yup. Old story from 2017 that the highly responsive development team has yet to do anything about. For an interim (how long is anybody’s guess now that 4 years have passed) workaround, see:
https://sqlitetoolsforrootsmagic.com/ancestry-treeshare-impact/#Correction-of-the-Flawed-Ancestry-Record-Source-Template-RM75-RM8

Tom,
Jean Paul Sartre once said that not making a decision is making a decision. Perhaps the development team uses this philosophy when deciding/not deciding which issues to respond to.

Rich