Ancestry data is a no-go

Initially, I was very excited about the prospect of allowing the import of DNA data into Opus 23. After all, they use the same Illumina technology as 23andMe (although 23andMe apparently have their own unique chip.) Initial testing was promising. Like 23andMe, Ancestry supplies raw data in a basic tab-delineated text file. It’s in a slightly different format, but there as no problem parsing it. In fact Ancestry offered several possible advantages over 23andMe.

It retails for $99 USD, which used to be what 23andMe charged, until they were cleared by the FDA to supply some very basic health insight data, at which time they hiked the price of the test up to an almost extortionate $199 USD, because now they can tell you what color eye you might have and whether your earwax is soft or hard.

Ancestry SNPs are all reported with the ‘rs’ number. In order to cross-reference SNPs in any of the bioinformatics databases, we need their official id number, which is, as per dbSNP, referenced as ‘rs[the id number]’. For example the SNP main SNPs (C677T and A1298C) for the MTHFR gene are rs1801133 and rs1801131. 23andme uses a lot of internal SNP ids, which they prefix with an ‘i’. The internal references usually do indicate SNPs that otherwise have rs id numbers, and if you are dogged enough, you can usually get the proper rs id for an internal SNP, but they don’t make it easy.

Problems arose when it occurred to me that it would be prudent to cross-compare the Ancestry and 23andMe SNPs with the basic Opus 23 curated SNP database. Opus 23 accesses several SNP databases, but its own internal database is the jewel in the crown, hand-curated by our developers with special reference to clinical utilization and nutrigenomics significance. So I wrote a simple Perl script to do the work.

The results were depressing, to say the least. Roughly 35-40% of the SNPs in the Opus curated database that are reported by 23andMe are not reported by Ancestry. And some of these are biggies, like the SNPs that control secretor status on FUT2. This leads me to believe that the Ancestry DNA analysis is skewed towards genealogy determination (perhaps not surprisingly) and not health outcomes.