Coding/ Design

  • Vitamin B12, Secretor Status and Ancestry Raw Data

    The Opus 23 genetic interpretation software [1] checks for ABH secretor status as part of the hereditary genetics algorithms. Knowing whether your client is a secretor or non-secretor is important for many reasons, one being that non-secretors have increased levels of serum vitamin B12. This is a well known association: at least five GWAS studies found SNPs in FUT2 show the strongest statistical association with circulating vitamin B12 [2-6]. Two of these studies also report a 10-25% increase in circulating total vitamin B12 concentration in homozygotes for the common non-secretor alleles, as determined by the FUT2 genotype of the nonsense stop-gain mutation W143X, rs601338. The mechanism for this is however unclear.

    Previously theories have included the influence of H. pylori infection, which has been associated with vitamin B12 deficiency. [7,8] Some authors have proposed that FUT2 genotype can influence the extent to which H. pylori attaches to gastric mucosa and influences vitamin B12 absorption. [9, 10] This was refuted by a subsequent study in 2012, which found that secretor status as determined by FUT2 variation correlates with plasma vitamin B12 concentrations, but is independent of H. pylori serotype. [11]

    Chery et. al. Proposed that FUT2 genotype could affect the glycosylation status of another vitamin B12 transporter, gastric intrinsic factor (GIF) [12]. This was a small study however, and although a potential effect was observed on GIF secretion and glycosylation according to FUT2 rs601338 genotype, the GIF phenotypes of the FUT2 rs601338 GA heterozygotes more closely aligned with those of the non-secretor genotype (AA) than those with the secretor genotype (GG). It is currently unclear to what extent FUT2 genotype influences GIF secretion and thereby alters vitamin B12 concentration in the general population.

    It is important to note that all the above mentioned studies measured only total circulating vitamin B12, which does not distinguish the proportion of B12 bound to its two separate carrier proteins, transcobalamin and haptocorrin. Haptocorrin, also known as transcobalamin-1 (TCN1), is a glycoprotein produced by the salivary glands of the oral cavity in response to ingestion of food. This protein binds strongly to vitamin B12 in the mouth to protect it from the acidic environment of the stomach. Haptocorrin also circulates and binds approximately 80% of circulating B12, rendering it unavailable for cellular delivery by transcobalamin II. These carrier proteins carry significantly different quantities of vitamin B12 in blood, and have different biological properties: transcobalamin II delivers vitamin B12 to all tissues, while vitamin B12 carried by haptocorrin is ultimately returned to the gut. In a recent paper on a GWAS study by Velkova et. al. using The Trinity Student Study population of 2,524 subjects in Ireland, the authors hypothesized that the expression of functional FUT2 enzyme could influence total circulating vitamin B12 concentration by altering the glycosylation of haptocorrin. This is the first study to assess the relationship between ‘active’ B12, total B12 and the FUT2 secretor status variant. [13]

    The authors reported that FUT2 genotype influences the concentration of haptocorrin-bound vitamin B12 to a far greater extent than transcobalamin-bound vitamin B12. This is consistent with FUT2 exerting influence via its fucosylation function, as haptocorrin is a glycosylated protein and transcobalamin is not. They also suggest that FUT2 activity impacts the intra-organismal recycling of vitamin B12, not the absorption and assimilation of the vitamin from the diet.

    In both the H. pylori and the GIF models described above, FUT2 genotype would alter the pool of vitamin B12 absorbed from the gut. As vitamin B12 transported from the gut binds to transcobalamin in plasma, these models are not consistent with the data from Velkova et. al., which shows that FUT2 genotype influences the concentration of haptocorrin-bound vitamin B12 to a far greater extent than transcobalamin-bound vitamin B12.

    The connection between secretor status and B12 levels is consistent with FUT2 exerting influence via its fucosylation function on B12 carriers, as haptocorrin is a glycosylated protein and transcobalamin is not. It also suggests that FUT2 activity impacts the intra-organismal recycling of vitamin B12, not the absorption and assimilation of the vitamin from the diet. This could be the reason why the standard test for vitamin B12 has significant false positive and false negative rates: only ~20% of circulating vitamin B12 (holoTC) represents the “active” bioavailable form, meaning that the most commonly ordered clinical test for vitamin B12 mainly measures the holoHC, which could mask an existing vitamin B12 deficiency. When evaluating or confirming vitamin B12 deficiency, additional markers of vitamin B12-dependent enzyme activity such as methylmalonic acid (MMA) and total homocysteine are also problematic. FUT2 secretor status may therefore be useful when considering the overall B12 status of an individual, and non-secretors may appear to have falsely elevated serum total B12 when compared with active B12.

    Opus 23 handles a range of raw data files, however Ancestry.com and Genos data files do not include the rs601338 SNP, which denotes the non-secretor mutation when homozygous. When only these data files are loaded Opus 23 looks for another SNP on FUT2 that is reported in Ancestry.com and Genos raw data files, and which is in perfect linkage disequilibrium with rs601338. This will give you the client’s imputed secretor status, and therefore indications for interpreting serum vitamin B12 tests. Opus 23 also checks for another FUT2 non-secretor SNP found only in Asians and not in Caucasians when looking for secretor status.

    References: 

    1. Opus 23 Pro genetic analysis and reporting software by Dr P. D’Adamo www.opus23.com.

    2 .Hazra, A., Kraft, P., Selhub, J., Giovannucci, E.L., Thomas, G., Hoover, R.N., Chanock, S.J. and Hunter, D.J. (2008) Common variants of FUT2 are associated with plasma vitamin B12 levels. Nat Genet, 40, 1160-1162. PMID 18776911.

    3. Lin, X., Lu, D., Gao, Y., Tao, S., Yang, X., Feng, J., Tan, A., Zhang, H., Hu, Y., Qin, X. et al. (2012) Genome-wide association study identifies novel loci associated with serum level of vitamin B12 in Chinese men. Hum Mol Genet, 21, 2610-2617. PMID 22367966.

    4. Tanaka, T., Scheet, P., Giusti, B., Bandinelli, S., Piras, M.G., Usala, G., Lai, S., Mulas, A., Corsi, A.M., Vestrini, A. et al. (2009) Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am J Hum Genet, 84, 477-482. PMID 19303062.

    5. Grarup, N., Sulem, P., Sandholt, C.H., Thorleifsson, G., Ahluwalia, T.S., Steinthorsdottir, V., Bjarnason, H., Gudbjartsson, D.F., Magnusson, O.T., Sparso, T. et al. (2013) Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets. PLoS Genet, 9, e1003530. PMID 23754956.

    6. Hazra, A., Kraft, P., Lazarus, R., Chen, C., Chanock, S.J., Jacques, P., Selhub, J. and Hunter, D.J. (2009) Genome-wide significant predictors of metabolites in the one-carbon metabolism pathway. Hum Mol Genet, 18, 4677-4687. PMID 19744961

    7. Kaptan, K., Beyan, C., Ural, A.U., Cetin, T., Avcu, F., Gulsen, M., Finci, R. and Yalcin, A. (2000) Helicobacter pylori–is it a novel causative agent in Vitamin B12 deficiency? Arch Intern Med, 160, 1349-1353. PMID 10809040.

    8. Carmel, R., Perez-Perez, G.I. and Blaser, M.J. (1994) Helicobacter pylori infection and food-cobalamin malabsorption. Dig Dis Sci, 39, 309-314. PMID 8313813.

    9 Ikehara, Y., Nishihara, S., Yasutomi, H., Kitamura, T., Matsuo, K., Shimizu, N., Inada, K., Kodera, Y., Yamamura, Y., Narimatsu, H. et al. (2001) Polymorphisms of two fucosyltransferase genes (Lewis and Secretor genes) involving type I Lewis antigens are associated with the presence of anti-Helicobacter pylori IgG antibody. Cancer Epidemiol Biomarkers Prev, 10, 971-977. PMID 11535550.

    10 Magalhaes, A., Rossez, Y., Robbe-Masselot, C., Maes, E., Gomes, J., Shevtsova, A., Bugaytsova, J., Boren, T. and Reis, C.A. (2016) Muc5ac gastric mucin glycosylation is shaped by FUT2 activity and functionally impacts Helicobacter pylori binding. Sci Rep, 6, 25575. PMID: 27161092.

    11. Oussalah, A., Besseau, C., Chery, C., Jeannesson, E., Gueant-Rodriguez, R.M., Anello, G., Bosco, P., Elia, M., Romano, A., Bronowicki, J.P. et al. (2012) Helicobacter pylori serologic status has no influence on the association between fucosyltransferase 2 polymorphism (FUT2 461 G->A) and vitamin B-12 in Europe and West Africa. Am J Clin Nutr, 95, 514-521. PMID 22237057.

    12. Chery, C., Hehn, A., Mrabet, N., Oussalah, A., Jeannesson, E., Besseau, C., Alberto, J.M., Gross, I., Josse, T., Gerard, P. et al. (2013) Gastric intrinsic factor deficiency with combined GIF heterozygous mutations and FUT2 secretor variant. Biochimie, 95, 995-1001. PMID 23402911.

    13. Velkova A, Diaz JEL, Pangilinan F, et. al; The FUT2 secretor variant p.Trp154Ter influences serum vitamin B12 concentration via holo-haptocorrin (holoHC), but not holo-transcobalamin (holoTC), and is associated with haptocorrin glycosylation, Hum Mol Genet, Volume 26, Issue 24, 15 December 2017, Pages 4975–4988. PMID 29040465.

    Facebooktwitterredditpinterestlinkedinmail
  • Opus 23 now supports multiple platforms

    The recent change in the reporting done by 23andMe from the V4 to V5 chip has thrown things into a bit of a dither. The earlier V4 SNP array was more robust, at least with SNPs of interest to those who work in nutrigenomics. For example, V4 reported over ten MAO SNPs of nutritional interest, whist V5 reports none. To circumvent the problem, I’ve recoded Opus 23 to allow the clinician to upload, singly or in combination, data files from 23andMe (V3, V4, V5), Ancestry DNA and the ‘Export to Promethease’ file available from Genos. To move Opus in this direction required a lot of recoding and I thank all our users for their support and patience.

    The first time you load an existing client profile into Opus it will take a bit longer to process the file. This is because they are being upgraded to the new data storage system. After that they should load as usual. Manage->Profiles->Append Raw Data to Current Client will take you to the BLENDER app,which allows you to merge raw data files. This will only be important as people begin to use Ancestry DNA, perhaps in combination with 23andMe V5. Since almost everyone currently in Opus is 23andMe V4 you really don’t need to do anything.

    The ‘Upload New Client Raw Data’ script has been extensively re-written. You still upload a ZIP file, but the script will identify the platform (V3/V4, V5, Ancestry DNA) and let you know. It also now features and extra screen so that you can verify/validate your form input before doing the final upload. Hopefully this will cut down on people contacting us having uploaded the same client twice.

    Uploading and merging  V5 and Ancestry DNA client data have about 74% of Opus-curated snps, while the prior V4 has about 79% coverage.

    If you do upload Ancestry DNA data, be advised that Ancestry names its raw data files in a non-unique manner, usually something like ‘dna-data-2017-09-03.zip’. This blunts the ability of the program to warn you that you are using the same data file on two different clients. You should rename the client raw data ZIP file on your hard drive to something unique (we recommend replacing Ancestry DNA filename with the client’s first and last initials and date of birth; in this case ‘dna-data-2017-09-03.zip’ might become ‘MG-11-22-1956.zip.’ But you can use any system you wish as long as each uploaded filename is unique.

    It looks like the best short term solution will be to have the client do BOTH 23andme V5 and Ancestry.  Opus 23  now allows you to sequentially upload the raw data and merge it. We will eventually move towards a dedicated chip. However this change from v4 to v5 caught everyone (not just Opus/Datapunk) flat-footed as to the huge drop in clinically significant SNPs that are reported in v5. Even in the best of circumstances it will be weeks and months until a specialized chip will become available. However, in the meantime, piggybacking 23andMe v5 with Ancestry DNA appears to be not all that bad of a temporary fix. Many of these SNP panels are having significant price drops, so having the client do bot 23andMe V5 and Ancestry DNA should not be prohibitively expensive.

     

    In Other News

    You can now compare V5, V4, Genos Promethease export, and Ancestry data as compared to the core 2600 Opus snps. Just log in, click the ‘Informatics’ pull down, the select ‘Tools/ Extras’ and “Platform Comparisons’. Table is searchable, sortable and filterable.

    Facebooktwitterredditpinterestlinkedinmail
  • Decoding 23andMe ‘i’ Numbers

    23andMe currently reports over 600,000 SNPs in the genome explorer, which are analyzed by their custom 2014 v4 chip. The process used is genotyping, rather than sequencing. The former is cheaper and quicker, and targets specific parts of the genome that are known to have variants in some or many people; the latter is used to find out the code of nucleotide base pairs in a sequence (or continuous stretch) of DNA, the exome (the coding part of DNA), or all the DNA in the whole genome.

    Genotyping does not report on all possible insertions or deletions. In general, it only reports small changes, spanning only one or a few bases. Sequencing will check whether all the DNA code in a region is found in the usual configuration or whether there are any unknown insertions or deletions.

    23andMe doesn’t test for all the SNPs they report on, but might impute variants present on larger chips or in sequencing analysis, using a statistical method that allows researchers to fill in missing data. This may be the reason 23andMe say “This data has undergone a general quality review, however only a subset of markers have been individually validated for accuracy.” [1]

    An example of this might be RhD blood group status: If you have a double deletion (DD) at “i4001527” you are RhD negative, if you don’t have the double deletion (DI or II) you are Rh positive. This number is available from a search in the 23andMe explorer, but is not found in the raw data can be downloaded in an ASCII text file and used for uploading to Opus23 Pro.

    Most of the numbers representing SNPs in the 23andMe raw data begin with ‘rs’, which are reference SNP identifiers, or reference SNP cluster IDs. [2] These rsIDs are assigned and managed by dbSNP, the official database for short genetic variations. However some numbers in the 23andMe raw data begin with ‘i’, which is an internal number assigned by 23andMe for testing locations on the genome for various reasons. This includes SNPs where the probes used differ from the reference sequence.[3] Some ‘i’ numbers are SNPs that don’t have rsIDs: 23andMe maps the i-number to the chromosome position, and internally they map this number to anything else they need to know about the SNPs to put it on a chip (many of these SNPs come from the custom portion of the genotyping array). Other ‘i’ numbers relate to SNPs that could highlight a genetic mutation in a user which is related to significant health risks or genetic conditions. The FDA don’t want users to be able to find out that they have these problems without genetic counselling, except for under specific circumstances where the user has made a declaration that they understand the consequences of accessing this data and what it might mean. The FDA are currently seeking medical opinion on situations where genetic test results might be available directly to the user. Comments can be submitted online  to the FDA by March 31st 2016. All submissions must include reference to: “Docket No.  FDA-2015-N-4809 for `Patient and Medical Professional Perspectives on the Return of Genetic Test Results; Public Workshop; Request for Comments.’”

    How does Opus23 Pro deal with ‘i’ numbers?

    Opus23 Pro curators use the genomic location linked with the coded ‘i’ numbers to find the rsID (if one exists), and if relevant, the ‘i’ numbers are added to the Opus23 Pro SNP database, and a lookup is performed by the software when analysing a client’s raw data. The ‘i’ numbers are linked with the rsID in the software, and this gives the practitioner a reference for further research in published medical literature. Any significant genetic risk factors can be added to the client report and explained to the patient, along with genetic counselling as necessary.

    References:

    1. Web page: “How 23andMe Reports Genotypes” https://customercare.23andme.com/hc/en-us/articles/212883677-How-23andMe-Reports-Genotypes.  Accessed 3/5/16
    2. The NCBI Handbook [Internet]. 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013-. Accessed 3/5/16

    3. 23andMe forum “23andMe upgrading to NCBI Build 37 coordinates on Aug. 1” https://www.23andme.com/you/community/thread/14308/6/ Accessed 3/5/16
    Facebooktwitterredditpinterestlinkedinmail
  • Strobing the tissues

    Opus 23 provides many unique opportunities for data integration and visualization. One app that I’ve just been added to the Opus toolbox is STROBE, a new Opus analytic app that allows you drill-down client genomic data by organ, tissue or cell distribution. To do this Opus mashes up its own internal SNP data with gene tissue expression data derived from a variety of public databases. I then sub-organized the tissue expression data by system (immune, cardiovascular, etc) so that the user could filter by clinical relevance.

    Once STROBE is fired up you’re presented with the screen depicted above. It is a typical squarified heat-map rendered in Highcharts. Values are assigned by virtue of the aggregate power factors of client SNP mutations number of genes associated with that tissue. Clicking on any tissue brings up a modal popup with a Manhattan-type gene distribution display.

    strobe1

    Y axis shows cumulative power factors of SNPs testing positive for that gene. The X axis is rough approximation of gene locus position. Yellow points indicate genes with exclusively heterozygous mutations. Orange points denote genes that contain homozygous mutations. Drag a rectangle around any area to zoom into that part of the map. Use the ‘Reset Zoom’ to return to full zoom.

    Clicking on any data point to bring up the information screen on that gene. Here you can notate, or even move on to examining SNPS, agents and algorithms associated with the gene.

    strobe2

    From them main screen you can use the select pull-down menu to limit tissues to specific systems. Here we limit the display to tissues, organs and cells of the immune system:

    strobe0a

    Facebooktwitterredditpinterestlinkedinmail
  • Psychic

    The Opus 23 PSYCHIC app allows you to search for natural products known to control gene expression. However, unlike a simple search engine, PSYCHIC is able to crawl up and down the molecular ‘Interactome’ (protein-protein interactions and gene expression data) to determine the upstream and downstream genes that interact with the gene you’ve searched for. In addition PSYCHIC allows you to chose which type of natural products (agonists/ antagonists) to include in the upstream and downstream results.

    As seen above, when the PSYCHIC screen loads you will be presented with the results of the default search term for the current client, the MTOR gene. The main infographic is comprised of a bar graph divided into two halves. The left half displays the upstream results, while the right half displays the downstream results, based on MTOR’s position in the interactome. The labels along the x-axis display the various natural products and their gene targets PSYCHIC has found that meet the search criteria. The y-axis value of each bar in the graph is determined by the evidence basis and strength of the position in the network for the gene depicted.

    At the bottom is a small half-pie chart depicts the SNPs for that gene contained in the Opus 23 Pro database.

    psychic2

    You can set filters on each half of the graph to limit results to a specific type (agonism or antagonism) by selecting an option from the pull down menu below. There are four options:

    • Inhibit/ Drain: This will tell PSYCHIC to return all upstream antagonists and downstream agonists
    • Inhibit/ Bottleneck: This will tell PSYCHIC to return all upstream antagonists and downstream antagonists
    • Stimulate/Drain: This will tell PSYCHIC to return all upstream agonists and downstream agonists
    • Stimulate/ Bottleneck: This will tell PSYCHIC to return all upstream agonists and downstream antagonists

    To select a gene to run in PSYCHIC, simply begin typing in its gene symbol in the text input field; PSYCHIC will auto-complete the entry with any genes for which it has data. If multiple options are displayed, simply select the gene you wish to analyze.

    When you’re ready, press the ‘Run Psychic’ button to have PSYCHIC run results.

    PSYCHIC uses highcharts.js for its data depiction, the CPAN Perl module graph.pm for creating the abstract data structures, the PPI (protein-protein interactions) database, and Opus 23’s own internal agent/gene expression database of PubMed citations.

    Facebooktwitterredditpinterestlinkedinmail
  • Agency

    The Agency app in Opus 23 takes pharmacogenomics to the next level. Relying on the extensive Opus 23 database of published research detailing gene expression data linked to natural products, Agency provides a visual representation of their interactions web. Multiple agents can be displayed, allowing the clinician to synthesis multi-target strategies.

    In the image above we see the expression pattern for co-enzyme Q10, a powerful anti-oxidant. Co-enzyme Q10 (COQ10) is the light green node in the center. The tan nodes surrounding it are the genes that interact with COQ10: Red edges (connecting lines) with a ‘T-bar’ indicate that COQ10 inhibits the expression of that gene. Green edges with an arrow indicate that COQ10 enhances the expression of that gene. Genes with a reddish color indicate that they may have compromised function in the current active client in Opus 23. Other agents with especially high abilities to influence the expression of any genes associated with COQ10 are at the periphery of the map and colored gray.

    A few examples that can be gleaned from the map:

    • The ability of COQ10 to increase superoxide dismutase (SOD1) might be enhanced by concurrent administration of Silymarin and Pycnogenol
    • The ability of COQ10 to antagonize interleukin 6 (IL6) might be enhanced by concurrent administration of omega 3 fatty acids
    • The effect of COQ10 to antagonize vascular endothelial growth factor A (VEGFA) might be enhanced by concurrent administration of wogonin (Scutellaria biacalensis), honokiol (Magnolia officinalis) and matrine (Sophora flavenscens). Notice that this gene is compromised in the current client.

    Like virtually every data depiction in Opus 23, clicking on any node brings up the information popup for that entity: clicking on an agent node brings up its expression data (with links to PubMed citations) while clickin on any gene brings up its genomic data any relevant SNP data associated with that gene.

    In addition to its network (web) depiction, any natural product can have its gene expression data depicted as a polar chart.

    agency2

    The polar chart format display the gene expression data as orange if the evidence is suggestive of an antagonistic effect, or green if the effect is agonistic. The strength of the evidence is computed from the sum total of the evidence, scaled by the type of experimental subject (in vitro, animal or human study).

    Facebooktwitterredditpinterestlinkedinmail
  • SuperMogadon

    SuperMogadon is a highly flexible search and sort tool that allows you to easily compare the client’s genotype with results from Genome Wide Association Studies* (GWAS) through the Opus 23 Pro database.

    As you can see there are over 50+ pages of GWAS data in SuperMogadon, which would make grinding through the data rather impractical. Like most data displays in Opus 23 that deal with large amounts of data, SuperMogadon features a ‘filterable’ display. Type full or partial search terms into the search box at the upper left hand corner and SuperMogadon immediately displays only those results.

    Click on the graph icon to display SNP distribution for that pathology or trait as a Manhattan Plot. Click on any column title to sort by that column.

    SuperMogadon Index Page
    SuperMogadon Index Page

    Clicking on the blue graph icon in the ‘Show Plot’ column will launch the SuperMogadon Manhattan plotter for that disease/trait.

    TheSuperMogadon Viewer is a GWAS Manhattan plotter, a type of scatter chart used to display data with a large number of data-points. Genomic SNP coordinates (marked by chromosome) are displayed along the X-axis, with the negative logarithm of association P-value for the disease or pathology displayed on the Y-axis. Because the strongest associations have the smallest P-values (e.g., 10 −15), their negative logarithms will be the greatest (e.g., 15). In the example above, we see the graph for ‘Type II diabetes’ as a GWAS Manhattan plot.

    Client SNP genotypes results are shape and color-coded:

    • Gray-colored points denote client SNPs that do not contain the risk allele
    • Orange-colored points denote that the client is homozygous for the risk allele
    • Yellow-colored points denote that the client is heterozygous for the risk allele
    • Square-shaped points signify that the SNP is in the GWAS and Opus 23 Pro databases and when clicked will trigger an information pop-up
    • Circle-shaped points signify that the SNP is not in the Opus 23 Pro database but is in the GWAS database and is reported by 23andMe. When clicked these SNPs will bring up its GWAS PubMed reference article

    Drag-select to zoom section or use the scroller at the bottom. Hover over any point to learn more. Clicking on any point triggers a full-information popup window. Like any other element in Opus 23 Pro, you can notate the SNP in SuperMogadon Viewer by clicking on the link to bring up the information pop-up, then clicking the ‘Add/Edit Note’ button at the top of the pop-up screen. You can also send any popup element directly to curation (so that it shows up in the Client Report.)

    Opus 23 Pro subscribes to the philosophy of ‘TMTOWTDI’ (There’s more than one way to do it, pronounced ‘Tim Toady’.) The program was designed with this idea in mind, in that it ‘doesn’t try to tell the physician how to parse the data.’ Rather, it presents many different frameworks and cross-sections of the available client data, using a myriad of infographic treatments. This (and our ability as a species to excel at pattern recognition) dramatically increases the odds that a noteworthy finding will not go undiscovered.


    * In genetic epidemiology, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWASs typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.

    Facebooktwitterredditpinterestlinkedinmail
  • Ancestry data is a no-go

    Initially, I was very excited about the prospect of allowing the import of ancestry.com DNA data into Opus 23. After all, they use the same Illumina technology as 23andMe (although 23andMe apparently have their own unique chip.) Initial testing was promising. Like 23andMe, Ancestry supplies raw data in a basic tab-delineated text file. It’s in a slightly different format, but there as no problem parsing it. In fact Ancestry offered several possible advantages over 23andMe.

    It retails for $99 USD, which used to be what 23andMe charged, until they were cleared by the FDA to supply some very basic health insight data, at which time they hiked the price of the test up to an almost extortionate $199 USD, because now they can tell you what color eye you might have and whether your earwax is soft or hard.

    Ancestry SNPs are all reported with the ‘rs’ number. In order to cross-reference SNPs in any of the bioinformatics databases, we need their official id number, which is, as per dbSNP, referenced as ‘rs[the id number]’. For example the SNP main SNPs (C677T and A1298C) for the MTHFR gene are rs1801133 and rs1801131. 23andme uses a lot of internal SNP ids, which they prefix with an ‘i’. The internal references usually do indicate SNPs that otherwise have rs id numbers, and if you are dogged enough, you can usually get the proper rs id for an internal SNP, but they don’t make it easy.

    Problems arose when it occurred to me that it would be prudent to cross-compare the Ancestry and 23andMe SNPs with the basic Opus 23 curated SNP database. Opus 23 accesses several SNP databases, but its own internal database is the jewel in the crown, hand-curated by our developers with special reference to clinical utilization and nutrigenomics significance. So I wrote a simple Perl script to do the work.

    The results were depressing, to say the least. Roughly 35-40% of the SNPs in the Opus curated database that are reported by 23andMe are not reported by Ancestry. And some of these are biggies, like the SNPs that control secretor status on FUT2. This leads me to believe that the Ancestry DNA analysis is skewed towards genealogy determination (perhaps not surprisingly) and not health outcomes.

    Facebooktwitterredditpinterestlinkedinmail
  • Microbiome mashup

    There has been an explosion of interest in the microbiome. Outfits such as uBiome has made it relatively inexpensive and easy to have your microbiome profiled. These services extract the bacterial DNA out of the sample and identify each of the bacteria that the DNA came from.

    There appear to be some limitations with the technology. I’ve been told by sources whom I consider informed that uBiome is not that accurate the deeper into phylogeny. Genera data may only be 40-60% reliable by some estimates. So while the major distinctions such as phylum and class may be reliable, drilling down to the precise distribution amounts of particular species may not be so helpful.

    Nonetheless there may be some advantages to importing microbiome data into Opus 23 Pro. From a research perspective we’d have the benefit of cross-comparing genomic data with microbiome data, and the ability to perhaps correlate dietary changes based on genomic analysis with progressive changes in sequential microbiome samples.

    One advantage is the ease of working with uBiome raw data. The most basic raw data you can download is a simple JSON data file that usually runs about 30K in size, so we’re not talking about any sort of server stress. This data is straightforward enough to parse. The image above was generated out of some basic uBiome data and ported to a visualization script (D3.js) to produce a sunburst information distribution.

    The same data ported to a dendrogram based on taxonomic distribution:

    dendrog

    Although this feature will probably not ship with Opus 23 Pro when it hits the pavement in January, I’ll probably add this microbiome tracking ability sometime shortly afterwards.

    Facebooktwitterredditpinterestlinkedinmail
  • Pearl of an idea

    Like most of my coding projects Opus 23 Pro is written primarily in the Perl scripting language. Perl handles most of the basic server-side functions (like disk reading, etc) and sends its output out to the browser via HTML, Javascript, etc.

    Perl is widely used in bioinformatics, and has been nicknamed “the Swiss Army chainsaw of scripting languages” because of its flexibility and power, and has also referred to as the “duct tape that holds the Internet together.” The Perl language borrows features from other programming languages, most significantly C++. It has powerful and unsurpassed text processing facilities, one of the reasons it saw major use during the development of the Human Genome project.

    Why, despite the protestations of my daughter that I should move to Python, I continue to code in Perl:

    • It is universal.
    • It is robust. Perl has an amazing library of existing modules that perform a variety of functions. If you can think of a task, CPAN (the Comprehensive Perl Archive Network) probably contains a module that will spare you the job of having to re-invent the wheel
    • It has strong bioinformatics roots. Perl was than computer language credited with ‘saving’ the Human Genome Project. Perl also has an extraordinary library of existing bioinformatics modules (BioPerl). Perl has a robust library of Application Programming Interfaces (APIs) that interface with the National Center for Biotechnology Information (NCBI) server, including access to PubMed, MeSH, etc.
    • It has a great didactic heritage. Perl has an enormous library of books, ranging from ‘Perl for Dummies’ to advanced bioinformatics textbooks

    One of the great qualities of Perl is its ability to do any one thing in any number of ways. According to it’s creator Larry Wall, Perl has two slogans. The first is “There’s more than one way to do it”, commonly known as TMTOWTDI. The second slogan is “Easy things should be easy and hard things should be possible.”

    Thanks to Perl, Opus 23 Pro has been made possible.

    Facebooktwitterredditpinterestlinkedinmail