Monthly Archives: December 2015

  • MoboCaster

    MoboCaster is an Opus 23 Pro informatics app that performs scenario-specific genomic analysis since, as clinicians, that’s pretty much how we think about genomics.

    The browser screen (above) in MoboCaster lists an overview of several genomic scenarios, such as the HPA Axis, Oxalate Genomics, Phase I Detoxification, etc. that display a heat map of colored boxes representing potentially problematic genes for the current client in this scenario. The size and color of the box indicates the significance of the gene in relation to others in this scenario: The larger and darker the box, the more problematic a gene may be for this client. The genes with curated SNPs with the highest power factor assigned by Opus 23 Pro editors will show up in a darker blue color and have a larger sized box. A bar below each scenario gives a key to the relative value assigned to each color. Hovering your mouse cursor over a box will move the arrow along color key to indicate the aggregated power factor of that gene, as well as the number of SNPs used to calculate this value.

    Clicking on any box will open a pop-up with information about the gene and the SNP status for the active client.


    If your client has all homozygous negative SNPs for a gene in a MoboCaster scenario, that gene will not appear in the MoboCaster overview. You can display the MoboCaster scenarios as either a heatmap (above) or as a polar chart:


    Clicking on the name of the scenario open it up in the MoboCaster.


    This links to a page with a description of the scenario and details of all the SNPs and their values in this scenario, listed in alphabetical order of the gene symbols. Clicking on a symbol from the list at the top of the page will take you directly to that gene and its description.

    If the gene is also listed in any maps or other Mobocaster scenarios or has associated natural products, these will be listed here with a hyperlink. The description of the gene may be curated to be specific to this scenario: to find out general information about the gene click on the gene symbol link to open the gene pop-up. Clicking on the SNP rs ID link will open a SNP pop-up.

    Some scenarios may examine only specific SNPs in a gene, for example, in the Phase I Detoxification scenario CYP1A2 lists only rs762551, as this SNP is known to be a high-inducibility variant. In this situation clicking on the link to CYP1A2 will open a pop-up giving you information about the other CYP1A2 SNPs for the current client.

  • Ancestry data is a no-go

    Initially, I was very excited about the prospect of allowing the import of DNA data into Opus 23. After all, they use the same Illumina technology as 23andMe (although 23andMe apparently have their own unique chip.) Initial testing was promising. Like 23andMe, Ancestry supplies raw data in a basic tab-delineated text file. It’s in a slightly different format, but there as no problem parsing it. In fact Ancestry offered several possible advantages over 23andMe.

    It retails for $99 USD, which used to be what 23andMe charged, until they were cleared by the FDA to supply some very basic health insight data, at which time they hiked the price of the test up to an almost extortionate $199 USD, because now they can tell you what color eye you might have and whether your earwax is soft or hard.

    Ancestry SNPs are all reported with the ‘rs’ number. In order to cross-reference SNPs in any of the bioinformatics databases, we need their official id number, which is, as per dbSNP, referenced as ‘rs[the id number]’. For example the SNP main SNPs (C677T and A1298C) for the MTHFR gene are rs1801133 and rs1801131. 23andme uses a lot of internal SNP ids, which they prefix with an ‘i’. The internal references usually do indicate SNPs that otherwise have rs id numbers, and if you are dogged enough, you can usually get the proper rs id for an internal SNP, but they don’t make it easy.

    Problems arose when it occurred to me that it would be prudent to cross-compare the Ancestry and 23andMe SNPs with the basic Opus 23 curated SNP database. Opus 23 accesses several SNP databases, but its own internal database is the jewel in the crown, hand-curated by our developers with special reference to clinical utilization and nutrigenomics significance. So I wrote a simple Perl script to do the work.

    The results were depressing, to say the least. Roughly 35-40% of the SNPs in the Opus curated database that are reported by 23andMe are not reported by Ancestry. And some of these are biggies, like the SNPs that control secretor status on FUT2. This leads me to believe that the Ancestry DNA analysis is skewed towards genealogy determination (perhaps not surprisingly) and not health outcomes.

  • Microbiome mashup

    There has been an explosion of interest in the microbiome. Outfits such as uBiome has made it relatively inexpensive and easy to have your microbiome profiled. These services extract the bacterial DNA out of the sample and identify each of the bacteria that the DNA came from.

    There appear to be some limitations with the technology. I’ve been told by sources whom I consider informed that uBiome is not that accurate the deeper into phylogeny. Genera data may only be 40-60% reliable by some estimates. So while the major distinctions such as phylum and class may be reliable, drilling down to the precise distribution amounts of particular species may not be so helpful.

    Nonetheless there may be some advantages to importing microbiome data into Opus 23 Pro. From a research perspective we’d have the benefit of cross-comparing genomic data with microbiome data, and the ability to perhaps correlate dietary changes based on genomic analysis with progressive changes in sequential microbiome samples.

    One advantage is the ease of working with uBiome raw data. The most basic raw data you can download is a simple JSON data file that usually runs about 30K in size, so we’re not talking about any sort of server stress. This data is straightforward enough to parse. The image above was generated out of some basic uBiome data and ported to a visualization script (D3.js) to produce a sunburst information distribution.

    The same data ported to a dendrogram based on taxonomic distribution:


    Although this feature will probably not ship with Opus 23 Pro when it hits the pavement in January, I’ll probably add this microbiome tracking ability sometime shortly afterwards.

  • Protocol development

    Two years ago I developed a software app called SkySaw for use on my teaching shift at the COEGM. SkySaw allows clinicians to structure patient encounters as a linked network (technically a directed acyclic graph). What made this attractive was that these individual networks could be connected together into a great network. One could then use graphing tools to data-mine relationships, trends and outcomes.  As the first stage of Opus23 Pro moves to completion, I decided to port this app over to Opus.  What follows is the online doc file for PROTO, the Opus 23 app.


    PROTO allows you to develop a flow-chart (network) based approach to health protocol management. Creating a protocol network in PROTO is easy. Protocols can then be added to the client report, allowing for a more client-friendly way of relating your clinical decisions.

    Actions are classified by various roles you can assign any node:


    Just choose and option from the menu at the top right-window:

    From left the right, the options are:

    • Reset the screen: Ticking this will reset the screen and zoom level. You can zoom in and out by using the scroll function of your mouse. You can move the network around by click-grabbing the network and moving your mouse.
    • Back to protocol dataset list: Ticking this will return you to the list of datasets for the various protocols you’ve created
    • Add a new node (plus sign): Ticking this will launch a popup window that allows you to add a new node to the network. You then give the node a name and a type (food, drug, molecular target, lab test, etc.) and if you wish provide a short bit of accompanying text.
    • Add a new edge (two connected nodes): Ticking this will launch a popup window that allows you to add a new edge to the network. Edges connect nodes. The input fields will autosuggest nodes to use based on the existing nodes in the network.
    • Info screen: Ticking this will launch this popup window.
    • Active dataset indicator: This icon shows if the protocol you are working on is the active dataset (i.e this is the protocol that will go on to be included in the client report, if you desire.)

    Editing nodes and edges: You can edit any node of edge in the map by clicking on it. This will launch the appropriate popup that will be populated with the existing data. You can make any changes and then save.

    Map direction: The map is designed to proceed in its development from left to right.

    Adding nodes on the fly: Certain node types (molecular targets, agents and foods) can be added from their own information popups. For example, clicking on any gene symbol in Opus 23 Pro launches the gene information popup for that gene. Clicking on the ‘add to protocol’ icon

    will add that gene to the current protocol as a ‘molecular target’ node.


  • Pearl of an idea

    Like most of my coding projects Opus 23 Pro is written primarily in the Perl scripting language. Perl handles most of the basic server-side functions (like disk reading, etc) and sends its output out to the browser via HTML, Javascript, etc.

    Perl is widely used in bioinformatics, and has been nicknamed “the Swiss Army chainsaw of scripting languages” because of its flexibility and power, and has also referred to as the “duct tape that holds the Internet together.” The Perl language borrows features from other programming languages, most significantly C++. It has powerful and unsurpassed text processing facilities, one of the reasons it saw major use during the development of the Human Genome project.

    Why, despite the protestations of my daughter that I should move to Python, I continue to code in Perl:

    • It is universal.
    • It is robust. Perl has an amazing library of existing modules that perform a variety of functions. If you can think of a task, CPAN (the Comprehensive Perl Archive Network) probably contains a module that will spare you the job of having to re-invent the wheel
    • It has strong bioinformatics roots. Perl was than computer language credited with ‘saving’ the Human Genome Project. Perl also has an extraordinary library of existing bioinformatics modules (BioPerl). Perl has a robust library of Application Programming Interfaces (APIs) that interface with the National Center for Biotechnology Information (NCBI) server, including access to PubMed, MeSH, etc.
    • It has a great didactic heritage. Perl has an enormous library of books, ranging from ‘Perl for Dummies’ to advanced bioinformatics textbooks

    One of the great qualities of Perl is its ability to do any one thing in any number of ways. According to it’s creator Larry Wall, Perl has two slogans. The first is “There’s more than one way to do it”, commonly known as TMTOWTDI. The second slogan is “Easy things should be easy and hard things should be possible.”

    Thanks to Perl, Opus 23 Pro has been made possible.

  • Client-friendly reporting

    I designed Opus 23 Pro to serve two user audiences: the physician who works in the development environment to generate and curate information; and the client, who represents the end-user of that information. Both have widely differing needs and points of reference. One aspect of Opus 23 Pro that I am especially proud of is the great lengths the program goes to to make its final product (the Client Report) simple, concise and easy to understand. One area that has an especially interesting and dynamic quality is the way that Opus 23 explains the actions and consequences of genes to the client.

    Opus 23 Pro aggregates much of its data from public, peer-reviewed sources. This includes the descriptive text that accompanies information about genes. Typically Opus 23 grabs this information from However these short abstracts are often highly technical and cryptic -very unlikely to be very helpful to a patient who does not have a background in genetics. So we created an alternate database of gene descriptions specifically written for the layperson, which is used when Opus 23 Pro generates the client report.

    However, even simplified gene descriptions can be somewhat technical and often rely on the reader having some form of base knowledge.

    Here is a nice touch you don’t see very often. When Opus prints out the gene description in the client report it checks its internal glossary for any advanced medical concepts and if they are in the description the ‘smart owl’ will activate and give the client a simple description of the term.


    To avoid repeating itself endlessly, Opus only adds the glossary description to the first description that requires it. In this screen shot it is explaining what the terms ‘gene,’  ‘interleukin,’ cytokine and ‘protein’ mean.


  • Multi-SNP algorithms

    Algorithms are perhaps the most significant and flexible aspect of Opus 23 data. They are usually the easiest result for the non-medical person to understand, because their conclusions are usually simplified statements in everyday language. Algorithms are processed by the LUMEN app in Opus 23 Pro.

    LUMEN is one of the most powerful apps in the Opus 23 Pro toolbox. This app allows you to examine the effects of multi-SNP, multi-gene consequences. Very few single SNPs exert their effects independently; they more typically interact through epistatic relationships (where the phenotypic expression of one gene influences the genotypic expression of another) or as a haplotype: a set of several SNPs, on a single chromatid of a chromosome pair that are associated statistically.

    LUMEN does this by using algorithms, small step by step programs that run inside of LUMEN and that look at the logical result of multi-SNP combinations. This type of logic is essentially ‘Boolean’ in that its queries yield a yes or no answer. The combination of both allows LUMEN to provide you with insight into complex SNP arrangements and relationships that might otherwise escape detection.

    It’s helpful to think of an Opus 23 algorithm as a tiny flowchart, that depending on which way the result branches, generates a ‘true or false’ result.

    For example, a simple algorithm to determine if you should get out of bed might be:

    • If you hear the alarm clock, open your eyes.
    • If it’s dark outside, go back to bed.
    • If it’s light outside, check the time.
    • If it’s earlier than 7AM, go back to bed.
    • If it’s later than 7AM, get up, check calendar
    • If it’s Saturday, go back to bed.

    As can be seen, there are a lot of ways you can go back to bed with this algorithm! And this is also true as well for the Opus 23 Pro algorithms: In order for an algorithm to be true, it must fufill all of several conditions. If even one condition fails, the whole algorithm will be false.

    Each algorithm is displayed in its own box, and contain information about the genes and SNPs used in its creation. The title of the algorithm is generally its conclusion. Typically, the client report contains only true algorithms, although you may choose to include false algorithms as well, especially if it would be helpful to make the client aware of something they’re likely to not be prone to. Thus:

    • An algorithm that returns a true will have a ‘check’ icon in the bottom left-hand box. The conclusions of these algorithms pertain to to the client based on their  genomic data results.
    • An algorithm that returns a false will have a ‘cross’ icon in the bottom left-hand box. The conclusions of these algorithms do not pertain to the client based on their genomic data, other than perhaps the added knowledge that this is one less thing in life to worry about.
    Opus 23 algorithm triangulates four genes/ 6 snps to estimate genomic variation in serum magnesium levels.
    Opus 23 algorithm triangulates four genes/ 6 snps to estimate genomic variation in serum magnesium levels.