GGF Global Genomics

GGF Generics Library

General – reference database
Children’s genetic diseases database
Cancer database

This project could be adopted by any cancer or disease research institute or any institution who could introduce a program where patients might be encouraged to contribute samples to build the databases. This could involve building both a public GGF library or database and a specialized database for the sponsor institution.

The GGF algorithm is able to analyze DNA and produce GGF motifs which are meaningful shapes as opposed to just statistical plots which are too ‘noisy’ to usefully classify into generic types for ‘technical analysis’. For instance, technical analysis in the stock market involves classifying statistical plots into generic shapes such as ‘candlesticks patterns’. One article lists 16 candlestick patterns analyzed into shape, shadow, tails ,direction, range, frequency of movement with colour coding for price up down etc. Distinct patterns are even given names including three line strike, two black gapping, three black crows, evening star, and abandoned baby (per www.investopedia.com ). Some genetic software does the same ie produce patterns from DNA that are able to be classified for reference such as GraphClust which clusters DNA patterns for this purpose (described in the following extract). However as is explained below GGF has greater utility for DNA analysis using a library of GGF generic patterns obtained from prior GGF compilations.

The following extract from our patent application explains:

The GGF Algorithm “is a new method designed to extract and recognize patterns related to the geometries of DNA that may reflect transcription outcomes to portray or map DNA sequences visually which will allow new ontologies of DNA or protein structure to be created. The GGF method is intended to allow new techniques of image and signal processing to be utilized to enhance existing DNA or protein libraries or databases which have higher recognition signatures amenable to better classification methods and query systems. GGF images and motifs can provide superior signatures and motifs to existing graphical representations of DNA or other coding macromolecules which are designed to greatly improve interpretation, analysis and understanding of genomic data. For example, GGF produces meaningful images in non coding intergenic zones of DNA providing new images that could provide a basis for discovering new gene expression mechanisms, evolutionary insights or epigenetic features otherwise unseen. The Wu Kabat variability plots can gauge non randomness whilst the GGF can not only indicate non randomness but produce meaningful motifs or signatures in clear images that can include generic motifs that may lead to matching other sequences which may be otherwise be difficult to match e.g. where substantial ‘noise’ obscures a generic or matched sequence. This is because GGF is not matching similar DNA sequences but producing motifs or signatures that can be compared. These sequences may be quite different but the image of the motifs are similar giving perhaps the only way to link the two loci of DNA under study. Thus, the GGF can provide solutions to the increasing problems of massive amounts of raw code derived from sequences.

To take one example of a graphical method which specifically analyses a type of DNA sequence representing clusters of non coding RNA sequences, the method is described in an article :Heyne S, Costa F, Rose D, Backofen R. GraphClust: alignment-free structural clustering of local RNA secondary structures. Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.

In that article the authors refer to applying their Cluster graph method too >220,000 sequence fragments to obtain a “small number of probable, but sufficiently different, structures for each RNA sequence. We then encode each structure as a labelled graph preserving all information about the nucleotide type and the bond type …in this way a sequence’s structure is represented as a graph with several disconnected components. We could now compute the similarity between the representative graphs using a graph kernel.” By use of various matching techniques such as ‘nearest neighbour’, covariance and refinement analysis and so on matches between the sample sequence and the target sequence can be made.

The GGF method has a new inventive step beyond methods such as Clust Graph because whilst the Clust Graph is a statistical visualization method designed to produce graphical non recursive representations merely aiding statistical correlation between the sample sequence and the target sequence, the GGF method is more than a statistical visualization method in that it contains recursive, geometrical and formatting features designed to model actual possible biological processes such as possible manner in which the recursive generation of polypeptides which fold occur or the possible manner in which micro or macro molecules tile into a macromolecule or tissues of the body and so on. Generating processes in the genome and proteome are known to be recursive and so recursive representations better model biological processes.”

[1] Heyne S, Costa F, Rose D, Backofen R. GraphClust: alignment-free structural clustering of local RNA secondary structures. Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.