Long, slightly off topic, slightly ranty post incoming.
I have very real problems with Phylos. Back in a different life, I did evolutionary genomics research. It involved sequencing random snippets of DNA across the entire genome, and comparing these snippets of DNA from hundreds of individuals against each other to determine their phylogenetic relationships, identifying regions of the genome underlying specific traits, and to identify genomic regions under selection in different habitats. I don’t do anything even remotely similar anymore, but I do have a very strong background in genomics research.
There are some giant pitfalls that can make genomic analysis extremely misleading. This is why scientific papers are extremely long and extremely dry. The majority of the paper is simply outlining the methods used so other scientists can determine if the results are justified or not.
These issues include but are not limited to 3 big things. What region or regions of the genome are being sequenced and analyzed? What algorithms are being used for the analysis? Finally, what is the complete data set used for the analysis?
The data set...without a sufficiently large data set, results can be deceiving. This is why your ancestry.com or 23&me results can change over time. Your dna isn’t changing, their data set is becoming larger and more complete. (Side note - I think those sites are a ripoff).
Algorithms...this is a hugely complex topic, but suffice to say, there are multiple different methods for analyzing genomic data, some are better suited than others for specific analyses or regions of the genome. Selecting the correct algorithm is a very important aspect of any genomic analysis.
What genomic region or regions are being sequenced. It’s important to understand that every organism’s genome doesn’t have one specific lineage. Rather, the genome is composed of thousands and thousands of genomic regions that all have their own distinct lineage. A gene that has undergone strong selection in multiple unrelated populations will lead someone to erroneously deduce that the populations are closely related. For a relationship analysis of closely related plants, you ideally want to sequence from thousands of genomic regions throughout the entire genome. At this point, I have no idea what genomic regions are being sequenced by phylos, but I do have some guesses. In short, I don’t think their analyses are particularly accurate because I don’t think they’re sequencing enough of the genome.
Finally, on to OGs and potential S1’s. For a poly hybrid like the OG (I’m assuming) you should absolutely be able to tell an S1 from the parent with a thorough genomic analysis. You will still get recombination of genes that are heterozygous in the parent, and a good genomic analysis can identify this. This is why I don’t find Phylos’ analyses particularly compelling. If they’re finding all OGs to be identical, then they’re not genotyping individuals to a thorough enough level. They’re likely just sequencing a few genic regions which is an outdated method of comparing relationships of related plants.
Personally, I believe all the pure OGs are either 1) renamed cuts, 2) S1’s of the original Ghost OG from Orgnkid, or 3) S1’s of S1’s. Also, I personally believe the original OG came from a Triangle Kush S1.
Here’s an interesting post from ThaDocta on OG cuts.
https://www.icmag.com/ic/showpost.php?p=7664527&postcount=1849