-
Notifications
You must be signed in to change notification settings - Fork 51
expected f-statistics #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
🚀 New features to boost your workflow:
|
|
Thank you for implementing this ! It all looks good to me. I made some suggestions on the documentation. About the quartet names, I agree that relying on the user to provide the right one is not sustainable.
I think I like the second solution better. When we create a The downside is that it requires a new field (I don't know if that is expensive ?), while the first solution might be easier to implement. |
|
The second solution would be most robust indeed! The main issue, I think, is memory. The number of 4-taxon sets explodes quickly, for example with 1365 Perhaps we could have But this approach still requires some kind of memory to know that the object has type |
|
Ah yes, good point about the waste of memory for vectors ! An other solution could be to create a new Maybe |
|
The last commit uses strategy 1: with a new function |
|
I agree with exporting I like this solution, it's simple and adds no memory burden. We'll just need to remember to possibly create a new fonction if we add a new time of quartet data structure in addition to CF and f4. Thanks ! |
expectedf2matrix,expectedf4tabletablequartetdata: code fromtablequartetCF, which now callstablequartetdata(..., prefix="CF")descendenceweight(basis to calculate Ω then f2),expectedf3matrix,check_valid_gammashammingdistancematrixanddistancecorrection_JC!, whose code was extracted fromstartingBL!pairwisetaxondistancematrixandcountquartetsintrees.Problem to be discussed: For quartet concordance factors, there is more symmetry than for f4-statistics: CF(12|34) = CF(12|43), yet f4(12|34) = - f4(12|43). For this reason, the 3 f4s are listed for the following taxon orders:
12_34, 13_42, 14_23, and then these 3 values sum up to 0.Yet, the default column names used by
tablequartetCFare:12_34, 13_24, 14_23(note_24instead of_42in the second).I'm not sure what's the best way to do this.