Inligting

Meer insiggewende filogenetiese bome

Meer insiggewende filogenetiese bome


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In die aanbieding van filogenetiese bome lyk die geometriese vorm van die bome dikwels op een of ander manier arbitrêr en word dit meer deur estetika as deur informatiewe redes aangedryf.

Maar sou dit nie sin maak om die meetkunde (afstande en hoeke) spesifieke betekenisse te gee nie? Of is dit reeds gedoen (wanneer dit reg gedoen word)?

Enkele basiese beginsels:

  1. Die vertikale as verteenwoordig tyd (op 'n lineêre, logaritmiese of "anti-logaritmiese" skaal).
  2. Die horisontale as verteenwoordig die hoeveelheid genetiese verandering (gemeet in toepaslike eenhede).
  3. Die lyn van een spesie (= bevolking met feitlik identiese DNA) gaan vertikaal van onder na bo.
  4. Die lyn eindig wanneer die spesie uitsterf.
  5. 'n Nuwe spesie vertak van 'n bestaande spesie met 'n horisontale lyn loodreg op die lyn van laasgenoemde.
  6. Die lengte van die nuwe (horisontale) lyn weerspieël die hoeveelheid genetiese verandering.
  7. Aan sy einde gaan die lyn vertikaal voort.
  8. Daar is geen ander vertakkings langs die horisontale lyn nie.

Deur hierdie beginsels en met perfekte kennis (en 'n goeie mate van genetiese verandering) sou ons 'n perfekte reghoekige boom soos hierdie hê:

Maar ons kennis is nie perfek nie: ons weet min tot niks van baie intermediêre spesies wat nie spore gelaat het nie. Hipoteses oor hierdie onbekende spesies kan weerspieël word deur skuins lyne waarvan die hellings byvoorbeeld kan aandui:

a) Daar was baie uitgesproke genetiese veranderinge in 'n kort tydperk nadat die nuwe spesie vertak het. (grys: veronderstelde intermediêre spesies)

b) Hulle was min klein genetiese veranderinge in 'n lang tydperk na vertakking.

Ek is op soek na besprekings/oorwegings van hierdie soort teken van filogenetiese bome (vraestelle, webblaaie of bloot goeie soekterme).


'n Interessante - en nuut vir my - standpunt word gegee in Joseph Ahrens se antwoord op hierdie Quora-vraag:

Hoe interpreteer jy voorgeslagte in filogenetiese bome?

Wat stadigaan vir my duidelik word: Dit gaan oor die interpretasie van reguit lyn segmente, maksimum reguit lyne, van horisontaal vs. vertikaal lyne, en nodusse (= vertakkingspunte) in filogenetiese bome, veral: watter van hulle moet as 'n spesie geïnterpreteer word?


Hier is die probleme met jou weldeurdagte projek:

1/ Temporele datastelle om bome te proporsieer: Dit is waarskynlik die mees afwesige ding in spesiemeting. Jy sal datastelle van tyd nodig hê vir kewerspesies, vir soogdiere, plante in millennia en miljoen jaar. 90+% van daardie inligting is afwesig.

2/ Genetiese nabyheid om bome te rangskik. Wetenskaplikes publiseer hul genoomstudies in die ToL-projek. Daar is baie afstanddata om alle diere te vergelyk... Genetiese afstand teken nie maklik aan nie, want op dieselfde boom kan sommige afstande 500 keer langer as ander wees. Op 'n boom van 20 bestaande spesies sal jy takke hê van 5 mm lank en 5 meter lank, die helfte van die spesies is op jou skerm en die ander helfte is hoër as die plafon. Dit is die beste om hulle met 'n ander simbool voor te stel, anders interaktiewe 3D-bome waar jy kompakte/groot aansigte kan wissel en die boom mal uitzoem, dit is illustratief en cool, anders gebruik 'n geweegde nabyheidslengte wat 1+ ooreenkoms/50 is, dit beteken' werk ook nie regtig nie. in elk geval, ek kon nie 'n manier vind om daardie inligting duidelik op die boom te vertoon nie.

Filogeniebome leen hulself nie goed om insiggewend gemeet te word nie. Jy begin deur te dink "O, ek kon 'n mooi manier vind om dit te doen" dan wanneer jy die datastelle toepas, word jou bome onbeheerbaar en deurmekaar as gevolg van eksponensiële taklengtes.

Ek het filogenetiese bome geprogrammeer gebaseer op die 81MB ToL tree of life-projek om baie miljoene spesies voor te stel, om 3D-fisikabome en HTTP-opsoeke te doen om wiki-tekste en -beelde vir alle spesies te verskaf. Daar is www.biostars.org 'n baie vriendelike en nuttige bioinformatika-forum vir nagraadse studente wat heeldag bome doen.

Daar is baie (dosyne) insiggewende boomgrafika-programme wat deur akademici gekodeer is, om spesiedata te visualiseer en te ontleed, in verskillende bome gerangskik en daardeur te sorteer.


Evolusionêre Bome

Abstrak

Evolusionêre, of filogenetiese, bome beeld die evolusie uit van 'n stel taksa vanaf hul mees onlangse gemeenskaplike voorouer (MRCA). 'n Spesieboom is 'n filogenetiese boom wat die evolusionêre geskiedenis van 'n stel spesies (of populasies) modelleer. 'n Geenboom is 'n filogenetiese boom wat 'n genealogie van 'n geen modelleer. Genebome van verskillende gene wat van 'n stel spesies gemonster is, mag vanweë 'n verskeidenheid faktore nie met mekaar saamstem nie, sowel as met die spesieboom. 'n Wye verskeidenheid algoritmes en rekenaarprogramme is beskikbaar om filogenetiese bome van verskillende tipes data af te lei. Terwyl ware evolusionêre bome gewortel is en meestal binêr (verdeeld), kan afgeleide bome onwortel of veelvoud wees.


Die vlakke van klassifikasie

Taksonomie (wat letterlik &ldquoarrangement law&rdquo beteken) is die wetenskap van benoeming en groepering van spesies om 'n internasionaal gedeelde klassifikasiestelsel te konstrueer. Die taksonomiese klassifikasiestelsel (ook genoem die Linnaeaanse stelsel na die uitvinder daarvan, Carl Linnaeus, 'n Sweedse natuurkundige) gebruik 'n hiërargiese model. 'n Hiërargiese stelsel het vlakke en elke groep op een van die vlakke sluit groepe op die volgende laagste vlak in, sodat op die laagste vlak elke lid aan 'n reeks geneste groepe behoort. 'n Analogie is die geneste reeks gidse op die hoofskyfstasie van 'n rekenaar. Byvoorbeeld, in die mees inklusiewe groepering verdeel wetenskaplikes organismes in drie domeine: Bakterieë, Archaea en Eukarya. Binne elke domein is 'n tweede vlak wat 'n koninkryk genoem word. Elke domein bevat verskeie koninkryke. Binne koninkryke is die daaropvolgende kategorieë van toenemende spesifisiteit: filum, klas, orde, familie, genus en spesie.

As voorbeeld word die klassifikasievlakke vir die huishond in Figuur 12.2.2 getoon. Die groep op elke vlak word 'n takson (meervoud: taxa) genoem. Met ander woorde, vir die hond is Carnivora die takson op ordevlak, Canidae is die takson op gesinsvlak, ensovoorts. Organismes het ook 'n algemene naam wat mense tipies gebruik, soos huishond of wolf. Elke taksonnaam word met hoofletters behalwe vir spesies, en die genus- en spesiename is kursief gedruk. Wetenskaplikes verwys na 'n organisme volgens sy genus- en spesiename saam, wat gewoonlik 'n wetenskaplike naam of Latynse naam genoem word. Hierdie tweenaamstelsel word binomiale nomenklatuur genoem. Die wetenskaplike naam van die wolf is dus Canis lupus. Onlangse studie van die DNS van mak honde en wolwe dui daarop dat die mak hond 'n subspesie van die wolf is, nie sy eie spesie nie, daarom word dit 'n ekstra naam gegee om sy subspesie status aan te dui, Canis lupus familiaris.

Figuur 12.1.2 toon ook hoe taksonomiese vlakke na spesifisiteit beweeg. Let op hoe ons die hond binne die domein gegroepeer met die wydste diversiteit van organismes vind. Dit sluit plante en ander organismes in wat nie afgebeeld word nie, soos swamme en protiste. Op elke subvlak word die organismes meer eenders omdat hulle nader verwant is. Voordat Darwin se evolusieteorie ontwikkel is, het natuurkundiges organismes soms geklassifiseer deur gebruik te maak van arbitrêre ooreenkomste, maar aangesien die evolusieteorie in die 19de eeu voorgestel is, werk bioloë om die klassifikasiestelsel evolusionêre verwantskappe te laat weerspieël. Dit beteken dat al die lede van 'n takson 'n gemeenskaplike voorouer moet hê en nouer verwant aan mekaar moet wees as aan lede van ander taksa.

Onlangse genetiese analise en ander vooruitgang het bevind dat sommige vroeëre taksonomiese klassifikasies nie werklike evolusionêre verwantskappe weerspieël nie, en daarom moet veranderinge en opdaterings gemaak word namate nuwe ontdekkings plaasvind. Een dramatiese en onlangse voorbeeld was die afbreek van prokariotiese spesies, wat tot die 1970's almal as bakterieë geklassifiseer is. Hul verdeling in Archaea en Bakterieë het tot stand gekom ná die erkenning dat hul groot genetiese verskille hul skeiding in twee van drie fundamentele vertakkings van die lewe regverdig.

Figuur 12.1.2: Op elke subvlak in die taksonomiese klassifikasiestelsel word organismes meer eenders. Honde en wolwe is dieselfde spesie omdat hulle kan teel en lewensvatbare nageslag produseer, maar hulle is verskillend genoeg om as verskillende subspesies geklassifiseer te word. (krediet &ldquoplant&rdquo: wysiging van werk deur "berduchwal"/Flickr-krediet &ldquoinsect&rdquo: wysiging van werk deur Jon Sullivan krediet &ldquofish&rdquo: wysiging van werk deur Christian Mehlführer krediet &ldquorabbit&rdquo: wysiging van werk deur Aidan Wojtas&quold-werk deur Aidan Wojtas &quold-krediet: Jonathan Wojtas-krediet: wysiging van werk deur Kevin Bacher, NPS krediet &ldquojackal&rdquo: wysiging van werk deur Thomas A. Hermann, NBII, USGS krediet &ldquowolf&rdquo wysiging van werk deur Robert Dewar krediet &ldquodog&rdquo: wysiging van werk deur "digital_image_fan"/Flickr)

Op watter vlakke word katte en honde as deel van dieselfde groep beskou?

Besoek hierdie PBS-werf om meer oor taksonomie te wete te kom. Onder Klassifiseer lewe, klik Begin interaktief.


Die oplossing van moeilike filogenetiese vrae: waarom meer reekse nie genoeg is nie

Aanhaling: Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, et al. (2011) Die oplossing van moeilike filogenetiese vrae: waarom meer rye nie genoeg is nie. PLoS Biol 9(3): e1000602. https://doi.org/10.1371/journal.pbio.1000602

Akademiese redakteur: David Penny, Massey Universiteit, Nieu-Seeland

Gepubliseer: 15 Maart 2011

Kopiereg: © 2011 Philippe et al. Hierdie is 'n ooptoegang-artikel wat versprei word onder die bepalings van die Creative Commons Erkenningslisensie, wat onbeperkte gebruik, verspreiding en reproduksie in enige medium toelaat, mits die oorspronklike outeur en bron gekrediteer word.

Befondsing: Die werk is befonds deur NSERC (www.nserc-crsng.gc.ca), CRC (www.chairs-chaires.gc.ca), Agence Nationale de la Recherche (http://www.agence-nationale-recherche.fr /), ARC Biomod (www.cfwb.be), en DFG (http://www.dfg.de/en/index.jsp). Die befondsers het geen rol in studie-ontwerp, data-insameling en -analise, besluit om te publiseer of voorbereiding van die manuskrip gehad nie.

Mededingende belange: Die skrywers het verklaar dat geen mededingende belange bestaan ​​nie.

Afkortings: BS, bootstrap ondersteuning EST, uitgedrukte volgorde tag LBA, lang tak aantrekkingskrag

In die strewe om die Boom van Lewe te rekonstrueer, het navorsers hulle toenemend tot filogenomika gewend, die afleiding van filogenetiese verwantskappe deur gebruik te maak van genoomskaaldata (Kassie 1). Betower deur die volgehoue ​​toename in volgordebepaling deurset, het baie filogenetici die hoop gekoester dat die inkongruensie wat gereeld waargeneem word in studies wat enkele of 'n paar gene gebruik [1] tot 'n einde sou kom met die generering van groot multigene datastelle. Tog, soos so dikwels gebeur, het die werklikheid baie meer kompleks geblyk te wees, aangesien drie onlangse grootskaalse ontledings, een gepubliseer in PLoS Biologie [2]–[4], maak duidelik. Die studies, wat handel oor die vroeë diversifikasie van diere, het hoogs inkongruente (Kassie 2) bevindings opgelewer ten spyte van die gebruik van aansienlike volgordedata (sien Figuur 1). Dit is duidelik dat die byvoeging van meer rye nie genoeg is om die teenstrydighede op te los nie.

Raam 1. Van Filogenetika tot Filogenomika

Filogenetika, die bepaling van evolusionêre verwantskappe tussen organismes, is sentraal tot ons begrip van die evolusie van lewe. Die drie filogenieë van Figuur 1 behels byvoorbeeld baie verskillende interpretasies oor die kompleksiteit van die gemeenskaplike voorouer van alle diere. Belangrike liggaamsplan-karakters (bv. neurosensoriese en spysverteringstelsels en spierselle) word gevind in cnidarians, ctenophores en bilaterians, maar nie in sponse en placozoans nie. Volgens die filogenieë van Schierwater et al. [4] en Dunn et al. [2], die taksonomiese verspreiding van hierdie karakters impliseer óf (i) dat die voorvaderlike metazoan reeds hierdie eienskappe vertoon het en dat sponse (en placozoans) dit sekondêr verloor het óf (ii) dat hierdie karakters verskeie kere onafhanklik deur konvergensie verkry is (bv. , in die cnidarian + ctenofoor en in die bilateriese afstammelinge, volgens die boom van Figuur 1A). Daarteenoor het die filogenie van Philippe et al. [3] is meer kongruent met morfologiese karakters en versoenbaar met 'n eenvoudige metazoïese voorouer en 'n latere opkoms van hierdie karakters slegs een keer, in die geslag wat lei tot die gemeenskaplike voorouer van coelenterates (cnidarians+ctenophores) en bilaterians.

Filogenieë word oor die algemeen uitgebeeld as bome (wat nie-genetwerkte grafieke is, soos in Figuur 1) omdat vertikale evolusie onbetwisbaar die primêre meganisme van oorerwing vir genetiese materiaal is. Die bestaan ​​van horisontale oordrag (bv. hibridisering van naverwante taksa, organelverkryging deur endosimbiose en horisontale geenoordrag) maak filogenetiese bome egter slegs pragmatiese benaderings, wat waarskynlik op die lang termyn deur filogenetiese netwerke vervang sal word (veral vir eensellige organismes) .

Onlangs het filogenomika, die gebruik van genomiese data om evolusionêre verwantskappe af te lei, na vore gekom as 'n nuwe domein van filogenetika. Die belangrikste sterkpunt van filogenomika is die drastiese vermindering in ewekansige (of steekproef) foute wat veroorsaak word deur die gebruik van groot (multigeen) datastelle. Talle benaderings kan gebruik word om voordeel te trek uit genomiese data (vir oorsig sien [49]). Kortliks, nuwe metodes gebaseer op oligonukleotiedinhoud, geeninhoud of intronposisies lyk belowend (soos getoon deur hul vermoë om redelike bome op te lewer), maar vereis bykomende teoretiese ontwikkelings om hul volle potensiaal te bereik. Daarom is die twee gewildste filogenomiese benaderings eenvoudige uitbreidings van die standaard filogenetiese metodes wat op enkelgeendatastelle toegepas word. Die eerste, bekend as die "supermatriks" (of superbelyning), bestaan ​​uit die samevoeging van talle ortoloë gene in 'n enkele supergeen, wat ontleed word met behulp van standaardmetodes (of effens gewysigde metodes soos aparte modelle wat veelvuldige stelle taklengtes toelaat [50] ). Die tweede, "superboom", benadering neem die teenoorgestelde pad deur eers 'n boom vir elke geen in die datastel af te lei en dan hierdie individuele bome in 'n enkele superboom te kombineer. Die supermatriksbenadering is die algemeenste gebruik, in ooreenstemming met die handjievol studies wat daarop dui dat dit groter akkuraatheid as die superboom bied [13],[51], alhoewel dit nog formeel gedemonstreer moet word.

Raam 2. Woordelys

Homologie/ortologie/paralogie/xenologie: Gene wat van 'n gemeenskaplike voorouer afkomstig is, word homoloë genoem. Twee homoloë gene is ortoloog as hulle deur 'n spesiasiegebeurtenis gedivergeer het. Daarteenoor ontstaan ​​paraloge deur duplisering van 'n enkele geen binne 'n gegewe afstamming, terwyl xenoloë die gevolg is van die horisontale oordrag van 'n geen van 'n skenkerspesie na 'n ontvangerspesie (wat uiteindelik sy oorspronklike kopie deur die xenoloog vervang kan kry).

Homoplasie/konvergensie: Valse ooreenkoms as gevolg van konvergensie of terugkeer en nie aan gemeenskaplike afkoms nie, word homoplasie genoem. Konvergensie beskryf die onafhanklike verkryging deur afsonderlike evolusionêre lyne van dieselfde nukleotied (of aminosuur) op 'n gegewe posisie. Dit is 'n direkte gevolg van veelvuldige vervangings.

Onvolledige geslagssortering: Die verbygaande behoud van voorvaderlike polimorfismes oor spesiesgebeure. Spesies wat in tyd saamgepers is en groot voortplantingspopulasies verhoog albei die waarskynlikheid van hierdie verskynsel. Met inagneming van drie afstammelinge wat vinnig gedivergeer het, sal sommige volgordeposisies per toeval tussen een paar gedeel word, terwyl ander tussen 'n ander paar gedeel sal word, en nog ander tussen die derde moontlike paar, wat die filogenetiese sein op die ooreenstemmende takke dus vervaag.

Inkongruensie: Daar word gesê dat twee (of meer) filogenetiese bome inkongruent is wanneer hulle teenstrydige vertakkingsordes (d.w.s. topologieë) vertoon en nie op mekaar geplaas kan word nie. Dit impliseer dat ten minste een nodus (ook bekend as 'n tweeverdeling) wat in een boom voorkom, nie in die ander(e) gevind word nie, waar dit deur alternatiewe groeperings van taksa vervang word.

Model van volgorde evolusie: 'n Statistiese beskrywing van die proses van substitusie in nukleotied- of aminosuurvolgordes. Komplekse modelle benader die evolusionêre proses beter, maar ten koste van meer parameters en berekeningstyd. Aangesien parameterryke modelle meer data benodig om behoorlik op te tree, het hulle werklik nuttig geword met die koms van filogenomiese datastelle.

Monofielie: Om as monofileties beskou te word, moet 'n taksonomiese groep aan twee voorwaardes voldoen: (i) al sy taksa moet afkomstig wees van 'n enkele voorouer en, wederkerig, (ii) alle taksa wat van hierdie gemeenskaplike voorouer afkomstig is, moet aan die groep behoort.

Nie-filogenetiese sein: Die kombinasie van verskillende soorte gestruktureerde geraas (bv. onopgemerkte homoplasieë) wat meeding met die egte filogenetiese sein tydens boomrekonstruksie. Selfs al is die nie-filogenetiese inhoud gedeeltelik 'n eienskap van 'n meervoudige volgorde-belyning (veral verwant aan die versadigingsvlak daarvan), hang die nie-filogenetiese sein wat eintlik afgelei word, baie af van die metode en die model van evolusie wat gekies is. In probabilistiese metodes is die nie-filogenetiese sein hoofsaaklik die gevolg van die data wat die model van volgorde-evolusie oortree. Hierdie oortredings ontstaan ​​omdat ons modelle onvermydelik oorvereenvoudig word in vergelyking met die kompleksiteit van die natuurlike evolusionêre proses. Uiteindelik sal die oënskynlike sein wat ontleed word 'n mengsel van filogenetiese en nie-filogenetiese sein wees.

Uitgroep/ingroep: Byna alle boomrekonstruksiemetodes produseer ongewortelde bome, waarin afgeleide verwantskappe geen inligting oor die rigting van tyd oordra nie. Om 'n boom te wortel en dit in 'n filogenie te verander, moet 'n mens 'n groep taksa by die ontleding insluit wat bekend is dat dit buite die groep is wat bestudeer word. Hierdie verwysingsgroep word die uitgroep genoem, terwyl die taksa van belang die ingroep maak.

Patristiese afstand: Die som van die lengtes van die takke wat twee nodusse in 'n filogenetiese boom verbind, waar daardie nodusse tipies terminale nodusse is wat bestaande taksa verteenwoordig. Dit is dus 'n afgeleide afstand (met inagneming van veelvuldige substitusies) groter as die ongekorrigeerde afstand wat direk bereken word uit die aantal verskille waargeneem tussen die twee ooreenstemmende rye in die belyning.

Filogenetiese sein/sinapomorfie: Die substitusies wat langs 'n gegewe tak van die evolusionêre boom voorkom. Die sterkte van die filogenetiese sein is eweredig aan die aantal substitusies wat langs die tak voorkom. In nie-probabilistiese metodes word die sein gekodeer in sinapomorfies, dit wil sê, gedeelde residue (nukleotiede of aminosure) op belynde posisies wat spesifiek is vir 'n stel reekse wat van 'n gemeenskaplike voorouer afgelei is. In probabilistiese metodes hang die hoeveelheid filogenetiese sein wat werklik uit 'n gegewe datastel onttrek word af van die model en sal na verwagting toeneem met die pas van die model by die data (m.a.w. die vermoë van die model om die data te verduidelik).

Filogenetiese boom: 'n (gekoppelde asikliese) grafiek wat die beraamde evolusionêre verwantskappe tussen 'n groep spesies beskryf. In molekulêre bome is taklengtes eweredig aan die genetiese afstande (en dus tot 'n mate tot tyd) wat afgelei word uit die ontleding van 'n veelvuldige belyning van homoloë volgordes (nukleotied- of aminosuurvolgordes).

Probabilistiese metodes: 'n Familie boomrekonstruksiemetodes uit veelvuldige volgordebelynings wat gegrond is op statistiese teorie en gebruik maak van eksplisiete modelle van volgorde-evolusie. Dit sluit maksimum waarskynlikheid en Bayesiaanse inferensiebenaderings in en is bekend as die akkuraatste, maar ook die mees rekenkundige veeleisende.

Versadiging: Wanneer rye in 'n meervoudige belyning soveel veelvuldige substitusies ondergaan het dat skynbare afstande die werklike genetiese afstande grootliks onderskat, word gesê dat die belyning versadig is. Filogenetiese afleiding werk die beste met datastelle wat net effens versadig is. As gevolg van hul verminderde toestandruimte (vier moontlike basisse), versadig nukleotiedvolgordes vinniger as proteïenvolgordes (20 moontlike aminosure).

Terrein-homogene/perseel-heterogene modelle: Die meeste modelle van volgorde-evolusie veronderstel dat dieselfde evolusionêre proses by elke posisie (of plek) van 'n belyning plaasvind. Met sulke modelle kan slegs die evolusionêre tempo as heterogeen oor terreine gemodelleer word, gewoonlik deur 'n gamma-verspreiding van tempo's. Dit is egter bekend dat selektiewe beperkings redelik heterogeen oor posisies heen is, wat die hipoteses van plek-homogene modelle ernstig oortree. Aan die ander kant neem plek-heterogene modelle aan dat die evolusionêre proses wyd oor terreine verskil, veral die stel aanvaarbare aminosure (bv. in die CAT-model). 'n Aantal studies het getoon dat terrein-heterogene modelle 'n beter passing by filogenomiese datastelle bied en geneig is om die sensitiwiteit vir boomrekonstruksie-artefakte te verminder (bv. LBA).

(A) Schierwater et al. [4] boom. (B) Dunn et al. [2] boom. (C) Philippe et al. [3] boom. Getalle tussen hakies na taksonname dui die aantal spesies aan wat in die datastel vir die ooreenstemmende takson ingesluit is. Bootstrap-ondersteuningswaardes bo 90% word met 'n kolpunt (vir nodusse) of deur onderstreep (vir terminale taksa) aangedui. Dit is die moeite werd om te noem dat die monofilie van Porifera nie onomwonde aanvaar word nie [28],[46] slegs die ontleding van 30 000 posisies met 'n ryk takson-steekproefneming en 'n komplekse model van evolusie herwin dit met beduidende statistiese ondersteuning [3]. Alhoewel so 'n yl filogenetiese sein sal vereis om die volle potensiaal van filogenomika te benut om met selfvertroue opgelos te word, val hierdie vraag buite die bestek van hierdie studie. Vereenvoudigde tekeninge (herteken vanaf [74]) aan die onderkant illustreer die groot morfologiese ongelykheid wat tussen die vyf terminale taksa bestaan. Porifera stem ooreen met sponse Cnidaria met seeanemone, jellievisse en bondgenote Ctenophora om jellies te kam en Bilateria met alle ander diere (gekenmerk deur hul bilaterale simmetrie) behalwe Trichoplax (Placozoa), wat morfologies blykbaar die mees eenvoudig georganiseerde dierefilum is.

Hier, met hierdie drie studies as 'n voorbeeld, bespreek ons ​​slaggate wat die eenvoudige toevoeging van rye nie kan vermy nie, en wys hoe die waargenome inkongruensie grootliks oorkom kan word en hoe verbeterde bioinformatika-metodes kan help om die volle potensiaal van filogenomika te openbaar.


Materiale en Metodes

Kode Beskikbaarheid

SaRTree is 'n sagteware wat in Perl geskryf is en gratis beskikbaar is onder GPL v3.0 lisensie. Die nuutste weergawe en die ondersteunende lêers, insluitend 'n eenvoudige voorbeeld wat in hierdie studie gebruik is, kan afgelaai word by https://github.com/DalongHu/SaRTree laas toeganklik op 31 Oktober 2019. Die bronkode skrifte vir die programme en modules is beskikbaar, maar kan dalk moet aangepas word om by die konfigurasie van die rekenaar wat gebruik word, te pas.

Die stappe en algoritmes betrokke by die SaRTree-pyplyn wat die ses "lewende bome"-modules implementeer

A: Rou data verwerking module. Afdeling A illustreer voorbehandeling van rou data wat saamgestelde volgorde uit beide samestelling van NGS rou leesdata en direk afgelaai vanaf aanlyn databasisse soos NCBI en ook die invoer van 'n omsirkelde volledige verwysingsgenoom insluit. SaRTree is ontwerp vir die vergelyking van saamgestelde genoomvolgordes, maar sal met korter rye werk. Met die eindresultaat wat SNP-roeplêers bereik, word verskeie rou data vergelyk met die verwysingsgenoom.

A1: Vergelyk stamvolgordes met die verwysingsgenoom. Kartering van kontigs van saamgestelde volgorde na die volledige genoom deur belyningssagteware soos progressiveMauve v2.3.1 (Darling et al. 2010) soos hier aanbeveel. Gebruikers kan die opdragte maklik hersien om verskillende parameters of verskillende sagteware te gebruik om by 'n spesifieke gebruik te pas.

A2: Dekking snoei. Deur 'n afsnypunt te gebruik (gewoonlik gebaseer op 'n konsepskatting van die proporsie kerngene in die genoom), word stamme met 'n lae dekking relatief tot die verwysing na module E gestuur, diegene met 'n hoë dekking word na module B gestuur om deel te neem aan hoof- boom bou.

B: SaRTree kernmodule. Programme en gereedskap in afdeling B speel die sentrale rol in die pyplyn, met SNP-inligting wat saamgestel, getransformeer, ontleed en uiteindelik verfyn word in 'n "regte" mutasielys, rekombinasielys en SNP-belyningsvolgorde as insette vir afdeling D.

B1: Kombinasie en transformasie van SNP-lêers. Een SNP-lyslêer word ontwikkel deur SNP-lêers vir alle stamme te kombineer om alle SNP-liggings en basisverskille relatief tot die verwysingsgenoom in te sluit.

B2: Bepaal vir die SNP verspreidingspatroon onder die stamme. Vir elke lokus met SNP's word stamme met dieselfde basistipe gegroepeer met 'n merker (soos "A" of "B") wat dan saamgespan word in 'n verstek volgorde van stamme om 'n verspreidingspatroon voor te stel wat gelykstaande is aan vermeende takke op die boom. Dan word alle SNP's in verskillende verspreidingspatrone onderskei, omdat SNP's binne 'n rekombinasiegebeurtenis 'n konsekwente patroon toon.

B3: Maak die inter-SNP-afstandlys. SNP's met dieselfde verspreidingspatroon sal op dieselfde tak wees, en die Inter SNP-afstand word bepaal vir elke paar aangrensende SNP's op daardie vermeende tak.

B4: Maak 'n lys van herkombinasiegebeurtenisse. Die opsporing van herkombinasiegebeurtenisse deur die afstandlyste te gebruik, dien as die kernbestanddeel in die SaRTree-kernafdeling, wat die stuur van data na afdeling C behels, en die herwinning van die herkombinasielys as uitvoer.

B5: Kry "regte" mutasies. Die SNP's in rekombinasiestreke word verwyder om die "regte" mutasielys te kry.

B6: Ontwikkel 'n SNP-belyningsvolgorde vir boombou. Om basisse op SNP-liggings saam te string vir elke monster en verwysingsgenoom maak 'n SNP-belyningsvolgordelêer wat gebruik word om die filogenetiese boom in afdeling D te bou.

C: Herdetect module. Met statistiektoetsing is hierdie proses genaamd RecDetect 'n onafhanklike komponent in SaRTree. Deur gebruik te maak van 'n statistiek model met die enigste aanname dat mutasies 'n ewekansige verspreiding het, word rekombinasie gebeure beraam vanaf die inter-SNP afstand lys wat gegenereer is vanaf 'n SNP ligging lys in B3. RecDetect werk goed met duisende rye en is maklik om te implementeer op verspreide rekenaarstelsels. Daar is balans tussen spoed en akkuraatheid en dit is die verstekprogram in SaRTree. RecDetect is goed vir hoë-resolusie werk met min stamme wat later 'n handkontrole sal kry en is robuust met duisende stamme en pas genome van slegs een geslag (aanvullende fig. 3, Aanvullende Materiaal aanlyn) en ook genome met baie hoë rekombinasietempo's . Gebruikers kan RecDetect met die hand vervang deur enige ander derdeparty-herkombinasie deur die pyplynskrif te hersien.

C1: Sortering en aanvanklike toetsing vir afstande. Nadat afstande tussen SNP's volgens lengte gesorteer is en afsnypunt as nul geïnisialiseer is, word die afstandelys na die eerste Kolmogorov-Smirnov-toets gestuur om te evalueer hoe eksponensiële verspreiding ooreenstem met die insetafstande.

C2 en C3: Iterasiestap met ongepaste resultaat. As die toetsresultaat in C1 die hipotese verwerp dat die afstande ooreenstem met 'n eksponensiële verspreiding ('n ongepaste resultaat), sal die afsnypunt met 10 verhoog word (verstek) om die filtering van die SNP's ligginglys oor te doen deur die SNP's met minder as 10 te skrap baseer afstande na bure. Daarna word die hernude SNP-ligginglys gebruik om die afstandlys wat hertoets moet word as dié in C1 te herbereken.

C4: Finaliseer stap met pasresultaat. As die toetsresultaat is dat die afstande by 'n eksponensiële verspreiding pas, word liggings met afstande minder as die sterkste afsnypunt (cf) as rekombinasiegebied geleë.

C5: Berekening van grense vir herkombinasiegebeure. Die grense van hierdie rekombinasiegebeurtenisse word uitgebrei om die lengte van die DNS-fragmente wat geïnkorporeer is te skat deur aan elke kant 'n helfte van die gemiddelde afstand tussen SNP's in die geïnkorporeerde DNS by te voeg.

C6: Uitvoer van herkombinasiestreeklys. Die laaste stap in afdeling C is om die herkombinasielys terug te stuur na afdeling B.

D: Boom en divergensie datums module. Hierdie module gebruik die "regte" mutasielys en hersiene rekombinasie om 'n akkurate filogenetiese boom te bou, plus divergensiedatums as daar 'n geïsoleerde datumlys is.

D1: Bou teikenboom. Met behulp van derdeparty-sagteware (RAxML weergawe 8 aanbeveel), word 'n maksimum waarskynlikheidboom gebou as die teiken filogenetiese boom.

D2: Wortel temporale boom. Die filogenetiese boom kan gewortel word deur TreeRooter (Module F) gebaseer op nominasie van 'n toepaslike uitgroepvolgorde, of met die hand deur 'n gebruiker-bepaalde strategie te gebruik.

D3: Skatting van divergensiedatum (opsioneel). Deur gebruik te maak van derdeparty-sagteware BEAST met geïsoleerde datumlys en SNP-belyningsvolgorde, maak 'n MCMC-proses 'n tydelike lêer met 'n agtervoegsel ".trees", wat die divergensiedatums aandui. BEAST kan slegs sinvolle resultate gee as die isolasiedatums voldoende versprei is in verhouding tot die diepte van die takke (Comas et al. 2013). Dit kan nie bruikbare resultate gee as slegs terminale takke lank is, of as die isolasiedatums 'n kort tydperk dek nie (Comas et al. 2013). BEAST het baie parameters wat ingestel sal moet word.

D4: Merk divergensiedatums op teikenboom. Die temporal trees-lêer word geannoteer met behulp van derdeparty-sagteware TreeAnnotator wat in BEAST ingesluit is (Drummond et al. 2012). Die uitvoerboom is reeds volgens datum geformateer.

D5: Merk evolusionêre besonderhede op die filogenetiese boom. Getalle en liggings van die "regte" mutasies en rekombinasie gebeure word teruggevoer na die finale boom met divergensie datums. Hierdie bioperl script program is ingesluit in SaRTree vir handmatige toepassing nadat die SaRTree pyplyn voltooi is.

E: StrainLocater-module. 'n Eenvoudige algoritme word geïmplementeer om nuwe stamme op 'n bestaande SaRTree-gegenereerde boom op te spoor. StrainLocater gebruik evolusionêre gebeure wat aan takke toegeken is as die enigste basis vir ligging en laat stroomaf studies toe gebaseer op die SaRTree-pyplyn.

E1: Voorbehandeling van nuwe monsterreekse. Nuwe voorbeeldreekse moet aanvanklik deur Module A binne SaRTree verwerk word om 'n geformateerde karteringlys as die invoer van StrainLocater te kry.

E2: Vergelyking met bestaande mutasielys. 'n Besparende metode word onderneem om te bepaal watter van die basisse op die teikenvertakking ook op die navraagvolgorde is, en te bepaal watter voor die nuwe nodus geleë sal wees, en watter daarna. Dit word geïmplementeer deur elke navraagvolgorde-karteringlys te vergelyk met die bestaande mutasielys wat deur SaRTree geskep is vir 'n teikenboom, 'n vals mutasielys word gebou om te wys aan watter tak die navraagstamme vir elke mutasiepunt behoort.

E3: Opspoor van nommers van mutasies op elke navraag. Deur 'n tellingmatriks en die teikenboomlêer te gebruik, word nommers van die navraagstam se teenwoordige/afwesig/gaping van elke mutasiepunt opgesom en omgeskakel na 'n telling wat die passing van elke tak verteenwoordig.

E4: Berekening van finale tellings van takke en vind navraag na die beste tak. Die tellings verkry vanaf E3 word gebruik, en ignoreer takke met nul of negatiewe tellings om die finale telling van elke tak te bereken as die som van die aanvanklike telling plus alle voorouertaktellings op die pad na die wortel. Die tak met die hoogste finale telling sal dan as uitslagkandidaat gekies word. Die eindresultaattak is die een wat die minste homoplastiese SNP's van die kandidaattak en sy twee afstammelinge genereer.

F: TreeRooter-module. The StrainLocater function can be used to locate an appropriate outgroup on an existing tree to root the tree. The traditional way to root a tree is to run the tree program with the outgroup strain included to root the tree. This always reduces the amount of shared sequence for generating the tree. In cases when no outgroup sequences are available or in other special situations, users could do the rooting manually or use third-party software such as pplacer ( Matsen et al. 2010) or EPA algorithm ( Berger et al. 2011) recommended by RAxML ( Stamatakis 2014).

F1: Pretreatment of outgroup sequences. The outgroup sequence or sequences must be initially processed by Module A within SaRTree to get a formatted mapping list as the input for TreeRooter.

F2: Locating outgroup onto unrooted tree. The outgroup mapping list created in the last step and the target unrooted tree generated in section D with its mutation list generated in section B are used as input for StrainLocater to locate an outgroup strain onto a branch of the target tree.

F3: Weighing new branch lengths. After locating the outgroup onto the main tree to get a new rooted tree, the resulting branch should be separated as two new branches by outgroup. The two new branch lengths are weighted by the proportion of mutation events on them.

Phylogenetic Reconstructions and Comparative Genomic Analysis, For Examples

All the examples in this study are processed by the High Performance Computing system in The University of Sydney, which is a cluster of computing systems based on Dell PowerEdge R630 Servers using Intel Xeon E5-2680 V3 CPU (2.6 GHz) and Dell PowerEdge C6320 Servers using Intel Xeon E5-2697A-V4 CPU (2.6 GHz). The general profile of all the trees built in this study is shown in supplementary table 4 , Supplementary Material online and the original files of those trees can be found at https://figshare.com/s/ac165d520410c994f587 last accessed October 31, 2019.

For the example run shown in supplementary figure 2 , Supplementary Material online, which was generated directly by FigTree v1.4.3 ( Drummond et al. 2012), we showed the simplest default run and its raw output of eight A. baumannii global clone II strains randomly selected from the NCBI refseq database (O’Leary et al. 2016). Strain ACICU was used as the reference and global clone I strain 307-0294 was set as the outgroup. As a quick-start demonstration, in the configuration file for this example, the parameter of RecDetect was set as “-f -t -x 40000” to indicate the simplest “fast” and “strict” algorithm.

Vir die A. baumannii Global Clone I phylogenetic tree, we downloaded sequences of all strains used in the study by Holt et al. (2016) and built a SaRTree tree using the strain 1656-2 as the outgroup to root the tree by TreeRooter. Strains TG19582 and 307-0294, which are described as strains with low quality or confusing location in the previous study, were excluded manually in the first run and then located onto the tree using StrainLocater. The tree is displayed by Figtree v1.4.3 in figure 3. For the serotypes of the strains, note that there are at least 3 naming systems for these oligosaccharides, and we have retained that used in the original paper to avoid confusion.

For the second example, we downloaded 2003 A. baumannii-calcoaceticus complex genome DNA sequences from the NCBI refseq database (O’Leary et al. 2016). Then we ran the standard SaRTree pipeline with strain ACICU as reference ( Snitkin et al. 2011) applying 90% cutoff value to filter out 725 low coverage strains, and 1,278 strains were input to the SaRTree main script. Owing to our lack of knowledge of the origins of A. baumannii major clones, the output tree was manually rooted after comparing with some published trees ( Wallace et al. 2016). The tree shown in supplementary figure 3 , Supplementary Material online is displayed by GraPhlAn v0.9.7 ( Asnicar et al. 2015). The MLST analysis follows the Pasteur protocol ( Diancourt et al. 2010). The 7 marker genes were extracted from 2003 genome sequences by blast+ v2.2.26 ( Camacho et al. 2009) and compared with the database at https://pubmlst.org/abaumannii/. The MLST result is shown in supplementary table 1 , Supplementary Material online. Comparative genomic analysis on antibiotic resistance genes is based on the CARD database ( Jia et al. 2017) using nucleotide sequences of antibiotic resistance genes to implement a Blast search with “-e 1e-100” as threshold and hits with coverage above 50% being recorded as antibiotic resistance genes. The result is also displayed by GraPhlAn v0.9.7 in supplementary figure 3 , Supplementary Material online.

Based on the tree with 1,287 strains, 73 Global Clone II related representatives were then selected for a second SaRTree run to generate the high-resolution phylogenetic tree shown in figure 2. All had a known date of isolation. We used the same ACICU strain as the reference and chose ST 25 strains XH857 and AB-HKU3-08 as the outgroup, rooting the tree with “-r man” using the manual rooting option in SaRTree. BEAST v1.10.4 ( Drummond et al. 2012) was used under a relaxed molecular clock using a coalescent constant population size and a general time-reversible model with gamma correction, which are selected as the best models based on effective sample size comparing to other combinations of models, to estimate the divergence dates of the branches. The first 1,000,000 from 10,000,000 chains are removed as burn-in. The final tree with divergence dates was built by TreeAnnotator within the BEAST package. The remaining 1,097 Global Clone II related strains are then located onto this tree by a third run of SaRTree with “-l” setting and “-m formatted” module to save computing time. A total of 460 strains failed to locate onto specific branches due to missing sequences or low quality and two ST215 strains T271 and AB-HKU3-10 are located out of the tree. The other 635 strains are effectively located. The final result in supplementary figure 4 , Supplementary Material online combines information on antibiotic resistance genes and isolation detail.

For the example shown in figure 4 and supplementary figure 5 , Supplementary Material online, we downloaded the 411 available E coli en Shigella complete genomes from the NCBI refseq database (O’Leary et al. 2016) and selected 351 strains with good background information to implement a SaRTree run with “-e both” setting to filter out low-quality SNP calling due to the unknown genomic diversity in this set of sequences. The genotype grouping is done by adding strains onto the tree described by Clermont et al. (2013) to define the 7 groups, based on 13 homology genes. Then the strains which located onto existing branches for those 7 groups are identified as in the same groups. The ones on none of those 7 branches are treated as new groups and also some strains lack some of the 13 genes and could not be identified by this method. Then we selected 29 strains from the 351-strain tree and some Shigella strains from groups not on the tree to rebuild an accurate tree as shown in figure 2. Same parameters, reference genome and outgroup are set for the 29-strain tree and all 11,162 available assembled E coli en Shigella genomes on NCBI genbank database were located onto the 29-strain tree by a modified StrainLocater script to optimize the running speed and memory requirement on the specific computing system due to the large number of input strains. The raw result is shown in supplementary table 5 , Supplementary Material online. The grouping results obtained by the traditional method for the 11,162 strains were generated by same method as the 351-strain tree. The grouping result for Shigella strains is based on their molecular serotyping result ( The et al. 2016), using blast+ v2.2.26 ( Camacho et al. 2009), by a Blast search for their wzx en wzy genes on a set of standard wzx en wzy genes of each serotype described before ( Liu et al. 2008). A comparison of grouping results using by the traditional method and the StrainLocater method is shown in supplementary table 3 , Supplementary Material online.

Vir die V. cholerae example shown in figure 5, the raw sequencing data ( supplementary table 1 , Supplementary Material online) was downloaded from the NCBI SRA database ( Kodama et al. 2012) and then assembled by SPAdes v3.10 with “–careful” setting ( Bankevich et al. 2012). Then a SaRTree pipeline was run with “-l” and default for other settings to load a standard StrainLocater Module. A published tree with mutation list from our previous study ( Hu et al. 2016) is used as the target tree. The figure was generated by FigTree v1.4.3 ( Drummond et al. 2012) in figure 5 which has manual decoration.


Estimating relatedness

Cladograms can be constructed with the aid of technologies that estimate molecular divergences in key sequences of DNA or protein amino acids. Similar to the progress seen in estimating the age of organic substances with the use of radioactive decay technologies and carbon dating, the advent of molecular biological technologies in the later half of the 20th century have increasingly allowed scientists to more accurately estimate the degree of evolutionary relatedness at the genetic level. Taking two homologous DNA sequences in different species, one can estimate evolutionary distance by measuring the number of nucleotide substitutions that have occurred over time. Alternatively, using protein products of DNA expression, one can measure the number of amino acid substitutions that have occurred between homologous protein sequences.


Gratis reaksie

How does a phylogenetic tree relate to the passing of time?

The phylogenetic tree shows the order in which evolutionary events took place and in what order certain characteristics and organisms evolved in relation to others. It does not relate to time.

Some organisms that appear very closely related on a phylogenetic tree may not actually be closely related. Hoekom is dit?

In most cases, organisms that appear closely related actually are however, there are cases where organisms evolved through convergence and appear closely related but are not.


METODES

This investigation was conducted during the second course of a two-course introductory biology series for science majors at a large, public university with very high research activity (Carnegie Foundation, 2013) in the midwestern United States. The large-enrollment course (n = 88) served students pursuing a number of majors (Table 2) at various stages in their academic careers (24% freshmen, 33% sophomores, 18% juniors, and 25% seniors). The first course in the introductory series focused on cell biology and included little or no exposure to phylogenetic trees. Although recommended, completion of the first course was not a prerequisite for the second course.

Table 2. Course enrollment by major group

Course Context

The instructor used a learner-centered approach to teaching biology, in which multiple forms of active engagement were used in place of passive lectures. Course activities included letter card questions (Freeman et al., 2007), collaborative learning groups (Smith, 2000 Tanner et al., 2003), small-group and whole-class discussions, think–pair–share sessions (Lyman, 1981), and case studies (Herreid, 1994). Model-based instruction (Hestenes, 1987 Hmelo et al., 2000 Brewe, 2008 Liu and Hmelo-Silver, 2009) was a prominent pedagogical strategy, as students frequently constructed box-and-arrow models of complex biological processes, such as evolution, nutrient cycles, and energy flow through ecosystems. Students worked in permanent, self-selected groups of three or four individuals on nearly all aspects of the course, including pyramid exams (Eaton, 2009) with individual and group components (75 and 25% of points, respectively). Learning objectives, instruction, and assessments largely targeted higher-order cognitive skills of analysis, synthesis, and evaluation (Bloom et al., 1956 Crowe et al., 2008 Momsen et al., 2010, 2013).

The introductory biology course included three primary units: evolution, form and function, and ecology (Figure 2). Although most prominent during the evolution unit, phylogenetic trees were used throughout the course when appropriate. For example, phylogenetic trees appeared in the form and function unit to help students visualize and reason about evolved traits required for plant survival on land.

Figure 2. Timeline of primary course units and data collection from assessments.

Instruction and Data Collection

Two homework assignments and two exams were the data sources for this study (Figure 2). The initial phylogenetic tree homework was completed in groups soon after phylogenetic trees were introduced as part of the evolution unit. The introduction consisted of a series of questions posed by the instructor and answered by students using letter cards. The questions familiarized students with structural characteristics of phylogenetic trees, such as nodes (represent common ancestors) and monophyletic groups, and presented the idea that taxa relatedness is determined by common ancestry. Letter card questions were followed by small-group and whole-class discussions until the entire class established the correct answer using appropriate reasoning. All phylogenetic tree questions used during class and for assessments referred to cladograms, in which only branching patterns have meaning. Chronograms (which show absolute time) and phylograms (which show amount of change) were briefly mentioned by the instructor, but students were never required to interact with or reason from them during the course (for further descriptions of phylogenetic tree types, see Baum and Offner, 2008Omland et al., 2008).

The initial phylogenetic tree homework featured a short series of open-ended questions designed around a phylogenetic tree of chordates. In addition to prompts about recent common ancestors, synapomorphies, and monophyletic groups, one question regarding taxa relatedness appeared on the group homework (Figure 3). Poor group performance for this question compelled the instructor to revisit phylogenetic tree interpretations during class. The question was presented to students again and debated through directed, small-group discussions. A subsequent whole-class discussion acknowledged most recent common ancestry as an appropriate reasoning strategy for determining taxa relatedness on phylogenetic trees. After the initial homework was revisited during class, taxa relatedness was specifically targeted through two additional letter card questions. Instruction specific to phylogenetic trees and evolutionary relatedness occurred across three consecutive course meetings, ending in week 5. We therefore include each student's average attendance across these 3 d in subsequent analysis as a reflection of the potential impact of instruction on student reasoning with phylogenetic trees.

Figure 3. Phylogenetic tree and taxa-relatedness question from the initial homework.

Phylogenetic trees and taxa-relatedness questions similar to the initial homework were placed on three subsequent assessments, which followed the end of instruction by 1, 10, and 12 wk, respectively (Figure 2). Such prompts were included on both the individual and group components of the evolution unit exam in which students completed the individual component before the group component (Supplemental Figures S1 and S2). A phylogenetic tree was provided for the individual component, but the group component required students to construct a phylogenetic tree from data before answering a taxa-relatedness question. Students were never asked to construct phylogenetic trees before completing the evolution unit exam. A phylogenetic tree and taxa-relatedness questions were also placed on the review homework 2 wk before the final exam (Figure S3) and on the individual component of the final exam (Figure S4). The prompt structure for the review homework and final exam was changed slightly from a two-choice prompt with open-ended reasoning to a four-choice prompt with open-ended reasoning. This alteration was made for several reasons. First, students had seen several taxa-relatedness questions throughout the semester to avoid retest concerns, we created prompts that were familiar to students but offered a somewhat new opportunity to interpret relatedness. Second, the multiple-choice foils prevented students from feeling obligated to select one taxon or the other, providing students with the option to identify taxa as equally related or unrelated. In both the review homework and final exam, the taxa involved were equally related. The phylogenetic tree on the final exam was also the only phylogenetic tree used as part of this investigation that did not include labeled synapomorphies.

Rubric Development and Coding

The initial rubric for coding student responses to taxa-relatedness questions was developed using a grounded theory approach (Glaser and Strauss, 1967). This reflected the nature of the project as developing in real time in response to classroom experiences and student learning difficulties.

Existing literature on phylogenetic tree interpretations (Table 1) was then used to confirm and refine some categories for the final rubric (Supplemental Material) and to identify two new reasoning strategies. Specifically, we found evidence that students determine relatedness by counting synapomorphies (taxa relatedness is determined by counting synapomorphies between the taxa on phylogenetic trees) and by using negation reasoning (reasoning includes descriptions of how not to interpret taxa relatedness on phylogenetic trees in all cases, this reasoning occurs concurrently with other reasoning see the Supplemental Material). In addition, we found evidence of students using monophyletic grouping (taxa in the same monophyletic group are more closely related to each other than to a taxon outside the monophyletic group) to reason about relatedness. While some research has identified monophyletic grouping as a possible reasoning approach, no one has provided evidence to show that students actually use monophyletic grouping.

For training the raters, all responses from the initial homework and both components of the evolution unit exam were numbered, and a random number generator was used to select 20 initial responses (15% of the total at the time). Two independent raters coded the initial responses and reached consensus through discussion. Following rubric calibration, agreement between the two raters was 94% for the remaining 258 responses from all four assessments, and disagreements were resolved through discussion. Student responses often included more than one form of reasoning and consequently fell into multiple rubric categories, resulting in 360 total reasoning codes assigned to 278 group and individual responses. Coding was partially blind, in which one rater was aware of group and individual identities while the second rater was not. Due to high agreement between independent raters, we do not believe rater bias was a significant issue for this investigation.

The taxa-relatedness questions used throughout the course required students to choose an answer and provide reasoning for their selection. Because answers selected by students were not always consistent with their reasoning, responses were coded again for answer (correct or incorrect) and reasoning used to support the answer (correct, incorrect, or mixed, i.e., a mix of correct and incorrect reasoning). The categories of most recent common ancestry and monophyletic grouping were considered correct reasoning, while negation reasoning always appeared with other forms of reasoning and was considered neither correct nor incorrect. All other rubric categories were deemed incorrect reasoning for taxa relatedness. This coding procedure identified students who guessed correct answers (correct answer with incorrect reasoning), and students who memorized correct reasoning without understanding its application (incorrect answer with correct reasoning). Only responses with both correct answers and correct reasoning demonstrated understanding of taxa relatedness on phylogenetic trees.

Statistical Analyses

Following the suggestion of Theobald and Freeman (2014), we constructed statistical models to test various hypotheses regarding student reasoning about phylogenetic trees. To assess hypotheses related to reasoning and answer selection, we constructed statistical models that accounted for variables affecting reasoning and answer selection. In addition, random effects were used to capture repeated measurements on the same groups and individuals on multiple assessments. Specifically, mixed-effect ordinal logistic-regression models were used to analyze taxa-relatedness reasoning, while mixed-effect logistic-regression models were used to analyze correct answers. For group reasoning, group assignment was modeled as a random effect, and assessment was a fixed effect. For individual reasoning, student was modeled as a random effect, while assessment, class attendance, year in school, and academic major were fixed effects. For group correctness, group assignment was modeled as a random effect, and assessment and reasoning (correct, incorrect, or mixed) were fixed effects. For individual correctness, student was modeled as a random effect, while reasoning, assessment, class attendance, year in school, and academic major were fixed effects. F-tests were used to determine significance of batches of explanatory variables (e.g., major), while t tests were used to determine significance of individual explanatory variables. Additional details of the statistical analyses (e.g., odds ratios) are available in the Supplemental Material.


Continental drift over geologic time helps explain species distributions

Over geologic time, not only have species diversity and composition changed, but also the location of the continents themselves have shifted. Continental drift is the very gradual movement, assembly, and rifting of the crustal plates and their associated continents. This process means that when a taxon arose millions of years ago, it probably lived in different location with respect to the equator and poles, and in a location that may have been connected physically with what are now separate continents. The short animation below shows the projected movement of continents, based on evidence from the magnetic rock record and other geological clues. As you view it, consider a specific group, such as reptiles, and when they arose and flourished globally.


'Tree of life' for 2.3 million species released

This circular family tree of Earth's lifeforms is considered a first draft of the 3.5-billion-year history of how life evolved and diverged. Credit: opentreeoflife.org

A first draft of the "tree of life" for the roughly 2.3 million named species of animals, plants, fungi and microbes—from platypuses to puffballs—has been released.

A collaborative effort among eleven institutions, the tree depicts the relationships among living things as they diverged from one another over time, tracing back to the beginning of life on Earth more than 3.5 billion years ago.

Tens of thousands of smaller trees have been published over the years for select branches of the tree of life—some containing upwards of 100,000 species—but this is the first time those results have been combined into a single tree that encompasses all of life. The end result is a digital resource that available free online for anyone to use or edit, much like a "Wikipedia" for evolutionary trees.

"This is the first real attempt to connect the dots and put it all together," said principal investigator Karen Cranston of Duke University. "Think of it as Version 1.0."

The current version of the tree—along with the underlying data and source code—is available to browse and download at https://tree.opentreeoflife.org.

It is also described in an article appearing Sept. 18 in the Verrigtinge van die Nasionale Akademie van Wetenskappe.

Evolutionary trees, branching diagrams that often look like a cross between a candelabra and a subway map, aren't just for figuring out whether aardvarks are more closely related to moles or manatees, or pinpointing a slime mold's closest cousins. Understanding how the millions of species on Earth are related to one another helps scientists discover new drugs, increase crop and livestock yields, and trace the origins and spread of infectious diseases such as HIV, Ebola and influenza.

Rather than build the tree of life from scratch, the researchers pieced it together by compiling thousands of smaller chunks that had already been published online and merging them together into a gigantic "supertree" that encompasses all named species.

The initial draft is based on nearly 500 smaller trees from previously published studies.

To map trees from different sources to the branches and twigs of a single supertree, one of the biggest challenges was simply accounting for the name changes, alternate names, common misspellings and abbreviations for each species. The eastern red bat, for example, is often listed under two scientific names, Lasiurus borealis en Nycteris borealis. Spiny anteaters once shared their scientific name with a group of moray eels.

"Although a massive undertaking in its own right, this draft tree of life represents only a first step," the researchers wrote.

For one, only a tiny fraction of published trees are digitally available.

A survey of more than 7,500 phylogenetic studies published between 2000 and 2012 in more than 100 journals found that only one out of six studies had deposited their data in a digital, downloadable format that the researchers could use.

The vast majority of evolutionary trees are published as PDFs and other image files that are impossible to enter into a database or merge with other trees.

"There's a pretty big gap between the sum of what scientists know about how living things are related, and what's actually available digitally," Cranston said.

As a result, the relationships depicted in some parts of the tree, such as the branches representing the pea and sunflower families, don't always agree with expert opinion.

Other parts of the tree, particularly insects and microbes, remain elusive.

That's because even the most popular online archive of raw genetic sequences—from which many evolutionary trees are built—contains DNA data for less than five percent of the tens of millions species estimated to exist on Earth.

"As important as showing what we do know about relationships, this first tree of life is also important in revealing what we don't know," said co-author Douglas Soltis of the University of Florida.

To help fill in the gaps, the team is also developing software that will enable researchers to log on and update and revise the tree as new data come in for the millions of species still being named or discovered.

"It's by no means finished," Cranston said. "It's critically important to share data for already-published and newly-published work if we want to improve the tree."

"Twenty five years ago people said this goal of huge trees was impossible," Soltis said. "The Open Tree of Life is an important starting point that other investigators can now refine and improve for decades to come."



Kommentaar:

  1. Leandre

    It is rather valuable answer

  2. Fenrimuro

    Between us saying we suggest you try, check google.com



Skryf 'n boodskap