Inligting

Wat is die beperkings op huidige nukleotiedvolgordebepalingstegnologieë?

Wat is die beperkings op huidige nukleotiedvolgordebepalingstegnologieë?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Deur die Illumina-platform te gebruik, is dit goedkoop en (relatief) maklik om groot hoeveelhede DNA of RNA te volgorde. Daar is verskeie ander platforms daar buite (Roche/454, SOLiD, PacBio, Ion Torrent) elk met hul eie duidelike voordele, maar Illumina blyk redelik gewild te wees vir baie toepassings, ten spyte van sy beperkings.

Ideaal gesproke wil ons 'n volgorde-tegnologie hê wat lang, foutvrye leeswerk met 'n hoë deurset lewer. Op hierdie stadium blyk dit egter dat ons 'n keuse moet maak: deurset of lengte (en kwaliteit). PacBio lyk belowend, maar die laaste wat ek gehoor het, kon hulle steeds nie hul eise lewer nie.

Wat is die molekulêre en biochemiese beperkings op ons huidige volgordebepalingstegnologieë? Hoekom moenie het ons reeds lang, foutvrye leeswerk met hoë deurset?


Dit wil voorkom asof jy jou eie vraag beantwoord het, die sein van 'n paar molekules wat deur 'n ensiem of 'n polimerase loop, is geneig om na 'n paar honderd basisse uit sinchronisasie te val. As 'n ensiem vir volgordebepaling strenger in tydstap was, kan dit byvoorbeeld help. Die masjiene lees spore in vier kanale met mooi stampe vir elke basis. Sien hierdie artikel vir 'n mooi voorbeeld. Jy kan sien dat as daar te veel van dieselfde basis agtereenvolgens is, dit moeilik word om te sê hoeveel basisse daar is. Met verloop van tyd sal al vier spore begin uitsmeer en jy kan nie vir Adam van Thelma sê of jy my bedoeling verstaan ​​nie.

Maar daar is ander knelpunte.

Die opeenvolgers plaas tans so 'n hoë volume data dat die ontleding van die betekenis van die uitsetdata nie vinnig genoeg ontleed kan word nie. Dit volg die neiging in biotegnologie oor die afgelope 12 jaar of so – meer volgordedata, mikro-skikkingdata, meer mutasiedata, meer genome as mense wat dit werklik kan gebruik om die biologie te verstaan. Daar is nou 'n bietjie van 'n analise-bottelnek.

So sommige van hierdie volgordehouers het groter leeslengtes, wat dit makliker kan maak om 'n reeks saam te stel. Hierdie volgorders kos gewoonlik meer. As jy byvoorbeeld 'n biblioteek het om 'n bietjie swam- of alggenoom te orden - jy sal die antwoord binne 'n dag of minder nou terugkry. In die vorm van 1 Tb lees miskien 50 tot 200 bp lank. Dit kan nogal lank neem om dit saam te stel in 'n nuwe genoomvolgorde, nog meer om die gene te vind, die geennetwerke te bou vanaf 'n sjabloon van paaie, ens. Stel jou net voor dat 'n duisend volgorders dag en nag uitpomp en jy kry die prentjie wat ek hier probeer skilder.

Oor koste. Ion Torrent en die nuwe oxford nanopore-volgorders is regtig goedkoop - $50k tot miskien $900 vir Oxford Nanopore se USB-volgorder. Die meeste ander stelsels kos honderde duisende dollars. Ion-torrent en Nanopore het meer weggooigoed - jy gooi 'n skyfie of selfs die hele sequencer weg - teen 'n koste van honderde dollars per monster.


V1) Wat is die molekulêre en biochemiese beperkings op ons huidige volgordebepalingstegnologieë?

A1) AFAIK:

Illumina sukkel om lang lesings te produseer (alhoewel miseq nou leeswerk kan genereer wat 300bp is en wat gepaar kan word, die sogenaamde gepaarde einde 2X300) want na 'n sekere aantal basisse wat gesintetiseer en op kamera opgeneem word (Illumina word opeenvolging deur sintese , basies voeg jy basisse by en meet fluoressensie by elke siklus), maw na 'n sekere aantal "siklusse" kan jy syncro verloor, en die kwaliteit van basisse neem af.

PacBio kan baie lang molekules genereer, maar hulle het steeds groot probleme met die betroubaarheid van die lees van die basisse (ek weet nie wat die probleem hier is nie)

V2) Hoekom het ons nie reeds lang, foutvrye leeswerk met hoë deurset nie?

A2) Omdat dit moeilik is om te doen! Maar ons beweeg hierheen!


Kort inleiding oor drie generasies genoomvolgordetegnologie

Dit is meer as 30 jaar sedert die eerste generasie DNA-volgordebepalingstegnologie in 1977 ontwikkel is. Gedurende hierdie tydperk het volgordebepalingtegnologie aansienlike vordering gemaak. Van die eerste generasie tot die derde generasie en selfs die vierde generasie, het opeenvolgingtegnologie die leeslengte van lank na kort en kort na lank ervaar. Alhoewel die tweede generasie—kortlees-volgordetegnologie steeds die huidige wêreldwye volgordebepalingsmark oorheers, ontwikkel die derde en vierde generasie volgordebepalingtegnologie vinnig in die loop van die tweejaartydperk. Elke transformasie van volgordebepalingstegnologie lei tot 'n groot rol in die bevordering van genoomnavorsing, mediese navorsing oor siektes, geneesmiddelontwikkeling, teling en ander velde. Hierdie blog fokus hoofsaaklik op die huidige genoomvolgordebepaling tegnologieë en hul volgordebepalingsbeginsels.

Die ontwikkeling van volgordebepalingtegnologie
In 1952 het Hershey en Chase die bekende T2-faag-infeksie van bakterieë-eksperiment voltooi, wat effektief bewys het dat DNS 'n genetiese materiaal is. In 1953 het Crick en Watson hul DNS-model in die Britse tydskrif–Nature gewys. Na 'n deeglike studie aan die Universiteit van Cambridge het hulle DNS-model met "dubbele heliks" beskryf. In 1958 het Francis Crick die genetiese sentrale dogma voorgestel, wat in 1970 in Nature herhaal is. Genetiese kode, ook bekend as kodons, genetiese kodons of drievoudige kodes, bepaal die nukleotiedvolgorde van die aminosuurvolgorde in die proteïen, wat bestaan ​​uit drie opeenvolgende nukleotiede. In 1966 het Hola aangekondig dat die genetiese kode ontsyfer is. In 1974, Szibalski, Poolse genetikus, voorgestel genetiese rekombinasie tegnologie was sintetiese biologie konsep. DNA-rekombinante tegnologie, ook bekend as genetiese ingenieurswese, het ten doel om DNA-molekules in vitro te herkombineer, wat in die toepaslike selle prolifereer. In 1983 is PCR (polimerase kettingreaksie) ontwikkel deur Dr. Kary B.Mullis. Dit is 'n molekulêre biologie tegniek en word gebruik om spesifieke DNS-fragmente te amplifiseer, wat as die spesiale DNS-replikasie in vitro beskou kan word.

In 1977 het A.M. Maxam en W. Gilbert het eerstens 'n DNS-fragmentvolgordebepalingsmetode daargestel, wat ook Maxam-Gilbert chemiese afbraakmetode genoem word. Tans is hierdie chemiese afbraakmetode en ensiematiese metode (dideoksie-kettingterminasiemetode) wat deur Sanger voorgestel word, vinnige volgordebepalingstegnieke. In 1986 is die eerste outomatiese sequencer—abi prisma 310 geenanaliseerder ontwikkel deur 'n Amerikaanse maatskappy—Pe Abi. En dan het Hood en Smith gebruik gemaak van fluoresserend gemerkte dNTP vir elektroforese-tegnologie. Daarom is die eerste kommersiële outomatiese sequencer gebore. Daarna is die kapillêre elektroforese-volgorder in 1996 ontwikkel en 3700-tipe outomatiese opeenvolger is in 1998 ontwikkel.

In 2008 het die Quake-groep HeliScope sequencer ontwerp en ontwikkel, wat ook 'n lusskyfie-volgorde-toerusting is. In dieselfde jaar is nanopore-volgordebepaling ontwikkel gebaseer op die elektroforese-tegnologie. In die volgende jaar is SMRT ontwikkel. In 2010 is ioon-PGM en GeXP in gebruik geneem.

In 2005 het Roche maatskappy 454 tegnologie–genome sequencer 20 stelsel ontwerp—n ultra hoë deurset genoom volgorde stelsel, wat geprys is as 'n mylpaal in die ontwikkeling van volgorde tegnologie deur Nature. In 2006 is illumina sequencer ontwikkel en dit is geskik vir DNS-biblioteke wat deur verskeie metodes voorberei is. In 2007 is Solid System ontwikkel.

Eerste generasie opeenvolging tegnologie
Die eerste generasie van volgordebepaling tegnologie is gebaseer op die ketting beëindiging metode wat ontwikkel is deur Sanger en Coulson in 1975 of die chemiese metode (ketting degradasie) uitgevind deur Maxam en Gulbert gedurende 1976 en 1977. En Sanger in 1977 beoordeel die eerste genoom volgorde wat aan Phage behoort. X174 met die hele lengte van 5375 basisse. Sedertdien het mense die vermoë verwerf om die aard van die genetiese verskil van lewe te snuffel, en dit is ook 'n begin van die genomiese era. Navorsers gaan voort om die Sanger-metode tydens prestasie te verbeter. In 2001 was dit gebaseer op die verbeterde Sanger-metode dat die eerste menslike genoomkaart voltooi is. Die kernbeginsel van Sanger-metode is dat ddNTP nie fosfodiesterbinding kan vorm tydens die sintese van DNA nie, as gevolg van die gebrek aan hidroksiel in sy 2 ‘en 3’. Dit kan dus gebruik word om die DNA-sintese-reaksie te onderbreek. Voeg 'n sekere proporsie ddNTP met radioaktiewe isotoop-etiket, insluitend ddATP, ddCTP, ddGTP en ddTTP, by onderskeidelik vier DNA-sintesereaksiestelsels. Na gelelektroforese en outoradiografie kan die DNA-volgordes van die monsters bepaal word volgens die posisie van die elektroforetiese band.

Benewens die Sanger-metode, is dit opmerklik dat daar gedurende die tydperk van volgordebepalingtegnologie-ontwikkeling baie ander volgordebepalingtegnologieë opduik, soos pirofosfaatvolgordebepalingsmetode, ligasie-ensiemmetode ensovoorts. Onder hierdie is pirofosfaatvolgordebepalingsmetode later deur Roche-maatskappy gebruik vir 454-tegniek, terwyl die ligasie-ensiemmetode vir SOLID-tegniek deur ABI-maatskappy gebruik is. Die algemene kernmetode wat deur albei van hulle gedeel is, was om dNTP te gebruik wat DNA-sintese kan onderbreek, soortgelyk aan ddNTP in Sanger-metode.

Al met al het die eerste generasie opeenvolgingtegnologie die leeslengtevermoë van 1000bp met die 99,999% akkuraatheid, wat die hoofkenmerk is. Die hoë koste, lae deurset en ander nadele lei egter tot 'n ernstige impak op die werklike grootskaalse toepassing daarvan. Daarom is die eerste generasie volgordebepalingtegnologie nie die mees ideale volgordebepalingmetode nie. Ondergaan ontwikkeling en verbetering, die tweede generasie van volgorde tegnologie is gebore gesimboliseer deur Roche’s 454 tegnologie, Illumina’s Solexa, Hiseq tegnologie, en ABI’s Solid tegnologie. Die tweede generasie volgordebepalingtegnologie kan nie net volgordebepalingskoste aansienlik verminder nie, maar ook die spoed van volgordebepaling dramaties verhoog, wat hoë akkuraatheid handhaaf. Die omkeertyd van die tweede generasie volgordebepalingtegnologie om 'n menslike genoomprojek te voltooi, kan net een week wees, terwyl die gebruik van die eerste generasie volgordebepalingtegnologie om dieselfde doelwit te bereik drie jaar is. Die leeslengte van die tweede generasie volgordetegnologie is egter baie korter as dié van die eerste generasie.

In die volgende blog hoofstuk, sal ons voortgaan om die tweede generasie van bekend te stel volgorde tegnologie.


Abstrak

Die veld van enkelselgenomika vorder vinnig en genereer baie nuwe insigte in komplekse biologiese stelsels, wat wissel van die diversiteit van mikrobiese ekosisteme tot die genomika van menslike kanker. In hierdie oorsig gee ons 'n oorsig van die huidige stand van die veld van enkelsel genoomvolgordebepaling. Eerstens fokus ons op die tegniese uitdagings om metings te maak wat van 'n enkele DNA-molekule begin, en ondersoek dan hoe sommige van hierdie onlangse metodologiese vooruitgang die ontdekking van onverwagte nuwe biologie moontlik gemaak het. Gebiede wat uitgelig is, sluit in die toepassing van enkelselgenomika om mikrobiese donker materie te ondervra en om die patogeniese rolle van genetiese mosaïsisme in meersellige organismes te evalueer, met die fokus op kanker. Ons probeer dan om vooruitgang te voorspel wat ons verwag om in die volgende paar jaar te sien.


Volgende generasie volgordebepaling en die toepassings daarvan

Anuj Kumar Gupta, UD Gupta, in Animal Biotechnology (Tweede uitgawe), 2020

Ioon halfgeleier volgordebepaling

Ion Torrent: Hierdie tegnologie werk op die beginsel van opsporing van waterstofioonvrystelling tydens inkorporering van nuwe nukleotied in die groeiende DNA-sjabloon. In die natuur wanneer 'n nukleotied deur 'n polimerase in 'n DNA-string geïnkorporeer word, word 'n waterstofioon as 'n neweproduk vrygestel. Ion Torrent, met sy Ion Personal Genome Machine (PGM™)-volgorder, gebruik 'n hoëdigtheid-skikking van mikrogemasjineerde putte om nukleotied-inkorporasie op 'n massiewe parallelle wyse uit te voer. Elke put bevat 'n ander DNA-sjabloon. Onder die putte is 'n ioon-sensitiewe laag gevolg deur 'n eie ioonsensor. Die ioon verander die pH van die oplossing, wat deur 'n ioonsensor opgespoor word. As daar twee identiese basisse op die DNA-string is, word die uitsetspanning verdubbel, en die skyfie teken twee identiese basisse aan wat sonder skandering, kamera en lig genoem word. In plaas daarvan om lig op te spoor soos in 454-pyrosequencing, skep Ion Torrent-tegnologie 'n direkte verband tussen die chemiese en die digitale gebeure. Waterstofione word op ioon-halfgeleier-volgordeskyfies opgespoor. Hierdie ioon-halfgeleierskyfies is ontwerp en vervaardig soos enige ander halfgeleierskyfies wat in elektroniese toestelle gebruik word. Dit word in die vorm van wafers van 'n silikonboule gesny. Die transistors en stroombane word dan patroonoorgedra en daarna geëts op die wafers met behulp van fotolitografie. Hierdie proses word 20 keer of meer herhaal, wat 'n meerlaagstelsel van stroombane skep.

Ion het vorendag gekom met 'n verskeidenheid volgorders met klein en groot data-uitvoer, wat volgens toepassings en gebruik gebruik moet word. Ion torrent PGM™ genereer 'n totale data-uitset van 30 MB tot 2 GB, afhangend van die tipe ioon-halfgeleier-volgordeskyfie wat gebruik word. In September 2012 het Ion Torrent egter hul groter stelsel, die Ion Proton, bekendgestel. Dit gebruik groter skyfies met hoër digthede en kan dus geskik wees vir transkriptoom-, eksoom- en groter geenpanele. Alhoewel Ion Proton in staat is om baie groter uitsette te genereer, ongeveer 10 GB, is dit aansienlik duurder. Hul nuwer hoër deurset-volgorders genaamd Ion S5 en S5XL kan data-uitset van 2–130 miljoen lesings genereer, afhangende van die tipe skyfie wat gebruik word, ongeveer 4 uur hardloop, afhangende van die skyfie wat gebruik word. Die leeslengte wat verkry is, is 200 en 400 bp vir IonTorrent en IonProton, terwyl S5 en S5XL ook die fasiliteit het om 600 bp-lesings te volg.

Voordele: Ioonstorting genereer leeslengte van ongeveer 200–600 bp, wat gebruik word om gapings in die samestelling te vul wat deur ander tegnologieë vervaardig word. As gevolg van lae koste betrokke het Ion-platforms erkenning in die kliniese sektor gekry. Die kort termyn van hierdie tegniek fasiliteer ook veelvuldige lopies vir die generering van meer data in 'n gegewe tyd.

Beperkings: Ioonplatforms lê tussen groot datategnologieë en langleeslengte-tegnologieë. Terwyl kortleestegnologieë vergemaklik word deur groot data wat gegenereer word, moet Ion die totale data-uitset verbeter. Die gerapporteerde hoër foutkoers en voortydige volgordeafkapping kan dit moeilik maak om 'n primêre keuse te wees waar baie hoë kwaliteit data vereis word (PubMed Central ID: PMC4249215).


Hierdie proses behels 'n mengsel van tegnieke: bakteriële kloning of PCR-sjabloon-suiwerings-etikettering van DNA-fragmente deur die kettingterminasiemetode met energie-oordrag, kleurstof-gemerkte dideoksinukleotiede en 'n DNA-polimerase kapillêre elektroforese en fluoressensie-opsporing wat vier-kleur plotte verskaf om die DNA te openbaar volgorde.

'n Kwaliteitsmaatstaf vir 'n opeenvolgende genoom. 'n Afgewerkte graad genoom, wat algemeen na verwys word as 'n voltooide genoom, is van hoër gehalte as 'n konsep-graad genoom, met meer basis dekking en minder foute en gapings (byvoorbeeld, die menslike genoom verwysing bevat 2,85 Gb, dek 99% van die genoom met 341 gapings, en het 'n foutkoers van 1 uit elke 100 000 bp).

Hierdie rekombinante DNS-molekule bestaan ​​uit 'n bekende streek, gewoonlik 'n vektor- of adaptervolgorde waaraan 'n universele primer kan bind, en die teikenvolgorde, wat tipies 'n onbekende gedeelte is wat georden moet word.

Toetse wat volgende generasie volgordebepalingtegnologieë gebruik. Dit sluit metodes in vir die bepaling van die volgorde-inhoud en oorvloed van mRNA's, nie-koderende RNA's en klein RNA's (gesamentlik genoem RNA-seq) en metodes vir die meet van genoomwye profiele van immuungepresipiteerde DNA-proteïenkomplekse (ChIP-seq), metileringsplekke ( metiel-seq) en DNase I hipersensitiwiteitsplekke (DNase-seq).

Hierdie resensie beskryf meestal tegnologieplatforms wat met 'n onderskeie maatskappy geassosieer word, maar die Polonator G.007-instrument, wat deur Danaher Motions ('n Dover-maatskappy) vervaardig en versprei word, is 'n oopbronplatform met vrylik beskikbare sagteware en protokolle. Gebruikers vervaardig hul eie reagense gebaseer op gepubliseerde verslae of deur saam te werk met George Church en kollegas of ander tegnologie-ontwikkelaars.

'n Fragmentbiblioteek word voorberei deur genomiese DNA ewekansig te skeer in klein groottes van <1kb, en vereis minder DNA as wat nodig sou wees vir 'n maat-paar biblioteek.

'n Genomiese biblioteek word voorberei deur geskeerde DNA wat geselekteer is vir 'n gegewe grootte, soos 2 kb, te sirkuleer, wat dus die punte wat voorheen ver van mekaar was, in die nabyheid bring. Deur hierdie sirkels in lineêre DNA-fragmente te sny, skep maat-paar sjablone.

Dit gebeur met stapsgewyse byvoegingsmetodes wanneer groeiende primers uit sinchronisiteit beweeg vir enige gegewe siklus. Sloerende stringe (bv. n − 1 uit die verwagte siklus) as gevolg van onvolledige verlenging, en voorste stringe (byvoorbeeld, n + 1) die gevolg is van die byvoeging van veelvuldige nukleotiede of probes in 'n populasie van identiese sjablone.

Donker nukleotiede of probes

'n Nukleotied of sonde wat nie 'n fluoresserende etiket bevat nie. Dit kan gegenereer word uit sy splitsing en oordrag van die vorige siklus of gehidroliseer word in situ van sy kleurstof-gemerkte eweknie in die huidige siklus.

Totale interne refleksie fluoressensie

'n Totale interne weerkaatsing fluoressensie beelding toestel produseer 'n verdwynende golf, dit wil sê, 'n naby-veld stilstaande opwekkingsgolf - met 'n intensiteit wat eksponensieel afneem weg van die oppervlak. Hierdie golf versprei oor 'n grensoppervlak, soos 'n glasskyfie, wat lei tot die opwekking van fluoresserende molekules naby (<200 nm) of by die oppervlak en die daaropvolgende versameling van hul emissieseine deur 'n detektor.

Biblioteke van mutante DNA-polimerases

Groot getalle geneties gemanipuleerde DNA-polimerases kan geskep word deur óf plekgerigte óf ewekansige mutagenese, wat lei tot een of meer aminosuursubstitusies, invoegings en/of delesies in die polimerase. Die doel van hierdie benadering is om gemodifiseerde nukleotiede meer doeltreffend te inkorporeer tydens die volgordebepalingsreaksie.

Dit is slegs nuttig vir enkelmolekule-tegnieke en word vervaardig deur dieselfde sjabloonmolekule meer as een keer in volgorde te plaas. Die data word dan in lyn gebring om 'n 'konsensuslees' te produseer, wat stogastiese foute wat in 'n gegewe volgorde gelees kan voorkom, verminder.

'n Oligonukleotiedvolgorde waarin een ondervragingsbasis met 'n spesifieke kleurstof geassosieer word (byvoorbeeld, A in die eerste posisie stem ooreen met 'n groen kleurstof). 'n Voorbeeld van 'n een-basis gedegenereerde sondestel is '1-probes', wat aandui dat die eerste nukleotied die ondervragingsbasis is. Die oorblywende basisse bestaan ​​uit óf gedegenereerde (vier moontlike basisse) óf universele basisse.

'n Oligonukleotiedvolgorde waarin twee ondervragingsbasisse met 'n spesifieke kleurstof geassosieer word (byvoorbeeld AA, CC, GG en TT word met 'n blou kleurstof gekodeer). '1,2-probes' dui aan dat die eerste en tweede nukleotiede die ondervragingsbasisse is. Die oorblywende basisse bestaan ​​uit óf gedegenereerde óf universele basisse.

'n Nukleotiedvervanging sal twee kleuroproepe hê, een vanaf die 5'-posisie en een vanaf die 3'-posisie van die dinukleotiedvolgorde. In vergelyking met 'n verwysingsgenoom, word basisvervanging in die teikenvolgorde deur twee spesifieke, aangrensende kleure gekodeer. In Figuur 3b is die volgorde 'CCT' geënkodeer as blou-geel ('CC' = blou 'CT' = geel), maar die vervanging van die middel 'C' vir 'A' sal lei tot twee kleurveranderings na groen-rooi. Enige ander kleurvolgorde kan as 'n fout weggegooi word.

Met twee-basis-gekodeerde probes word die fluoresserende sein of kleur wat tydens beelding verkry word, geassosieer met vier dinukleotiedvolgordes met 'n 5'- en 3'-basis. Kleurspasie is die volgorde van oorvleuelende dinukleotiede wat vier gelyktydige nukleotiedreekse kodeer. Belyning met 'n verwysingsgenoom is die mees akkurate metode om kleurruimte in 'n enkele nukleotiedvolgorde te vertaal.

Nul-modus golfleier detektors

Hierdie nanostruktuurtoestel is 100 nm in deursnee, wat kleiner is as die 532 nm en 643 nm lasergolflengtes wat in die Pacific Biosciences-platform gebruik word. Lig kan nie deur hierdie klein golfleiers voortplant nie, vandaar die term nul-modus. Hierdie aluminium-beklede golfleiers is ontwerp om 'n verdwynende golf te produseer (sien die 'totale interne refleksie fluoressensie' woordelysterm) wat die waarnemingsvolume by die oppervlak van die polimerase reaksie aansienlik verminder tot by die zeptoliter-reeks (10 -21 l). Dit bied 'n voordeel vir die polimerisasiereaksie, wat by hoër kleurstof-gemerkte nukleotiedkonsentrasies uitgevoer kan word.

Fluoresensie-resonansie-energie-oordrag

Dit is oor die algemeen 'n stelsel wat uit twee fluoresserende kleurstowwe bestaan, een is 'n skenkerkleurstof ('n blouer fluorofoor) en die ander 'n aanvaarderkleurstof ('n rooier fluorofoor). Wanneer die twee kleurstofmolekules naby mekaar gebring word (gewoonlik ≤30 nm), word die energie van die opgewekte skenkerkleurstof na die aanvaarderkleurstof oorgedra, wat die emissie-intensiteitsein verhoog.

Alle volgorde variante anders as enkel-nukleotied variante, insluitend blok substitusies, invoegings of delesies, inversies, segmentele duplisering en kopie-nommer verskille.

'n Projek wat daarop gemik is om seldsame volgordevariante te ontdek met geringe alleelfrekwensies van 1% in normale genome afkomstig van HapMap-monsters.

'n Projek wat daarop gemik is om koste-effektiewe, hoë-deurset-tegnologieë te ontwikkel en te bekragtig om al die proteïenkoderende streke van die menslike genoom te herrangskik.

Die studie van gemeenskappe van gemengde mikrobiese genome wat in diere, plante en omgewingsnisse woon. Monsters word versamel en ontleed sonder dat dit nodig is om geïsoleerde mikrobes in die laboratorium te kweek. Die Menslike Mikrobioomprojek het ten doel om 'n verwysingstel van mikrobiese genome van verskillende habitatte binne die menslike liggaam te karakteriseer, insluitend nasale, mondelinge, vel, gastroïntestinale en urogenitale streke, en om te bepaal hoe veranderinge in die menslike mikrobioom gesondheid en siekte beïnvloed.

’n Projek wat daarop gemik is om enkelnukleotiedvariante en strukturele variante te ontdek wat met groot kankers geassosieer word, soos breinkanker (glioblastoma multiforme), longkanker (plaveiselkarsinoom) en eierstokkanker (sereuse sistedenokarsinoom).

'n Projek wat daarop gemik is om oop toegang tot menslike genoomvolgordes van vrywilligers te verskaf en om instrumente te ontwikkel om hierdie inligting te interpreteer en dit met verwante persoonlike mediese inligting te korreleer.


Vergelyking van die twee bygewerkte opeenvolgingstegnologieë vir genoomsamestelling: HiFi lees van Pacbio Sequel II-stelsel en ultralang lees van Oxford Nanopore

Die beskikbaarheid van verwysingsgenome het 'n omwenteling in die studie van biologie gemaak. Verskeie mededingende tegnologieë is ontwikkel om die kwaliteit en robuustheid van genoomsamestellings gedurende die afgelope dekade te verbeter. Die twee wydgebruikte verskaffers van langleesvolgordebepaling – Pacbio (PB) en Oxford Nanopore Technologies (ONT) – het onlangs hul platforms opgedateer: PB maak hoë deurset HiFi-lees moontlik met basisvlak-resolusie met >99% en ONT-gegenereerde leeswerk so lank as 2 Mb. Ons het die twee bygewerkte platforms op een enkele rysindividu toegepas, en dan die twee samestellings vergelyk om die voordele en beperkings van elk te ondersoek. Die resultate het getoon dat ONT-ultralang-lesings hoër aaneenlopendheid gelewer het, wat 'n totaal van 18 aaneensettings produseer waarvan 10 in 'n enkele chromosoom saamgestel is in vergelyking met dié van 394 aaneenlopende en drie chromosoomvlak-aaneensettings vir die PB-samestelling. Die ONT-ultralang-lesings het ook samestellingsfoute voorkom wat veroorsaak is deur lang herhalende streke waarvoor ons 'n totaal van 44 gene van valse oortollighede en 10 gene van vals verliese in die PB-samestelling waargeneem het wat gelei het tot oor-/onderskatting van die geenfamilies in daardie lang herhalende streke . Ons het ook opgemerk dat die PB HiFi gegenereerde samestellings lees met aansienlik minder foute op die vlak van enkelnukleotied en klein InDels as dié van die ONT-samestelling wat gemiddeld 1,06 foute per Kb-samestelling gegenereer het en uiteindelik 1,475 verkeerde geenaantekeninge veroorsaak het via veranderde of afgeknotte proteïen voorspellings.


OPSPORING VAN GENOOMVERANDERINGS VIA -OMICS TEGNOLOGIEë

In die afgelope 15 jaar is verskeie gevorderde tegnologieë ontwikkel wat akkumulasie en assessering van grootskaalse datastelle van biologiese molekules moontlik maak, insluitend DNS-volgorde (die genoom), transkripsies (die transkripsie wat RNS behels), DNS-modifikasie (die epigenoom), en , in mindere mate, proteïene en hul modifikasies (die proteoom) en metaboliete (die metaboloom). Sulke datastelle maak vergelykende ontledings van nie-GE- en GE-lyne moontlik op so 'n wyse dat effekte op plantgeenuitdrukking, metabolisme en samestelling op 'n meer ingeligte wyse beoordeel kan word. Toegang tot die tegnologieë laat ook ontleding toe van die omvang van die natuurlike variasie in 'n gewasspesie op die DNA-, RNA-, proteïen-, metaboliet- en epigenetiese vlakke, wat dit moontlik maak om te bepaal of variasie in GE-gewasse binne die omvang is wat natuurlik en onder kultivars voorkom. Soos hieronder bespreek vir elk van die -omics-datatipes, was tegnologieë om toegang tot die molekules te verkry relatief onlangs vanaf 2015, maar het vinnig gevorder. Sommige tegnologieë was gereed om ontplooi te word om datastelle te genereer vir assessering van die uitwerking van genetiese ingenieurswese gebeure toe die komitee se verslag geskryf is. Ander sal in die komende dekade in akkuraatheid en deurset verbeter en kan eendag nuttige tegnologieë wees om die effekte van genetiese ingenieurswese-gebeure te bepaal. Die Presisiegeneeskunde-inisiatief wat in Januarie 2015 deur president Obama 6 aangekondig is, fokus daarop om te verstaan ​​hoe genetiese verskille tussen individue en mutasies teenwoordig in kanker en siek selle (teenoor gesonde selle) menslike gesondheid beïnvloed. 'n Analoog projek wat diverse -omika-benaderings in gewasplante met genetiese ingenieurswese en konvensionele teling gebruik, kan in-diepte verbeterings in die begrip van plantbiologiese prosesse verskaf wat op sy beurt toegepas kan word om die effekte van genetiese modifikasies in gewasplante te bepaal.

Genomika

Een manier om vas te stel of genetiese ingenieurswese tot buite-teiken-effekte gelei het (hetsy deur kerntransformasie met Agrobacterium of geengewere, RNAi, of sulke opkomende tegnologieë soos genoomredigering) is om die genoom van die GE-plant te vergelyk met 'n voorbeeld—of verwysing—genoom van die ouer nie-GE-plant. Die verwysingsgenoom is soos 'n bloudruk vir die spesie, wat alleliese diversiteit openbaar en die gene identifiseer wat met fenotipe geassosieer word. Met die kennis van die variasie wat natuurlik in 'n spesie voorkom, kan 'n mens die gemanipuleerde genoom met die verwysingsgenoom vergelyk om te onthul of genetiese manipulasie enige veranderinge veroorsaak het—verwagte of onbedoelde— en om konteks te kry om te bepaal of veranderinge nadelige gevolge kan hê. Omdat daar inherente DNA-volgordevariasie tussen plante binne 'n spesie, en selfs tussen kultivars, is, sal enige geneties gemanipuleerde veranderinge vergelyk moet word met die nie-GE ouer en die omvang van natuurlike genomiese variasie. Dit wil sê, veranderinge wat deur genetiese ingenieurswese gemaak word, moet in 'n gepaste konteks geplaas word.

Agtergrond

In Julie 1995, die eerste genoomvolgorde van 'n lewende organisme, die bakterie Haemophilus griep (1 830 137 basispare), is gerapporteer (Fleischmann et al., 1995). Hierdie paradigma-veranderende tegnologiese prestasie was moontlik as gevolg van die ontwikkeling van geoutomatiseerde DNS-volgordebepalingsmetodes, verbeterde rekenaarverwerkingskrag en die ontwikkeling van algoritmes vir die rekonstruering van 'n volle genoom op grond van gefragmenteerde, ewekansige DNS-volgordes. In Oktober 1995 het die genoom van die bakterie Mycoplasma genitalium is vrygestel (Fraser et al., 1995) hierdie gestolde heel-genoom haelgeweer volgorde en samestelling as die metode vir die verkryging van genoom volgordes. In die volgende twee dekades het hoër deurset en goedkoper metodes vir genoomvolgordebepaling en samestelling na vore gekom (vir oorsig, sien McPherson, 2014) en die volgordebepaling van die genome van honderde spesies, sowel as duisende individue, in alle koninkryke van lewe. Byvoorbeeld, sedert die vrystelling van die konsepvolgorde van die menslike verwysingsgenoom in 2001 (Lander et al., 2001 Venter et al., 2001), is duisende individuele menslike genome in volgorde geplaas, insluitend sulke vergelykende genoomvolgorde-projekte soos: 'n diep katalogus van menslike variasie van duisende individue, 7 normale versus tumorselle van 'n enkele individu, families met oorgeërfde genetiese afwykings, en siek teenoor gesonde bevolkings. Daardie projekte het gefokus op die opsporing van die alleliese diversiteit in 'n spesie en die assosiasie van gene met fenotipes, soos die geneigdheid vir spesifieke siektes.

Beperkings in huidige De Novo-genoom-volgordebepaling en samestellingsmetodes vir plante

Huidige metodes om 'n genoom te volgorde en 'n genoom de novo saam te stel behels ewekansige fragmentering van DNA, generering van volgorde lees, en rekonstruksie van die oorspronklike genoom volgorde deur die gebruik van samestelling algoritmes. Alhoewel die metodes robuust is en steeds verbeter, is dit belangrik om daarop te let dat dit nie daarin slaag om die volle genoomvolgorde van komplekse eukariote te lewer nie. Inderdaad, selfs die menslike genoomvolgorde— waarvoor miljarde dollars bestee is om 'n hoë-gehalte verwysingsgenoomvolgorde te verkry wat 'n magdom nuttige inligting verskaf het om menslike biologie te verstaan, insluitend kanker en ander siektes, is nog steeds onvolledig. Vir plante is die maatstaf vir 'n hoëgehalte genoomsamestelling dié van die modelspesie Arabidopsis thaliana, wat 'n uiters klein genoom het wat in 2000 gepubliseer is (Arabidopsis Genome Initiative, 2000). Meer as 15 jaar na die vrystelling van die A. thaliana verwysingsgenoomvolgorde en met die beskikbaarheid van volgordes van meer as 800 bykomende toetredings, was 8 'n geskatte 30� miljoen nukleotiede se volgorde steeds ontbreek in die A. thaliana Kol-0 verwysing genoom samestelling (Bennett et al., 2003). Die meeste van die ontbrekende rye is hoogs herhalend (soos ribosomale RNA-gene en sentromere herhalings), maar sommige geenbevattende streke is afwesig as gevolg van tegniese uitdagings. Met verhoogde genoomgrootte en herhalende-volgorde-kompleksiteit, word volledige voorstelling van die genoomvolgorde meer uitdagend. Inderdaad, die genoomsamestellings van die meeste groot gewasspesies (mielies, koring, gars en aartappels) is almal van slegs konsepgehalte en het aansienlike gapings (Schnable et al., 2009 Potato Genome Sequencing Consortium, 2011 International Barley Genome Sequencing, 2012 Li. et al., 2014a) nie een verskaf 'n volledige, volledige voorstelling van die genoom nie.

In verskeie groot gewasse, toe die komitee sy verslag geskryf het, was projekte gelykstaande aan die menslike 10 000-genome-projek aan die gang om die algehele diversiteit van die spesie te bepaal deur die “pan-genome” te dokumenteer (Weigel en Mott, 2009) . It has been surprising in several of these studies that there is substantial genomic diversity in some plant species not only in allelic composition but also in gene content (Lai et al., 2010 Hirsch et al., 2014 Li et al., 2014b). Thus, a single “reference” genome sequence derived from a single individual of a species will fail to represent the genetic composition and diversity of the overall population adequately and will therefore limit interpretations of directed changes in the genome (such as ones that can be delivered by emerging genome-editing methods that are being used to generate GE crops).

Resequencing: Assessing Differences Between the Reference and Query Genome

Once the DNA sequence of a crop's genome is assembled well enough to serve as a reference genome, resequencing becomes a powerful and cost-effective method for detecting genomic differences among related accessions (individuals) or GE lines. Resequencing entails generating random-sequence reads of the query genome (the genome that is being compared with the reference genome), aligning those sequence reads with a reference genome, and using algorithms to determine differences between the query and the reference. The strengths of this approach are that it is inexpensive and permits many query genomes to be compared with the reference genome and thereby provides substantial data about similarities and differences between individuals in a species (Figure 7-5). However, limitations of the approach can affect determination of whether two genomes are different. First, sequence read quality will affect data interpretation in that read errors can be misinterpreted as sequence polymorphisms. Second, the coverage of sequence reads generated can limit interrogation of the whole genome because the sampling is random and some regions of the genome are underrepresented in the read pool. Third, library construction 9 and sequencing bias will affect which sequences are present in the resequencing dataset and consequently available for alignment with the reference genome. Fourth, read-alignment algorithms fail to detect all polymorphisms if the query diverges too widely from the reference, especially with insertions and deletions or with SNPs near them. Fifth, read alignments and polymorphism detection are limited to nonrepetitive regions of the genome, so regions that are repetitive in the genome cannot be assessed for divergence. Although obstacles remain, resequencing is a powerful method for measuring differences in genome sequences between wild-type plants (normal untransformed individuals) and engineered plants. With expected improvements in technology, the resolution of resequencing to reveal differences between two genomes will improve.

FIGURE 7-5

Detection of genome, epigenome, transcriptome, proteome, and metabolome alterations in genome-edited, genetically engineered plants. SOURCE: Illustration by C. R. Buell. NOTE: To perform various -omics assessments of genome-edited plants, both the wild-type (more. )

Computational Approaches

Alternatives to resequencing approaches to identify polymorphisms in DNA sequence between two genomes were emerging when the committee was writing its report. The foundation of computational approaches to identify polymorphisms is algorithms that perform k-mer counting (a k-mer is a unique nucleotide sequence of a given length) in which unique k-mers are identified in two read pools (for example, wild type and mutant) and k-mers that differ between the two samples are then computationally identified. Those k-mers are then further analyzed to identify the nature of the polymorphism (SNP versus insertion or deletion) and to associate the polymorphism with a gene and potential phenotype (Nordstrom et al., 2013 Moncunill et al., 2014). The sensitivity and specificity of such programs are comparable with or better than the current methods that detect SNPs and insertions/deletions by using genome-sequencing methods and thus have the potential to identify more robustly genome variation introduced through genetic engineering. The committee expects the field to continue to develop rapidly and to enable researchers to read genomic DNA with increased sensitivity and specificity.

Utility of Transcriptomics, Proteomics, and Metabolomics in Assessing Biological Effects of Genetic Engineering

As stated in the 2004 National Research Council report Safety of Genetically Engineered Foods, understanding the composition of food at the RNA, protein, and metabolite levels is critical for determining whether genetic engineering results in a difference in substantial equivalence compared to RNA, protein, and metabolite levels in conventionally bred crops (NRC, 2004 see Chapter 5). Although the genome provides the 𠇋lueprint” for the cell, assessment of the transcriptome, proteome, and metabolome can provide information on the downstream consequences of genome changes that lead to altered phenotype. Methods used to assess transcripts, proteins, and metabolites in plants are described below with the committee's commentary on limitations of the sensitivity and specificity of detection and interpretation that existed when this report was being written. One caveat in the use of any of these techniques is related to inherent biological variation regardless of genetic-engineering status. Even with identical genotypes grown under identical conditions, there is variation in the transcriptome, proteome, and metabolome. Scientists address such variation by using biologically replicated experiments and multiple -omics and molecular-biology approaches. In addition to biological variation, allelic variation results in different levels of transcripts, proteins, and metabolites in different accessions. To provide context to any observed changes in the transcriptome, proteome, or metabolome attributable to a genetic-engineering event, the broader range of variation in commercially grown cultivars of a crop species can be compared with that of a GE line to determine whether modified levels are outside the realm of variation in a crop. Thus, in assessment of GE crops, interpretation must be in the context of inherent biological and allelic variation of the specific crop. Assessment is also made difficult by the fact that scientists have little or no knowledge of what functions a substantial number of genes, transcripts, proteins, and metabolites perform in a plant cell.

Transkriptomika

Advancements in high-throughput sequencing technologies have enabled the development of robust methods for quantitatively measuring the transcriptome, the expressed genes in a sample. One method, known as RNA sequencing (RNA-seq), entails isolation of RNA, conversion of the RNA to DNA, generation of sequence reads, and bioinformatic analyses to assess expression levels, alternative splicing, and alternative transcriptional initiation or termination sites (Wang et al., 2009 de Klerk et al., 2014). This method can be applied to mRNA, small RNAs (which include interfering RNAs involved in RNAi), total RNA, RNA bound to ribosomes, and RNA-protein complexes to gain a detailed assessment of RNAs in a cell. Methods to construct RNA-seq libraries, generate sequence reads, align to a reference genome, and determine expression abundances are fairly robust even with draft genome sequences if they provide nearly complete representation of the genes in the genome (Wang et al., 2009 de Klerk et al., 2014). Statistical methods to determine differential expression between any two samples, such as two plants with identical genotypes at different developmental stages, are continuing to mature but are limited by inherent biological variation in the transcriptome. Indeed, variation between independent biological replicates of wild-type tissues is well documented. For example, estimation of whole-transcriptome expression abundance in independent biological replicates of a given experimental treatment is considered to be highly reproducible if Pearson's correlation values are more than 0.95 values greater than 0.98 are typically observed. However, even with high Pearson's correlation values, numerous genes may exhibit different expression among biological replicates. Thus, differential gene expression in GE plants would need to be compared with the observed variation in gene expression in biological replicates of untransformed individuals to ensure the absence of major effects of the genetic-engineering event on the transcriptome.

Overshadowing any expression differences discovered between a wild-type plant and an engineered plant is the fact that little is known about the exact function of a substantial number of genes, transcripts, and proteins for any plant species. In maize, nearly one-third of the genes have no meaningful functional annotation even when informative functional annotation is provided, the annotation was most likely assigned by using automated transitive annotation methods that depend heavily on sequence similarity. Thus, even if differentially expressed genes are detected between the wild-type and GE samples, interpreting them in the context of health or effects on the ecosystem may be challenging at best. For example, a study of the effects of expression of the antifungal protein in rice that was introduced with genetic engineering showed changes in about 0.4 percent of the transcriptome in the GE lines (Montero et al., 2011). Analysis of 20 percent of the changes indicated that 35 percent of the unintended effects could be attributed to the tissue-culture process used for plant transformation and regeneration, whereas 15 percent appeared to be event-specific and attributable to the presence of the transgene. About 50 percent of the changes that were attributed to the presence of the transgene were in expression of genes that could be induced in the non-GE rice by wounding. It is impossible to determine whether the changes in transcript levels recorded in the study indicate that the GE rice might be worse than, equal to, or better than its non-GE counterpart as regards food safety. One way to assess the biological effects of genetic engineering on the transcriptome is to include a variety of conventionally bred cultivars in the study and determine whether the range of expression levels in the GE line falls within the range observed for the crop, but this method will not provide definitive evidence of food or ecosystem safety.

Proteomika

Several methods permit comparison of protein composition and post-translational protein modifications between samples (for review, see May et al., 2011). For example, two-dimensional difference in-gel electrophoresis permits quantitative comparison of two proteomes through differential labeling of the samples followed by separation and quantification (Figure 7-5 D). In mass spectrometry (MS), another method for examining the proteome, proteins are first broken into specific fragments (often by proteases, which are enzymes that catalyze the cleavage of proteins into peptides at specific sites) and fractionated with such techniques as liquid chromatography. Then the mass-to-charge ratios of the peptides are detected with MS. MS data typically provide a unique “signature” for each peptide, and the identity of the peptides is typically determined by using search algorithms to compare the signatures with databases of predicted peptides and proteins derived from genome or transcriptome sequence data. Differential isotope labeling can be used in the MS approach to determine quantitative differences in protein samples. One limitation of all current proteomic techniques is sensitivity whole-proteome studies typically detect only the most abundant proteins (Baerenfaller et al., 2008). Furthermore, sample-preparation methods need to be modified to detect different fractions of the proteome (such as soluble versus membrane-bound and small versus large proteins) (Baerenfaller et al., 2008). Thus, to provide a broad assessment of the proteome, an array of sample-preparation methods must be used. Finally, as with the other -omics methods, interpretation of the significance of proteomic differences is made difficult by the fact that scientists have little knowledge of what a large number of proteins do in a plant cell.

Metabolomics

It is common practice in evaluating GE crops for regulatory approval to require targeted profiling of specific metabolites or classes of metabolites that may be relevant to the trait being developed or that are known to be present in the target species and to be potentially toxic if present at excessive concentrations. Under current regulatory requirements, substantial metabolic equivalence is assessed on the basis of concentrations of gross macromolecules (for example, protein or fiber), such nutrients as amino acids and sugars, and specific secondary metabolites that might be predicted to cause concern.

As with genomics, transcriptomics, and proteomics, the approaches collectively known as metabolomics have been developed to determine the nature and concentrations of all metabolites in a particular organism or tissue. It has been argued that such information should be required before a GE crop clears regulatory requirements for commercialization. However, in contrast with genomic and transcriptomic approaches, with which it is now technically easy to assess DNA sequences and measure relative concentrations of most or all transcripts in an organism with current sequencing technologies respectively, metabolomics as currently performed can provide useful data only on a subset of metabolites. That is because each metabolite is chemically different, whereas DNA and RNA comprise different orderings of just four nucleotide bases. Metabolites have to be separated, usually with gas chromatography or high-performance liquid chromatography their nature and concentrations are then determined, usually with MS. The mass spectra are compared with a standard library of chemicals run on the same analytical system. The major problem for this type of metabolomic analysis of plants is the possession in the plant kingdom of large numbers of genus-specific or even species-specific natural products (see section 𠇌omparing Genetically Engineered Crops and Their Counterparts” in Chapter 5 for discussion of plant natural products). Advanced commercial platforms for plant metabolomics currently measure about 200 identified compounds, usually within primary metabolism, and less broadly distributed natural products are poorly represented (Clarke et al., 2013). However, these approaches can differentiate a much larger number of distinct but unidentified metabolites, and it is useful to know whether concentrations of a metabolite are specifically affected in a GE crop even if the identity of the particular metabolite is not known. For example, with a combination of separation platforms coupled to mass spectrometry, it was possible to resolve 175 unique identified metabolites and 1,460 peaks with no or imprecise metabolite annotation, together estimated to represent about 86 percent of the chemical diversity of tomato (Solanum lycopersicum) as listed in a publicly available database (Kusano et al., 2011). Although such an approach allows one to determine whether metabolite peaks are present in a GE crop but not in the non-GE counterpart or vice versa, metabolomics, in the absence of a completely defined metabolome for the target species in which the toxicity of all components is known, is not able to determine with confidence that a GE or non-GE plant does not contain any chemically identified molecule that is unexpected or toxic.

An alternative approach to nontargeted analysis of metabolites is to perform metabolic fingerprinting and rely on statistical tools to compare GE and non-GE materials. That does not necessarily require prior separation of metabolites and can use flow-injection electrospray ionization mass spectrometry (Enot et al., 2007) or nuclear magnetic resonance (NMR) spectroscopy (Baker et al., 2006 Ward and Beale, 2006 Kim et al., 2011). NMR spectroscopy is rapid and requires no separation but depends heavily on computational and statistical approaches to interpret spectra and evaluate differences.

Generally, with a few exceptions, metabolomic studies have concluded that the metabolomes of crop plants are affected more by environment than by genetics and that modification of plants with genetic engineering typically does not bring about off-target changes in the metabolome that would fall outside natural variation in the species. Baseline studies of the metabolomes (representing 156 metabolites in grain and 185 metabolites in forage) of 50 genetically diverse non-GE DuPont Pioneer commercial maize hybrids grown at six locations in North America revealed that the environment had a much greater effect on the metabolome (affecting 50 percent of the metabolites) than did the genetic background (affecting only 2 percent of the metabolites) the difference was more striking in forage samples than in grain samples (Asiago et al., 2012). Environmental factors were also shown to play a greater role than genetic engineering on the concentrations of most metabolites identified in Bt rice (Chang et al., 2012). In soybean, nontargeted metabolomics was used to demonstrate the dynamic ranges of 169 metabolites from the seeds of a large number of conventionally bred soybean lines representing the current commercial genetic diversity (Clarke et al., 2013). Wide variations in concentrations of individual metabolites were observed, but the metabolome of a GE line engineered to be resistant to the triketone herbicide mesotrione (which targets the carotenoid pathway that leads to photobleaching of sensitive plants) did not deviate with statistical significance from the natural variation in the current genetic diversity except in the expected changes in the targeted carotenoid pathway. Similar metabolomic approaches led to the conclusion that a Monsanto Bt maize was substantially equivalent to conventionally bred maize if grown under the same environmental conditions (Vaclavik et al., 2013) and that carotenoid-fortified GE rice was more similar to its parental line than to other rice varieties (Kim et al., 2013). Those studies suggest that use of metabolomics for assessing substantial equivalence will require testing in multiple locations and careful analysis to differentiate genetic from environmental effects, especially because there will probably be effects of gene𠄾nvironment interactions.

Some metabolomic and transcriptomic studies have suggested that transgene insertion or the tissue-culture process involved in regeneration of transformed plants can lead to “metabolic signatures” associated with the process itself (Kusano et al., 2011 Montero et al., 2011). That was reported for GE tomatoes with overproduction of the taste-modifying protein miraculin, although it was pointed out by the authors that, as in comparable studies with other GE crops, “the differences between the transgenic lines and the control were small compared to the differences observed between ripening stages and traditional cultivars” (Kusano et al., 2011).

For metabolomics to become a useful tool for providing enhanced safety assessment of a specific GE crop, it will be necessary to develop a chemical library that contains all potential metabolites present in the species under all possible environmental conditions. It is a daunting task that may be feasible for a few major commodity crops under currently occurring biotic and abiotic stresses, but even that would not necessarily cover future environmental conditions. Annotated libraries of metabolites are unlikely to be developed for minor crops in the near future.

The Epigenome

Agtergrond

Whereas the DNA sequence of a gene encodes the mRNA that is translated into the corresponding protein, the rate at which a gene in the nucleus of a eukaryotic cell is transcribed into mRNA can be heavily influenced by chemical modification of the DNA of the gene and by chemical modification of the proteins associated with the DNA. In plants and other eukaryotes, genomic nuclear DNA can be chemically modified and is bound to an array of proteins in a DNA–protein complex termed chromatin. The major proteins in chromatin are histone proteins, which have an important role in regulating the accessibility of the transcriptional machinery to the gene and its promoter (regulatory region) and thereby control synthesis of mRNAs and proteins. Multiple types of histone proteins are found in plants, each with an array of post-translational modification (for example, acetylation and methylation) that can affect transcriptional competence of a gene. DNA can also be covalently modified by methylation of cytosines that affect transcriptional competence. Collectively, those modifications, which influence the expression of genes and are inheritable over various time spans, are known as epigenetic marks.

Epigenetic marks are determinants of transcriptional competence, and alteration of the epigenetic state (which occurs naturally but infrequently) can alter expression profiles or patterns of target genes. For example, when a transposable element inserts in or near a gene, the gene can be “silenced” as regions near a transposon become highly methylated and transcription-ally suppressed owing to the activity of the cell's native RNA-mediated DNA methylation machinery. Different epigenetic marks occur naturally in crop species examples of transposable element-mediated gene silencing include allelic variation at the tomato 2-methyl-6-phytylquinol methyltransferase gene involved in vitamin E biosynthesis (Quadrana et al., 2014) and imprinting as seen in endosperm tissue, in which differential insertion of transposable elements occurs in the maternal and paternal parents (Gehring et al., 2009).

Methods of Characterizing the Epigenome

Methods of characterizing the epigenome are available and improving rapidly. For DNA methylation, high-throughput, single-nucleotide resolution can be obtained through bisulfite sequencing (BS-seq for review, see Feng et al., 2011 Krueger et al., 2012). BS-seq methods mirror that of genome resequencing except that the genomic DNA is first treated with bisulfite, which converts cytosines to uracils but does not affect 5-methyl-cytosine residues. As a consequence, nonmethylated cytosines will be detected as thymidines after the polymerase chain reaction step during epigenome-library construction. After sequencing, reads are aligned with a reference genome sequence, and nonmethylated cytosines are detected as SNPs and compared with a parallel library constructed from untreated DNA (see section above “Resequencing: Assessing Differences Between the Reference and Query Genome” Figure 7-5). There are limitations of BS-seq approaches, such as incomplete conversion of cytosines, degradation of DNA, and an inability to assess the full methylome because of read mapping limitations, sequencing depth, and sequencing errors, as described above for resequencing. Another limitation is the dynamic nature of plant genome cytosine methylation. Plants derived from an identical parent that have not been subject to any traditional selection or GE transformation can have different epigenomes𠅊n example of 𠇎pigenetic drift” (Becker et al., 2011). Thus, determining the epigenome of a plant at one specific point in time will not necessarily indicate the future epigenome of offspring of that plant.

Histone marks can be detected through chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq for review see Yamaguchi et al., 2014 Zentner and Henikoff, 2014). First, chromatin is isolated so that the proteins remain bound to the DNA. Then the DNA is sheared, and the DNA that is bound to specific histone proteins is selectively removed by using antibodies specific to each histone mark. The DNA bound to an antibody is then used to construct a library that is sequenced and aligned with a reference genome, and an algorithm is used to define the regions of the genome in which the histone mark is found. Sensitivity and specificity of ChIP-Seq depend heavily on the specificity of the histone-mark antibodies, on technical limitations in alignment of sequence reads with the reference genome, and on the overall quality of the reference genome itself. Also, the present state of understanding does not permit robust prediction of the effects of many epigenetic modifications on gene expression, and gene expression can be more thoroughly and readily assessed by transcriptomics.

Evaluation of Crop Plants Using -Omics Technologies

The -omics evaluation methods described above hold great promise for assessment of new crop varieties, both GE and non-GE. In a tiered regulatory approach (see Chapter 9), -omics evaluation methods could play an important role in a rational regulatory framework. For example, consider the introduction of a previously approved GE trait such as a Bt protein in a new variety of the same species. Having an -omics profile in a new GE variety that is comparable to the profile of a variety already in use should be sufficient to establish substantial equivalence (Figure 7-6, Tier 1). Furthermore, -omics analyses that reveal a difference that is understood to have no adverse health effects (for example, increased carotenoid content) should be sufficient for substantial equivalence (Figure 7-6, Tier 2).

FIGURE 7-6

Proposed tiered crop evaluation strategy crops using -omics technologies. SOURCE: Illustration by R. Amasino. NOTE: A tiered set of paths can be taken depending on the outcome of the various -omics technologies. In Tier 1, there are no differences between (more. )

The approach described above could also be used across species. For example, once it is established that production of a protein (such as a Bt protein) in one plant species poses no health risk, then the only potential health risk of Bt expression in another species is unintended off-target effects. -Omics analyses that reveal no differences (Figure 7-6, Tier 1) or in which revealed differences present no adverse health effects (Figure 7-6, Tier 2) in comparison with the previously deregulated GE crop or the range of variation found in cultivated, non-GE varieties of the same species provide evidence for substantial equivalence. As discussed in Chapter 5 (see section “Newer Methods for Assessing Substantial Equivalence”), there have been more than 60 studies in which -omics approaches were used to compare GE and non-GE varieties, and none of these studies found differences that were cause for concern.

There are also scenarios for which -omics analyses could indicate that further safety testing is warranted, such as if -omics analyses reveal a difference that is understood to have potential adverse health effects (for example, increased expression of genes responsible for glycoalkaloid synthesis) (Figure 7-6, Tier 3). Another scenario is if -omics analyses reveal a change of a protein or metabolite for which the consequences cannot be interpreted and are outside the range observed in GE and non-GE varieties of the crop (Figure 7-6, Tier 4). It is important to note that a Tier 4 scenario is not in and of itself an indication of a safety issue. The functions or health effects of consumption of many genes and corresponding RNAs, proteins, and metabolites in non-GE plants are not known. Furthermore, the chemical structure of many metabolites in plants that can be detected as “peaks” in various analytical systems is not known. Substantially more basic knowledge is needed before -omics datasets can be fully interpreted.

The state of the art of the different -omics approaches varies considerably. Advances in the efficiency of DNA-sequencing technology enable a complete genome or transcriptome to be sequenced at a cost that is modest on the scale of regulatory costs. Transcriptomics could play an important role in evaluation of substantial equivalence because it is relatively straightforward to generate and compare extensive transcriptomic data from multiple biological replicates of a new crop variety versus its already-in-use progenitor. As noted above, if no unexpected differences are found, this is evidence of substantial equivalence. It is possible that two varieties with equivalent transcriptomes have a difference in the level of a metabolite due to an effect of the product of a transgene on translation of a particular mRNA or on activity of a particular protein, but these are unlikely scenarios.

It is also straightforward and relatively low in cost to generate genome-sequence data from many individuals from a new GE or non-GE variety to determine which lineage has the fewest nontarget changes to its genome. As noted earlier in the chapter, mutagenesis, although currently classified as conventional breeding, can result in extensive changes to the genome thus generating DNA sequence data will be useful in evaluating varieties produced by this method.

Metabolomic and proteomic techniques cannot presently provide a complete catalog of the metabolome or proteome. Nevertheless, these -omics approaches can play a role in assessment. For example, a similar metabolome or proteome in a new variety compared to an existing variety provides supporting evidence of substantial equivalence, whereas a difference can indicate that further evaluation may be warranted.

The most thorough evidence of substantial equivalence would result from a complete knowledge of the biochemical constituents of one crop variety compared to other varieties. As noted above, that is not possible with present techniques for the proteome and metabolome. However, looking to the future, an increasing knowledge base of plant biochemistry will translate into fewer analyses that result in a Tier 4 situation, and basic research in plant biochemistry will continue to expand the knowledge base that will enable the thorough and rational evaluation of new crop varieties basic research will also expand fundamental understanding of basic biological processes in plants and thus enable advances in molecular plant breeding.

FINDING: Application of -omics technologies has the potential to reveal the extent of modifications of the genome, the transcriptome, the epigenome, the proteome, and the metabolome that are attributable to conventional breeding, somaclonal variation, and genetic engineering. Full realization of the potential of -omics technologies to assess substantial equivalence would require the development of extensive species-specific databases, such as the range of variation in the transcriptome, proteome, and metabolome in a number of genotypes grown in diverse environmental conditions. Although it is not yet technically feasible to develop extensive species-specific metabolome or proteome databases, genome sequencing and transcriptome characterization can be performed.

RECOMMENDATION: To realize the potential of -omics technologies to assess intended and unintended effects of new crop varieties on human health and the environment and to improve the production and quality of crop plants, a more comprehensive knowledge base of plant biology at the systems level (DNA, RNA, protein, and metabolites) should be constructed for the range of variation inherent in both conventionally bred and genetically engineered crop species.