|
[BioPython] genbank annotation: msg#00009python.bio.general
Hi! I have two genbank genome files, one of the old kind where each region is noted twize, and one where they are unique. What I would like to extract from this is the feature information, in this sort of format: Type start stop direction name In the first case, where almost all regions are noted twize, I'd like to have only one of them included in the list. You have a genbank parser thing in biopython which I'd like to use, however, I cannot figure out how to use it to do this. The files: The first: source 1..2944528 /organism="Listeria monocytogenes" /mol_type="genomic DNA" /strain="EGD-e" /db_xref="taxon:1639" gene 305..1673 /gene="dnaA" RBS 305..310 CDS 318..1673 /codon_start=1 /transl_table=11 /product="Chromosomal replication initiation protein DnaA" /protein_id="CAC98216.1" /db_xref="GI:16409360" /db_xref="GOA:Q8YAW2" /db_xref="UniProt/Swiss-Prot:Q8YAW2" /translation="MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIIS APNNFVRDWLEKSYTQFIANILQEITGRLFDVRFIDGEQEENFEYTVIKPNPALDEDG IEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHLM HAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVLLIDDIQFL AGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPD LETRIAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITA GLAAEALKDIIPSSKSQVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYL SRELTDASLPKIGDEFGGRDHTTVIHAHEKISQLLKTDQVLKNDLAEIEKNLRKAQNM F" gene 1856..3062 /gene="dnaN" RBS 1856..1860 CDS 1867..3012 /codon_start=1 /transl_table=11 /product="DNA polymerase III, beta chain" /protein_id="CAC98217.1" /db_xref="GI:16409361" /db_xref="GOA:Q8YAW1" /db_xref="UniProt/TrEMBL:Q8YAW1" /translation="MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTL TGSDSDISIEAFIPLIENDEVIVEVESFGGIVLQSKYFGDIVRRLPEENVEIEVTSNY QTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPINVLKNIVRQTVFAVSAIEVRP VLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSELNKLLDDA SESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAID RASLLARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKY MMDALRAFEGDDIQISFSGTMRPFVLRPKDAANPNEILQLITPVRTY" The second: source 1..4214630 /strain=168 /organism="Bacillus subtilis subsp. subtilis str. 168" /mol_type="genomic DNA" /db_xref="taxon:224308" CDS 410..1750 /function="initiation of chromosome replication (DNA synthesis)" /gene="dnaA" /protein_id="CAB11777.1" /locus_tag="BSU00010" /transl_table=11 /translation="MENILDLWNQALAQIEKKLSKPSFETWMKSTKAHSLQGDTLTIT APNEFARDWLESRYLHLIADTIYELTGEELSIKFVIPQNQDVEDFMPKPQVKKAVKED TSDFPQNMLNPKYTFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHL MHAIGHYVIDHNPSAKVVYLSSEKFTNEFINSIRDNKAVDFRNRYRNVDVLLIDDIQF LAGKEQTQEEFFHTFNTLHEESKQIVISSDRPPKEIPTLEDRLRSRFEWGLITDITPP DLETRIAILRKKAKAEGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLINKDIN ADLAAEALKDIIPSSKPKVITIKEIQRVVGQQFNIKLEDFKAKKRTKSVAFPRQIAMY LSREMTDSSLPKIGEEFGGRDHTTVIHAHEKISKLLADDEQLQQHVKEIKEQLK" /db_xref="GOA:P05648" /db_xref="SUBTILIS:BG10065" /db_xref="SWISS-PROT:P05648" /note="alternate gene name: dnaH, dnaJ, dnaK" CDS 1939..3075 /locus_tag="BSU00020" /transl_table=11 /translation="MKFTIQKDRLVESVQDVLKAVSSRTTIPILTGIKIVASDDGVSF TGSDSDISIESFIPKEEGDKEIVTIEQPGSIVLQARFFSEIVKKLPMATVEIEVQNQY LTIIRSGKAEFNLNGLDADEYPHLPQIEEHHAIQIPTDLLKNLIRQTVFAVSTSETRP ILTGVNWKVEQSELLCTATDSHRLALRKAKLDIPEDRSYNVVIPGKSLTELSKILDDN QELVDIVITETQVLFKAKNVLFFSRLLDGNYPDTTSLIPQDSKTEIIVNTKEFLQAID RASLLAREGRNNVVKLSAKPAESIEISSNSPEIGKVVEAIVADQIEGEELNISFSPKY MLDALKVLEGAEIRVSFTGAMRPFLIRTPNDETIVQLILPVRTY" /product="DNA polymerase III (beta subunit)" /function="DNA synthesis" /gene="dnaN" /EC_number="2.7.7.7" /protein_id="CAB11778.1" /db_xref="GOA:P05649" /db_xref="SUBTILIS:BG10066" /db_xref="SWISS-PROT:P05649" /note="alternate gene name: dnaG, dnaK" Karin -- Karin Lagesen, PhD student karin.lagesen@xxxxxxxxxxxxxx http://www.cmbn.no/rognes/ _______________________________________________ BioPython mailing list - BioPython@xxxxxxxxxxxxx http://biopython.org/mailman/listinfo/biopython |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | [BioPython] New pharma shop: 00009, kmb28 |
|---|---|
| Next by Date: | [BioPython] Bioinformatics mailing list: 00009, Sebastian Bassi |
| Previous by Thread: | [BioPython] New pharma shopi: 00009, kmb28 |
| Next by Thread: | [BioPython] Bioinformatics mailing list: 00009, Sebastian Bassi |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |