How to filter out sequences based on a given data using Python? -


i filter out sequences don't want based on given file a.fasta. original file contain sequences , fasta file file starts sequence id followed nucleotides represented a, t, c, g. can me?

a.fasta

>chr12:15747942-15747949 tgacatca >chr2:130918058-130918065 tgacctca 

original.fasta

>chr3:99679938-99679945 tgacgtaa >chr9:135822160-135822167 tgacctca >chr12:15747942-15747949 tgacatca >chr2:130918058-130918065 tgacctca >chr2:38430457-38430464 tgacctca >chr1:112381724-112381731 tgacatca 

expected output c.fasta

>chr3:99679938-99679945 tgacgtaa >chr9:135822160-135822167 tgacctca >chr2:38430457-38430464 tgacctca >chr1:112381724-112381731 tgacatca 

code

import sys import warnings bio import seqio bio import biopythondeprecationwarning warnings.simplefilter('ignore',biopythondeprecationwarning)  fasta_file = sys.argv[1]  # input fasta file remove_file = sys.argv[2] # input wanted file, 1 gene name per line result_file = sys.argv[3] # output fasta file  remove = set() open(remove_file) f:     line in f:         line = line.strip()         if line != "":             remove.add(line)  fasta_sequences = seqio.parse(open(fasta_file),'fasta')  open(result_file, "w") f:     seq in fasta_sequences:         nuc = seq.seq.tostring()         if nuc not in remove , len(nuc) > 0:             seqio.write([seq], f, "fasta") 

the code above filter out repeated sequences keep repeated sequences if appear in output

check out @ biopython. here solution using that:

from bio import seqio  input_file = 'a.fasta' merge_file = 'original.fasta' output_file = 'results.fasta' exclude = set() fasta_sequences = seqio.parse(open(input_file),'fasta') fasta in fasta_sequences:     exclude.add(fasta.id)  fasta_sequences = seqio.parse(open(merge_file),'fasta') open(output_file, 'w') output_handle:    fasta in fasta_sequences:         if fasta.id not in exclude:             seqio.write([fasta], output_handle, "fasta") 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -