Perl script to parse a FastA file?

Zuko435

New member
I need a Perl script that will parse through a FastA file generated by NCBI Protein and return the following:

1. Gene name
2. Sequence
3. Host organism

Does anyone have any prototype that does something similar to that?
 
Something like this will work, it depends how yuo are storing the info. I like hashes, so I put it in a hash.

open (SEQ, "proteinfile.fasta") or die "Dang!";
my $name = <SEQ>;
($genename, $species) = $name =~ /some regex/;

while (<SEQ>) {
if (/>/) {
$genomes{$genename_species}{'genename'} = $genename;
$genomes{$genename_species}{'species'} = $species;
$genomes{$genename_species}{'sequence'} = $sequence;
$sequence = ();
($genename, $species) = /some regex/;
}
else {
my ($seq) = /([A-Z]+)/i;
$sequence .= $seq;
}
}
}
$genomes{$genename_species}{'genename'} = $genename;
$genomes{$genename_species}{'species'} = $species;
$genomes{$genename_species}{'sequence'} = $sequence;

This job is so short, I would not take the trouble of learning how to deal with a cpan module. Just write it already.
 
Back
Top