Perl script to parse a FastA file?

Zuko435 · Jul 1, 2011

I need a Perl script that will parse through a FastA file generated by NCBI Protein and return the following:

1. Gene name
2. Sequence
3. Host organism

Does anyone have any prototype that does something similar to that?

swbarnes2 · Jul 2, 2011

Something like this will work, it depends how yuo are storing the info. I like hashes, so I put it in a hash.

open (SEQ, "proteinfile.fasta") or die "Dang!";
my $name = <SEQ>;
($genename, $species) = $name =~ /some regex/;

while (<SEQ>) {
if (/>/) {
$genomes{$genename_species}{'genename'} = $genename;
$genomes{$genename_species}{'species'} = $species;
$genomes{$genename_species}{'sequence'} = $sequence;
$sequence = ();
($genename, $species) = /some regex/;
}
else {
my ($seq) = /([A-Z]+)/i;
$sequence .= $seq;
}
}
}
$genomes{$genename_species}{'genename'} = $genename;
$genomes{$genename_species}{'species'} = $species;
$genomes{$genename_species}{'sequence'} = $sequence;

This job is so short, I would not take the trouble of learning how to deal with a cpan module. Just write it already.

martinthurn · Jul 6, 2011

DO NOT REINVENT THE WHEEL
follow this link for pre-existing Perl code for Fasta

Perl script to parse a FastA file?

Zuko435

New member

swbarnes2

New member

martinthurn

New member