How do you, In PERL, truncated the header of multiple header lines?

andr0x

New member
I don't know perl but I do know regular expressions (a bit, I'm not sure anyone fully understands regex), keep in mind perl has many great features for regular expressions and there may be an easier "perl" way of do this, but here is a regex to match you what you need.

/^\S+\s(DQA1\*\S+)\s$/

Nw I don't know how your headers work, this will match anything beginning with DQA1* and then all the way to the next space, you can then either replace the older string with the match or make a new file with only the matches, whatever is easier in perl, but if say you wanted to the matches to include not only DQA but also taht Pacy-DQA then you want to change the expression in the parentheses to
(\S*DQA\*\S+)

of course if you have items that begin with something other than DQA then this won't work.

here's a a great easy introduction to regular expressions, its made for the python programming language but the regex syntax will work in perl.

http://code.google.com/edu/languages/google-python-class/regular-expressions.html
 
I need to shorten the header for each sequence in a multiple fasta format file. The format for the sequences and the header are as following:

>IPD:MHC00660 Pacy-DQA1*0102
TCTTACGGTTTCTCTGGCCAGTTCACCCATGAATTCGATGGAGACGAGCAGTTCTACGTGGACCTGGAGAGGAAGGAGACGGCCTGGCGATGGCCTGAGTTAAGCAAATTTGGAGGTTTTGACCCGCAGGGTGCACTGAGAAACTTGGC

There are multiple ones of these so I just need to go through each header and shorten it.
For instance I need to go from: >HLA:HLA00608 DQA1*03:01:01 768 bp
to: DQA1*03:01:01

What kind of regular expression can I use to delete things? or should I not be using regular expression?
 
Back
Top