I need to shorten the header for each sequence in a multiple fasta format file. The format for the sequences and the header are as following:
>IPD:MHC00660 Pacy-DQA1*0102
TCTTACGGTTTCTCTGGCCAGTTCACCCATGAATTCGATGGAGACGAGCAGTTCTACGTGGACCTGGAGAGGAAGGAGACGGCCTGGCGATGGCCTGAGTTAAGCAAATTTGGAGGTTTTGACCCGCAGGGTGCACTGAGAAACTTGGC
There are multiple ones of these so I just need to go through each header and shorten it.
For instance I need to go from: >HLA:HLA00608 DQA1*03:01:01 768 bp
to: DQA1*03:01:01
What kind of regular expression can I use to delete things? or should I not be using regular expression?
>IPD:MHC00660 Pacy-DQA1*0102
TCTTACGGTTTCTCTGGCCAGTTCACCCATGAATTCGATGGAGACGAGCAGTTCTACGTGGACCTGGAGAGGAAGGAGACGGCCTGGCGATGGCCTGAGTTAAGCAAATTTGGAGGTTTTGACCCGCAGGGTGCACTGAGAAACTTGGC
There are multiple ones of these so I just need to go through each header and shorten it.
For instance I need to go from: >HLA:HLA00608 DQA1*03:01:01 768 bp
to: DQA1*03:01:01
What kind of regular expression can I use to delete things? or should I not be using regular expression?