d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Awk Help > Can Pay Fg
Add Reply New Topic New Poll
Member
Posts: 5,102
Joined: Jan 2 2010
Gold: 45,375.00
Nov 7 2017 06:36pm
I'm trying to use awk to get an output like:
Transcript_65_23 GO:0008270 GO:0005515
Transcript_49_6 GO:0005089
Transcript_8_17 GO:0003676|GO:0005524
Transcript_28_6 GO:0047262

From:
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 Pfam PF04434 SWIM zinc finger
684 717 2.7E-9 T 04-11-2017 IPR007527 Zinc finger, SWIM-type GO:0008270
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 SMART SM00666 54 137 1.1E-17 T 04-11-2017 IPR000270 PB1 domain GO:0005515
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 SMART SM00575 693 720 4.0E-10 T 04-11-2017 IPR006564 Zinc finger, PMZ-type GO:0008270
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 Pfam PF03108 MuDR family transposase 235 299 2.6E-28 T 04-11-2017 IPR004332 Transposase, MuDR, plant
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 PANTHER PTHR31973 36 811 0.0 T 04-11-2017
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 Pfam PF00564 PB1 domain 59 127 1.4E-11 T 04-11-2017 IPR000270 PB1 domain GO:0005515
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 Pfam PF10551 MULE transposase domain 430 524 6.0E-15 T 04-11-2017 IPR018289 MULE transposase domain
Transcript_65_23 4ba426a2971a784a80baf5a353f58c7d 815 PANTHER PTHR31973:SF12 36 811 0.0 T 04-11-2017
Transcript_49_6 9eb5dbdee729819a9712cec3b2c7339b 32 PANTHER PTHR33101 1 32 3.0E-12 T 04-11-2017
Transcript_49_6 9eb5dbdee729819a9712cec3b2c7339b 32 PANTHER PTHR33101:SF1 1 32 3.0E-12 T 04-11-2017
Transcript_49_6 9eb5dbdee729819a9712cec3b2c7339b 32 Pfam PF03759 PRONE (Plant-specific Rop nucleotide exchanger) 1 32 8.6E-11 T 04-11-2017 IPR005512 PRONE domain GO:0005089
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 PANTHER PTHR24031 7 451 9.3E-230 T 04-11-2017
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 SMART SM00490 239 319 9.6E-31 T 04-11-2017 IPR001650 Helicase, C-terminal
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 Pfam PF00271 Helicase conserved C-terminal domain 217 319 3.3E-30 T 04-11-2017 IPR001650 Helicase, C-terminal
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 SMART SM00487 2 202 2.5E-49 T 04-11-2017 IPR014001 Helicase superfamily 1/2, ATP-binding domain
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 Pfam PF00270 DEAD/DEAH box helicase 8 171 3.2E-45 T 04-11-2017 IPR011545 DEAD/DEAH box helicase domain GO:0003676|GO:0005524
Transcript_8_17 011ac9f705e374501d0a28836d5f0fc5 497 PANTHER PTHR24031:SF326 7 451 9.3E-230 T 04-11-2017
Transcript_28_6 a8df5bea5d69be37055f5af229bbb61a 118 PANTHER PTHR32116:SF7 33 113 7.4E-16 T 04-11-2017
Transcript_28_6 a8df5bea5d69be37055f5af229bbb61a 118 PANTHER PTHR32116 33 113 7.4E-16 T 04-11-2017 IPR029993 Plant galacturonosyltransferase GAUT GO:0047262

I can't list transcripts more than once, and each transcript row cant have the same GO:000XXXX value twice (Ex. Transcript_65_23 has two copies of GO:0008270 and GO:0005515 but it should only be printed once for each number for that line)
If anybody could show me some code that would do some or all of this, with some basic explanation, I would be so appreciative and I can give FG for your time.
Go Back To Programming & Development Topic List
Add Reply New Topic New Poll