I made a couple of assumptions to generate a way to split the files by genes
Assumptions:
You are working with plain .txt files
You want the new organized genes in plain .txt files (there are a million other things you may want but I figured this was the simplest)
Rather than doing this in bash bc that is far from my strong suit I wrote a simple ruby script that takes the raw files from one folder and parses them by Gene and writes the new files to a new folder. With this you only end up with 1 file for each Gene.
path='genes'Dir.foreach(path)do|filename|nextiffilename=='.'||filename=='..'||filename=='.DS_Store'gene_file=nilheader=trueputs"working on #{filename}"file=File.open("#{path}/#{filename}",'r')file.each_linedo|line|putslineifline.empty?||line=="\n"puts"line empty"header=truegene_file&.closeelsifheaderputs"opening new file"gene_file=File.open("new_gene_files/#{line.gsub(/_id\d/,'').gsub('>','').strip}.txt",'a')puts"adding to file"gene_file.putslineheader=falseelseputs"adding to file"gene_file&.putslineendendfile.closeend
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I made a couple of assumptions to generate a way to split the files by genes
Assumptions:
Rather than doing this in bash bc that is far from my strong suit I wrote a simple ruby script that takes the raw files from one folder and parses them by Gene and writes the new files to a new folder. With this you only end up with 1 file for each Gene.