DEV Community

Discussion on: What do you do on a daily basis for your job?

Collapse
 
jacoby profile image
Dave Jacoby

I am a developer (right now, the developer) for a genomics core lab.

What we do is gene sequencing (reading, not writing) for a research institution. There are many labs that need sequences done, and few of them do it enough for it be worthwhile for them to get their own sequencer, so there's us.

What I do, most days, is look at two screens at a standing desk and type.

My corner of the lab is mostly about meta-data. Who brought in samples? What are the samples? What do you want done with them? How are we getting paid? This is mostly web forms and SQL, and I go in to fix data much more often than I do to fix or update the tools, but that does happen.

Because nobody wants to waste time and money sequencing bad samples, there are Quality Assurance steps. My code generates config files, so we don't get "sample 1, sample 2" in output that we have to associate with the sample IDs later. I also have some visualization tools that allow us to inform our customers where we are in the process.

Sequencing can take between a few hours and a week, depending on the engine, and you end up with about 100 characters of ACGT per read, and all the reads are in different files, and the assembly process occurs to make it into one full genome, which can be several hundred GBs. Here we must remember that file systems are filled with inodes, and you can kill a file system with a huge number of small files, even if disk usage is still small. This is not part of my workflow, yet.

The output of the assembly is on a multi-petabyte storage system (shared across departments) connected to several research clusters, and we have several ways to share this data. If our customers are also on this cluster, we use Access Control Lists (ACLs) to ensure they can access it. Previously, it was done via ln and magic, but the new storage system supports ACLs, which is better. I wrote tools that add and remove access control based on stored rules across large directories, but now that that's working, I rarely have to think about them.

If a customer does not have access to these systems, or if we have moved their data onto tape storage, we use a service called Globus to give them access. Because the permission to share this data lies with us, not our customers, I wrote a proxy to for this. I spend much more time checking that systems are up and helping the customers and their collaborators navigate this service than working on the tools themselves.

Additionally, I'm a computer guy in a lab where much of the work is biochemical, so fixing PCs, running cable, and answering questions on Excel also fall to me.

Weekly:

  • Monday morning, I reset permissions for Globus.
  • (Almost every) Tuesday afternoon, I go to campus helpdesk at a coffee shop to ask the admins of the storage system and clusters questions and to sometimes answer our customers' questions as well.
  • Every other Thursday, there's a meeting between my boss and other bioinformatics people. Every other meeting, the admins are there as well.

Nearly everything else changes depending on what's going on that day or week.