Proving I can code enough to study bioengineering - Part 1

A friend of mine approached me a while back to help him with a particular python programming challenge.: The entry-level programming proof for the master studies of bioengineering for a university in Austria. He never went to apply and therefore we never solved the problem. But since then it kept me wondering: Would I be able to code this? Well, we’re about to find out.

We’ll start simple; by reading the requirements:

Python - Programming.PNG

Free translation from me into English:

  1. Readout a DNA sequence from a file in FASTA Format

  2. Write a function to calculate the GC-content of the genome

  3. Write a function with a “sliding window” which runs through the genome and calculates the difference of the GC-content of the sample with the average of the genome

    • The size of the sliding window should be adjustable through a parameter, the default value is 50

    • The position with the highest difference should be the output

I will roughly follow the CRISP-DM data science framework to solve this question

  • Data understanding: The biology part. I have to do research about the context of the requirements is. What is a “GC-content” and what it is used for? Gaining some basic understanding will help me when handling the data. Though, my plan is to jump to the next part as soon as possible.

  • Data preparation/modeling: The programming part. I will have to learn how to read a FASTA format and program the sliding window. I do have first ideas of how to code it, but I will need a deeper data understanding to understand if this is feasible.

My expectation is, that the real data “biology” questions will only arise when I’m starting to handle the data. Hence I want to start programming as quickly as possible. The move back, towards data understanding, will happen naturally and continuously.

The good thing is, I’m not a complete novice to Python or programming per sé. I code VBA at work and built my own Python code to help me with my investments. This proof seems like a good stretch of my programming skills so let’s figure this one out. I’ll do a “pit-stop” every two weeks to track progress made, so stay tuned.

yours truly

Previous
Previous

Proving I can code enough to study bioengineering - Part 2

Next
Next

Why Fish Don’t Exist