Coding for Cancer
Addressing serious diseases from the perspective of a computer scientist might seem counterintuitive to the biological nature of health. However, as our collective information continues to increase, it seems likely that programming will be used as a dominant technique used to attack problems smoothly and efficiently using catalogued information. My case for this argument stems from intracellular cascades, which are identified in biology as many of the essential regulation checkpoints for cell growth, proliferation, and destruction. The discoveries of these intracellular cascades can be synonymous with oncogene pathways and biosignaling markers, depending on which pathway one is addressing. Knowing this, it should be clear why I find this important. Identifying any components missing in these regulation pathways can allow us to use catalogued information about these pathways in unison with measured genotypes in an affected individual to determine risk, cause, and treatment of cell regulation- based diseases, namely cancer.
Like many focuses of cell and molecular biology, intracellular signaling is constantly evolving with discoveries being made nearly on a daily basis. I figure then that it is essential that we start somewhere in beginning a code for cancer. This code takes into account various cell pathways that go along with cell regulation, including apoptosis and gene regulation through activation or inhibition of transcription factors. Being that I am more focused on biology in general, I thought it would be a more interesting task to focus on the computer science aspect of this problem. Therefore, the complexities of the pathways are not yet indicated by this primordial code of a far greater task. Instead, the idea of this code is to attack the problem from its very most basic nature, understanding that even though many variables are in play, the direct impact of the presence or absence of a signaling molecule is often binary. Then by creating data structures that are interwoven by code, one can see how the binary direct impact of a signaling molecule can actually have profound impacts on the overall health of the cell.
How The Code Works
My proposed code attempts to answer the question, how can we use stored data to infer information about cancer risks and treatment? The stored data in this case comes from two sources: the person’s genotype and all the possible information about cell cycle proteins and genetic expression that we have stored thus far. My solution therefore stems from user input, which codes for a person’s genotype within certain parameters and compares that information to data stored in lists, which is then analyzed by several functions to give the user a detailed report on the potential risk for a general cell with all other conditions standardized. Let’s take a look at each step in detail.
The user input allows a person to enter in specific mutations within certain parameters, as stated before. This is one of the more limited steps of this code so far, because it is difficult for a person to gain access to their own genome and even if the full genome can be analyzed, much of that information cannot be directly correlated completely with protein expression. My code, therefore, skips the step of the human genome (note, in the future this step will be critical for efficiency and should NOT be skipped), and takes input data regarding protein expression. In other words, the user inputs which proteins are not expressed or the user can input WT for wild-type (no genetic loss of function mutations). The user input looks like this, where given parameters are suggested for protein loss-of-function:
The stored data can come from any trustworthy collection of cell-cycle pathways, but in the future it will certainly come from objective databanks with consistent vocabulary for each protein or gene in question. Analyzing the pathways is the most important part of the code, because the code must be written to understand a few critical rules in cell regulation. Some of these that I worked into my code include considerations such as the fact that if an activating molecule is missing the regulation for that pathway will be lost completely, while if an inhibiting molecule is missing, the pathway will function but in the opposite regulation standard that a healthy cell would intend. It must be considered in this portion of the code that genetic expression is not an “all-or-nothing” process, and very slight manipulations in the genome can result in a gradient of genetic expression by way of mutations that code for separate but similar amino acids or permit posttranslational modification unique to the wild-type. There is also the possibility of over-expression of genes, which can serve to overregulate cell division if the cell-growth checkpoints and inhibitors are in some way overshadowed by a higher concentration of signaling molecules promoting cell proliferation. It is in this step that the complexity of collective data will ultimately be simplified into a way to efficiently understand the human genome, but such data is currently still a bit out of reach. The standard for data I used for the “all-or-nothing” protein function or loss-of-function in my code is shown below:
The results of my code will give clues into the potential health of specific pathways in the cell. Molecular cancer treatments can function to save impacted cell regulation pathways from the consequences of mutations. The user input will take a list up to the size of all proteins in the parameters shown above, and it does not add duplicates. Examples of the results of my code are shown below:
If you have any questions, comments, or your own ideas please let me know. I will continue working on developing this code while more data surfaces with insights into the human genome and the cell cycle. My current troubleshooting tasks involve some accuracy discrepancies, but overall the code runs mostly smoothly. I will provide a brief conclusion next week regarding how humanity can use coding in biology and medicine, as well as what we must carefully consider as we take steps forward.