/sci/ - Science & Math

File: 63 KB, 1009x599, Codon.png [View same] [iqdb] [saucenao] [google]

Anonymous Wed Dec 26 12:11:51 2018 No.10244417 [Reply] [Original]

On Christmas Eve, I created a program that randomly generates an amino acid sequence. One codon at a time, the peptide bond chain is generated.

There are four nucleotides. A codon is composed of three nucleotides. Therefore, there are 64 unique codons.

There are 19 amino acids and 1 imino acid. For simplicity, it's said there are 20 amino acids. Each codon generates one amino acid.

The program I created assumes all codons have an equal probability of being generated. Three codons are stop codons. The program terminates when a stop codon is generated.

Now, the next logical step is to create a program that randomly generates an alpha helix or beta-pleated sheet. However, to do this, I need to know which sequences of amino acids fold into an alpha helix and which peptide bond chains fold into a beat-pleated sheet.

I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?

If not all possible configurations have been tested, is there at least conjecture which would be helpful? For example, suppose polar amino acids tend to fold one way. Suppose cystine tends to make the the polypeptide chain fold another way. Suppose the size of the amino acid also determines which of the two possible secondary protein structures form.

Note, the only type of structure I'm interested in right now is the secondary structure. Different rules apply to how polypetide chains fold in protein domains, tertiary structures, and quaternary structures.

Anonymous Wed Dec 26 12:12:16 2018 No.10244420
File: 833 KB, 1434x1097, Secondary Structure.png [View same] [iqdb] [saucenao] [google]

Protein domains are polypetide chains which contain between 40 and 250 amino acids. If the information I need to create a computer program which randomly generates the secondary structure is available, the next logical step is to create a program which randomly generates a protein domain.

However, nothing I've read has even provided the range of amino acids in the secondary structure. Based on the pictures of every textbook I've read, it appears as though the secondary structure is composed of roughly dozen amino acids. Although, to create a computer program I need much more precise information.

Anonymous Wed Dec 26 12:13:15 2018 No.10244423

>>10244417
>I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?
You may remember a little program called "folding at home".

Anonymous Wed Dec 26 12:16:16 2018 No.10244430

>>10244420
>However, nothing I've read has even provided the range of amino acids in the secondary structure. Based on the pictures of every textbook I've read, it appears as though the secondary structure is composed of roughly dozen amino acids. Although, to create a computer program I need much more precise information.
On this, you might want to look into proteomics wrt enzymes. You'll note that proline in particular screws with stuff, it makes tighter coils because of its structure. Also which amino acids are hydrophobic or philic affect how they end up arranged, but you're going for something way too complicated here.

>>	Anonymous Wed Dec 26 12:21:16 2018 No.10244441 >>10244417 wtf is that monstrosity

>>	Anonymous Wed Dec 26 12:28:06 2018 No.10244459 >>10244441 You see shit like this a lot when someone's trying to "roll out" loops in code for performance. It's annoying af.

Anonymous Wed Dec 26 12:28:07 2018 No.10244460
File: 75 KB, 960x720, Polypeptide chain.jpg [View same] [iqdb] [saucenao] [google]

>>10244441
You mean the picture? It's a Java program that randomly generates a polypepetide chain. Note, methionine always initiates translation. The nucleotide sequence for the "start" codon is adenine, uracil, and guaninine, in that order.

>>	Anonymous Wed Dec 26 12:30:00 2018 No.10244464 File: 22 KB, 132x93, sun.png [View same] [iqdb] [saucenao] [google] >>10244417 Dude ... your code ...

>>	Anonymous Wed Dec 26 12:31:17 2018 No.10244469 >>10244460 He's referring to the autist-level quantity of if statements.

Anonymous Wed Dec 26 12:41:57 2018 No.10244488
File: 89 KB, 962x902, Codon.jpg [View same] [iqdb] [saucenao] [google]

>>10244469
That's pretty much all there is to the code. All the if statements are inside of a while loop. When any one of three "stop" codons is generated, the while loop ends. Methionine is generated prior to the while loop.

>>	Anonymous Wed Dec 26 12:45:11 2018 No.10244493 >>10244459 thats what compilers are for

>>	Anonymous Wed Dec 26 12:48:17 2018 No.10244494 File: 7 KB, 211x152, 1530985286495.png [View same] [iqdb] [saucenao] [google] >>10244417 >that code

>>	Anonymous Wed Dec 26 12:56:18 2018 No.10244506 >>10244493 >trusting a computer to understand code

>>	Anonymous Wed Dec 26 13:04:30 2018 No.10244526 >>10244488 There is a better way to write it that makes it more dechipherable and reworkable, which will likely be inevitable as you progress through your project. Good luck to ya, anon.

>>	Anonymous Wed Dec 26 13:06:43 2018 No.10244532 >>10244493 I'm not suggesting it's a good strategy. It can work, not as often as the people who use it seem to think though.

>>	Anonymous Wed Dec 26 13:08:06 2018 No.10244537 >>10244488 Look up a "codon wheel". That'll give you an idea of how to make it way more efficient and readable.

>>	Anonymous Wed Dec 26 13:08:18 2018 No.10244539 >>10244417 Whoa... that’s a awesome program dude! You’re really smart, and not a fucking retard at all. You should keep doing this, as it is a worthwhile way to spend your time!

>>	Anonymous Wed Dec 26 13:09:52 2018 No.10244540 >>10244539 Don't be such a Negatron anon. It's more productive than nothing and the guy has learnt some basic bio.

>>	Anonymous Wed Dec 26 13:14:04 2018 No.10244546 File: 460 KB, 1005x1120, Protein Domains.png [View same] [iqdb] [saucenao] [google] >>10244526 I organized the code the way I organize my brain. I'm able to decipher it fine.

Anonymous Wed Dec 26 13:20:52 2018 No.10244555

>>10244417
A lot of people shitting on your code, but leave no suggestions. So here's one: convert to binary algebra.

You only need one byte, not three. Structure should be nucleotide = 0b00zzyyxx. I don't know how much you know about this so a slow start:

Let's look at the first if statement, it has 4 different variations making up "-Gly:
1. x = 3, x = 0b11. y = 3, y = 0b11. z = 0, z = 0b00. so the entire number would be: 0b00001111=15

2. x = 3, y = 3, z = 1 -> 0b00011111 = 31

3. x = 3, y = 3, z = 2 -> 0b00101111 = 47

4. x = 3, y =1, z = 0 -> 0b00000111 = 7

The entire if statement would then be:

if(nucleotide == 15 || nucleotide == 31 || nucleotide == 47 || nucleotide == 7){System.out.print("-Gly")}

This would increase readability by a lot..

>>	Anonymous Wed Dec 26 13:21:31 2018 No.10244556 >>10244417 >>10244488 >>10244546 Are you sure you can read the code since if you get 3,3,3 you would print out -Gly-Ala with the code you posted.

>>	Anonymous Wed Dec 26 13:26:42 2018 No.10244563 >>10244555 Because of degeneracy of the codons, you'd find that much better if you reversed the codon order,

>>	Anonymous Wed Dec 26 13:28:11 2018 No.10244564 >>10244563 I have no idea what that is, I'm just a fellow cs engineer who sees poor code

Anonymous Wed Dec 26 13:33:48 2018 No.10244575
File: 660 KB, 2001x2309, Translation.jpg [View same] [iqdb] [saucenao] [google]

>>10244556
Good eye. Alanine should be 313. I guess I should probably double check the program. Just created it two days ago. I recall copy and pasting the Ala if statement for other amino acids. There could be other errors as well.

Anonymous Wed Dec 26 13:37:03 2018 No.10244581

>>10244564
See >>10244488 vs suggestion here >>10244537
The first two nucleotides of the codons never change for any variations of an amino acid. So in that example you could have it as (I think) the values 28 thru 31 for gly if you had z as the last two digits. Mildly neater. Easier to debug and things.

>>	Anonymous Wed Dec 26 13:38:55 2018 No.10244585 >>10244575 This is part of why we write neat code anon.

>>	Anonymous Wed Dec 26 13:44:07 2018 No.10244594 >>10244581 In fact, just noticed you've written case 4 wrong, all x and y should be 0b11 for -Gly. And to clarify I'm saying 0b00xxyyzz would work better.

Anonymous Wed Dec 26 13:56:33 2018 No.10244631
File: 143 KB, 485x611, Protein Synthesis.jpg [View same] [iqdb] [saucenao] [google]

>>10244585
I've made other programs in a similar style. I always spot mistakes the second time I look at the code. It doesn't take too many rounds to spot all the mistakes. I prefer the code this way, because to me it's better organized. As is, the code shows the importance of each nucleotide as a separate entity and drives the point home that a codon is composed of three nucleotides. All 64 possible outcomes are illustrated.

There's an additional bonus piece of information, somewhat subtly presented. The smaller the amino acid, the higher it is on the list.

>>	Anonymous Wed Dec 26 13:59:04 2018 No.10244637 Stop using computers for this shit We can't have the machine race being too knowledgeable about organics or they will know how to eliminate us when they gain consciousness

Anonymous Wed Dec 26 14:04:13 2018 No.10244650
File: 38 KB, 817x322, numGenJava.jpg [View same] [iqdb] [saucenao] [google]

To help with syntax and number generation (since i realized you'd need to do a healthy mix of string and integer mixing which might be daunting):

1. Randomize the numbers x, y and z

2. Create a string "0b00"

3. Convert x, y and z to binary (google it).

4. Concatenate the string "00" + Integer.toString(x, 2) + ...

See pic related for full code, apparently 0b wasn't convention in java when it comes to converting.

>>	Anonymous Wed Dec 26 14:06:52 2018 No.10244656 >>10244417 >other stems make fun of compsci >meanwhile they write code like that why am I even surprised

Anonymous Wed Dec 26 14:49:05 2018 No.10244720
File: 91 KB, 960x720, folding.jpg [View same] [iqdb] [saucenao] [google]

In any case, does anyone know a list of amino acid sequences divided into two categories 1. alpha helix 2. beta-pleated sheets? After fixing the mistakes in the primary structure program, I'd like to begin coding a program that generates the secondary structure.

Have all possible amino acid sequences for the secondary structure been determined? Has every possible sequence been determined whether it will fold into an alpha helix or beta sheet? If not, has it at least been conjectured how the polypeptide chain will fold?

>>	Anonymous Wed Dec 26 15:17:19 2018 No.10244755 >>10244656 This guy is neither nor. I think a general law should be "People write bad code" btw, I see more comp sci background bad code because they go into the industry. It's all equally as bad.

>>	Anonymous Wed Dec 26 15:26:33 2018 No.10244771 File: 92 KB, 960x720, coiling or folding.jpg [View same] [iqdb] [saucenao] [google] >>10244720 This image probably should have been called "coiling" or "alpha helix."

>>	Anonymous Wed Dec 26 15:26:59 2018 No.10244772 use a lookup table jesus

Anonymous Wed Dec 26 15:35:33 2018 No.10244792

>>10244417
>I need to know which sequences of amino acids fold into an alpha helix and which peptide bond chains fold into a beat-pleated sheet.
>I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?
protein folding is one of the most difficult problems in biology.

Anonymous Wed Dec 26 15:41:07 2018 No.10244804
File: 73 KB, 960x720, Secondary Structure.jpg [View same] [iqdb] [saucenao] [google]

On second thought, maybe I'm going about this all wrong. Maybe categorizing the secondary structure into two categories isn't an effective way to reduce the protein search space. At the very least, it should help in creating a general protein framework.

One of the main hurdles in a quantitative approach to biology is that there are practically no textbooks on the topic. I've never found one specifically on proteins. Ideally, there would be five volumes: I. Primary structure II. Secondary Structure III. Protein domain IV. Tertiary structure V. Quartnernary structure.

Does anyone here know of any textbooks that do this? How online resources? Anything helpful?

>>	Anonymous Wed Dec 26 15:42:22 2018 No.10244807 File: 243 KB, 1016x675, klossycrap.png [View same] [iqdb] [saucenao] [google] >>10244417 >that code Holy fucking shit ahahahahhahahaha

>>	Anonymous Wed Dec 26 15:56:23 2018 No.10244836 >>10244807 this is not a csfag you retard

>>	Anonymous Wed Dec 26 16:06:25 2018 No.10244853 >>10244836 Implying you need to be a cs faggot to think this is shit. Just define a multidimensional array/list for your gay-ass protein translation thing as lookup table and return a value array[x][y][z]

>>	Anonymous Wed Dec 26 16:14:27 2018 No.10244860 >>10244853 I know it's shit, but why did you post a csfag meme

>>	Anonymous Wed Dec 26 16:15:38 2018 No.10244861 >>10244417 If you actually wrote that code, literally just kill yourself I am not joking

>>	Anonymous Wed Dec 26 16:16:15 2018 No.10244864 >>10244860 Klossy needs my cum on her face and she knows it too

>>	Anonymous Wed Dec 26 21:50:31 2018 No.10245467 >>>/g/69081112 Do yourself a favor OP and read through lol, also they find a mistake in your code

Anonymous Wed Dec 26 22:33:26 2018 No.10245530

Some friendly advice OP:
Use hash tables. It cuts down on the size and requires less brain power. I'm not sure about java, but in C this would look something like: https://pastebin.com/g80fvj9t
Implementing it like this allows you to extend your program, in case somehow a new codon is discovered or some shit. Extensibility allows you to be lazy and makes your life easier.
Good work though!

>>	Anonymous Wed Dec 26 22:34:27 2018 No.10245531 >>10245530 I made a few errors given my haste and copious amounts of liquor, so view it more as pseudo-code

>>	Anonymous Wed Dec 26 23:02:05 2018 No.10245564 >>10244417 OP here, this was all just b8 to see how many butthurt csfags have infested this board. yup looks like quite alot

>>	Anonymous Wed Dec 26 23:08:43 2018 No.10245576 >>10245564 I was just trying to give constructive criticism, but alright

>>	Anonymous Thu Dec 27 00:23:29 2018 No.10245658 This looks like a job for switch case! Just append x y and z together and then go like case 330: case 331: case 332: case 333: print "-Gly" case 310: case 311: case 312: case 313: (image shows 333 which would print -Gly. Bug?) print "-Ala" and so on.

>>	Anonymous Thu Dec 27 00:25:32 2018 No.10245660 >>10245658 Using switch with raw integers is bad form, and a hash table would be better

>>	Anonymous Thu Dec 27 00:37:56 2018 No.10245684 >>10245660 I'd argue that seeing he is going to have to build it by hand anyway, a switch case is simpler. If the program were to populate itself then I'd use a hash table or other structure.

>>	Anonymous Thu Dec 27 00:50:36 2018 No.10245708 >>10245684 I would honestly always use a hash table. Especially in cases where the keys and values are constant. In higher level languages, these constructs are optimized to hell and back.

>>	Anonymous Thu Dec 27 00:56:22 2018 No.10245717 >>10245708 Switch cases are extremely fast. But anyway, it hardly matters. Either way would be much easier to read and spot mistakes so he wouldn't have two codons being appended in one iteration like he currently does.

>>	Anonymous Thu Dec 27 00:57:43 2018 No.10245720 >>10245717 Yeah, I guess we're arguing over nothing

>>	Anonymous Thu Dec 27 02:47:00 2018 No.10245887 >>10244417 Hedious

>>	Anonymous Thu Dec 27 02:53:36 2018 No.10245902 File: 1.26 MB, 865x1645, 1505546396485.png [View same] [iqdb] [saucenao] [google] >>10244417 Before you keep coding you should start reading. You'll thank me later.

>>	Anonymous Thu Dec 27 05:13:43 2018 No.10246105 File: 1.25 MB, 400x400, 1528938172341.gif [View same] [iqdb] [saucenao] [google] Hey OP check this out: >>>/g/69081112

Anonymous Thu Dec 27 12:32:26 2018 No.10246830
File: 68 KB, 960x720, Primary Structure.jpg [View same] [iqdb] [saucenao] [google]

>>10245467
There were actually two mistakes. As mentioned earlier, code from alanine was copied to other amino acids.

>>10245530
>in case somehow a new codon is discovered
A codon is made of three nucleotides. Uracil, Cytosine, Adanine, or Guanine. Making the search space 4^3 = 64. This is true of single celled organisms such as bacteria, as well as humans.

There isn't anything new to discover when it comes to the primary structure.

>>	Anonymous Thu Dec 27 12:33:22 2018 No.10246834 >>10246830 I was aware of that, hence why I put 'or some shit' right after what you quoted

>>	Anonymous Thu Dec 27 13:24:40 2018 No.10246921 >>10246830 >A codon is made of three nucleotides. Uracil, Cytosine, Adanine, or Guanine. Look up orthogonal nucleotides/subunits. Also RNA has quite a few more options that just UACG.

>>	Anonymous Thu Dec 27 13:42:56 2018 No.10246958 >>10245708 I mean I guess it depends, I want to say when I quickly prototype/sketch out something maybe I wouldn't but actually I always use a hash table or something similar. So much quicker to make and organise.

>>	Anonymous Thu Dec 27 16:20:32 2018 No.10247297 >>10244459 But this isnt a loop, dumbass. Unrolling loops is supposed to remove the check for stopping the loop for each execution of the loop.

>>	Anonymous Thu Dec 27 16:21:35 2018 No.10247300 Post your complete code, please. (use pastebin or github)

>>	Anonymous Thu Dec 27 16:22:25 2018 No.10247303 >>10244417 Stop coding Sincerely, /g/

>>	Anonymous Thu Dec 27 16:23:26 2018 No.10247306 File: 29 KB, 349x642, db0[1].jpg [View same] [iqdb] [saucenao] [google] >>10245564

>>	Anonymous Thu Dec 27 16:24:21 2018 No.10247309 >>10244417 let's show people how much of a smartass I am >proceed to post on /sci/ an absolutely retarded piece of code that no one in his right mind would try to understand

>>	Anonymous Thu Dec 27 20:23:28 2018 No.10247869 Yandev would be proud...

>>	Anonymous Thu Dec 27 20:26:26 2018 No.10247877 >>10244417 looks like babby's first day programming course exercise. That's just terrible

>>	Anonymous Thu Dec 27 20:26:45 2018 No.10247878 >>10244417 This is shitty code, not to mention it's written in Java, the worst programming language you could use for this type of thing. Use C.

>>	Anonymous Fri Dec 28 02:58:01 2018 No.10248500 >>10247878 R is better for sciences

>>	Anonymous Fri Dec 28 03:05:40 2018 No.10248508 >>10244417 So there's seriously no information on when an amino acid sequence folds into an alpha helix or beta pleated sheet? That's stuff I heard in high school, there has to be something.

>>	Anonymous Fri Dec 28 03:07:18 2018 No.10248510 >>10244417 Ooooooh my God you used if statements for everything, you have to be trolling. At the very least you could've used switches.

>>	Anonymous Fri Dec 28 03:34:06 2018 No.10248530 >>10244417 obvious bait

>>	Anonymous Fri Dec 28 04:05:16 2018 No.10248566 File: 24 KB, 400x400, a060ac6fdb6d76102917d822f3892951_400x400.jpg [View same] [iqdb] [saucenao] [google] >>10248500 Python is better for everything ever

>>	Anonymous Fri Dec 28 05:42:56 2018 No.10248676 File: 38 KB, 727x480, 1528835745426.jpg [View same] [iqdb] [saucenao] [google] >>10248500 LOL

>>	Anonymous Fri Dec 28 08:52:58 2018 No.10248851 >>10247878 C is horrible for string manipulation.

>>	Anonymous Fri Dec 28 09:14:54 2018 No.10248879 >>10245902 meme list

>>	Anonymous Fri Dec 28 09:20:34 2018 No.10248888 >>10248566 If you're a scriptlet.

Anonymous Fri Dec 28 09:29:22 2018 No.10248903

>>10244417

I don't get it. You go on doing all this shit, when there's millions of software that already do this and they do it much better?
https://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software
I've read that artificial intelligence is already predicting this shit very accurately, but I'm too much of a lazy asshole to search for the source.
What is the goal you had in mind when you started to work on this project?

Anonymous Fri Dec 28 09:32:39 2018 No.10248912

I don't understand this. You claim to have read extensively about proteins, but somehow never encountered at least 1 chapter regarding secondary structures and the amino acids contained in them?
I mean, a fucking simple search in google for "alpha helix" lands you on the wikipedia article.
In this wikipedia article:
https://en.wikipedia.org/wiki/Alpha_helix#Amino-acid_propensities
If you're a smart guy you'll read that and you'll follow the references to read the papers and then you'll find out more on the literature regarding the topic.
Secondary structure has been figured out more or less, with many programs able to predict accurately which stretches of sequence will become what. It's the actual 3D folding that is the difficult part.
But I'm quite sure that AlphaFold, Google's crack at the folding problem with its deep learning AI, will get close to figuring it out.

Also:
http://predictioncenter.org/index.cgi

Also:
Check out these guys' labs, they're doing some next level shit with proteins
https://www.bakerlab.org/
http://yeateslab.mbi.ucla.edu/

>>	Anonymous Fri Dec 28 11:32:35 2018 No.10249149 >>10248903 Google's DeepMind won an award recently folding proteins.

Anonymous Fri Dec 28 14:17:28 2018 No.10249407
File: 129 KB, 1389x583, Helix Propensity.png [View same] [iqdb] [saucenao] [google]

>>10248912
>Helices observed in proteins can range from four to over forty residues long, but a typical helix contains about ten amino acids (about three turns).
This sentence in the Wikipedia article is relevant to creating a program that categorizes amino acid sequences into two categories 1. Alpha helix 2. Beta-pleated sheet. Unfortunately, there's no source for the quoted sentence. Also, I need more specific information that. I can figure out the search space based on the information provided by the quoted sentence (assuming it's accurate), but that's about it.

The paragraph on Amino-acid propensities is relevant, but doesn't get anywhere near the precision I'm looking for. At least there's a source cited. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1299714/
Pic related. The problem with this source though is that it only evaluates individual amino acids, rather than sequences of amino acids. Even then, only approximations are provided. The article is also twenty years old. It's hard to believe there hasn't been more detailed and precise data gathered since then.

>>10248903
>>10249149
What I'd like to do is create a perfect information program. Google Deep Mind and others only approximate.

Everyone tries to jump straight to proteins. It would be more logical to focus on the secondary structure first. Even focusing just on protein domains would be better than the current approach.

Anonymous Fri Dec 28 14:25:38 2018 No.10249437
File: 57 KB, 750x750, hmm.jpg [View same] [iqdb] [saucenao] [google]

>>10244417
Isn't there a basic terminator gene set for every protein and enzyme section that is transcribes so there isn't excess building block materials built in newly made proteins or enzymes

Can't you go off those ending points?

>>	Anonymous Fri Dec 28 14:30:21 2018 No.10249449 >>10244417 >go on NCBI BLAST >find a protein you like >find a region that has your desired structures >find the sequence for that region (theyre normally quite bicely organized) >copy it >repeat with other molecules for greater variety

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]