[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 63 KB, 1009x599, Codon.png [View same] [iqdb] [saucenao] [google]
10244417 No.10244417 [Reply] [Original]

On Christmas Eve, I created a program that randomly generates an amino acid sequence. One codon at a time, the peptide bond chain is generated.

There are four nucleotides. A codon is composed of three nucleotides. Therefore, there are 64 unique codons.

There are 19 amino acids and 1 imino acid. For simplicity, it's said there are 20 amino acids. Each codon generates one amino acid.

The program I created assumes all codons have an equal probability of being generated. Three codons are stop codons. The program terminates when a stop codon is generated.

Now, the next logical step is to create a program that randomly generates an alpha helix or beta-pleated sheet. However, to do this, I need to know which sequences of amino acids fold into an alpha helix and which peptide bond chains fold into a beat-pleated sheet.

I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?

If not all possible configurations have been tested, is there at least conjecture which would be helpful? For example, suppose polar amino acids tend to fold one way. Suppose cystine tends to make the the polypeptide chain fold another way. Suppose the size of the amino acid also determines which of the two possible secondary protein structures form.

Note, the only type of structure I'm interested in right now is the secondary structure. Different rules apply to how polypetide chains fold in protein domains, tertiary structures, and quaternary structures.

>> No.10244420
File: 833 KB, 1434x1097, Secondary Structure.png [View same] [iqdb] [saucenao] [google]
10244420

Protein domains are polypetide chains which contain between 40 and 250 amino acids. If the information I need to create a computer program which randomly generates the secondary structure is available, the next logical step is to create a program which randomly generates a protein domain.

However, nothing I've read has even provided the range of amino acids in the secondary structure. Based on the pictures of every textbook I've read, it appears as though the secondary structure is composed of roughly dozen amino acids. Although, to create a computer program I need much more precise information.

>> No.10244423

>>10244417
>I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?
You may remember a little program called "folding at home".

>> No.10244430

>>10244420
>However, nothing I've read has even provided the range of amino acids in the secondary structure. Based on the pictures of every textbook I've read, it appears as though the secondary structure is composed of roughly dozen amino acids. Although, to create a computer program I need much more precise information.
On this, you might want to look into proteomics wrt enzymes. You'll note that proline in particular screws with stuff, it makes tighter coils because of its structure. Also which amino acids are hydrophobic or philic affect how they end up arranged, but you're going for something way too complicated here.

>> No.10244441

>>10244417
wtf is that monstrosity

>> No.10244459

>>10244441
You see shit like this a lot when someone's trying to "roll out" loops in code for performance. It's annoying af.

>> No.10244460
File: 75 KB, 960x720, Polypeptide chain.jpg [View same] [iqdb] [saucenao] [google]
10244460

>>10244441
You mean the picture? It's a Java program that randomly generates a polypepetide chain. Note, methionine always initiates translation. The nucleotide sequence for the "start" codon is adenine, uracil, and guaninine, in that order.

>> No.10244464
File: 22 KB, 132x93, sun.png [View same] [iqdb] [saucenao] [google]
10244464

>>10244417

Dude ... your code ...

>> No.10244469

>>10244460
He's referring to the autist-level quantity of if statements.

>> No.10244488
File: 89 KB, 962x902, Codon.jpg [View same] [iqdb] [saucenao] [google]
10244488

>>10244469
That's pretty much all there is to the code. All the if statements are inside of a while loop. When any one of three "stop" codons is generated, the while loop ends. Methionine is generated prior to the while loop.

>> No.10244493

>>10244459
thats what compilers are for

>> No.10244494
File: 7 KB, 211x152, 1530985286495.png [View same] [iqdb] [saucenao] [google]
10244494

>>10244417
>that code

>> No.10244506

>>10244493
>trusting a computer to understand code

>> No.10244526

>>10244488
There is a better way to write it that makes it more dechipherable and reworkable, which will likely be inevitable as you progress through your project. Good luck to ya, anon.

>> No.10244532

>>10244493
I'm not suggesting it's a good strategy. It can work, not as often as the people who use it seem to think though.

>> No.10244537

>>10244488
Look up a "codon wheel". That'll give you an idea of how to make it way more efficient and readable.

>> No.10244539

>>10244417
Whoa... that’s a awesome program dude! You’re really smart, and not a fucking retard at all. You should keep doing this, as it is a worthwhile way to spend your time!

>> No.10244540

>>10244539
Don't be such a Negatron anon. It's more productive than nothing and the guy has learnt some basic bio.

>> No.10244546
File: 460 KB, 1005x1120, Protein Domains.png [View same] [iqdb] [saucenao] [google]
10244546

>>10244526
I organized the code the way I organize my brain. I'm able to decipher it fine.

>> No.10244555

>>10244417
A lot of people shitting on your code, but leave no suggestions. So here's one: convert to binary algebra.

You only need one byte, not three. Structure should be nucleotide = 0b00zzyyxx. I don't know how much you know about this so a slow start:

Let's look at the first if statement, it has 4 different variations making up "-Gly:
1. x = 3, x = 0b11. y = 3, y = 0b11. z = 0, z = 0b00. so the entire number would be: 0b00001111=15

2. x = 3, y = 3, z = 1 -> 0b00011111 = 31

3. x = 3, y = 3, z = 2 -> 0b00101111 = 47

4. x = 3, y =1, z = 0 -> 0b00000111 = 7

The entire if statement would then be:

if(nucleotide == 15 || nucleotide == 31 || nucleotide == 47 || nucleotide == 7){System.out.print("-Gly")}

This would increase readability by a lot..

>> No.10244556

>>10244417
>>10244488
>>10244546
Are you sure you can read the code since if you get 3,3,3 you would print out -Gly-Ala with the code you posted.

>> No.10244563

>>10244555
Because of degeneracy of the codons, you'd find that much better if you reversed the codon order,

>> No.10244564

>>10244563
I have no idea what that is, I'm just a fellow cs engineer who sees poor code

>> No.10244575
File: 660 KB, 2001x2309, Translation.jpg [View same] [iqdb] [saucenao] [google]
10244575

>>10244556
Good eye. Alanine should be 313. I guess I should probably double check the program. Just created it two days ago. I recall copy and pasting the Ala if statement for other amino acids. There could be other errors as well.

>> No.10244581

>>10244564
See >>10244488 vs suggestion here >>10244537
The first two nucleotides of the codons never change for any variations of an amino acid. So in that example you could have it as (I think) the values 28 thru 31 for gly if you had z as the last two digits. Mildly neater. Easier to debug and things.

>> No.10244585

>>10244575
This is part of why we write neat code anon.

>> No.10244594

>>10244581
In fact, just noticed you've written case 4 wrong, all x and y should be 0b11 for -Gly. And to clarify I'm saying 0b00xxyyzz would work better.

>> No.10244631
File: 143 KB, 485x611, Protein Synthesis.jpg [View same] [iqdb] [saucenao] [google]
10244631

>>10244585
I've made other programs in a similar style. I always spot mistakes the second time I look at the code. It doesn't take too many rounds to spot all the mistakes. I prefer the code this way, because to me it's better organized. As is, the code shows the importance of each nucleotide as a separate entity and drives the point home that a codon is composed of three nucleotides. All 64 possible outcomes are illustrated.

There's an additional bonus piece of information, somewhat subtly presented. The smaller the amino acid, the higher it is on the list.

>> No.10244637

Stop using computers for this shit
We can't have the machine race being too knowledgeable about organics or they will know how to eliminate us when they gain consciousness

>> No.10244650
File: 38 KB, 817x322, numGenJava.jpg [View same] [iqdb] [saucenao] [google]
10244650

To help with syntax and number generation (since i realized you'd need to do a healthy mix of string and integer mixing which might be daunting):

1. Randomize the numbers x, y and z

2. Create a string "0b00"

3. Convert x, y and z to binary (google it).

4. Concatenate the string "00" + Integer.toString(x, 2) + ...

See pic related for full code, apparently 0b wasn't convention in java when it comes to converting.

>> No.10244656

>>10244417
>other stems make fun of compsci
>meanwhile they write code like that
why am I even surprised

>> No.10244720
File: 91 KB, 960x720, folding.jpg [View same] [iqdb] [saucenao] [google]
10244720

In any case, does anyone know a list of amino acid sequences divided into two categories 1. alpha helix 2. beta-pleated sheets? After fixing the mistakes in the primary structure program, I'd like to begin coding a program that generates the secondary structure.

Have all possible amino acid sequences for the secondary structure been determined? Has every possible sequence been determined whether it will fold into an alpha helix or beta sheet? If not, has it at least been conjectured how the polypeptide chain will fold?

>> No.10244755

>>10244656
This guy is neither nor. I think a general law should be "People write bad code" btw, I see more comp sci background bad code because they go into the industry. It's all equally as bad.

>> No.10244771
File: 92 KB, 960x720, coiling or folding.jpg [View same] [iqdb] [saucenao] [google]
10244771

>>10244720
This image probably should have been called "coiling" or "alpha helix."

>> No.10244772

use a lookup table jesus

>> No.10244792

>>10244417
>I need to know which sequences of amino acids fold into an alpha helix and which peptide bond chains fold into a beat-pleated sheet.
>I've read many books about proteins and peptides, nanotechnology, and molecular biology but no book has provided the necessary information. So I'm wondering: is there a book or database which provides a list of amino acid sequences which fold into an alpha helix or beta-pleated sheet?
protein folding is one of the most difficult problems in biology.

>> No.10244804
File: 73 KB, 960x720, Secondary Structure.jpg [View same] [iqdb] [saucenao] [google]
10244804

On second thought, maybe I'm going about this all wrong. Maybe categorizing the secondary structure into two categories isn't an effective way to reduce the protein search space. At the very least, it should help in creating a general protein framework.

One of the main hurdles in a quantitative approach to biology is that there are practically no textbooks on the topic. I've never found one specifically on proteins. Ideally, there would be five volumes: I. Primary structure II. Secondary Structure III. Protein domain IV. Tertiary structure V. Quartnernary structure.

Does anyone here know of any textbooks that do this? How online resources? Anything helpful?

>> No.10244807
File: 243 KB, 1016x675, klossycrap.png [View same] [iqdb] [saucenao] [google]
10244807

>>10244417
>that code
Holy fucking shit ahahahahhahahaha

>> No.10244836

>>10244807
this is not a csfag you retard

>> No.10244853

>>10244836
Implying you need to be a cs faggot to think this is shit.
Just define a multidimensional array/list for your gay-ass protein translation thing as lookup table and return a value array[x][y][z]

>> No.10244860

>>10244853
I know it's shit, but why did you post a csfag meme

>> No.10244861

>>10244417
If you actually wrote that code, literally just kill yourself

I am not joking

>> No.10244864

>>10244860
Klossy needs my cum on her face and she knows it too

>> No.10245467

>>>/g/69081112
Do yourself a favor OP and read through lol, also they find a mistake in your code

>> No.10245530

Some friendly advice OP:
Use hash tables. It cuts down on the size and requires less brain power. I'm not sure about java, but in C this would look something like: https://pastebin.com/g80fvj9t
Implementing it like this allows you to extend your program, in case somehow a new codon is discovered or some shit. Extensibility allows you to be lazy and makes your life easier.
Good work though!

>> No.10245531

>>10245530
I made a few errors given my haste and copious amounts of liquor, so view it more as pseudo-code

>> No.10245564

>>10244417
OP here, this was all just b8 to see how many butthurt csfags have infested this board.
yup looks like quite alot

>> No.10245576

>>10245564
I was just trying to give constructive criticism, but alright

>> No.10245658

This looks like a job for switch case!
Just append x y and z together and then go like
case 330:
case 331:
case 332:
case 333:
print "-Gly"
case 310:
case 311:
case 312:
case 313: (image shows 333 which would print -Gly. Bug?)
print "-Ala"

and so on.

>> No.10245660

>>10245658
Using switch with raw integers is bad form, and a hash table would be better

>> No.10245684

>>10245660
I'd argue that seeing he is going to have to build it by hand anyway, a switch case is simpler.
If the program were to populate itself then I'd use a hash table or other structure.

>> No.10245708

>>10245684
I would honestly always use a hash table. Especially in cases where the keys and values are constant. In higher level languages, these constructs are optimized to hell and back.

>> No.10245717

>>10245708
Switch cases are extremely fast.
But anyway, it hardly matters. Either way would be much easier to read and spot mistakes so he wouldn't have two codons being appended in one iteration like he currently does.

>> No.10245720

>>10245717
Yeah, I guess we're arguing over nothing

>> No.10245887

>>10244417
Hedious

>> No.10245902
File: 1.26 MB, 865x1645, 1505546396485.png [View same] [iqdb] [saucenao] [google]
10245902

>>10244417
Before you keep coding you should start reading.
You'll thank me later.

>> No.10246105
File: 1.25 MB, 400x400, 1528938172341.gif [View same] [iqdb] [saucenao] [google]
10246105

Hey OP check this out:
>>>/g/69081112

>> No.10246830
File: 68 KB, 960x720, Primary Structure.jpg [View same] [iqdb] [saucenao] [google]
10246830

>>10245467
There were actually two mistakes. As mentioned earlier, code from alanine was copied to other amino acids.

>>10245530
>in case somehow a new codon is discovered
A codon is made of three nucleotides. Uracil, Cytosine, Adanine, or Guanine. Making the search space 4^3 = 64. This is true of single celled organisms such as bacteria, as well as humans.

There isn't anything new to discover when it comes to the primary structure.

>> No.10246834

>>10246830
I was aware of that, hence why I put 'or some shit' right after what you quoted

>> No.10246921

>>10246830
>A codon is made of three nucleotides. Uracil, Cytosine, Adanine, or Guanine.
Look up orthogonal nucleotides/subunits. Also RNA has quite a few more options that just UACG.

>> No.10246958

>>10245708
I mean I guess it depends, I want to say when I quickly prototype/sketch out something maybe I wouldn't but actually I always use a hash table or something similar. So much quicker to make and organise.

>> No.10247297

>>10244459
But this isnt a loop, dumbass. Unrolling loops is supposed to remove the check for stopping the loop for each execution of the loop.

>> No.10247300

Post your complete code, please. (use pastebin or github)

>> No.10247303

>>10244417
Stop coding

Sincerely,

/g/

>> No.10247306
File: 29 KB, 349x642, db0[1].jpg [View same] [iqdb] [saucenao] [google]
10247306

>>10245564

>> No.10247309

>>10244417
let's show people how much of a smartass I am
>proceed to post on /sci/ an absolutely retarded piece of code that no one in his right mind would try to understand

>> No.10247869

Yandev would be proud...

>> No.10247877

>>10244417
looks like babby's first day programming course exercise. That's just terrible

>> No.10247878

>>10244417
This is shitty code, not to mention it's written in Java, the worst programming language you could use for this type of thing.
Use C.

>> No.10248500

>>10247878
R is better for sciences

>> No.10248508

>>10244417
So there's seriously no information on when an amino acid sequence folds into an alpha helix or beta pleated sheet? That's stuff I heard in high school, there has to be something.

>> No.10248510

>>10244417
Ooooooh my God you used if statements for everything, you have to be trolling. At the very least you could've used switches.

>> No.10248530

>>10244417
obvious bait

>> No.10248566
File: 24 KB, 400x400, a060ac6fdb6d76102917d822f3892951_400x400.jpg [View same] [iqdb] [saucenao] [google]
10248566

>>10248500
Python is better for everything ever

>> No.10248676
File: 38 KB, 727x480, 1528835745426.jpg [View same] [iqdb] [saucenao] [google]
10248676

>>10248500
LOL

>> No.10248851

>>10247878
C is horrible for string manipulation.

>> No.10248879

>>10245902
meme list

>> No.10248888

>>10248566
If you're a scriptlet.

>> No.10248903

>>10244417

I don't get it. You go on doing all this shit, when there's millions of software that already do this and they do it much better?
https://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software
I've read that artificial intelligence is already predicting this shit very accurately, but I'm too much of a lazy asshole to search for the source.
What is the goal you had in mind when you started to work on this project?

>> No.10248912

I don't understand this. You claim to have read extensively about proteins, but somehow never encountered at least 1 chapter regarding secondary structures and the amino acids contained in them?
I mean, a fucking simple search in google for "alpha helix" lands you on the wikipedia article.
In this wikipedia article:
https://en.wikipedia.org/wiki/Alpha_helix#Amino-acid_propensities
If you're a smart guy you'll read that and you'll follow the references to read the papers and then you'll find out more on the literature regarding the topic.
Secondary structure has been figured out more or less, with many programs able to predict accurately which stretches of sequence will become what. It's the actual 3D folding that is the difficult part.
But I'm quite sure that AlphaFold, Google's crack at the folding problem with its deep learning AI, will get close to figuring it out.

Also:
http://predictioncenter.org/index.cgi

Also:
Check out these guys' labs, they're doing some next level shit with proteins
https://www.bakerlab.org/
http://yeateslab.mbi.ucla.edu/

>> No.10249149

>>10248903
Google's DeepMind won an award recently folding proteins.

>> No.10249407
File: 129 KB, 1389x583, Helix Propensity.png [View same] [iqdb] [saucenao] [google]
10249407

>>10248912
>Helices observed in proteins can range from four to over forty residues long, but a typical helix contains about ten amino acids (about three turns).
This sentence in the Wikipedia article is relevant to creating a program that categorizes amino acid sequences into two categories 1. Alpha helix 2. Beta-pleated sheet. Unfortunately, there's no source for the quoted sentence. Also, I need more specific information that. I can figure out the search space based on the information provided by the quoted sentence (assuming it's accurate), but that's about it.

The paragraph on Amino-acid propensities is relevant, but doesn't get anywhere near the precision I'm looking for. At least there's a source cited. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1299714/
Pic related. The problem with this source though is that it only evaluates individual amino acids, rather than sequences of amino acids. Even then, only approximations are provided. The article is also twenty years old. It's hard to believe there hasn't been more detailed and precise data gathered since then.

>>10248903
>>10249149
What I'd like to do is create a perfect information program. Google Deep Mind and others only approximate.

Everyone tries to jump straight to proteins. It would be more logical to focus on the secondary structure first. Even focusing just on protein domains would be better than the current approach.

>> No.10249437
File: 57 KB, 750x750, hmm.jpg [View same] [iqdb] [saucenao] [google]
10249437

>>10244417
Isn't there a basic terminator gene set for every protein and enzyme section that is transcribes so there isn't excess building block materials built in newly made proteins or enzymes

Can't you go off those ending points?

>> No.10249449

>>10244417
>go on NCBI BLAST
>find a protein you like
>find a region that has your desired structures
>find the sequence for that region (theyre normally quite bicely organized)
>copy it
>repeat with other molecules for greater variety