/sci/ - Science & Math

File: 99 KB, 1000x1500, 1576473231702.jpg [View same] [iqdb] [saucenao] [google]

Anonymous Sat Feb 22 17:51:02 2020 No.11408740 [Reply] [Original]

>biology paper has a small sample size
how do these experiments even get approved

>>	Anonymous Sat Feb 22 17:52:35 2020 No.11408744 >>11408740 Because there's nothing else. That's why bioinformatics is going to destroy everything

>>	Anonymous Sat Feb 22 17:56:08 2020 No.11408749 >>11408740 >approved I'm not sure you understand how publishing scientific studies works, wojshit.

>>	Anonymous Sat Feb 22 18:52:15 2020 No.11408897 >>11408749 retard alert

>>	Anonymous Sun Feb 23 12:54:18 2020 No.11410854 >>11408740 Depends on what you mean by small sample size. Small sample sizes aren't a problem if you know what you're doing.

>>	Anonymous Sun Feb 23 18:27:01 2020 No.11411849 >>11410854 >biologists >knowing what they're doing

>>	Anonymous Sun Feb 23 19:22:56 2020 No.11411990 >>11410854 >Small sample sizes aren't a problem if you know what you're doing. ?

>>	Anonymous Sun Feb 23 19:29:28 2020 No.11411999 It's fine just bootstrap that shit

>>	Anonymous Sun Feb 23 19:30:14 2020 No.11412001 >applied chemistry not science

Anonymous Sun Feb 23 19:31:57 2020 No.11412004

>>11411990
For example, if you want to prove the existence of a thing, a sample size of 1 is sufficient. Now, to prove the thing is useful for some reason, you do need more samples. Though similarly, few samples may be sufficient to prove that a theory does not apply within its assumed constraints (then you could have also constructed an artificial example, but you use the samples to demonstrate that the construction also holds in real life).

That said,

>>11411849
This.

>>11411999
Now you're thinking like a statistishit.

>>11408744
This but not really. Bioinformaticians at least kindof know their shit in most cases, but they're struggling hard because of the shit data generated by >>11408740
Think few samples is an issue? Now you get few samples and half the data in each sample is randomly corrupted. Is it time series data? The time points are sampled at basically random distances plus some random time lag (which isn't recorded), and for all you know the mechanism you care about has happened (or has predictive signal) only between the points that are sampled.

Anonymous Sun Feb 23 19:38:58 2020 No.11412012
File: 17 KB, 659x431, Brain_weight_age.gif [View same] [iqdb] [saucenao] [google]

>>11411990
if you know what you're doing and know what result you want to get then you only need to sample until you get the result that you're looking for,
like if you're looking for how marie curie discredits how all women are stupid, then don't go around sampling women until you find a smart one and bunch of stupid ones, just go direct to marie curie and thats all the evidence you need.

Anonymous Sun Feb 23 19:47:44 2020 No.11412029

>>11412004
>This but not really. Bioinformaticians at least kindof know their shit in most cases, but they're struggling hard because of the shit data generated by >>11408740 (OP)
>Think few samples is an issue? Now you get few samples and half the data in each sample is randomly corrupted. Is it time series data? The time points are sampled at basically random distances plus some random time lag (which isn't recorded), and for all you know the mechanism you care about has happened (or has predictive signal) only between the points that are sampled.
Yes I'm literally having that problem.
Biologists are atrociously bad at collecting data. It's taken me ages to clean it.

>>	Anonymous Sun Feb 23 19:50:36 2020 No.11412038 >>11412029 >>11411999 middle school janitor here, what does it mean to clean data?

>>	Anonymous Sun Feb 23 19:57:08 2020 No.11412057 >>11412038 Basically you sort out the parts that look incomplete and fits neatly into rows and columns etc.

>>	Anonymous Sun Feb 23 19:58:27 2020 No.11412059 >>11412057 what do you mean by sort out, like throw out if it doesnt fit your hypothesis?

>>	Anonymous Sun Feb 23 20:00:19 2020 No.11412064 >>11412059 No that's what biologists do. So I have to undo it. What we want is to have all the data there and let the software tell us through statistics and modeling if it fits the hypothesis or not.

Anonymous Sun Feb 23 20:02:59 2020 No.11412069

>>11412059
It depends on the data. For example, proteomics data is composed of centroided raw spectra (looks like a list of peak positions and intensity pairs). It is common (despite being a huge mistake) to do a rolling 100-or-so-peaks window on the spectrum and to remove any peaks whose intensity isn't the maximum intensity in any window. That is because in older instruments, the electric noise level was quite high and generated spurious peaks, and that peak intensity corresponding to frequency of observations. This practice is still common today despite the fact intensity has not been correlated with presence in any meaningful way for a very long time, and the electric noise baseline of modern instrument is negligible at best. As a result it ends up removing many important peaks and not taking care of the noise peaks much.

Anonymous Sun Feb 23 20:04:35 2020 No.11412073

>>11412069
As
>>11412064
says:
>What we want is to have all the data there and let the software tell us through statistics and modeling if it fits the hypothesis or not.
Exactly this.
Unfortunately that's nowhere near possible right now, it's a huge political game to convince them to collect the data correctly, and experiments can cost from thousands to millions to rerun with proper collection standards.

>>	Anonymous Sun Feb 23 22:03:47 2020 No.11412362 >>11408897 Some journals will publish anything if it's saucy enough. go on webMD right now you will find some piss poor papers

>>	Anonymous Sun Feb 23 22:09:10 2020 No.11412377 >>11412362 approved can mean a lot of things, like given funding

>>	Anonymous Sun Feb 23 22:22:40 2020 No.11412406 >>11412362 You can just go check nature publications if you want piss poor papers.

>>	Anonymous Mon Feb 24 02:33:16 2020 No.11412812 >>11412029 Damn, makes me thankful that physicsfags are good with recorded everything they do in experiment.

>>	Anonymous Mon Feb 24 16:43:31 2020 No.11414566 Biofag here how do I record based data

Anonymous Mon Feb 24 16:53:28 2020 No.11414586

>>11414566
Depends what kind of thing you work with. As a rule of thumb,
- Don't discard any reading
- Don't process any reading prior to storage (store both the raw and the usual processed versions)
- Sample at equal timesteps
- Sample as finely as possible
There are other issues though. For example if you're doing drug assays or such, it's usual to keep DMSO on the two columns on each side of the slide, despite the fact that it's well-known that edge effects will cause more evaporation on those same columns than in the middle. The proper method would be to randomize the position of DMSOs on the plate and to record that information but it's not practical without rethinking the master plate (or the motherplate). A recent new technology using a soundwave to acoustically push the droplet into the well of the plate currently exists and it's a good bet to alleviate these kinds of concerns, but usual biofags say "muh protocol" and refuse to use it.

>>	Anonymous Mon Feb 24 17:05:14 2020 No.11414611 that's why i love being a microbiologist, way more sample sizes. although counting and sorting thousands of colonies does get monotonous at times.

Anonymous Mon Feb 24 17:49:51 2020 No.11414713

>>11410854
I've seen many neuroscientific research papers with n=3-5 in each experimental group, and then just comparing the means with a t-test. They usually want to propose some new effect or a mechanism.

Would you make assumptions about a larger population, based on just 3-5 individuals?

In most cases, unless explained sufficiently in materials and methods, I interpret small sample sizes as selecting and manipulating the data - the researchers have probably done more experiments and just selected the measurements that fit their cause.

Anonymous Mon Feb 24 19:46:28 2020 No.11414972

Anyone have good R packages which can add properties to a list of drug cas numbers?
I have these cas numbers but it's really hard to do a chemoinformatic analysis because I'm bad at coding and it takes me ages to parse the properties from the web pages of pubchem after searching the cas numbers individually.
I know a package that does it automatically but I don't know how to apply it to every cas number in my excel file so basically it would be the same thing that I'm doing if I used that

Anonymous Mon Feb 24 19:58:10 2020 No.11415002

>>11408740
>>11411990
Large sample sizes are only needed in specific circumstances. You don't need to see a thousand elephants and a thousand dogs to see that one is bigger than the others. Big sample sizes are generally needed in studies involving rather rare random events, not when you compare easily measurable things. (Large sample sizes will likely mean the effect might be an artifact and is certainly not practically relevant.)

>>	Anonymous Mon Feb 24 20:07:47 2020 No.11415022 >>11415002 And this is why virtually every biology paper is bullshit, everybody. Literal inbreds like this poster.

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]