[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 99 KB, 1000x1500, 1576473231702.jpg [View same] [iqdb] [saucenao] [google]
11408740 No.11408740 [Reply] [Original]

>biology paper has a small sample size
how do these experiments even get approved

>> No.11408744

>>11408740
Because there's nothing else.
That's why bioinformatics is going to destroy everything

>> No.11408749

>>11408740
>approved
I'm not sure you understand how publishing scientific studies works, wojshit.

>> No.11408897

>>11408749
retard alert

>> No.11410854

>>11408740
Depends on what you mean by small sample size. Small sample sizes aren't a problem if you know what you're doing.

>> No.11411849

>>11410854
>biologists
>knowing what they're doing

>> No.11411990

>>11410854
>Small sample sizes aren't a problem if you know what you're doing.
?

>> No.11411999

It's fine just bootstrap that shit

>> No.11412001

>applied chemistry
not science

>> No.11412004

>>11411990
For example, if you want to prove the existence of a thing, a sample size of 1 is sufficient. Now, to prove the thing is useful for some reason, you do need more samples. Though similarly, few samples may be sufficient to prove that a theory does not apply within its assumed constraints (then you could have also constructed an artificial example, but you use the samples to demonstrate that the construction also holds in real life).

That said,

>>11411849
This.

>>11411999
Now you're thinking like a statistishit.

>>11408744
This but not really. Bioinformaticians at least kindof know their shit in most cases, but they're struggling hard because of the shit data generated by >>11408740
Think few samples is an issue? Now you get few samples and half the data in each sample is randomly corrupted. Is it time series data? The time points are sampled at basically random distances plus some random time lag (which isn't recorded), and for all you know the mechanism you care about has happened (or has predictive signal) only between the points that are sampled.

>> No.11412012
File: 17 KB, 659x431, Brain_weight_age.gif [View same] [iqdb] [saucenao] [google]
11412012

>>11411990
if you know what you're doing and know what result you want to get then you only need to sample until you get the result that you're looking for,
like if you're looking for how marie curie discredits how all women are stupid, then don't go around sampling women until you find a smart one and bunch of stupid ones, just go direct to marie curie and thats all the evidence you need.

>> No.11412029

>>11412004
>This but not really. Bioinformaticians at least kindof know their shit in most cases, but they're struggling hard because of the shit data generated by >>11408740 (OP)
>Think few samples is an issue? Now you get few samples and half the data in each sample is randomly corrupted. Is it time series data? The time points are sampled at basically random distances plus some random time lag (which isn't recorded), and for all you know the mechanism you care about has happened (or has predictive signal) only between the points that are sampled.
Yes I'm literally having that problem.
Biologists are atrociously bad at collecting data. It's taken me ages to clean it.

>> No.11412038

>>11412029
>>11411999
middle school janitor here, what does it mean to clean data?

>> No.11412057

>>11412038
Basically you sort out the parts that look incomplete and fits neatly into rows and columns etc.

>> No.11412059

>>11412057
what do you mean by sort out, like throw out if it doesnt fit your hypothesis?

>> No.11412064

>>11412059
No that's what biologists do. So I have to undo it.
What we want is to have all the data there and let the software tell us through statistics and modeling if it fits the hypothesis or not.

>> No.11412069

>>11412059
It depends on the data. For example, proteomics data is composed of centroided raw spectra (looks like a list of peak positions and intensity pairs). It is common (despite being a huge mistake) to do a rolling 100-or-so-peaks window on the spectrum and to remove any peaks whose intensity isn't the maximum intensity in any window. That is because in older instruments, the electric noise level was quite high and generated spurious peaks, and that peak intensity corresponding to frequency of observations. This practice is still common today despite the fact intensity has not been correlated with presence in any meaningful way for a very long time, and the electric noise baseline of modern instrument is negligible at best. As a result it ends up removing many important peaks and not taking care of the noise peaks much.

>> No.11412073

>>11412069
As
>>11412064
says:
>What we want is to have all the data there and let the software tell us through statistics and modeling if it fits the hypothesis or not.
Exactly this.
Unfortunately that's nowhere near possible right now, it's a huge political game to convince them to collect the data correctly, and experiments can cost from thousands to millions to rerun with proper collection standards.

>> No.11412362

>>11408897
Some journals will publish anything if it's saucy enough. go on webMD right now you will find some piss poor papers

>> No.11412377

>>11412362
approved can mean a lot of things, like given funding

>> No.11412406

>>11412362
You can just go check nature publications if you want piss poor papers.

>> No.11412812

>>11412029
Damn, makes me thankful that physicsfags are good with recorded everything they do in experiment.

>> No.11414566

Biofag here how do I record based data

>> No.11414586

>>11414566
Depends what kind of thing you work with. As a rule of thumb,
- Don't discard any reading
- Don't process any reading prior to storage (store both the raw and the usual processed versions)
- Sample at equal timesteps
- Sample as finely as possible
There are other issues though. For example if you're doing drug assays or such, it's usual to keep DMSO on the two columns on each side of the slide, despite the fact that it's well-known that edge effects will cause more evaporation on those same columns than in the middle. The proper method would be to randomize the position of DMSOs on the plate and to record that information but it's not practical without rethinking the master plate (or the motherplate). A recent new technology using a soundwave to acoustically push the droplet into the well of the plate currently exists and it's a good bet to alleviate these kinds of concerns, but usual biofags say "muh protocol" and refuse to use it.

>> No.11414611

that's why i love being a microbiologist, way more sample sizes. although counting and sorting thousands of colonies does get monotonous at times.

>> No.11414713

>>11410854
I've seen many neuroscientific research papers with n=3-5 in each experimental group, and then just comparing the means with a t-test. They usually want to propose some new effect or a mechanism.

Would you make assumptions about a larger population, based on just 3-5 individuals?

In most cases, unless explained sufficiently in materials and methods, I interpret small sample sizes as selecting and manipulating the data - the researchers have probably done more experiments and just selected the measurements that fit their cause.

>> No.11414972

Anyone have good R packages which can add properties to a list of drug cas numbers?
I have these cas numbers but it's really hard to do a chemoinformatic analysis because I'm bad at coding and it takes me ages to parse the properties from the web pages of pubchem after searching the cas numbers individually.
I know a package that does it automatically but I don't know how to apply it to every cas number in my excel file so basically it would be the same thing that I'm doing if I used that

>> No.11415002

>>11408740
>>11411990
Large sample sizes are only needed in specific circumstances. You don't need to see a thousand elephants and a thousand dogs to see that one is bigger than the others. Big sample sizes are generally needed in studies involving rather rare random events, not when you compare easily measurable things. (Large sample sizes will likely mean the effect might be an artifact and is certainly not practically relevant.)

>> No.11415022

>>11415002
And this is why virtually every biology paper is bullshit, everybody. Literal inbreds like this poster.