Auxiliary - Bioinformatics Project Meeting Agenda
FGCZ - (Draft)
25 February, 2026
Source:vignettes/Auxiliary_ExDesignSurvey.Rmd
Auxiliary_ExDesignSurvey.RmdProject Nr :
Order Id :
Participants:
- User :
- Bioinformatician :
- Coach :
When?
Comments :
Overview of the analysis
In label-free quantification (LFQ) experiments, we measure relative protein changes among groups of subjects. Before the main experiment, a quality control (QC) experiment is performed. We use the data from the QC to estimate the within-group variance and to determine the number of samples in the main experiment. The measurements of the main experiment are modeled using linear models, mixed-effects linear models, which can be thought of as an extension of the t-test to more than two groups. These estimates are used to perform Gene Set Enrichment Analysis and Overrepresentation analysis using WebGestaltR and sigora. The meeting aims to obtain and document all the information relevant to perform a statistically sound LFQ data analysis.
Type of proteomics experiment
-
- Pro : robust, can always be applied
- Con : fewer ids and quant values than with other methods
-
- Pro : improved sensitivity, works better if more samples are measured.
- Con : needs at least 8 samples
-
- Pro : improves sensitivity (more proteins identified) by fractionating samples
- Con : fold change compression
Aim of the meeting
The goal of the meeting is this protocol specifying:
- which protein database to use
- which and how many samples to use for the QC
- determine all the parameters for the sample size estimation
- the design of your main experiment
- all the names of the factors and factor levels
- specify the hypothesis to be tested
The samples should be annotated with the names specified in the protocol in the b-fabric system. These names will also appear in the result tables and visualizations.
We perform the data analysis according to this protocol. Changing the analysis protocol (e.g. adding contrasts or blocking factors) after the analysis was delivered will incur time delays and additional costs.
Genomics Analysis
Did you perform a genomics experiment on the same samples?
What type of experiment?
- Was it run at the FGCZ?
- project Id:
- order Id:
- Bioinformatician :
- Are you interested in Proteomics Genomics data integration?
Note:
- NGS data can be used to create improved protein identification databases.
- If NGS - proteomics data integration is planned, compatible statistical analysis should be applied to the NGS and Proteomics data.
Protein sequence database and downstream analysis tools
Which sequence database should be used for protein identification?
Which organism?
- other :
Should additional protein sequences be added to the database?
Contaminant list?
- other :
Which downstream analysis tools will be used?
- other :
Note:
Typically we are using protein databases from uniprot.org
Ideally the identifiers in the protein database are compatible with the downstream analysis tool you intend to use. For instance WebGestalt, although it recognizes many types of identifiers: e.g., Uniprot Swiss-Prot, Gene Symbols, NCBI Entrez gene, it does not work with Uniprot Treml identifiers.
Effect size and size of test
These parameters are crucial for the QC experiment and will be assessed again when protein variance estimates from the QC experiment are available. They also will be used when visualizing the results in the final report (e.g., Volcano plots) or when running overrepresentation analysis (ORA).
Effect size
What is the smallest effect size (i.e., fold change in protein abundance) of biological relevance, you are interested in detecting? Typically fold changes are , , or . This information is needed to determine the sample size as well as to biologically interpret the data.
The number of samples you need to use highly depends on the effect size you want to detect. Detecting small effect sizes requires you to measure more samples. On the other hand, the statistical significance of a fold change does not imply biological relevance. Consequently, do not measure more samples than is required.
- other :
Size of the test1
Typically a difference between two conditions is considered to be significant if the p-Value is less than . For some applications a less significant p-Value can be used (e.g., ) for others a smaller p-Value might be required.
- other :
Power of test
The power of a binary hypothesis test is the probability that the test rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true Wikipedia
- other :
Does the variance within groups differ?
Is the variance within the groups the same or different? E.g., do you expect some groups with larger within group differences?
If yes, which of the group’s might have the highest within-group variance? (Some treatment might have heterogeneous outcomes.)
If possible, use samples from this group for the QC experiment, which will enable you to get a conservative estimate of the required samples sizes. If the variance in one group is substantially higher, the power of the tests can be improved by allocating more samples to this group.
Difference in protein variances
From the QC experiment, we can estimate for each protein a within-group variance. For some proteins, we observe more variation within a group compared to other proteins.
Are there proteins or protein groups of particular interest?
Provide protein IDs:
Do you want to obtain significant p-Values for proteins with large within-group variances, or would you be satisfied with getting significances for the of low variance proteins?
- other :
Note:
- High abundant proteins will exhibit smaller technical variability compared with low abundant proteins.
Designs
This section covers the most frequently used designs.
Parallel Group Design
How many groups?
Factor name (camel case3 e.g., Treatment) :
Level names (camel case e.g. Control, Diet, Creme):
Factorial Design4
Applies when studying how two or more factors influence protein expression, e.g., and .
- How many factors? Typically 2 rarely 3.
- How many levels per factor? Typically 2 or 3 per factor.
| Factor Name | Number levels | level names |
|---|---|---|
- Provide computer-friendly names of the factors (grouping variables) in camel case.
- Provide names of levels for each grouping variable.
Example:
| Factor Name | Number levels | level names |
|---|---|---|
| Treatment | 3 | Control, Diet, Creme |
| AgeGroup | 2 | young, old |
Repeated measurements
Applies when measurements are repeated on the same subject, e.g. over time 5.
If yes please specify a computer-friendly factor name using camel case: e.g., PatientId.
Name of the factor :
The subject identifiers can contain upper and lower case letters and numbers but avoid special characters and white spaces. Neither the factor names or the levels of the blocking factors will be shown in the results or figures generated.
Blocking factors
A blocking factor is some variable that affects an experimental outcome but is itself of no interest, e.g., sample batch 6. Including blocking-factors can improve the statistical model.
| Factor Name | Number levels |
|---|---|
Contrasts - Hypothesis to be tested
If there are more than two groups, specify the hypothesis to be tested (comparison) using the factor and factor level names.
Contrasts to be computed:
Examples for parallel Group design (only one factor):
- Diet - Control
- Creme - Control
Example for factorial designs:
- Treatment_Control - Treatment_Creme
- AgeGroup_young - AgeGroup_old
- Treatment_Control:AgeGroup_young - Treatment_Control:AgeGroup_old
Note:
If one group is used in more comparisons than all other groups, it will benefit the power of the test 7 to allocate more samples to this group, since increasing the number of degrees of freedom in this group will benefit many comparisons.
Limit the number of hypotheses you test. Multiplicity - each comparison is a test of hypothesis. Therefore, it is required to perform a p-Value adjustment. For instance, using the Bonferroni correction, the obtained p-Values are multiplied by the number of the hypothesis tested. Given two comparisons, a p-Value of 0.03 after Bonferroni adjustment is , i.e., not significant given a test size of . Therefore, do not test all possible hypothesis! For instance, if including the contrast Diet - Creme, it is required to multiply the p-Values with the factor . i.e., a p-Value of after Bonferroni adjustment is - i.e., not significant.
Factorial designs generate many interactions (e.g., Treatment_Control:AgeGroup_young). A factorial design generates groups (unique combinations of factors) which makes it possible to specify many contrasts. Limit the number of hypotheses ideally to few (one or two) per experiment. It is required to adjust the p-Values based on the number of hypotheses examined.
Typically in factorial designs, tests for differences among main effects (differences between levels within a single factor) and second-order interactions of the factors.
main effects : Treatment_Control - Treatment_Diet
Test for interaction : (Treatment_Control:AgeGroup_young - Treatment_Control:AgeGroup_old) - (Treatment_Diet:AgeGroup_young - Treatment_Diet:AgeGroup_old)
More designs
- Time Series, e.g., growth curves - require at least 4-5 different well-designed time-points. When less than 4-time points are measured than the time points are typically treated as factors and can be analyzed using our standard pipeline for factorial designs.
We might be able to support you with some other hypothesis tests, depending on our availability.
- Dose-response curves
- Survival analysis