Auxiliary - Bioinformatics Project Meeting Agenda • prolfquapp

Overview of the analysis

In label-free quantification (LFQ) experiments, we measure relative protein changes among groups of subjects. Before the main experiment, a quality control (QC) experiment is performed. We use the data from the QC to estimate the within-group variance and to determine the number of samples in the main experiment. The measurements of the main experiment are modeled using linear models, mixed-effects linear models, which can be thought of as an extension of the t-test to more than two groups. These estimates are used to perform Gene Set Enrichment Analysis and Overrepresentation analysis using WebGestaltR and sigora. The meeting aims to obtain and document all the information relevant to perform a statistically sound LFQ data analysis.

Type of proteomics experiment

data dependent aquisition (DDA) experiment (robust method, typically used for affinity purification experiments)
- Pro : robust, can always be applied
- Con : fewer ids and quant values than with other methods
data independent aquisition (DIA) experiment (improved sensitivity)
- Pro : improved sensitivity, works better if more samples are measured.
- Con : needs at least 8 samples
tandem mass tag (TMT) experiment
- Pro : improves sensitivity (more proteins identified) by fractionating samples
- Con : fold change compression

Sample modifications

Which fixed an variable modfications are used:

Variable Modifictions: - [ ] acetyl - [ ] biotin - [ ] deamidation - [ ] dimethyl - [ ] formylation - [ ] methylation - [ ] O - Glycosylation HexNAc

Aim of the meeting

The goal of the meeting is this protocol specifying:

which protein database to use
which and how many samples to use for the QC
determine all the parameters for the sample size estimation
the design of your main experiment
all the names of the factors and factor levels
specify the hypothesis to be tested

The samples should be annotated with the names specified in the protocol in the b-fabric system. These names will also appear in the result tables and visualizations.

We perform the data analysis according to this protocol. Changing the analysis protocol (e.g. adding contrasts or blocking factors) after the analysis was delivered will incur time delays and additional costs.

Genomics Analysis

Did you perform a genomics experiment on the same samples?

no
yes

What type of experiment?

$~$

Was it run at the FGCZ?
project Id:
order Id:
Bioinformatician :
Are you interested in Proteomics Genomics data integration?

$~$

Note:

NGS data can be used to create improved protein identification databases.
If NGS - proteomics data integration is planned, compatible statistical analysis should be applied to the NGS and Proteomics data.

Protein sequence database and downstream analysis tools

Which sequence database should be used for protein identification?

Which organism?

homo sapiens
mus musculus
yeast
zebra-fish
other :

Should additional protein sequences be added to the database?

$~$

Contaminant list?

Complete
Without human contaminants
other :

Which downstream analysis tools will be used?

WebGestalt (WEB-based GEne SeT AnaLysis Toolkit)
string-db (Known and predicted protein-protein interactions)
other :

$~$

Note:

Typically we are using protein databases from uniprot.org
Ideally the identifiers in the protein database are compatible with the downstream analysis tool you intend to use. For instance WebGestalt, although it recognizes many types of identifiers: e.g., Uniprot Swiss-Prot, Gene Symbols, NCBI Entrez gene, it does not work with Uniprot Treml identifiers.

Effect size and size of test

These parameters are crucial for the QC experiment and will be assessed again when protein variance estimates from the QC experiment are available. They also will be used when visualizing the results in the final report (e.g., Volcano plots) or when running overrepresentation analysis (ORA).

Effect size

What is the smallest effect size (i.e., fold change in protein abundance) of biological relevance, you are interested in detecting? Typically fold changes are $1.5$ , $2$ , or $4$ . This information is needed to determine the sample size as well as to biologically interpret the data.

The number of samples you need to use highly depends on the effect size you want to detect. Detecting small effect sizes requires you to measure more samples. On the other hand, the statistical significance of a fold change does not imply biological relevance. Consequently, do not measure more samples than is required.

1.5
2
4
other :

Size of the test¹

Typically a difference between two conditions is considered to be significant if the p-Value is less than $0.05$ . For some applications a less significant p-Value can be used (e.g., $0.1$ ) for others a smaller p-Value might be required.

0.01
0.05
0.1
other :

Power of test

The power of a binary hypothesis test is the probability that the test rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true Wikipedia

0.8
other :

Does the variance within groups differ?

Is the variance within the groups the same or different? E.g., do you expect some groups with larger within group differences?

yes
no

If yes, which of the group’s might have the highest within-group variance? (Some treatment might have heterogeneous outcomes.)

$~$ $~$

If possible, use samples from this group for the QC experiment, which will enable you to get a conservative estimate of the required samples sizes. If the variance in one group is substantially higher, the power of the tests can be improved by allocating more samples to this group.

Difference in protein variances

From the QC experiment, we can estimate for each protein a within-group variance. For some proteins, we observe more variation within a group compared to other proteins.

Are there proteins or protein groups of particular interest?

Provide protein IDs:

Do you want to obtain significant p-Values for proteins with large within-group variances, or would you be satisfied with getting significances for the $X\%$ of low variance proteins?

$50\%$
$70\%$
other :

Note:

High abundant proteins will exhibit smaller technical variability compared with low abundant proteins.

Designs

This section covers the most frequently used designs.

Parallel Group Design - Evaluates a single factor²
Factorial Design - Evaluates multiple factors simultaneously

Parallel Group Design

How many groups?

2
more. How many?

Factor name (camel case³ e.g., Treatment) :

Level names (camel case e.g. Control, Diet, Creme):

Factorial Design⁴

Applies when studying how two or more factors influence protein expression, e.g., $Knockout$ and $Treatment$ .

How many factors? Typically 2 rarely 3.
How many levels per factor? Typically 2 or 3 per factor.

Factor Name	Number levels	level names $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$
$~$
$~$
$~$
$~$

Provide computer-friendly names of the factors (grouping variables) in camel case.
Provide names of levels for each grouping variable.

Example:

Factor Name	Number levels	level names $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$
Treatment	3	Control, Diet, Creme
AgeGroup	2	young, old

Repeated measurements

Applies when measurements are repeated on the same subject, e.g. over time ⁵.

Yes
No

If yes please specify a computer-friendly factor name using camel case: e.g., PatientId.

Name of the factor :

The subject identifiers can contain upper and lower case letters and numbers but avoid special characters and white spaces. Neither the factor names or the levels of the blocking factors will be shown in the results or figures generated.

Blocking factors

A blocking factor is some variable that affects an experimental outcome but is itself of no interest, e.g., sample batch ⁶. Including blocking-factors can improve the statistical model.

Factor Name	Number levels
$~$
$~$
$~$
$~$

Further information

$~$

Contrasts - Hypothesis to be tested

If there are more than two groups, specify the hypothesis to be tested (comparison) using the factor and factor level names.

Contrasts to be computed:

$~$

Examples for parallel Group design (only one factor):

Diet - Control
Creme - Control

Example for factorial designs:

Treatment_Control - Treatment_Creme
AgeGroup_young - AgeGroup_old
Treatment_Control:AgeGroup_young - Treatment_Control:AgeGroup_old

Note:

If one group is used in more comparisons than all other groups, it will benefit the power of the test ⁷ to allocate more samples to this group, since increasing the number of degrees of freedom in this group will benefit many comparisons.
Limit the number of hypotheses you test. Multiplicity - each comparison is a test of hypothesis. Therefore, it is required to perform a p-Value adjustment. For instance, using the Bonferroni correction, the obtained p-Values are multiplied by the number of the hypothesis tested. Given two comparisons, a p-Value of 0.03 after Bonferroni adjustment is $0.06$ , i.e., not significant given a test size of $0.05$ . Therefore, do not test all possible hypothesis! For instance, if including the contrast Diet - Creme, it is required to multiply the p-Values with the factor $3$ . i.e., a p-Value of $0.2$ after Bonferroni adjustment is $0.6$ - i.e., not significant.

Factorial designs generate many interactions (e.g., Treatment_Control:AgeGroup_young). A $3\times 2$ factorial design generates $6$ groups (unique combinations of factors) which makes it possible to specify many contrasts. Limit the number of hypotheses ideally to few (one or two) per experiment. It is required to adjust the p-Values based on the number of hypotheses examined.
Typically in factorial designs, tests for differences among main effects (differences between levels within a single factor) and second-order interactions of the factors.
main effects : Treatment_Control - Treatment_Diet
Test for interaction : (Treatment_Control:AgeGroup_young - Treatment_Control:AgeGroup_old) - (Treatment_Diet:AgeGroup_young - Treatment_Diet:AgeGroup_old)

More designs

Time Series, e.g., growth curves - require at least 4-5 different well-designed time-points. When less than 4-time points are measured than the time points are typically treated as factors and can be analyzed using our standard pipeline for factorial designs.

We might be able to support you with some other hypothesis tests, depending on our availability.

Dose-response curves
Survival analysis

Auxiliary - Bioinformatics Project Meeting Agenda

FGCZ - (Draft)

07 May, 2026

Overview of the analysis

Type of proteomics experiment

Sample modifications

Aim of the meeting

Genomics Analysis

Protein sequence database and downstream analysis tools

Effect size and size of test

Effect size

Size of the test¹

Power of test

Does the variance within groups differ?

Difference in protein variances

Designs

Parallel Group Design

Factorial Design⁴

Repeated measurements

Blocking factors

Further information

Contrasts - Hypothesis to be tested

More designs

Other topics

Auxiliary - Bioinformatics Project Meeting Agenda

FGCZ - (Draft)

07 May, 2026

Overview of the analysis

Type of proteomics experiment

Sample modifications

Aim of the meeting

Genomics Analysis

Protein sequence database and downstream analysis tools

Effect size and size of test

Effect size

Size of the test1

Power of test

Does the variance within groups differ?

Difference in protein variances

Designs

Parallel Group Design

Factorial Design4

Repeated measurements

Blocking factors

Further information

Contrasts - Hypothesis to be tested

More designs

Other topics

Size of the test¹

Factorial Design⁴