MetaX Cookbook
This is the guidebook for the MetaXGUI Version. If you are using the CLI to analyze, We recommend that you read thedocumentation for each MetaX module for instructions on how to use it from the command line.
Overview
MetaX is a novel tool for linking peptide sequences with taxonomic and functional information inMetaproteomics. We introduce theOperational Taxon-Function (OTF) concept to explore microbial roles and interactions ("who is doing what and how") within ecosystems.
MetaX also featuresstatistical modules andplotting tools for analyzing peptides, taxa, functions, proteins, and taxon-function contributions across groups.

Project Page
VisitGitHub to get more information:
https://github.com/byemaxx/MetaX
Getting Started
- The main window of MetaX

- Click 'Tools Menu' to switchdifferent modules

Exploring Data with MetaX
See thePreparing Your Data section to build the database and annotate peptides to OTFs before starting.
Module 1. OTF Analyzer
After obtaining theOperational Taxa-Functions (OTF) Table using thePeptide Annotator, you can perform downstream analysis with theOTF Analyzer.
1. Data Preparation
OTFs (Operational Taxa-Functions) Table: Obtained from thePeptide Annotator module.
Meta Table: The first column is sample names, and the other columns represent different groups. If no meta table is provided, meta info will be generated automatically: (1) all samples are in the same group; (2) each sample is a separate group.
Example Meta Table:
samples | Individuals | Treatment | Sweetener |
---|---|---|---|
sample_1 | V1 | Treatment | XYL |
sample_2 | V1 | Treatment | XYL |
sample_3 | V1 | Treatment | XYL |
sample_4 | V1 | Control | PBS |
sample_5 | V1 | Control | PBS |
sample_6 | V1 | Control | PBS |
You can load example data byclicking the button.

Then, clickGo to start the analysis.
- Advanced Settings
- Peptide Column Name: Specifies the column in the OTF table that contains peptide information.
- Protein Column Name: Specifies the column in the OTF table that contains protein information (only required if protein summation is performed in downstream analysis).
- Sample Column Prefix: Identifies the prefix of sample columns to determine intensity columns in the OTF table.
- Any Data Mode: Allows analysis of any table using MetaX, not limited to OTF tables (only partial tool functionality is available).
- Customized Table Item Column Name: Specifies the column containing item names in any data mode. If left empty, the first column will be selected by default.
2. Data Overview
The Data Overview provides basic information about your data, such as the number of taxa, functions, and proportions.
- Set the threshold for linked peptides and the differences between them to plot figures.

- Select different functions to plot the proportion distribution.

- Filter out samples for downstream analysis.

3. Set TaxaFunc

Data Selection
Function: Select a function for downstream analysis (None in the list means no function is selected, focusing only on peptides and taxa).
Function Filter Threshold: If a specific function within a protein group of a peptide has the highest proportion, it will be considered the representative function for that peptide. The default threshold is 1.00 (100%).

Taxa Level: Select a taxa level for downstream analysis (Life in the list means no filtering by any taxa, the follow analysis focus on functions).
Split Function: Split the annotations with multi-functions.
KO Intensity ko:K00625,ko:K13788 10 to
KO Intensity ko:K00625 10 ko:K13788 10 IfShare Intensity is checked, the intensity above would given5 to each split KO
Peptide Number Threshold: only keep the taxon (function or OTF) at least has the setting number of peptides.
Create Taxa and Func only from OTFs:
Without selection (checkbox not checked):
- Taxa table: Peptides are filtered based solely on taxa levels, without considering any functional categories.
- Function table: Peptides are filtered solely by functional categories and thresholds, regardless of their taxa levels.
- Taxa-Function (OTFs) table: Peptides are filtered by both taxa levels and functional categories simultaneously.
With selection (checkbox checked):
All tables are filtered by both taxa levels and functional categories simultaneously.
Sum Proteins Intensity
ClickCreate Proteins Intensity Table to sum peptides to proteins if the Protein column is in the original table.
- Occam's Razor,Anti-Razor andRank: Methods available for inferring shared peptides.
- Razor:
- Build a minimal set of proteins to cover all peptides.
- For each peptide, choose the protein with the most peptides (if multiple proteins have the same number of peptides, share intensity to them).
- Anti-Razor:
- All proteins share the intensity of each peptide.
Rank:
- Build the rank of proteins.
- Choose the protein with a higher rank for the shared peptide.
Methods to Build Protein Rank:- unique_counts: Use the counts of proteins inferred by unique peptides.- all_count: Use the counts of all proteins.- unique_intensity: Use the intensity of proteins inferred by unique peptides.- shared_intensity: Use the intensity divided by the number of shared peptides for each protein.
Data preprocessing
Quantitative Method:
Sum: Sum the peptides intensity directly to Taxa, Functions or OTFs intensity.
DirecteLFQ: Using DirecteLFQ to normalize the peptides and then estimate the intensity by usingintensity traces.
Outlier handling:
There are several methods for detecting and handling outliers.
- Two steps will be applied:
- Outlier Detection: Users can select a method to mark outlier values as NaN. Then the rows
only contain NaN values and 0
will be removed. The remaining NaN values will be handled in the next step. Outlier Handling: Users can choose a method to fill the remaining NaN values.
Outliers Detection:
IQR: In a group, if the value is greater than Q3+1.5*IQR or less than Q1-1.5*IQR, the value will be marked as NaN.
Missing-Value: Detect nan values in the data. If a value is nan, it will be marked as a NaN.
Half-Zero: This rule applies to groups of data. If more than half of the values in a group are 0, while the rest are non-zero, then the non-zero values are marked as NaN. Conversely, if less than half of the values are 0, then the zero values are marked as NaN. If the group contains an equal number of 0 and non-zero values, all values in the group are marked as NaN.
Zero-Dominant: This rule applies to groups of data. If more than half of the values in a group are 0, then the non-zero values are marked as NaN.
Zero-Inflated Poisson: This method is based on the Zero-Inflated Poisson (ZIP) model, which is a type of model that is used when the data contains a lot of zeros, more than what is expected in a standard Poisson model. In this context, the ZIP model is used to detect outliers in the data. The process involves fitting the ZIP model to the data and then predicting the data values. If the predicted value is less than 0.01, then the data point is marked as an outlier (NaN).
Negative Binomial: This method is based on the Negative Binomial model, which is a type of model used when the variance of the data is greater than the mean. Similar to the ZIP method, the Negative Binomial model is fitted to the data and then used to predict the data values. If the predicted value is less than 0.01, then the data point is marked as an outlier (NaN).
- Z-Score: Z-score is a statistical measure that tells how far a data point is from the mean in terms of standard deviations. Outliers are often identified as points with Z-scores greater than 2.5 or less than -2.5.
- Mahalanobis Distance: Mahalanobis distance measures the distance between a point and a distribution, considering the correlation among variables. Outliers can be identified as points with a Mahalanobis distance that exceeds a certain threshold.
In all methods, You can choose detection outliers by a meta column, and a meta to handle the outliers.
Outliers Imputation:
Drop: Remove peptides that contain any NaN values.
Original: Remove peptides that contain any NaN values.
Mean: Outliers will be imputed by the mean.
Median: Outliers will be imputed by the median.
KNN: Outliers will be imputed by KNN (K=5). The K-Nearest Neighbors algorithm uses the mean or median of the nearest neighbours to fill in missing values.
Regression: Outliers will be imputed by using IterativeImputer with regression method. This method uses round-robin linear regression, modelling each feature with missing values as a function of other features.
Multiple: Outliers will be imputed by using IterativeImputer with multiple imputations method. It uses the IterativeImputer with a specified number (K=5) of the nearest features.
You can choose the outliers Imputation byeach group or byall samples.
Remove Batch Effect:
Here, you can choose a group as the batch effect, then use [reCombat] (https://github.com/BorgwardtLab/reComBat) for handling.
Data Transformation:
Log2, Log10, Square root transformation, Cube root transformation and box-cox.
Data Normalization:
Trace Shifting: Reframing the Normalization Problem with Intensity traces (inspired by DirectLFQ).
- Note: Ifboth trace shifting and transformation are applied,normalization will be done before transformation.
Standard Scaling (Z-Score), Min-Max Scaling, Pareto Scaling, Mean centring and Normalization by Precentage.
If you use [Z-Score, Mean centring and Pareto Scaling] data normalization, the data will be given a minimum offset again to avoid negative values.
- Drag the item's name to change theorder of data preprocessing.
Then, click Go to create a TaxaFunc object for analysis.

Then we can check tables inTable Review part, and export it.


4. Basic Stats
PCA, Correlation and Box Plot

We can selectmetagroups orsamples (default all) to plotPCA,Correlation, andBox Plot for[Taxa, Function, Taxa-Func, Peptide table, Protein table]




Setting and modifying the plot
Show or hide labels in the figure by checking the checkboxShow Labels
SelectSub Meta to plot with two meta
Change settings in thePLOT PARAMETER tab
Select specific Groupswith condition
e.g. : Select PBS, BAS and others groupsonly inIndividualV1
Selectspecific Samples to Analysis
Number stats
We can plot the bar for the number for each table bygroups or bysamples

Taxa Specific
Alpha/Beta Diversity
Sunburst
TreeMap
Sankey
Heatmap and Bar Plot

- Select items(Taxa, Function, Taxa-Func and Peptide ) to plot:
- AddAll Taxa, or select one we are interested in.

Add items to Top List: select the top items to plot by some statistical method.
Clickedfilter with threshold will filter by the padj of ANOVA and T-TEST and padj and Log2FC of DESeq2 result (setting in the corresponding page).

Add a list for ploting:
Make sure one row one item

Setting:
Change the setting fit for your data.
- Rename Samples: Add group info to each sample name
- Rename Taxa: Only keep the last taxonomic level to reduce to name
Plot Mean: calculate the mean of each group before plotting
Sub Meta: select a second meta, then combine two meta by mean for Heatmap and 3D bar plot
Plot all color maps to view by right clickTheme
Plot:

Modify the pic to fit the window to get thePerfect picture:
Bar Plot:

- interactive function:

change to line plot:
3D Bar plot
Plot 3D bar by selecting asub meta.
Peptide Query
- Query everything of a peptide

5. Cross Test
T-TEST
- Select 2 group stats T-Test for[Taxa, Function, Taxa-Func, Peptide table and Proteins Table]

ANOVA-TEST
- selectsome groups orall groups to ANOVA Test for[Taxa, Function, Taxa-Func and Peptide table]

Significant Taxa-Func
- Significant comparing enables us to find the result ofThe taxa between the two groups showing no significant differences, while the related functions are significantly different and function no significant but relted taxa significant.
Plot Corss Heatmap
- Theresult of the T-test and ANOVA Test will show in a new window

Plot Heatmap for results
Chose a Table to plottop differences heatmap or getthe top table

- Taxa-Func cross heatmap:
- The orange cells mean in the corresponding function ( X-axis) and Taxa( Y-axis) are significantly different between groups.

Func(Taxa) Heatmap:
The colour shows the intensity of the significant Func(Taxa) between groups.

Significant Taxa-Func Heatmap:
The colored tiles represent the taxa which were not significantly different between groups but the related functions were.
Group-Control TEST
- Dunnett's Test
Set a Group as"Control", then compare all groups to Control
Comparing in Each Condition: Select a meta such as individual, then compare groups to control in each individual.
DESeq2 Test
Bingo! You noticed the hidden function of MetaX, clickHelp -> About -> Like 3 times to unlock the function to compare all groups to control.
- Result of Dunnett's Test:
- T- Statistic value shown in the heatmap
DESeq2
- Select two groups to calculate FoldChange by [PyDESeq2]: https://github.com/owkin/PyDESeq2

- Selectp-adjust,log2FC to plot
(Ultra-Up(Down): |log2FC| > Max log2FC)
Volcano:
Sankey:
- The last node level is the functions linked to each Taxon (When plotting Taxa-Func)
TUKEY_TEST

Select a function:
Test the significant groups in this function.
Select a Taxon:
Test the significant groups in this taxon.
Select both function and taxon:
Test the significant groups in this function and this taxon.

Show Linked Taxa Only: only shows the taxa linked with the current function in the taxa combo box.
Show Linked Func Only: only shows the function linked with the current taxon in the function combo box.
Do not forget to clickReset Function Taxa List to reset all items after the filtering
Tukey result plot:
- The dots and lines show the difference in the mean value of the Tukey test

6. Expression Analysis
Co-Expression Networks & Heatmap
- select Groups or Samples to calculate the correlation and plot the network

- Slecet table, and set the method of correlation and threshold

- Add some items to the focus list (Optional)

Network Plot
The Red dots are focus items
- The depth of color and the width of edges represent the correlation value
- The size of the dot indicates the number of connections

- Correlation of expression
Expression Trends
- Add items to the list window to plot the clusters with similar trends of intensity

Clusters plot (clustered byk-means)
The coloured line is the average

Select aspecific cluster to plotinteractive Lines or get thetable
The dashed red line is the average
7. Taxa-Func Link
Taxa-Func Link Plot

Check all taxa in one function (or Check all functions in a taxon)
selecta function, and click the buttonShow Linked Taxa Only
- Linked Number: The number shows how many taxa are linked in this function
- The number starts with Taxa: The number shows how many peptides are in this Taxa-Func

- Filter items of the Taxa and Func list

Plot Heatmap or Bar
Select some groups (Default all) to getthe intensity of each taxon of this function

- Plotpeptides inone Function of a Taxon


- Switch Bar to Stacked or not ( Line)

- Change Bar plot to Lines

Taxa-Func Network
- Select some groups or samples (default all)
- add some taxa, func or taxa-func to focus the window (Optional)

- Plot list only
- Plot List Only: Show the items only in the list and the items linked to them
Without Links: Only show the items in the focus list
Network plot
- The yellow dots are taxa, and the grey dots are functions, the size of the dots presents the intensity
- The red dots are the taxa we focused on
- The green dots are the functions we focused on
- More parameters can be set inDev->Settings->Others (e.g. Nodes Shape, color, Line Style)

8. Restore Last TaxaFunc Object
- Once you create TaxaFunc, theTaxaFunc Object will save automatically, and you can restore it next time.
- Also, we can export the current MetaX to a file and reload it again.
Preparing Your Data
Module 2. Database Builder
Note: The results fromMetaLab v2.3 MaxQuant workflow do not require database building. However, we do not recommend using these results as input to MetaX, as many peptides may be discarded.
- Build the database for thefirst time using theDatabase Builder.
Option 1: Build Database Using MGnify Data
Ensure you download the correct database type corresponding to your data.

Option 2: Build Database Using Own Data
- Annotation Table: A TSV table (tab-separated), with the first column as protein name joined with Genome by "_", e.g., "Genome1_protein1", and other columns containing annotation information.

- Taxa Table: A TSV table (tab-separated), with the first column as Genome name, e.g., "Genome1", and the second column as taxa.
Example Annotation Table:
Query | Preferred_name | EC | KEGG_ko |
---|---|---|---|
MGYG000000001_00696 | mfd | - | ko:K03723 |
MGYG000000001_02838 | hxlR | - | - |
MGYG000000001_01674 | ispG | 1.17.7.1,1.17.7.3 | ko:K03526 |
MGYG000000001_02710 | glsA | 3.5.1.2 | ko:K01425 |
MGYG000000001_01356 | mutS2 | - | ko:K07456 |
MGYG000000001_02630 | - | - | - |
MGYG000000001_02418 | ackA | 2.7.2.1 | ko:K00925 |
MGYG000000001_00728 | atpA | 3.6.3.14 | ko:K02111 |
MGYG000000001_00695 | pth | 3.1.1.29 | ko:K01056 |
MGYG000000001_02907 | - | - | ko:K03086 |
MGYG000000001_02592 | rplC | - | ko:K02906 |
MGYG000000001_00137 | - | - | ko:K03480,ko:K03488 |
Example Taxa Table:
Genome | Lineage |
---|---|
MGYG000000001 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_GCA-900066495;s_GCA-900066495 sp902362365 |
MGYG000000002 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Lachnospirales;f_Lachnospiraceae;g_Blautia_A;s_Blautia_A faecis |
MGYG000000003 | d_Bacteria;p_Bacteroidota;c_Bacteroidia;o_Bacteroidales;f_Rikenellaceae;g_Alistipes;s_Alistipes shahii |
MGYG000000004 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Oscillospirales;f_Ruminococcaceae;g_Anaerotruncus;s_Anaerotruncus colihominis |
MGYG000000005 | d_Bacteria;p_Firmicutes_A;c_Clostridia;o_Peptostreptococcales;f_Peptostreptococcaceae;g_Terrisporobacter;s_Terrisporobacter glycolicus_A |
MGYG000000006 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Staphylococcales;f_Staphylococcaceae;g_Staphylococcus;s_Staphylococcus xylosus |
MGYG000000007 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus intestinalis |
MGYG000000008 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Lactobacillus;s_Lactobacillus johnsonii |
MGYG000000009 | d_Bacteria;p_Firmicutes;c_Bacilli;o_Lactobacillales;f_Lactobacillaceae;g_Ligilactobacillus;s_Ligilactobacillus murinus |
Module 3. Database Updater
TheDatabase Updater allows updating the database built by theDatabase Builder or adding more annotations. This step isoptional.
- Update the built database and extend annotations.

Option 1: Built-in Mode
We recommend some extended databases, such asdbCAN_seq.
Option 2: TSV Table
Extend the database by adding a new database to the database table. Ensure the column separator is a tab and the first column is the Protein name, with other columns containing function annotations.
Example:
Protein ID | COG | KEGG | ... |
---|---|---|---|
MGYG000000001_02630 | Function 1 | Function 1 | ... |
MGYG000000001_01475 | Function 2 | Function 1 | ... |
MGYG000000001_01539 | Function 3 | Function 1 | ... |
Module 4. Peptide Annotator
1. Results from MAG Workflow
The peptide results use Metagenome-assembled genomes (MAGs) as the reference database for protein searches, e.g., MetaLab-MAG, MetaLab-DIA and other workflows wich using MAG databases like MGnify or customized MAGs Database.
- Annotate the peptide to the Operational Taxa-Functions (OTF) Table before analysis using thePeptide Annotator.

Required:
Database: The database created byDatabase Builder
Peptide Table:
Option 1: From MetaLab-MAG results (final_peptides.tsv)
Option 2: Create it manually, with the first column as the ID (e.g., peptide sequence) and the second column as the proteins ID of MGnify (e.g., MGYG000003683_00301; MGYG000001490_01143) or your database, and other columns as the intensity of each sample.
Example:
Sequence Proteins Intensity_V1_01 Intensity_V1_02 Intensity_V1_03 Intensity_V1_04 (Acetyl)KGGVEPQSETVWR MGYG000002716_01681;MGYG000000195_00452;MGYG000001616_00519;MGYG000002258_01582;MGYG000001300_00281;MGYG000002926_00231;... 714650 0 0 0 (Acetyl)KVIPELNGK MGYG000003589_01892;MGYG000001560_01812;MGYG000001789_00244;... 0 0 0 0 (Acetyl)LAELGAKAVTLSGPDGYIYDPDGITTK MGYG000001199_02893 0 0 0 0 (Acetyl)LLTGLPDAYGR MGYG000001757_01206;MGYG000004547_02135;MGYG000001283_00124;MGYG000004758_00803;MGYG000002486_00845;MGYG000000271_01269 0 307519 0 0 (Acetyl)MDFTLDKK MGYG000000076_01275;MGYG000003694_00879;MGYG000000312_02425;MGYG000000271_02102;MGYG000004271_00233;MGYG000002517_00542;MGYG000000489_01025 306231 0 0 1214497 Output Save Path: The location to save the result table.
LCA Threshold: Find the LCA with the proportion threshold for each peptide. The default is 1.00 (100%).
2. Results from MaxQuant Workflow
The peptide results fromMetaLab 2.3 MaxQuant workflow.
- Select theMetaLab result folder, which contains themaxquant_search folder.

ThePeptide Annotator will automatically find thepeptides_report.txt,BuiltIn.pepTaxa.csv, andfunctions.tsv in themaxquant_search folder. Alternatively, you can select the files manually.
SelectOTFs Save To to set the location to save the result table.

Developer Tools
Export Log
You can export the log file for debugging or reporting the issue.
Show or Hide the Console
Settings
Check the box ofAuto Check Update to enable or disable check update when launching
- Change to update from thestable version orbeta version by settings
- Other Options Settings
Enjoy MetaX
If you have any issues or suggestions, please New issue in myGitHub.