TopPIC Suite

In addition to this tutorial, you can find video tutorials on TopPIC Suite (link) and the interpretation of TopPIC and TopMG identifications (link). We thank Dr. David Tabb for making these video tutorials.

1 Overview

In this tutorial, we use TopPIC Suite to analyze two top-down LC-MS/MS data files on a computer with a Windows Operating System. Annotated proteoform spectrum matches (PrSMs) identified by TopPIC from the data files can be browsed here.


2 Folders

Create the folders below for software packages and data sets used in this tutorial.

  1. Create a new folder named toppic_tutorial on the C: drive of your system.
  2. Create a new subfolder named toppic in the folder C:\toppic_tutorial\ for the software TopPIC suite.
  3. Create a new subfolder named tutorial_1 in the folder C:\toppic_tutorial\.
  4. Create a new subfolder named tutorial_2 in the folder C:\toppic_tutorial\.
  5. Create a new subfolder named tutorial_3 in the folder C:\toppic_tutorial\.

The resulting folder structure is shown in the screenshot below.



3 Software tools

3.1 Msconvert

Msconvert is a software tool in ProteoWizard that converts raw files into various spectrum file formats. Follow the steps below to download ProteoWizard:

  1. Go to the link ProteoWizard.
  2. Choose the type 'Windows 64-bit installer' for end users and download ProteoWizard.
  3. Double click the downloaded file pwiz-setup-3.0-x86-64.msi to install it.

3.2 TopPIC suite

  1. Go to the download webpage of TopPIC suite.
  2. Choose the download type "Windows 64-bit zip file", fill out the registration form, and click "I accept license agreement and download TopPIC Suite" to download it.
  3. Save it to the folder C:\toppic_tutorial\toppic\.
  4. Extract all the files of the downloaded zip file to the folder C:\toppic_tutorial\toppic\.

4 Tutorial 1

In this tutorial, we will use TopIndex, TopFD and TopPIC to analyze a top-down MS/MS data set of Salmonella typhimurium for proteoform identification.

4.1 Top-down MS/MS Dataset

In the MS experiment, the protein extract of S. typhimurium was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was first separated by gas-phase fractionation, resulting in 7 fractions. Each fraction was separated by an HPLC system coupled with an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). MS and MS/MS spectra were collected at a resolution of 60,000 and 30,000, respectively. In this tutorial, we use only the data files of two fractions (st_1.raw and st_2.raw).

Click here to download the data set, save it in the folder C:\toppic_tutorial\tutorial_1\, and unzip it in the same folder.

4.2 Protein sequence database

A S. typhimurium proteome database of 4,533 proteins was downloaded from the UniProt database.

Click here to download the protein database and save it in the folder C:\toppic_tutorial\tutorial_1\.

The folder C:\toppic_tutorial\tutorial_1\ is shown in the screenshot below.


4.2.1 Index file generation

We use TopIndex to generate index files from the protein database. They will speed up database search of TopPIC and TopMG. This step is optional. Skipping index generation only slows the analysis of section 4.5 for database search. While TopIndex supports multithreading, users with a spinning hard disk would experience faster speed when using only one thread instead of multple threads. TopIndex generates very large index files. For example, index files generated for the targe-decoy concatenated UniProt human proteome database are about 240 GB. To achieve high speed index generation, we suggest that a computer with at least 1 TB SSD (Solid State Drive) should be used.

  1. Double click the executable file topindex_gui.exe in the folder C:\toppic_tutorial\toppic.
  2. Add the file C:\toppic_tutorial\tutorial_1\uniprot-st.fasta.
  3. Select Carbamidomethylation on cysteine as the fixed modification.
  4. Check the checkbox Decoy database.
  5. Click to the button "Start" to generate index files.

The screenshot of topindex_gui is shown below.

TopIndex generates a folder C:\toppic_tutorial\tutorial_1\uniprot-st.fasta_idx containing index files.

In the analysis, carbamidomethylation is selected as the fixed modification because proteins were reduced with dithiothreitol and alkylated with iodoacetamide before the MS experiment. When proteins are not reduced, no fixed modification should be selected.

4.3 File format conversion

We use MSConvertGUI to convert the raw files st_1.raw and st_2.raw to mzML files.

  1. Search "msconvert" in the search box on the task bar of Windows 10, and run the desktop app "MSConvert."
  2. Add the files C:\toppic_tutorial\tutorial_1\st_1.raw and C:\toppic_tutorial\tutorial_1\st_2.raw as input files.
  3. Add the filter "peakPeaking vendor msLevel=1-" (important).
  4. Click to the button "Start" to perform file format conversion.

The screenshot of MSConvertGUI is shown below.

In the above file format conversion, the peak picking filter (step 3) is used to generate centroid, not profile, mzML data files, which are required by the spectral deconvolution tool TopFD.

The resulting mzML files are

C:\toppic_tutorial\tutorial_1\st_1.mzML
and
C:\toppic_tutorial\tutorial_1\st_2.mzML
The sizes of the two files are about 41 MB and 47 MB, respectively. They can be downloaded here. The running time for the file format conversion is less than one minute.

4.4 Mass spectral deconvolution

We use topfd_gui for top-down mass spectral deconvolution.

  1. Double click the executable file topfd_gui.exe in the folder C:\toppic_tutorial\toppic.
  2. Add the file C:\toppic_tutorial\tutorial_1\st_1.mzML and C:\toppic_tutorial\tutorial_1\st_2.mzML as input files.
  3. Click to the button "Start" to deconvolute the file.

The screenshot of topfd_gui is shown below.

TopFD reports ten text files and two folders.

  1. Two msalign files containing deconvoluted MS1 spectra:
    C:\toppic_tutorial\tutorial_1\st_1_ms1.msalign
    C:\toppic_tutorial\tutorial_1\st_2_ms1.msalign
  2. Two msalign files containing deconvoluted MS/MS spectra:
    C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign
    C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign
  3. Four text files containing LC-MS features:
    C:\toppic_tutorial\tutorial_1\st_1_ms1.feature
    C:\toppic_tutorial\tutorial_1\st_1_ms2.feature
    C:\toppic_tutorial\tutorial_1\st_2_ms1.feature
    C:\toppic_tutorial\tutorial_1\st_2_ms2.feature
  4. Two XML files containing LC-MS features:
    C:\toppic_tutorial\tutorial_1\st_1_feature.xml
    C:\toppic_tutorial\tutorial_1\st_2_feature.xml
  5. Two folders containing deconvoluted MS/MS spectra in the JavaScript format
    C:\toppic_tutorial\tutorial_1\st_1_html\topfd
    C:\toppic_tutorial\tutorial_1\st_2_html\topfd
  6. The output files and folders can be downloaded here.

    4.5 Mass spectral identification using TopPIC

    We use toppic_gui to search the MS/MS spectra in st_1_ms2.msalign and st_2_ms2.msalign against the protein database uniprot-st.fasta to identify PrSMs with a variable PTM file var_mods.txt, in which oxidation on methionine is set as a variable PTM. The variable PTM file can be downloaded here.

    1. Double click the executable file toppic_gui.exe in the folder C:\toppic_tutorial\toppic.
    2. Select C:\toppic_tutorial\tutorial_1\uniprot-st.fasta as the protein database file.
    3. Add C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign and C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign as mass spectrum data files.
    4. Input "combined" as the file name for combined identifications.
    5. Select Carbamidomethylation on cysteine as the fixed modification.
    6. Select file "var_mods.txt" as the variable PTM file.
    7. Check the checkbox Decoy database.
    8. Select FDR as the spectrum level cutoff type.
    9. Select FDR as the proteoform level cutoff type.
    10. Click to the button "Start"

    The screenshots of toppic_gui are shown below.

    For each input msalign file, TopPIC reports four TSV files, two XML files, and collections of HTML files for identified proteoforms. For example, the output files for st_1_ms2.msalign are

    • A TSV file containing identified PrSMs with a 1% spectrum-level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_prsm.tsv
    • A TSV file containing identified PrSMs with a 1% spectrum-level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_prsm_single.tsv
    • A TSV file containing identified proteoforms and their best PrSMs with a 1% proteoform-level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_proteoform.tsv
    • A TSV file containing identified proteoforms and their best PrSMs with a 1% proteoform-level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_proteoform_single.tsv
    • An XML file containing identified proteoforms and their best PrSMs with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_proteoform.xml
    • An XML file containing all identified PrSMs without clustering and filtering:
      C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_prsm.xml
    • A folder containing JavaScript files of identified PrSMs with a 1% spectrum-level FDR:
      C:\toppic_tutorial\tutorial_1\st_1_html\toppic_prsm_cutoff
    • A folder containing JavaScript files of identified PrSMs with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_1\st_1_html\toppic_proteoform_cutoff
    • A folder containing HTML files for the visualization of identified PrSMs:
      C:\toppic_tutorial\tutorial_1\st_1_html\topmsv

    In addition, the identifications reported for st_1_ms2.msalign and st_2_ms2.msalign are combined, and filtered by a 1% spectrum-level FDR and a 1% proteoform-level FDR. The combined results are reported in the following files.

    • A TSV file containing combined PrSM identifications with a 1% spectrum level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_prsm.tsv
    • A TSV file containing combined PrSM identifications with a 1% spectrum level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_prsm_single.tsv
    • A TSV file containing combined proteoform identifications and their best PrSMs with a 1% proteoform level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_proteoform.tsv
    • A TSV file containing combined proteoform identifications and their best PrSMs with a 1% proteoform level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_proteoform_single.tsv
    • An XML file containing combined proteoform identifications and their best PrSMs with a 1% proteoform level FDR:
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_proteoform.xml
    • An XML file containing all identified PrSMs without clustering and filtering:
      C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_prsm.xml
    • A folder containing JavaScript files of combined PrSM identifications with a 1% spectrum-level FDR:
      C:\toppic_tutorial\tutorial_1\combined_html\toppic_prsm_cutoff
    • A folder containing JavaScript files of combined PrSM identifications with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_1\combined_html\toppic_proteoform_cutoff
    • A folder containing HTML files for the visualization of identified PrSMs:
      C:\toppic_tutorial\tutorial_1\combined_html\topmsv

    In the analysis, carbamidomethylation is selected as the fixed modification because proteins were reduced with dithiothreitol and alkylated with iodoacetamide before the MS experiment. When proteins are not reduced, no fixed modification should be selected.

    A shuffled decoy database is concatenated to the target database to estimate spectrum-level and proteoform-level FDRs. All identified PrSMs are first filtered by a 1% spectrum-level FDR and the resulting PrSMs are reported in the file combined_ms2_toppic_prsm.tsv. The proteoforms corresponding to the PrSMs are further filtered using a 1% proteoform-level FDR and the resulting proteoforms and their corresponding best PrSMs are reported in the file combined_ms2_toppic_proteoform.tsv. Microsoft Excel can be used to open these two files. To browse the PrSM identifications, go to the folder combined_html\topmsv and use Google Chrome (Windows Edge and Firefox are not recommended) to open the file index.html.

    The output files can be downloaded here.

    4.6 Data analysis using the command line interface

    4.6.1 Index file generation

    We use topindex to generate index files from the protein database uniprot-st.fasta to speed up database search of TopPIC and TopMG.

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\topindex.exe
    2. Input mzML file:
      C:\toppic_tutorial\tutorial_1\uniprot-st.fasta

    Commands

    cd C:\toppic_tutorial\tutorial_1
    ..\toppic\topindex -f C57 -d uniprot-st.fasta 

    4.6.2 Mass spectral deconvolution

    We use topfd for top-down mass spectral deconvolution.

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\topfd.exe
    2. Input mzML file:
      C:\toppic_tutorial\tutorial_1\st_1.mzML
      C:\toppic_tutorial\tutorial_1\st_2.mzML

    Commands

    cd C:\toppic_tutorial\tutorial_1
    ..\toppic\topfd st_*.mzML

    4.6.3 Mass spectral identification using TopPIC

    We use toppic to search the MS/MS spectra in st_1_ms2.msalign and st_2_ms2.msalign against the protein database uniprot-st.fasta to identify PrSMs.

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\toppic.exe
    2. Input protein database file:
      C:\toppic_tutorial\tutorial_1\uniprot-st.fasta
    3. Input MS1 msalign files:
      C:\toppic_tutorial\tutorial_1\st_1_ms1.msalign
      C:\toppic_tutorial\tutorial_1\st_2_ms1.msalign
    4. Input MS/MS msalign files:
      C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign
      C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign
    5. Input feature files:
      C:\toppic_tutorial\tutorial_1\st_1_ms1.feature
      C:\toppic_tutorial\tutorial_1\st_1_ms2.feature
      C:\toppic_tutorial\tutorial_1\st_2_ms1.feature
      C:\toppic_tutorial\tutorial_1\st_2_ms2.feature
    6. Variable PTM file:
      C:\toppic_tutorial\tutorial_1\var_mods.txt

    Commands

    cd C:\toppic_tutorial\tutorial_1
    ..\toppic\toppic -f C57 -d -t FDR -T FDR -b var_mods.txt -c combined uniprot-st.fasta st_*_ms2.msalign

    5 Tutorial 2

    We will use TopMG to analyze the data set st_1.raw described in Tutorial 1. TopMG is still in the development stage. Please let us know if you find any bugs in it..

    5.1 Data set and preprocessing

    The description of the data file and its preprocessing steps can be found in Sections 4.1 - 4.4. Click here to download the data files used in the analysis, save it in the folder C:\toppic_tutorial\tutorial_2\, and unzip it. It includes the following files.
    • A S. typhrimurium protein database file:
      C:\toppic_tutorial\tutorial_2\uniprot-st.fasta
    • A deconvoluted MS1 data file:
      C:\toppic_tutorial\tutorial_2\st_1_ms1.msalign
    • A deconvoluted MS/MS data file:
      C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign
    • Two MS feature file:
      C:\toppic_tutorial\tutorial_2\st_1_ms1.feature
      C:\toppic_tutorial\tutorial_2\st_1_ms2.feature
    • A text file containing a variable PTM: oxidation on methionine.
      C:\toppic_tutorial\tutorial_2\var_mods.txt
    • A folder containing deconvoluted MS/MS spectra in the JavaScript format.
      C:\toppic_tutorial\tutorial_2\st_1_html\topfd

    5.2 Index file generation

    To speed up database search, follow the steps in Section 4.2.1 to generate index files for the database file uniprot-st.fasta. If index files have been generated, it is not necessary to regenerate index files. You can copy the index folder to the folder C:\toppic_tutorial\tutorial_2\.

    5.3 Proteoform identification by TopMG

    1. Double click the executable file topmg_gui.exe in the folder C:\toppic_tutorial\toppic.
    2. Select C:\toppic_tutorial\tutorial_2\uniprot-st.fasta as the protein database file.
    3. Add C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign as a mass spectrum data file.
    4. Select C:\toppic_tutorial\tutorial_2\variable_mods.txt as the file of variable PTMs.
    5. Select Carbamidomethylation on cysteine as the fixed modification.
    6. Check the checkbox Decoy database.
    7. Select FDR as the spectrum level cutoff type.
    8. Set the spectrum level FDR cutoff to 0.05.
    9. Select FDR as the proteoform level cutoff type.
    10. Set the proteoform level FDR cutoff to 0.05.
    11. Click to the button "Start"

    The screenshots of topmg_gui are shown below.

    TopMG reports two TSV files, two XML files, and collections of HTML files for identified proteoforms.

    • A TSV file containing identified PrSMs with a 5% spectrum-level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_prsm.tsv
    • A TSV file containing identified PrSMs with a 5% spectrum-level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_prsm_single.tsv
    • A TSV file containing identified proteoforms and their best PrSMs with a 5% proteoform-level FDR. When a proteoform is shared by multiple proteins, all the proteins are reported.
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_proteoform.tsv
    • A TSV file containing identified proteoforms and their best PrSMs with a 5% proteoform-level FDR. When a proteoform is shared by multiple proteins, only one protein is reported.
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_proteoform_single.tsv
    • An XML file containing identified proteoforms and their best PrSMs with a 5% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_proteoform.xml
    • An XML file containing all identified PrSMs without clustering and filtering:
      C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_prsm.xml
    • A folder containing JavaScript files of identified PrSMs with a 5% spectrum-level FDR:
      C:\toppic_tutorial\tutorial_2\st_1_html\topmg_prsm_cutoff
    • A folder containing JavaScript files of identified PrSMs with a 5% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_2\st_1_html\topmg_proteoform_cutoff
    • A folder containing HTML files for the visualization of identified PrSMs:
      C:\toppic_tutorial\tutorial_1\st_1_html\topmsv

    The output files can be downloaded here.

    To browse the PrSM identifications, go to the folder st_1_html\topmsv and use Google Chrome (Windows Edge and Firefox are not recommended) to open the file index.html.

    5.4 Data analysis using the command line interface

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\topmg.exe
    2. Input protein database file:
      C:\toppic_tutorial\tutorial_2\uniprot-st.fasta
    3. Input MS1 msalign file:
      C:\toppic_tutorial\tutorial_2\st_1_ms1.msalign
    4. Input MS/MS msalign file:
      C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign
    5. MS feature files:
      C:\toppic_tutorial\tutorial_2\st_1_ms1.feature
      C:\toppic_tutorial\tutorial_2\st_1_ms2.feature
    6. Variable PTM list:
      C:\toppic_tutorial\tutorial_2\var_mods.txt

    Commands

    cd C:\toppic_tutorial\tutorial_2 
    ..\toppic\topindex -f C57 -d uniprot-st.fasta 
    ..\toppic\topmg -f C57 -d -t FDR -v 0.05 -T FDR -V 0.05 -i variable_mods.txt uniprot-st.fasta st_1_ms2.msalign

    6 Tutorial 3

    We will use TopPIC and TopDiff to compare the abundance of proteoforms and find differentially expressed proteoforms using two MS data files of Escherichia coli cells (ecoli_1.raw and ecoli_2.raw).

    In the MS experiment, the protein extract of E. coli was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was separated by capillary zone electrophoresis and analyzed by an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). Technical duplicates were generated for testing proteoform quantification in two runs of the same sample.

    6.1 Data set and preprocessing

    The raw data files were processed following the steps found in Sections 4.1 - 4.4. Click here to download the data files used in the analysis, save it in the folder C:\toppic_tutorial\tutorial_3\, and unzip it. It includes the following files.

    • E.coli protein database file:
      C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
    • Two mzML files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_mzML
      C:\toppic_tutorial\tutorial_3\ecoli_2_mzML
    • Two deconvoluted MS1 data files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.msalign
    • Two deconvoluted MS/MS data files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
    • Four deconvoluted LC-MS feature files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.feature
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.feature
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.feature
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.feature
    • Two XML files containing LC-MS features:
      C:\toppic_tutorial\tutorial_3\ecoli_1_feature.xml
      C:\toppic_tutorial\tutorial_3\ecoli_2_feature.xml
    • Two folders containing deconvoluted MS/MS spectra in the JavaScript format.
      C:\toppic_tutorial\tutorial_3\ecoli_1_html\topfd
      C:\toppic_tutorial\tutorial_3\ecoli_2_html\topfd

    6.2 Index file generation

    To speed up database search, follow the steps in Section 4.2.1 to generate index files for the database file uniprot-ecoli.fasta. If index files have been generated, it is not necessary to regenerate index files.

    6.3 Mass spectral identification using TopPIC

    We use toppic_gui to search the MS/MS spectra in ecoli_1_ms2.msalign and ecoli_2_ms2.msalign against the protein database uniprot-ecoli.fasta to identify PrSMs.

    1. Double click the executable file toppic_gui.exe in the folder C:\toppic_tutorial\toppic.
    2. Select C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta as the protein database file.
    3. Add C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign and C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign as mass spectrum data files.
    4. Select Carbamidomethylation on cysteine as the fixed modification.
    5. Check the checkbox Decoy database.
    6. Select FDR as the spectrum level cutoff type.
    7. Select FDR as the proteoform level cutoff type.
    8. Click to the button "Start"

    The screenshots of toppic_gui are shown below.

    For each input msalign file, TopPIC reports two TSV files, two XML files, and collections of html files for identified proteoforms. As a result, the output files for ecoli_1_ms2.msalign, ecoli_2_ms2.msalign are

    • Four TSV files containing identified PrSMs with a 1% spectrum-level FDR:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_prsm.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_prsm.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_prsm_single.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_prsm_single.tsv
    • Four TSV files containing identified proteoforms and their best PrSMs with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform_single.tsv
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform_single.tsv
    • Two XML files containing identified proteoforms and their best PrSMs with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.xml
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.xml
    • Two XML files containing all identified PrSMs without clustering and filtering:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_prsm.xml
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_prsm.xml
    • Two folders containing JavaScript files of identified PrSMs with a 1% spectrum-level FDR:
      C:\toppic_tutorial\tutorial_3\ecoli_1_html\toppic_prsm_cutoff
      C:\toppic_tutorial\tutorial_3\ecoli_2_html\toppic_prsm_cutoff
    • Two folders containing JavaScript files of identified PrSMs with a 1% proteoform-level FDR:
      C:\toppic_tutorial\tutorial_3\ecoli_1_html\toppic_proteoform_cutoff
      C:\toppic_tutorial\tutorial_3\ecoli_2_html\toppic_proteoform_cutoff
    • Two folders containing HTML files for the visualization of identified PrSMs:
      C:\toppic_tutorial\tutorial_3\ecoli_1_html\topmsv
      C:\toppic_tutorial\tutorial_3\ecoli_2_html\topmsv

    The output files can be downloaded here.

    6.4 Proteoform abundance comparison by TopDiff

    1. Double click the executable file topdiff_gui.exe in the folder C:\toppic_tutorial\toppic.
    2. Add C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign and C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign as mass spectrum data files.
    3. Click to the button "Start"

    The screenshots of topdiff_gui are shown below.

    TopDiff reports one TSV file for identified proteoforms with their abundances in the input mass spectrum data

    C:\toppic_tutorial\tutorial_3\sample_diff.tsv

    The output file can be downloaded here.

    6.5 Data analysis using the command line interface

    6.5.1 Mass spectral identification by TopPIC

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\toppic.exe
    2. Input protein database file:
      C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
    3. MS1 msalign files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.msalign
    4. MS/MS msalign files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
    5. LC-MS feature files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.feature
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.feature
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.feature
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.feature

    Commands

    cd C:\toppic_tutorial\tutorial_3 
    ..\toppic\topindex -f C57 -d uniprot-ecoli.fasta 
    ..\toppic\toppic -f C57 -d -t FDR -T FDR uniprot-ecoli.fasta ecoli_*_ms2.msalign

    6.5.2 Comparing proteoform abundances using TopDiff

    File locations

    1. Executable file:
      C:\toppic_tutorial\toppic\topdiff.exe
    2. MS/MS msalign files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
    3. Proteoform identification files:
      C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.xml
      C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.xml

    Commands

    cd C:\toppic_tutorial\tutorial_3 
    ..\toppic\topdiff ecoli_1_ms2.msalign ecoli_2_ms2.msalign