{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T18:27:54Z","timestamp":1772735274152,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1012258","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T00:00:00Z","timestamp":1721174400000}}],"reference-count":28,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T00:00:00Z","timestamp":1720137600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Computational Science Engineering and Mathematics graduate program fellowship"},{"name":"Erisyon, Inc."},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R35GM122480"],"award-info":[{"award-number":["R35GM122480"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009633","name":"Eunice Kennedy Shriver National Institute of Child Health and Human Development","doi-asserted-by":"publisher","award":["HD085901"],"award-info":[{"award-number":["HD085901"]}],"id":[{"id":"10.13039\/100009633","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000928","name":"Welch Foundation","doi-asserted-by":"publisher","award":["F-1515"],"award-info":[{"award-number":["F-1515"]}],"id":[{"id":"10.13039\/100000928","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell\u2019s method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell\u2019s method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1012258","type":"journal-article","created":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T17:47:46Z","timestamp":1720201666000},"page":"e1012258","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":5,"title":["Estimating error rates for single molecule protein sequencing experiments"],"prefix":"10.1371","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3145-1782","authenticated-orcid":true,"given":"Matthew Beauregard","family":"Smith","sequence":"first","affiliation":[]},{"given":"Kent","family":"VanderVelden","sequence":"additional","affiliation":[]},{"given":"Thomas","family":"Blom","sequence":"additional","affiliation":[]},{"given":"Heather D.","family":"Stout","sequence":"additional","affiliation":[]},{"given":"James H.","family":"Mapes","sequence":"additional","affiliation":[]},{"given":"Tucker M.","family":"Folsom","sequence":"additional","affiliation":[]},{"given":"Christopher","family":"Martin","sequence":"additional","affiliation":[]},{"given":"Angela M.","family":"Bardo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8808-180X","authenticated-orcid":true,"given":"Edward M.","family":"Marcotte","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2024,7,5]]},"reference":[{"key":"pcbi.1012258.ref001","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1146\/annurev-biophys-102121-103615","article-title":"Protein Sequencing, One Molecule at a Time.","volume":"51","author":"BM Floyd","year":"2022","journal-title":"Annu Rev Biophys"},{"key":"pcbi.1012258.ref002","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41592-021-01143-1","article-title":"The emerging landscape of single-molecule protein sequencing technologies.","volume":"18","author":"JA Alfaro","year":"2021","journal-title":"Nat Methods"},{"key":"pcbi.1012258.ref003","doi-asserted-by":"crossref","first-page":"786","DOI":"10.1038\/s41565-018-0236-6","article-title":"Paving the way to single-molecule protein sequencing.","volume":"13","author":"L Restrepo-P\u00e9rez","year":"2018","journal-title":"Nat Nanotechnol."},{"key":"pcbi.1012258.ref004","doi-asserted-by":"crossref","first-page":"7261","DOI":"10.1007\/s00253-020-10745-2","article-title":"Leveraging nature\u2019s biomolecular designs in next-generation protein sequencing reagent development","volume":"104","author":"J Tullman","year":"2020","journal-title":"Appl Microbiol Biotechnol"},{"key":"pcbi.1012258.ref005","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1021\/acsphotonics.1c01825","article-title":"Label-Free Optical Analysis of Biomolecules in Solid-State Nanopores: Toward Single-Molecule Protein Sequencing.","volume":"9","author":"Y Zhao","year":"2022","journal-title":"ACS Photonics."},{"key":"pcbi.1012258.ref006","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1186\/s12864-015-1519-z","article-title":"Genome assembly using Nanopore-guided long and error-free DNA reads","volume":"16","author":"M-A Madoui","year":"2015","journal-title":"BMC Genomics"},{"key":"pcbi.1012258.ref007","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/2047-217X-3-22","article-title":"A reference bacterial genome dataset generated on the MinION portable single-molecule nanopore sequencer","volume":"3","author":"J Quick","year":"2014","journal-title":"GigaScience"},{"key":"pcbi.1012258.ref008","doi-asserted-by":"crossref","first-page":"1097","DOI":"10.1111\/1755-0998.12324","article-title":"A first look at the Oxford Nanopore MinION sequencer","volume":"14","author":"AS Mikheyev","year":"2014","journal-title":"Mol Ecol Resour"},{"key":"pcbi.1012258.ref009","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1038\/nbt.3103","article-title":"MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island","volume":"33","author":"PM Ashton","year":"2015","journal-title":"Nat Biotechnol"},{"key":"pcbi.1012258.ref010","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1186\/s13059-018-1462-9","article-title":"From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy","volume":"19","author":"FJ Rang","year":"2018","journal-title":"Genome Biol"},{"key":"pcbi.1012258.ref011","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1101\/gr.277031.122","article-title":"Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2","volume":"32","author":"A Zee","year":"2022","journal-title":"Genome Res"},{"key":"pcbi.1012258.ref012","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/s12859-023-05226-y","article-title":"Complete sequence verification of plasmid DNA using the Oxford Nanopore Technologies\u2019 MinION device","volume":"24","author":"SD Brown","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1012258.ref013","article-title":"Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures","author":"J Swaminathan","year":"2018","journal-title":"Nat Biotechnol"},{"key":"pcbi.1012258.ref014","doi-asserted-by":"crossref","first-page":"e1004080","DOI":"10.1371\/journal.pcbi.1004080","article-title":"A theoretical justification for single molecule peptide sequencing.","volume":"11","author":"J Swaminathan","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1012258.ref015","doi-asserted-by":"crossref","first-page":"283","DOI":"10.3891\/acta.chem.scand.04-0283","article-title":"Method for Determination of the Amino Acid Sequence in Peptides","volume":"4","author":"P Edman","year":"1950","journal-title":"Acta Chem Scand"},{"key":"pcbi.1012258.ref016","doi-asserted-by":"crossref","first-page":"e1011157","DOI":"10.1371\/journal.pcbi.1011157","article-title":"Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier.","volume":"19","author":"MB Smith","year":"2023","journal-title":"PLOS Comput Biol"},{"key":"pcbi.1012258.ref017","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1214\/aoms\/1177697196","article-title":"A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.","volume":"41","author":"LE Baum","year":"1970","journal-title":"Ann Math Stat."},{"key":"pcbi.1012258.ref018","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/BF00941892","article-title":"Lipschitzian optimization without the Lipschitz constant","volume":"79","author":"DR Jones","year":"1993","journal-title":"J Optim Theory Appl"},{"key":"pcbi.1012258.ref019","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1007\/s10898-020-00952-6","article-title":"The DIRECT algorithm: 25 years Later.","volume":"79","author":"DR Jones","year":"2021","journal-title":"J Glob Optim."},{"key":"pcbi.1012258.ref020","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1093\/comjnl\/7.2.155","article-title":"An efficient method for finding the minimum of a function of several variables without calculating derivatives.","volume":"7","author":"MJD Powell","year":"1964","journal-title":"Comput J"},{"key":"pcbi.1012258.ref021","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum Likelihood from Incomplete Data via the EM Algorithm.","volume":"39","author":"AP Dempster","year":"1977","journal-title":"J R Stat Soc Ser B Methodol."},{"key":"pcbi.1012258.ref022","doi-asserted-by":"crossref","first-page":"2595","DOI":"10.1021\/acschembio.1c00631","article-title":"Photoredox-Catalyzed Decarboxylative C-Terminal Differentiation for Bulk- and Single-Molecule Proteomics","volume":"16","author":"L Zhang","year":"2021","journal-title":"ACS Chem Biol"},{"key":"pcbi.1012258.ref023","doi-asserted-by":"crossref","first-page":"14856","DOI":"10.1021\/acs.langmuir.1c02644","article-title":"Studies of Surface Preparation for the Fluorosequencing of Peptides","volume":"37","author":"CM Hinson","year":"2021","journal-title":"Langmuir"},{"key":"pcbi.1012258.ref024","article-title":"Robust and scalable single-molecule protein sequencing with fluorosequencing.","author":"J Mapes","year":"2023","journal-title":"bioRxiv"},{"key":"pcbi.1012258.ref025","doi-asserted-by":"crossref","first-page":"18871","DOI":"10.1016\/S0021-9258(17)30595-1","article-title":"Amino acid sequences of two proline-rich bactenecins. Antimicrobial peptides of bovine neutrophils","volume":"265","author":"RW Frank","year":"1990","journal-title":"J Biol Chem"},{"key":"pcbi.1012258.ref026","doi-asserted-by":"crossref","first-page":"4403","DOI":"10.1073\/pnas.071047998","article-title":"Attomole level protein sequencing by Edman degradation coupled with accelerator mass spectrometry","volume":"98","author":"M Miyashita","year":"2001","journal-title":"Proc Natl Acad Sci"},{"key":"pcbi.1012258.ref027","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1002\/pro.2633","article-title":"Computer-aided design of a catalyst for Edman degradation utilizing substrate-assisted catalysis","volume":"24","author":"B Borgo","year":"2015","journal-title":"Protein Sci Publ Protein Soc"},{"key":"pcbi.1012258.ref028","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/0003-2697(84)90359-2","article-title":"Use of o-phthalaldehyde to reduce background during automated Edman degradation","volume":"137","author":"AW Brauer","year":"1984","journal-title":"Anal Biochem"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1012258","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T00:00:00Z","timestamp":1721174400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1012258","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T17:38:00Z","timestamp":1721237880000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1012258"}},"subtitle":[],"editor":[{"given":"Alexandre V.","family":"Morozov","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,7,5]]},"references-count":28,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2024,7,5]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1012258","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1012258","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,5]]}}}