{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T02:32:39Z","timestamp":1769049159487,"version":"3.49.0"},"reference-count":43,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ASCEPI project"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>The Jewish community archive in Pisa owns a vast collection of documents and manuscripts that date back centuries. These documents contain valuable genealogical information, including birth, marriage, and death records. This paper aims to describe the preliminary results of the Archivio Storico della Comunita Ebraica di Pisa (ASCEPI) project, with a focus on the extraction of data from the Nati, Morti e Ballottati (NMB) Registry document in the archive. The NMB Registry contains about 1900 records of births, deaths, and balloted individuals within the Jewish community in Pisa. The study uses a semiautomatic pipeline of digitization, transcription, and Natural Language Processing (NLP) techniques to extract personal data such as names, surnames, birth and death dates, and parental names from each record. The extracted data are then used to build a knowledge base and a genealogical tree for a representative family, Supino. This study demonstrates the potential of using NLP and rule-based techniques to extract valuable information from historical documents and to construct genealogical trees.<\/jats:p>","DOI":"10.3390\/informatics10020042","type":"journal-article","created":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T01:30:29Z","timestamp":1683855029000},"page":"42","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Genealogical Data Mining from Historical Archives: The Case of the Jewish Community in Pisa"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5252-6966","authenticated-orcid":false,"given":"Angelica","family":"Lo Duca","sequence":"first","affiliation":[{"name":"Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy"}]},{"given":"Andrea","family":"Marchetti","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy"}]},{"given":"Manuela","family":"Moretti","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy"}]},{"given":"Francesca","family":"Diana","sequence":"additional","affiliation":[{"name":"Department of Civilisation and Forms of Knowledge, University of Pisa, 56100 Pisa, Italy"}]},{"given":"Mafalda","family":"Toniazzi","sequence":"additional","affiliation":[{"name":"Department of Civilisation and Forms of Knowledge, University of Pisa, 56100 Pisa, Italy"}]},{"given":"Andrea","family":"D\u2019Errico","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1177\/1473871615621592","article-title":"Visualizing genealogy through a family-centric perspective","volume":"16","author":"Ball","year":"2017","journal-title":"Inf. Vis."},{"key":"ref_2","unstructured":"(2023, March 16). The ASCEPI Project. Available online: http:\/\/ascepi.iit.cnr.it\/."},{"key":"ref_3","unstructured":"Colorni, V. (1967). Itinerarium, Bologna, Italy, (anastatic reprint)."},{"key":"ref_4","unstructured":"Lonardo, P.M. (1982). Gli Ebrei a Pisa, Forni. Doc. VII."},{"key":"ref_5","unstructured":"Luzzati, M. (1985). La Casa dell\u2019Ebreo Saggi sugli Ebrei a Pisa e in Toscana nel Medioevo e nel Rinascimento, Nistri Lischi."},{"key":"ref_6","unstructured":"Toaff, T. (1990). La Nazione Ebrea a Livorno e a Pisa (1591\u20131700), Olschki."},{"key":"ref_7","first-page":"3","article-title":"L\u2019insediamento ebraico nella Pisa del \u2018600","volume":"24","year":"1987","journal-title":"Crit. Stor."},{"key":"ref_8","unstructured":"Salvadori, R. (1995). Breve Storia Degli Ebrei Toscani IX\u2013XX Secolo, Le Lettere."},{"key":"ref_9","unstructured":"Luzzati, M. (1998). Gli Ebrei di Pisa (Secoli IX\u2013XX), Pacini Editore."},{"key":"ref_10","unstructured":"Mortara, M. (1886). Indice Alfabetico dei Rabbini e Scrittori Israeliti, Sacchetto."},{"key":"ref_11","unstructured":"Amram, D. (1963). The Makers of Hebrew Books in Italy, London, The Holland Press Limited."},{"key":"ref_12","unstructured":"Grassi, R. (2003). Archivi&Computer: Automazione e Beni Culturali, Carocci."},{"key":"ref_13","unstructured":"(2023, March 16). The SpaCy library. Available online: https:\/\/spacy.io\/."},{"key":"ref_14","unstructured":"(2023, March 16). The iPages Flipbook Plugin. Available online: https:\/\/wordpress.org\/plugins\/ipages-flipbook\/."},{"key":"ref_15","unstructured":"Jayanthi, N., Indu, S., Hasija, S., and Tripathi, P. (2017). Advances in Computing and Data Sciences: First International Conference, ICACDS 2016, Ghaziabad, India, 11\u201312 November 2016, Springer. Revised Selected Papers 1."},{"key":"ref_16","unstructured":"Abrate, M., Del Grosso, A.M., Giovannetti, E., Duca, A.L., Luzzi, D., Mancini, L., Marchetti, A., Pedretti, I., and Piccini, S. (2014, January 26\u201331). Sharing Cultural Heritage: The Clavius on the Web Project. Proceedings of the LREC 2014, Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland."},{"key":"ref_17","unstructured":"(2023, April 12). The New York Public Library. Available online: https:\/\/www.nypl.org\/collections\/nypl-recommendations\/guides\/goodspeed-manuscript-collection."},{"key":"ref_18","unstructured":"(2023, April 12). The Schoenberg Database of Manuscripts. Available online: https:\/\/sdbm.library.upenn.edu\/."},{"key":"ref_19","unstructured":"Sicuro, M. (2022). Una Piccola Comunit\u00e0 Ebraica al Confine Orientale Veneto-Asburgico in et\u00e0 Moderna: Ontagnano (1577\u20131797), EUT Edizioni Universit\u00e0 di Trieste."},{"key":"ref_20","unstructured":"(2023, March 13). The Friedberg Jewish Manuscript Society Project. Available online: https:\/\/fjms.genizah.org\/."},{"key":"ref_21","unstructured":"(2023, March 13). The Hebrew Manuscripts Digitisation Project. Available online: https:\/\/www.bl.uk\/hebrew-manuscripts."},{"key":"ref_22","unstructured":"(2023, March 13). The Cairo Genizah Collection. Available online: https:\/\/cudl.lib.cam.ac.uk\/collections\/genizah\/1."},{"key":"ref_23","unstructured":"(2023, March 13). The Judaica Collection. Available online: https:\/\/www.nli.org.il\/en\/at-your-service\/who-we-are\/collections\/judaism-collection."},{"key":"ref_24","unstructured":"(2023, March 14). The Jewish Atlantic World Project. Available online: https:\/\/rdc.reed.edu\/c\/jewishatl\/home\/."},{"key":"ref_25","unstructured":"Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., and Doucet, A. (2021). Named entity recognition and classification on historical documents: A survey. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Trias, F., Wang, H., Jaume, S., and Idreos, S. (2021, January 7\u201311). Named entity recognition in historic legal text: A transformer and state machine ensemble method. Proceedings of the Natural Legal Language Processing Workshop 2021, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.nllp-1.18"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Belhi, A., Bouras, A., Al-Ali, A.K., and Sadka, A.H. (2021). Data Analytics for Cultural Heritage, Springer.","DOI":"10.1007\/978-3-030-66777-1"},{"key":"ref_28","first-page":"262","article-title":"Exploring entity recognition and disambiguation for cultural heritage collections","volume":"30","author":"Verborgh","year":"2013","journal-title":"Digit. Sch. Humanit."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Erdmann, A., Wrisley, D.J., Allen, B., Brown, C., Cohen-Bod\u00e9n\u00e8s, S., Elsner, M., Feng, Y., Joseph, B., Joyeux-Prunel, B., and De Marneffe, M.-C. (2019, January 3\u20135). Practical, efficient, and customizable active learning for named entity recognition in the digital humanities. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.","DOI":"10.18653\/v1\/N19-1231"},{"key":"ref_30","unstructured":"Pontes, E.L., Cabrera-Diego, L.A., Moreno, J.G., Boros, E., Hamdi, A., Sid\u00e8re, N., Coustaty, M., and Doucet, A. (2020). Digital Libraries at Times of Massive Societal Transition: 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, 30 November\u20131 December 2020, Springer International Publishing."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Manjavacas, E., and Fonteyn, L. (2022). Adapting vs. Pre-training Language Models for Historical Languages. J. Data Min. Digit. Humanit., 1\u201319.","DOI":"10.46298\/jdmdh.9152"},{"key":"ref_32","unstructured":"Ehrmann, M., Romanello, M., Najem-Meyer, S., Doucet, A., Clematide, S., Faggioli, G., and Potthast, M. (2022). CEUR Workshop Proceedings (No. 3180), CEUR-WS."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1109\/TVCG.2010.159","article-title":"Geneaquilts: A system for exploring large genealogies","volume":"16","author":"Bezerianos","year":"2010","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_34","unstructured":"Gonzalez, J., Nguyen, N.V., and Dang, T. (2022). VisFCAC: An Interactive Family Clinical Attribute Comparison. arXiv."},{"key":"ref_35","unstructured":"Xiang, F., Zhu, S., Wang, Z., Maher, K., Liu, Y., Zhu, Y., Chen, K., and Liang, Z. (2020). ACM SIGGRAPH 2020 Art Gallery, Association for Computing Machinery."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1109\/THMS.2017.2693236","article-title":"GenealogyVis: A System for Visual Analysis of Multidimensional Genealogical Data","volume":"47","author":"Liu","year":"2017","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Korst, J., Pronk, V., and van Wijk, J.J. (2020, January 8\u201310). A visualization of family relations inspired by the london metro map. Proceedings of the 13th International Symposium on Visual Information Communication and Interaction, Eindhoven, The Netherlands.","DOI":"10.1145\/3430036.3430065"},{"key":"ref_38","unstructured":"Mansueli, V.A.P., and Okano, M.T. (2018, January 22\u201326). Representations of genealogies in graph theory: K-Graphs. Proceedings of the 27th International of Association for Management Technology, Birmingham, UK."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Folkman, T., Furner, R., and Pearson, D. (2018, January 17\u201320). GenERes: A genealogical entity resolution system. Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore.","DOI":"10.1109\/ICDMW.2018.00079"},{"key":"ref_40","unstructured":"Leskinen, P., and Hyv\u00f6nen, E. (2021). The Semantic Web\u2013ISWC 2021: 20th International Semantic Web Conference, ISWC 2021, Virtual Event, October 24\u201328, 2021, Proceedings 20, Springer International Publishing."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, M., Feng, J., Shu, X., Jie, Z., and Tang, J. (2018, January 22\u201326). Photo to family tree: Deep kinship understanding for nuclear family photos. Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data, Seoul, Republic of Korea.","DOI":"10.1145\/3267935.3267936"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"2380","DOI":"10.1080\/13658816.2020.1821885","article-title":"Connecting family trees to construct a population-scale and longitudinal geo-social network for the U.S","volume":"35","author":"Koylu","year":"2020","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_43","unstructured":"Lo Duca, A., Bacciu, C., and Marchetti, A. (2019). Ancient Greek Art and European Funerary Art, Cambridge Scholars Publishing."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/2\/42\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:33:12Z","timestamp":1760124792000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/2\/42"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,11]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["informatics10020042"],"URL":"https:\/\/doi.org\/10.3390\/informatics10020042","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,11]]}}}