{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T10:03:47Z","timestamp":1766311427808,"version":"3.37.3"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,10,17]],"date-time":"2019-10-17T00:00:00Z","timestamp":1571270400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,10,17]],"date-time":"2019-10-17T00:00:00Z","timestamp":1571270400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000181","name":"Air Force Office of Scientific Research","doi-asserted-by":"publisher","award":["FA9550-15-1-007"],"award-info":[{"award-number":["FA9550-15-1-007"]}],"id":[{"id":"10.13039\/100000181","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000925","name":"John Templeton Foundation","doi-asserted-by":"publisher","award":["61066"],"award-info":[{"award-number":["61066"]}],"id":[{"id":"10.13039\/100000925","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["N66001-16-1-4067"],"award-info":[{"award-number":["N66001-16-1-4067"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EPJ Data Sci."],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reading remains a preferred leisure activity fueling an exceptionally competitive publishing market: among more than three million books published each year, only a tiny fraction are read widely. It is largely unpredictable, however, which book will that be, and how many copies it will sell. Here we aim to unveil the features that affect the success of books by predicting a book\u2019s sales prior to its publication. We do so by employing the<jats:italic>Learning to Place<\/jats:italic>machine learning approach, that can predicts sales for both fiction and nonfiction books as well as explaining the predictions by comparing and contrasting each book with similar ones. We analyze features contributing to the success of a book by feature importance analysis, finding that a strong driving factor of book sales across all genres is the publishing house. We also uncover differences between genres: for thrillers and mystery, the publishing history of an author (as measured by previous book sales) is highly important, while in literary fiction and religion, the author\u2019s visibility plays a more central role. These observations provide insights into the driving forces behind success within the current publishing industry, as well as how individuals choose what books to read.<\/jats:p>","DOI":"10.1140\/epjds\/s13688-019-0208-6","type":"journal-article","created":{"date-parts":[[2019,10,17]],"date-time":"2019-10-17T14:59:19Z","timestamp":1571324359000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Success in books: predicting book sales before publication"],"prefix":"10.1140","volume":"8","author":[{"given":"Xindi","family":"Wang","sequence":"first","affiliation":[]},{"given":"Burcu","family":"Yucesoy","sequence":"additional","affiliation":[]},{"given":"Onur","family":"Varol","sequence":"additional","affiliation":[]},{"given":"Tina","family":"Eliassi-Rad","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4028-3522","authenticated-orcid":false,"given":"Albert-L\u00e1szl\u00f3","family":"Barab\u00e1si","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,10,17]]},"reference":[{"key":"208_CR1","unstructured":"Statista: U.S. Book Industry\/Market\u2014Statistics & Facts. https:\/\/www.statista.com\/topics\/1177\/book-market\/ [Online; accessed 23-May-2018] (2018)"},{"key":"208_CR2","first-page":"1753","volume-title":"Proceedings of the 2013 conference on empirical methods in natural language processing","author":"VG Ashok","year":"2013","unstructured":"Ashok VG, Feng S, Choi Y (2013) Success with style: using writing style to predict the success of novels. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp\u00a01753\u20131764"},{"issue":"2","key":"208_CR3","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1080\/08997760701193720","volume":"20","author":"M Clement","year":"2007","unstructured":"Clement M, Proppe D, Rott A (2007) Do critics make bestsellers? Opinion leaders and the success of books. J Media Econ 20(2):77\u2013105","journal-title":"J Media Econ"},{"issue":"3","key":"208_CR4","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1509\/jmkr.43.3.345","volume":"43","author":"JA Chevalier","year":"2006","unstructured":"Chevalier JA, Mayzlin D (2006) The effect of word of mouth on sales: online book reviews. J Mark Res 43(3):345\u2013354","journal-title":"J Mark Res"},{"issue":"1","key":"208_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/0001839214523602","volume":"59","author":"B Kov\u00e1cs","year":"2014","unstructured":"Kov\u00e1cs B, Sharkey AJ (2014) The paradox of publicity: how awards can negatively affect the evaluation of quality. Adm Sci Q 59(1):1\u201333","journal-title":"Adm Sci Q"},{"issue":"2","key":"208_CR6","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1007\/s10824-013-9203-0","volume":"38","author":"E Shehu","year":"2014","unstructured":"Shehu E, Prostka T, Schmidt-St\u00f6lting C, Clement M, Bl\u00f6meke E (2014) The influence of book advertising on sales in the German fiction book market. J Cult Econ 38(2):109\u2013130","journal-title":"J Cult Econ"},{"issue":"1","key":"208_CR7","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1632\/pmla.2013.128.1.238","volume":"128","author":"L Nakamura","year":"2013","unstructured":"Nakamura L (2013) \u201cWords with friends\u201d: socially networked reading on Goodreads. PMLA 128(1):238\u2013243","journal-title":"PMLA"},{"issue":"1","key":"208_CR8","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1007\/s10824-006-9029-0","volume":"31","author":"J Beck","year":"2007","unstructured":"Beck J (2007) The sales effect of word of mouth: a model for creative goods and estimates for novels. J Cult Econ 31(1):5\u201323","journal-title":"J Cult Econ"},{"issue":"1","key":"208_CR9","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1080\/08997764.2011.549428","volume":"24","author":"C Schmidt-St\u00f6lting","year":"2011","unstructured":"Schmidt-St\u00f6lting C, Bl\u00f6meke E, Clement M (2011) Success drivers of fiction books: an empirical analysis of hardcover and paperback editions in Germany. J Media Econ 24(1):24\u201347. https:\/\/doi.org\/10.1080\/08997764.2011.549428","journal-title":"J Media Econ"},{"issue":"4","key":"208_CR10","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1002\/dir.20087","volume":"21","author":"C Dellarocas","year":"2007","unstructured":"Dellarocas C, Zhang XM, Awad NF (2007) Exploring the value of online product reviews in forecasting sales: the case of motion pictures. J Interact Mark 21(4):23\u201345. https:\/\/doi.org\/10.1002\/dir.20087","journal-title":"J Interact Mark"},{"key":"208_CR11","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1109\/ASONAM.2010.50","volume-title":"Advances in social networks analysis and mining (ASONAM), 2010 international conference on","author":"F Abel","year":"2010","unstructured":"Abel F, Diaz-Aviles E, Henze N, Krause D, Siehndel P (2010) Analyzing the blogosphere for predicting the success of music and movie products. In: Advances in social networks analysis and mining (ASONAM), 2010 international conference on. IEEE Press, New York, pp\u00a0276\u2013280"},{"key":"208_CR12","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1145\/2818048.2820065","volume-title":"Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. CSCW \u201916","author":"J Park","year":"2016","unstructured":"Park J, Ciampaglia GL, Ferrara E (2016) Style in the age of instagram: predicting success within the fashion industry using social media. In: Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. CSCW \u201916. ACM, New York, pp\u00a064\u201373. https:\/\/doi.org\/10.1145\/2818048.2820065"},{"issue":"2","key":"208_CR13","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1177\/002224296002500206","volume":"25","author":"LA Fourt","year":"1960","unstructured":"Fourt LA, Woodlock JW (1960) Early prediction of market success for new grocery products. J Mark 25(2):31\u201338","journal-title":"J Mark"},{"issue":"8","key":"208_CR14","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0071226","volume":"8","author":"M Mesty\u00e1n","year":"2013","unstructured":"Mesty\u00e1n M, Yasseri T, Kert\u00e9sz J (2013) Early prediction of movie box office success based on Wikipedia activity big data. PLoS ONE 8(8):71226","journal-title":"PLoS ONE"},{"issue":"1","key":"208_CR15","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-017-0111-y","volume":"6","author":"O Varol","year":"2017","unstructured":"Varol O, Ferrara E, Menczer F, Flammini A (2017) Early detection of promoted campaigns on social media. EPJ Data Sci 6(1):13","journal-title":"EPJ Data Sci"},{"issue":"1","key":"208_CR16","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-018-0135-y","volume":"7","author":"B Yucesoy","year":"2018","unstructured":"Yucesoy B, Wang X, Huang J, Barab\u00e1si A-L (2018) Success in books: a big data approach to bestsellers. EPJ Data Sci 7(1):7","journal-title":"EPJ Data Sci"},{"key":"208_CR17","unstructured":"Group, B.I.S.: Complete BISAC Subject Headings List, 2017 Edition. http:\/\/bisg.org\/page\/BISACEdition [Online; accessed 4-October-2017] (2017)"},{"key":"208_CR18","unstructured":"Wikipedia: Data dumps. https:\/\/meta.wikimedia.org\/wiki\/Data_dumps [Online; accessed 13-April-2018] (2018)"},{"key":"208_CR19","unstructured":"Wikipedia: API:Main page. https:\/\/www.mediawiki.org\/wiki\/API:Main_page [Online; accessed 13-April-2018] (2018)"},{"key":"208_CR20","doi-asserted-by":"crossref","unstructured":"Spoerri A (2007) What is popular on Wikipedia and why? First Monday 12(4)","DOI":"10.5210\/fm.v12i4.1765"},{"issue":"5","key":"208_CR21","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1177\/0002764212469367","volume":"57","author":"B Keegan","year":"2013","unstructured":"Keegan B, Gergle D, Contractor N (2013) Hot off the Wiki: structures and dynamics of Wikipedia\u2019s coverage of breaking news events. Am Behav Sci 57(5):595\u2013622","journal-title":"Am Behav Sci"},{"issue":"1","key":"208_CR22","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-016-0079-z","volume":"5","author":"B Yucesoy","year":"2016","unstructured":"Yucesoy B, Barab\u00e1si A-L (2016) Untangling performance from success. EPJ Data Sci 5(1):17","journal-title":"EPJ Data Sci"},{"key":"208_CR23","volume-title":"Natural language processing with Python","author":"S Bird","year":"2009","unstructured":"Bird S, Klein E, Loper E (2009) Natural language processing with Python, 1st edn. O\u2019Reilly Media, Sebastopol","edition":"1"},{"key":"208_CR24","volume-title":"Foundations of statistical natural language processing","author":"CD Manning","year":"1999","unstructured":"Manning CD, Sch\u00fctze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge"},{"issue":"2","key":"208_CR25","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"S Lloyd","year":"1982","unstructured":"Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129\u2013137","journal-title":"IEEE Trans Inf Theory"},{"issue":"2","key":"208_CR26","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/oxfordjournals.pan.a004868","volume":"9","author":"G King","year":"2001","unstructured":"King G, Zeng L (2001) Logistic regression in rare events data. Polit Anal 9(2):137\u2013163","journal-title":"Polit Anal"},{"issue":"1","key":"208_CR27","first-page":"543","volume":"17","author":"D Hsu","year":"2016","unstructured":"Hsu D, Sabato S (2016) Loss minimization and parameter estimation with heavy tails. J Mach Learn Res 17(1):543\u2013582","journal-title":"J Mach Learn Res"},{"issue":"1","key":"208_CR28","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1111\/coin.12123","volume":"34","author":"M Maalouf","year":"2018","unstructured":"Maalouf M, Homouz D, Trafalis TB (2018) Logistic regression in large rare events and imbalanced data: a performance comparison of prior correction and weighting methods. Comput Intell 34(1):161\u2013174","journal-title":"Comput Intell"},{"issue":"1","key":"208_CR29","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-017-03011-5","volume":"7","author":"M Schubach","year":"2017","unstructured":"Schubach M, Re M, Robinson PN, Valentini G (2017) Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants. Sci Rep 7(1):2959","journal-title":"Sci Rep"},{"key":"208_CR30","unstructured":"Wang X, Varol O, Eliassi-Rad T (2019) L2P: an algorithm for estimating heavy-tailed outcomes. arXiv preprint. arXiv:1908.04628"},{"issue":"1","key":"208_CR31","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332","journal-title":"Mach Learn"},{"key":"208_CR32","first-page":"80","volume":"2","author":"F Mosteller","year":"1968","unstructured":"Mosteller F, Tukey JW (1968) Data analysis, including statistics. Handb Soc Psychol 2:80\u2013203","journal-title":"Handb Soc Psychol"},{"key":"208_CR33","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","volume":"36","author":"M Stone","year":"1974","unstructured":"Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc, Ser B, Methodol 36:111\u2013147","journal-title":"J R Stat Soc, Ser B, Methodol"},{"key":"208_CR34","first-page":"451","volume-title":"Advances in neural information processing systems","author":"WW Cohen","year":"1998","unstructured":"Cohen WW, Schapire RE, Singer Y (1998) Learning to order things. In: Advances in neural information processing systems, pp\u00a0451\u2013457"},{"key":"208_CR35","first-page":"569","volume-title":"Advances in neural information processing systems","author":"R Herbrich","year":"2007","unstructured":"Herbrich R, Minka T, Graepel T (2007) Trueskill\u2122: a Bayesian skill rating system. In: Advances in neural information processing systems, pp\u00a0569\u2013576"},{"key":"208_CR36","first-page":"133","volume-title":"Proc of the 8th ACM SIGKDD intl conf on knowledge discovery and data mining","author":"T Joachims","year":"2002","unstructured":"Joachims T (2002) Optimizing search engines using clickthrough data. In: Proc of the 8th ACM SIGKDD intl conf on knowledge discovery and data mining. ACM, New York, pp\u00a0133\u2013142"}],"container-title":["EPJ Data Science"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-019-0208-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1140\/epjds\/s13688-019-0208-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-019-0208-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T04:45:43Z","timestamp":1721882743000},"score":1,"resource":{"primary":{"URL":"https:\/\/epjdatascience.springeropen.com\/articles\/10.1140\/epjds\/s13688-019-0208-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,17]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["208"],"URL":"https:\/\/doi.org\/10.1140\/epjds\/s13688-019-0208-6","relation":{},"ISSN":["2193-1127"],"issn-type":[{"type":"electronic","value":"2193-1127"}],"subject":[],"published":{"date-parts":[[2019,10,17]]},"assertion":[{"value":"21 February 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 September 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 October 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"ALB is the founder and holds shares of three startups that use network science and data science tools. The other authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"31"}}