Visualization for Social Data Science

Author

Roger Beecham

Note

This is the online home of Visualization for Social Data Science, a book to be published by Chapman and Hall/CRC Press in July 2025. You can pre-order the book here.

Endorsements

A book that gives learners the inspiration, knowledge and worked examples to create cutting edge visualisations of their own.
– James Chesire, Professor of Geographic Information and Cartography, University College London, co-author of Atlas of the Invisible, Where the Animals Go and London: The Information Capital.

Balance is at the root of good design, and resonates throughout Visualization for Social Data Science. The book’s harmony of concepts and techniques, precision and creativity, provide a perfect tonic for any social scientist seeking to scale up their visualization and data science knowledge and skills.
– Matt Duckham, Director of Information in Society EIP, RMIT, Melbourne

If a picture is worth a thousand words, a book demonstrating the how and why of effective visualisation that complements and strengthens data analytics is priceless. Beecham’s volume is such a prize, pairing examples and applications from across the social sciences with code and data to illustrate the power of statistical and visual analysis working in tandem.
– Rachel Franklin, Executive Director, Center for Geographic Analysis, Harvard University

This is an important book on an important topic. I particularly like the examples showing different visualizations of the same data and the parallel presentation of graphics and code. And I absolutely love the chapter on visual storytelling. I can’t wait to use this book in my classes.
– Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York

This is a very well-structured, clearly-written introduction to visualizing social data, especially data with a strong spatial component. The examples are accessible, instructive, and beautifully developed within each chapter. While the book is an excellent introduction to methods and techniques, it never loses sight of why we want to look at data in the first place.
– Kieran Healy, Duke University and author of Data Visualization: A practical introduction

Novel sources of “found” data are creating new opportunities and greater responsibilities for understanding data to avoid specious discoveries. Visualization for Social Data Science enables social scientists to be careful, thoughtful and effective social data scientists by illuminating both how and why to incorporate visualization into scientific discovery.
—- Harvey Miller, Bob and Mary Reusche Chair, Center for Urban and Regional Analysis, The Ohio State University

Preface

Social scientists have at their disposal an expanding array of data measuring very many social behaviours. This is undoubtedly a positive. Previously unmeasurable aspects of human behaviour can now be explored in a large-scale empirical way, while already measured aspects of behaviour can be re-evaluated. Such data are nevertheless rarely generated for the sole purpose of social research, and this fact elevates visual approaches in importance due to visualization’s emphasis on discovery. When encountering new data for the first time, data graphics help expose complex structure and multivariate relations, and in so doing advance an analysis in situations where the questions to be asked and techniques to be deployed may not be immediately obvious.

Visualization toolkits such as ggplot2, vega-lite and Tableau have been designed to ease the process of generating data graphics for analysis. There is a comprehensive set of texts and resources on visualization design theory, and several notable how-to primers on visualization practice. However, comparatively few existing resources demonstrate with real data and real social science scenarios how and why data graphics should be incorporated in a data analysis, and ultimately how they can be used to generate and claim knowledge.

This book aims to fill this space. It presents principled workflows, with code, for using data graphics and statistics in tandem. In doing so it equips readers with critical design and technical skills needed to analyse and communicate with a range of datasets in the social sciences.

The book emphasises application. Each chapter introduces concepts for analysis, with an accompanying technical implementation that uses real-world data on a range of Public Health, Transportation, Social and Electoral outcomes. The ambition is that by the end of each chapter, we have a more advanced knowledge and understanding of the phenomena under investigation.

Structure, content and outcomes

Chapters of the book are divided into Concepts and Techniques. The Concepts sections cover key literature, ideas and approaches that can be leveraged to analyse the dataset introduced in the chapter. In the Techniques sections, code examples are provided for implementing those concepts and ideas. Each chapter starts with a list of Knowledge and Skills outcomes that map to the Concepts and Techniques. To support the technical elements, chapters have a corresponding computational notebook file. These files contain pre-prepared code chunks to be executed. In the early chapters we aim at brevity in the Concepts sections, offset by slightly more lengthy Techniques sections. As the book progresses the balance shifts somewhat, with more involved conceptual discussion and more specialised and abbreviated technical demonstrations.

Readers of the book will learn how to:

Describe, process and combine social science datasets from a range of sources.
Design statistical graphics that expose structure in social science data and that are underpinned by evidence-backed practice in information visualization and cartography.
Use data science and visualization frameworks to produce data analysis code that is coherent and easily shareable.
Apply modern statistical and graphical techniques for analysing, representing and communicating data and model outputs with integrity.

Audience and assumed background

The book is for people analysing societal issues, broadly defined, including from within Geography, Public Health, Transportation and Political Science. It is aimed at postgraduate students and researchers, data journalists, analysts working in public sector and commercial organisations.

All technical examples are implemented using the R programming environment; so too every data graphic that appears in this book. Some prior knowledge of the R ecosystem is assumed, and as the chapters progress, more advanced concepts and coding procedures are introduced. While the book covers many of the fundamentals of R for working with social science datasets, our ultimate aim is to demonstrate through example how data graphics can and should be used in a data analysis. In this way it complements core resources that more fully cover, from zero-level prior knowledge, these how-to aspects: R for Data Science (Wickham and Grolemund 2017), Tidymodelling with R (Kuhn and Silge 2023) and Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019).

Omissions and additions

There are certain aspects of the book that might be surpising to those seasoned in reading data visualization textbooks. We do not cover interactivity in data graphics, and there is not a chapter dedicated to geospatial visualization, though numerous geospatial visualizations (maps) appear throughout to address particular analysis questions.

The reasons for this are principled as well as pragmatic. The R programming environment is not well-suited to highly flexible, interactive data graphics. Even if it were, we would question the need for interaction in many of the real-world data analysis scenarios covered in this book. The lack of a dedicated geovisualization chapter will hopefully become clear by the end of Chapter 3. It is useful to apply the same theory, heuristics and coding ideas to designing and evaluating maps as one would any other data graphic.

Space in the book is instead dedicated to introspecting into data graphics: the role of statistics and models for emphasising important structure and de-emphaising spurious structure, the differing purposes of data graphics at different analysis stages and the role of data graphics in building trust and integrity. Many of the book’s influences are from data journalism, as well as information visualization and cartography.

Acknowledgments

You will notice that the book is written in the first person, but with “we/our” rather than the singular pronoun “I/my”. The reasons for this are partly stylistic. They also, hopefully, betray that the ideas and work presented in the book are not entirely my own. In particular “I” would like to thank Jo Wood and Jason Dykes, whose thinking on visualization design and practice runs throughout the book; and Robin Lovelace, who helped get things kick-started, whose technical knowledge is legion and whose critique and encouragement is always welcome. Thanks also to Lara Spieker from CRC Press and Taylor & Francis for helping move from an early plan to full production. And finally, as ever, to the reviewers for providing expert feedback on the book’s structure and emphasis, and for the more general encouragement and positivity.

BBC Visual and Data Journalism Team. 2019. “BBC Visual and Data Journalism Cookbook for R Graphics.” https://github.com/bbc/rcookbook.

Beecham, R., N. Williams, and L. Comber. 2020. “Regionally-Structured Explanations Behind Area-Level Populism: An Update to Recent Ecological Analyses.” PLOS One 15 (3): e0229974. https://doi.org/10.1371/journal.pone.0229974.

Buja, A., D. Cook, H. Hofmann, M. Lawrence, E-K Lee, D. Swayne, and H. Wickham. 2009. “Statistical Inference for Exploratory Data Analysis and Model Diagnostics.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367 (1906): 4361–83. https://doi.org/10.1098/rsta.2009.0120.

Burn-Murdoch, J. 2023. “Making Charts That Make an Impact.” Invited talk, Data Visualization Society’s Outlier Conference. https://www.youtube.com/watch?v=tIbaQUo6H9g&ab_channel=DataVisualizationSociety.

Comber, A., C. Brunsdon, M. Charlton, G. Dong, R. Harris, B. Lu, Y. Lü, et al. 2023. “A Route Map for Successful Applications of Geographically Weighted Regression.” Geographical Analysis 55 (1): 155–78. https://doi.org/10.1111/gean.12316.

Franconeri, S. L., L. M. Padilla, P. Shah, J. M. Zacks, and J. Hullman. 2021. “The Science of Visual Data Communication: What Works.” Psychological Science in the Public Interest 22 (3): 110–61. https://doi.org/10.1177/15291006211051956 .

Healy, K. 2019. Data Visualization: A Practical Introduction. Princeton, NJ: Princeton University Press. https://socviz.co.

Hullman, J., and A. Gelman. 2021. “Designing for Interactive Exploratory Data Analysis Requires Theories of Graphical Inference.” Harvard Data Science Review 3 (3).

Ismay, C., and A. Kim. 2020. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. New York, NY: CRC Press. https://doi.org/10.1201/9780367409913.

Kale, A., F. Nguyen, M. Kay, and J. Hullman. 2019. “Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data.” IEEE Transactions on Visualization and Computer Graphics 25 (1): 892–902. https://doi.org/10.1109/TVCG.2018.2864909.

Kay, M. 2021. “Uncertainty Visualization as a Moral Imperative.” Invited talk, BostonCHI meeting. https://www.youtube.com/watch?v=mfQ3QVyw4N0&ab_channel=BostonCHI.

Kosara, R. 2023. “Lesson 4: Presentation, Uncertainty, ISOTYPE.” ObservableHQ Notebook; ObservableHQ. https://observablehq.com/@observablehq/lesson-4-presentation-uncertainty-isotype?collection=@observablehq/advanced-data-vis-course.

Kuhn, M., and J. Silge. 2023. Tidy Modelling with R. Sebastopol, CA: O’Reilly.

Lovelace, R., J. Nowosad, and J. Muenchow. 2019. Geocomputation with R. London, UK: CRC Press.

Scherer, C. 2023. “Designing Data Visualizations to Successfully Tell a Story.” Workshop at Posit::conf(2023), Chicago, IL. https://posit-conf-2023.github.io/dataviz-storytelling/.

The Turing Way Community. 2025. “The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research.” Zenodo. https://doi.org/10.5281/zenodo.15213042.

Wickham, H., and G. Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O’Reilly Media.

Wolf, L. J., L. Anselin, D. Arribas-Bel, and L. Rivers Mobley. 2021. “On Spatial and Platial Dependence: Examining Shrinkage in Spatially Dependent Multilevel Models.” Annals of the American Association of Geographers 111 (6): 1679–91. https://doi.org/10.1080/24694452.2020.1841602.

Yang, F., M. Cau, C. Mortenson, H. Fakhari, A. D. Lokmanoglu, J. Hullman, S. Franconeri, N. Diakopoulos, E. C. Nisbet, and M. Kay. 2024. “Swaying the Public? Impacts of Election Forecast Visualizations on Emotion, Trust, and Intention in the 2022 U.S. Midterms.” IEEE Transactions on Visualization and Computer Graphics 30 (1): 23–33. https://doi.org/10.1109/TVCG.2023.3327356.