Visualization for Social Data Science

Author

Roger Beecham

Preface

Version: 0.0.1

Social scientists have at their disposal an expanding array of data measuring very many social behaviours. This is undoubtedly a positive. Previously unmeasurable aspects of human behaviour can now be explored in a large-scale empirical way, whilst already measured aspects of behaviour can be re-evaluated. Such data are nevertheless rarely generated for the sole purpose of social research, and this fact elevates visual approaches in importance due to visualization’s emphasis on discovery. When encountering new data for the first time, data graphics help expose complex structure and multivariate relations, and in so doing help advance an analysis in situations where the questions to be asked and techniques to be deployed may not be immediately obvious.

Visualization toolkits such as ggplot2, vega-lite and Tableau have been designed to ease the process of generating data graphics for analysis. There is a comprehensive set of texts and resources on visualization design theory and practice, and several notable how-to primers on visualization. However, comparatively few existing resources demonstrate with real data and real social science scenarios how and why data graphics should be incorporated in a data analysis, and ultimately how they can be used to generate and claim knowledge.

This book aims to fill this space. It presents principled workflows, with accompanying code, for using data graphics and statistics in tandem. In doing so it equips readers with critical design and technical skills needed to analyse and communicate with a range of datasets in the social sciences.

The book emphasises application. Each chapter introduces concepts for analysis, with an accompanying technical implementation, using a real-world dataset located within a social science domain. The ambition is that by the end of each chapter, we have a more advanced knowlegde and understanding of the phenomena under investigation.

Structure, content and outcomes

Chapters of the book are divided into Concepts and Techniques. The Concepts sections cover key literature, ideas and approaches that can be leveraged to analyse the dataset introduced in the chapter. In the Techniques sections, code examples are provided for implementing those concepts and ideas. Each chapter starts with a list of Knowledge and Skills outcomes that map to the Concepts and Techniques. To support the technical elements, chapters have a corresponding computational notebook template file. The templates contain pre-prepared code chunks to be executed. In the early chapters of the book we aim at brevity in the Concepts sections, offset by slightly more lengthy Techniques sections. As the book progresses the balance shifts somewhat, with more involved conceptual discussion and more specialised and abbreviated technical demonstrations.

Readers of the book will learn how to:

  • Describe, process and combine social science datasets from a range of sources.
  • Design statistical graphics that expose structure in social science data and that are underpinned by evidence-backed practice in information visualization and cartography.
  • Use modern data science and visualization frameworks to produce data analysis code that is coherent and easily shareable.
  • Apply modern statistical techniques for analysing, representing and communicating data and model outputs with integrity.

Audience and assumed background

The book is for people working in the broadly defined social sciences, including Geography, Public Health, Transportation, Political Science, Business, Economics. It is aimed at postgraduate students and researchers, data journalists, analysts working in public sector, non-governmental and commercial organisations.

All technical examples are implemented using the R programming environment; so too every data graphic that appears in this book. Little prior knowledge of the R ecosystem is assumed, but as the chapters progress more advanced concepts and coding procedures are introduced. Whilst the book covers many of the fundamentals of R for working with social science datasets, our ultimate aim is to demonstrate through example how data graphics can and should be used in a data analysis. In this way it complements core resources that more fully cover these how-to aspects: R for Data Science (Wickham and Grolemund 2017), Tidymodelling with R (Kuhn and Silge 2023) and Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019).

Omissions and additions

There are certain omissions in the book, as well as certain additions, that might be surpising to those seasoned in reading data visualization books. We do not cover interactivity in data graphics and there is not a chapter dedicated to geospatial visualization, though numerous geospatial visualizations (maps) appear throughout to addresss particular analysis questions.

The reasons for this are principled as well as pragmatic. The R programming environment is not well-suited to highly flexible, interactive data graphics. Even if it were, we would question the need for interaction in many of the real-world data analysis scenarios covered in this book. The lack of a dedicated geovisualization chapter will hopefully become clear by the end of Chapter 3. It is useful to apply the same theory, heuristics, and coding ideas to designing and evaluating maps as one would any other data graphic.

Space in the book is instead dedicated to introspecting into data graphics: the role of statistics and models for emphasising important structure and de-emphaising spurious structure, the differing purposes of data graphics at different analysis stages and the role of data graphics in building trust and integrity. Many of the book’s influences are from data journalism, as well as information visualization and cartography.

Acknowledgments

You will notice that the book is written in the first person, but with “we/our” rather than the singular pronoun “I/my”. The reasons for this are partly stylistic. They also, hopefully, betray that the ideas presented in the book are not entirely “my own”. In particular I would like to thank Jo Wood and Jason Dykes, whose thinking on visualization design and practice runs throughout the book; and Robin Lovelace, who helped get things kick-started, whose technical knowledge is legion and whose critique and encouragement is always welcome. Thanks also to …

License

This website is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License.