Skip to main content

Using ChatGPT for Fast and Impressive Data Visualisations in R

Workshop

Data visualization is a must-have skill for every empirical researcher, regardless of the field. The visual medium is often the most efficient way to explore data, generate hypothesis, communicate results, and convey the story behind a research paper and its underlying purpose. However, beautiful data visualizations come with a steep learning curve. The problem is that the best data visualizations in environments like R will produce incredibly beautiful results, but even seasoned researchers must often build these visualizations from scratch to reflect the particular features of new research projects and ensure a replicable research process.

Thankfully, new developments in generative AI make it easier to produce beautiful visualizations, even for researchers who have little experience with coding or R. Good data visualisation with R requires understanding of several subjects:
 

  • The nature and properties of data that can be visualized;
  • A grounded understanding of how humans processes visual inputs;
  • Best-practices and the underlying principles of good visualization design;
  • The logic of ggplot2 package and the 'Grammar of Graphics' theory behind it;
  • How AI tools such as ChatGPT and Github Copilot can support the creation of beautiful visualizations.


This introductory workshop, taught by professor Peter Gruber (who has dual PhDs in physics and economics), provides PhD students, professors, and professionals with the right combination of theoretical foundations and practical tools to create engaging, statistically correct, and efficient data visualizations with AI assistants like ChatGPT and Copilot. The workshop features a variety of practical examples, critiques, and best-practice guidelines that participants can immediately apply to their daily research. Equal emphasis is places on statistical soundness, great design, and efficient coding with AI support, allowing even beginners to get started right away.

Day 1 Topics covered:

  • An introduction to Data Visualization
    • The 'Grammar of Graphics' and the ggplot2 package for R
    • Simple data visualizations
  • Understanding data in order to understand visualization
    • R data types and R data.frames
    • Stratified data graphs
  • Perception, the Gestalt Principles, and rules of good design
  • Statistical data visualizations


Day 2 Topics covered:

  • The language of Data Visualization and graphical prompting with AI
    • Tools for efficient work: Rstudio, ChatGPT, and Github Copilot
    • Refining data visualizations
  • Color and contrast
  • Creating classical and advanced data maps
  • Current research and the future of data visualization
    • Visualizing textual data
    • Advanced visualizations


Examples across the two days will include:

  • Stem and leaf plots
  • Histograms, density, box, violin, QQ plots, and Pareto charts; error bars
  • Bar and line charts, parallel coordinate plots
  • Scatter, regression and bubble plots
  • Pie, donut, waffle, marimekko charts, treemaps
  • Dot distribution maps, heat maps, choropleth, cartogram, grid and hexagon maps; statebins
  • Sankey charts, chord diagrams, word clouds


For PhD students and academic researchers, this workshop offers a unique opportunity master an important research skill. Assuming no prior knowledge of data visualization and only basic R skills, participants will gain an understanding of how to harness the combined power of R, ChatGPT and Github Copilot for data visualization, enabling them to conduct more efficient research and to communicate it more convincingly. The skills acquired in this workshop will not only enhance their research capabilities but also open up new avenues for innovation in their respective fields, while saving them significant time in their research workflows and allowing them to more easily published results by presenting them in ways that are compelling and convincing.

The seminar will be taught via Zoom and features daily take-home skill challenges, and all seminar Zoom recordings and material (including program input, output, data, and slides) will be available online for 30 days after the seminar begins – just in case you would prefer to attend asynchronously or you would like to go back and revisit the seminar content after it concludes. An online seminar chat forum will also be monitored by the instructor during the seminar and for 30 days after the seminar concludes, so that you can ask questions related to seminar content outside of the live seminar sessions. An official Instats certificate of completion is provided at the conclusion of the seminar.

13 - 14 May 2024