Different graphs for different purposes

From Stepping Up

Jump to: navigation, search
Different graphs for different purposes
Letting the Numbers Talk
Stepping Up Guide

Once the data has been acquired and analyzed, it is time to present it, hopefully in the most attractive and perceptive way! Deciding on the format to use becomes easier when you know that there are specific presentation tools for each type of data:

  1. Charts displaying frequencies,
  2. Charts displaying distributions,
  3. Charts displaying correlations.


I want to show: I should use: What type of data do I need?
Frequency of occurrence (how often),

Comparison of magnitude (how much or how strongly)

Bar chart

Data table

Pareto chart

Categorized data, data tallied in subdivisions
Distribution of observations (how much variety, over what range) Histogram Several measurements (repeats) of a single experimental quantity, i.e. all parameters are identical between each measurements

See Replication

Correlation between parameters (is one variable related or affected by another) Run chart (Trend over time)

Map (Correlation in space)

Scatter diagram

Measurements of an experimental quantity, the dependent variable while varying some parameter of interest, the independent variable. Repeats, i.e. keeping the indepedent variable fixed and acquiring several observations, can be included.

For a run chart, the independent variable is time; for a map, it is position.

See Knowing the Variables: Control!


Contents

Representing frequency of occurence

Data Table

Tables convey precise numerical information whereas graphs are more helpful to grasp the bigger picture. In most cases, graphs will make it easier to interpret your data.

However, in some occasion, tables may be preferred over graphs:

  • Each value can be labelled or described easily, for instance to represent data from various sources,
  • Values can be grouped in categories,
  • Incomplete or disparate data sets are more accurately represented,
  • The difference in order of magnitude can be better appreciated, e.g. to represent large, intermediate and small values at the same time.

Nevertheless, it is a good idea to include both tables and graphs of your raw data and results in the Data section of your report. Here is a good example of a table.Informative Table.

No pun intended, the Periodic Table is a very good example of arranging data by characteristic, and category. A graphical representation of all the elements would be very confusing and not very useful. Tables are beautiful in that we can see specific parameters of a given variable arranged neatly (Element is our variable, and the specifics are atomic mass, number, number of orbitals, etc.)

Bar Chart

A Sample Bar Chart
A Sample Bar Chart

Bar charts present results that compare different groups. Bar charts work best when showing comparisons among categories. They are also the best way of showing "how much" or "how often" one category is affected in relation to another, depending on what you are trying to show. For example, if I wanted to show how many students get As after drinking Gatorade for the whole semester, compared to those that don't, a bar chart will clearly show the difference.

There are three different bar chart types that are used in different situations. 'Simple bar charts', 'Grouped bar charts' and Stacked Bar Charts'.

Simple bar charts sort data into simple categories. This is the type that you are most familiar with. Use vertical or horizontal bars; horizontal bars look better when long labels are being used.

Grouped bar charts divide data into groups within each category and show comparisons between individual groups as well as between categories. The groups within the categories have the bars "touching". For example, say I apply drug X to normal cells and cancer cells, and then apply "placebo drug" to normal cells and cancer cells. I would likely sort the bars of "Drug X treated" into one category (have them touch). I would also sort the bars of "placebo drug treated" into another category (have these bars touch on the sides). Another example is if you are comparing rates of smoking in different age groups in 2005 and 1955. You could group 2005 and 1995 bars touching, and then separate the groups based on different age groups.

Stacked bar charts show related groups one on top of each other. The components of each bar in a stacked bar chart should show each component as a fraction (or percentage) of the whole. It is recommended that you consider using stacked bar charts rather than pie charts if you need to compare more than three sets of data.

Here are a few examples courtesy of Concordia University.

Image:Vertical.jpg

Image:Grouped.jpg

Image:Stacked.jpg

Whenever possible, use bar or pie charts to support data interpretation. Do not assume that results or points are so clear and obvious that a chart is not needed for clarity.

Pie Chart

A Sample Pie Chart
A Sample Pie Chart

The idea behind the pie chart is that many "pieces of pie" represent the relative proportion of various items in making up the whole quantity, i.e. how the "pie" is divided up. It is recommended that you do not to use pie charts for the following reasons:

  • Pie charts take up a lot of space for the information they represent,
  • The "pieces of pie" are not consistent in the way they are arranged spatially,
  • In some occasions, it is difficult for the eye to gauge the difference between pieces of pie.

However, if you would like to use a pie chart, make sure you use it when a certain fraction is overwhelming small or large when compared to another. That is to say, pie charts are quick ways of representing two quantities if it is easy to determine that one is significantly different than the other.

Instead, the same information can be represented in the form of a bar chart or a table. If you decide nevertheless to use pie charts, be careful not to use too many notations. Keep it as simple as possible, only including enough information necessary to interpret the chart.

Representing distribution of measurements

Histogram

Histograms, unlike bar charts, are used to display numerical/quantitative variables. For example, say you are measuring the pulse rate of 50 women to see how it differs between them.

Imagine using a bar chart to display this data - you would have 50 bars! This is ridiculous. We cannot do this because here we are looking at numerical data. Using a histogram, we slice up the entire span of values covered by the quantitative variable into equal-width piles called bins. Then we count the number of values that fall into each bin. The bin and the counts in each bin give the distribution of the quantitative variable. So for our pulse experiment, we would divide the X-axis into bins showing ranges of pulse rate. ie: Bin 1 would be 55 - 60, Bin 2 would be 60 - 65, Bin 3 would be 65 - 70, etc. Then we total the number of counts in each bin. This is the vertical height of each bin.

From this, we can see the distribution of the pulse rate among the 50 women. We might expect it to have a bell-shape.

Representing correlation between variables

Map

When thinking about a map, one might have in mind various city or countries with their geography represented in relation to others. Maps describe the position and shape of objects in space. You can take the concept to your advantage and use maps to convey spatial information about your results.

Numerical or qualitative information can be added on top of the spatial information on a map, for example:

  • The amount of contaminant in soil samples can be mapped to their location of origin,
  • Important tumour suppressor genes can be mapped on the 1D map of human chromosome 1.

This additional information can be carried by various visual effects; use the one which works best:

  • Important sites and values can be indicated with labels and arrows,
  • A variable can be represented at different sites in a bubble graph, the size of the bubble being related to the value of the variable at this site,
  • The map can take different colors depending on the value of a variable; this is called a heat map. In this case, a color legend must be included.

Maps can easily represent information in 1D and 2D. Three-dimensional (3D) maps or renditions of objects can also be prepared, but note that, for the human eye, it is difficult to perceive perspective as represented on a flat surface such as the science fair board. If you have data with more than two dimensions (i.e. 3D, 4D, etc), you should seek an alternative way of representing it, for example:

  • A panel with a series of 2D maps,
  • A computer animation or an interactive 3D model,
  • A small-scale mock-up model that you keep at your stand!

Scatter Diagram

Scatter diagrams, also called Scatterplots, may be the most common and most effective display for data. By looking at them you can see patterns, trends, associations, relationships and outliers. Scatter plots are observing the relationship between two quantitative variables. For example, we could do a scatterplot of "year" and "expenses of the Enron Corporation". This is a special type of scatter plot called a time plot.

Scatter diagrams relate two quantitative variables and ask whether there is an association between them. Are grades higher now than they used to be? Are height and weight of a person related? Is the cost of traffic congestion per person related to the peak speed on the freeway? How does the number of cells vary over time with a particular treatment? Is the time spent at the dinner table related to how many calories you burn? Is the speed of the roller coaster related to the drop?

Note: Just because a relation "exists" between two quantities, it's really up to you to make sure that that relationship is logical, or meaningful. For example there might be, by coincidence, a remarkable relationship between ice cream prices and the weather in Timbuktu. But this doesn't tell us anything about either variable, because we know that in the real world, one did not cause the other.

A few good examples: Scatter Diagrams

Network Diagrams

Network diagrams are instrumental in projects that wish to plot how one person might feature in an interconnected web of relationships. Examples where this representation might be useful include plotting the relationship one gene's expression might have over other genes in the same area of a certain chromosome; Determining what words are most likely to occur in a reader's mind when subject to a certain kind of stimulus;

At first glance these diagrams can appear messy and convoluted, but they are the only types that will effectively capture in the interconnectedness of any given situation, provided that's what you want to highlight. Network Map

Annotating Graphs

We will discuss three major components of graph annotation: labeling, making notes, and effective correlations.

Labeling

Any type of graph has the sole aim of representing what is most valuable about your raw data, without actually writing each value down in a table, ad nauseam. That being said, what you label on your graph and how you choose to title it are both important components of the graph itself.

You can consider labeling the maximum and minimum values on your graph, if either are crucial to your experiment. You can also consider labeling specific intervals on the graph to indicate where a medium was replenish, where certain conditions were changed, etc. These sorts of labels should be concise and kept to a minimum of two to four words. Use a dialogue box type label, in which an arrow indicates the point of interest and the text box provides the information about that point.

Your graph's title should tell the reader in one phrase what this graph has tested, and what are the 'axes' involved. For example, in a project that constructed a fuel cell and tested its ability to generate hydrogen gas over time, one graph could possibly be titled "Measurement of H2 Yield over Time" or "Fuel Cell Performance at 20, 30 and 40C." Both these titles immediately give the reader information as to what was measured, and in what conditions, or against what setting (Time, Temperature, Location, etc.).

Make sure that your labels and your title only add to the importance and informative nature of your graph.

Making Notes

It is customary to include a few sentences as a caption to all your graphs, providing a brief analysis of the data above and explaining their significance. This can serve as a replacement or an addition to labeling, but usually both complement each other nicely. These few sentences shouldn't conclude anything about the result of the given experiment, but rather should explain what is going, and what might have caused certain anomalies. You may be wondering how this is different from a conclusion, and to answer that concern, consider this example:

Imagine a project wishes to examine how long a microbial fuel cell can last without replenishing any nutrients for the bacteria present. A graph might show steady output of current and a few hours later, the curve will taper off towards zero. In the notes below the graph, one might write that nutrients were not replenished in this experiment, explaining the reasons for the curve dropping off. However, making a conclusion that nutrients need to be replenished every few hours to keep the current output steady is not to be done in these notes.

Notes can also include any specific information about the nature of the experiment, how many times it was repeated or specific concentrations used that were otherwise unspecified in the title or the labels.

Correlations

In addition to what we have already told you about correlations, here are some more tips on representing correlations properly on your graph.

In regression curves, trend lines can be a very quick means of assessing the quality of a relationship between two parameters. The R-squared value approaches 1 as the relation becomes more and more evident. You can fit data to linear, exponential, logarithmic or power based regression best-fit trends, depending on your own inferences as to how the two parameters should be related.

On Excel or JMP an equation for the correlation might be calculated empirically by the program, and given along with the R-squared value. The equation might not be necessary unless you plan to extrapolate data that you didn't or couldn't measure. Usually the R-squared value is a label you should include in any correlation graph.


This article was written by:

Aaron Hakim and Jean-Philippe Demers

Personal tools