原文阅读| 提高图形质量的十条法则
请点击上方蓝字“学位论文写作”关注写作
原文引言译文:科学可视化被经典地定义为以图形方式显示科学数据的过程。然而,这个过程远非直接或自动的。有太多不同的方法来表示相同的数据:散点图、线性图、条形图和饼图,仅举几个例子。此外,相同的数据,使用相同类型的绘图,根据谁在看图形,可能会有很大的不同。科学可视化更准确的定义是人和数据之间的图形界面。在这篇短文中,我们不会假装解释关于这个图形界面的全部;对此读者应参见文献[1],[2]的介绍。相反,本文的目标是提供一套基本的规则来(帮助作者)改进图形设计,并解释一些常见的陷阱。
原文:Rougier NP, Droettboom M, Bourne PE (2014) Ten Simple Rules for Better Figures. PLoS Comput Biol 10(9): e1003833. https://doi.org/10.1371/journal.pcbi.1003833
指作者Rougier NP, Droettboom M, Bourne PE
排版说明:为了适应屏幕阅读,将原文每一段按句子分成几段(图注除外)
信息胜过美丽
Scientific visualization is classically defined as the process of graphically displaying scientific data.
However, this process is far from direct or automatic.
There are so many different ways to represent the same data: scatter plots, linear plots, bar plots, and pie charts, to name just a few.
Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure.
A more accurate definition for scientific visualization would be a graphical interface between people and data.
In this short article, we do not pretend to explain everything about this interface; rather, see [1], [2] for introductory work.
Instead we aim to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.Rule 1: Know Your Audience
了解你的读者
Figure 1. Know your audience.
This is a remake of a figure that was originally published in the New York Times (NYT) in 2007. This new figure was made with matplotlib using approximated data. The data is made of four series (men deaths/cases, women deaths/cases) that could have been displayed using classical double column (deaths/cases) bar plots. However, the layout used here is better for the intended audience. It exploits the fact that the number of new cases is always greater than the corresponding number of deaths to mix the two values. It also takes advantage of the reading direction (English [left-to-right] for NYT) in order to ease comparison between men and women while the central labels give an immediate access to the main message of the figure (cancer). This is a self-contained figure that delivers a clear message on cancer deaths. However, it is not precise. The chosen layout makes it actually difficult to estimate the number of kidney cancer deaths because of its bottom position and the location of the labelled ticks at the top. While this is acceptable for a general-audience publication, it would not be acceptable in a scientific publication if actual numerical values were not given elsewhere in the article.
Rule 2: Identify Your Message
识别你的信息
Figure 2. Identify your message.
The superior colliculus (SC) is a brainstem structure at the crossroads of multiple functional pathways. Several neurophysiological studies suggest that the population of active neurons in the SC encodes the location of a visual target that induces saccadic eye movement. The projection from the retina surface (on the left) to the collicular surface (on the right) is based on a standard and quantitative model in which a logarithmic mapping function ensures the projection from retinal coordinates to collicular coordinates. This logarithmic mapping plays a major role in saccade decision. To better illustrate this role, an artificial checkerboard pattern has been used, even though such a pattern is not used during experiments. This checkerboard pattern clearly demonstrates the extreme magnification of the foveal region, which is the main message of the figure.
Rule 3: Adapt the Figure to the Support Medium
将图形调整至适应媒介
Figure 3. Adapt the figure to the support medium.
These two figures represent the same simulation of the trajectories of a dual-particle system 数学公式省略) where each particle interacts with the other. Depending on the initial conditions, the system may end up in three different states. The left figure has been prepared for a journal article where the reader is free to look at every detail. The red color has been used consistently to indicate both initial conditions (red dots in the zoomed panel) and trajectories (red lines). Line transparency has been increased in order to highlight regions where trajectories overlap (high color density). The right figure has been prepared for an oral presentation. Many details have been removed (reduced number of trajectories, no overlapping trajectories, reduced number of ticks, bigger axis and tick labels, no title, thicker lines) because the time-limited display of this figure would not allow for the audience to scrutinize every detail. Furthermore, since the figure will be described during the oral presentation, some parts have been modified to make them easier to reference (e.g., the yellow box, the red dashed line).
Rule 4: Captions Are Not Optional
必须使用图注
Rule 5: Do Not Trust the Defaults
不要相信缺省设置
Figure 4. Do not trust the defaults.
The left panel shows the sine and cosine functions as rendered by matplotlib using default settings. While this figure is clear enough, it can be visually improved by tweaking the various available settings, as shown on the right panel.
Rule 6: Use Color Effectively
有效利用颜色
Sequential: one variation of a unique color, used for quantitative data varying from low to high.
Diverging: variation from one color to another, used to highlight deviation from a median value.
Qualitative: rapid variation of colors, used mainly for discrete or categorical data.
Lastly, avoid using too many similar colors since color blindness may make it difficult to discern some color differences (see [6] for detailed explanation).
Figure 5. Use color effectively.
This figure represents the same signal, whose frequency increases to the right and intensity increases towards the bottom, using three different colormaps. The rainbow colormap (qualitative) and the seismic colormap (diverging) are equally bad for such a signal because they tend to hide details in the high frequency domain (bottom-right part). Using a sequential colormap such as the purple one, it is easier to see details in the high frequency domain. Adapted from [5].
Rule 7: Do Not Mislead the Reader
不要误导读者
Lastly, do not hesitate to ask colleagues about their interpretation of your figures.
Figure 6. Do not mislead the reader.
On the left part of the figure, we represented a series of four values: 30, 20, 15, 10. On the upper left part, we used the disc area to represent the value, while in the bottom part we used the disc radius. Results are visually very different. In the latter case (red circles), the last value (10) appears very small compared to the first one (30), while the ratio between the two values is only 3∶1. This situation is actually very frequent in the literature because the command (or interface) used to produce circles or scatter plots (with varying point sizes) offers to use the radius as default to specify the disc size. It thus appears logical to use the value for the radius, but this is misleading. On the right part of the figure, we display a series of ten values using the full range for values on the top part (y axis goes from 0 to 100) or a partial range in the bottom part (y axis goes from 80 to 100), and we explicitly did not label the y-axis to enhance the confusion. The visual perception of the two series is totally different. In the top part (black series), we tend to interpret the values as very similar, while in the bottom part, we tend to believe there are significant differences. Even if we had used labels to indicate the actual range, the effect would persist because the bars are the most salient information on these figures.
Rule 8: Avoid “Chartjunk”
避免图表中有无用元素
Figure 7. Avoid chartjunk.
We have seven series of samples that are equally important, and we would like to show them all in order to visually compare them (exact signal values are supposed to be given elsewhere). The left figure demonstrates what is certainly one of the worst possible designs. All the curves cover each other and the different colors (that have been badly and automatically chosen by the software) do not help to distinguish them. The legend box overlaps part of the graphic, making it impossible to check if there is any interesting information in this area. There are far too many ticks: x labels overlap each other, making them unreadable, and the three-digit precision does not seem to carry any significant information. Finally, the grid does not help because (among other criticisms) it is not aligned with the signal, which can be considered discrete given the small number of sample points. The right figure adopts a radically different layout while using the same area on the sheet of paper. Series have been split into seven plots, each of them showing one series, while other series are drawn very lightly behind the main one. Series labels have been put on the left of each plot, avoiding the use of colors and a legend box. The number of x ticks has been reduced to three, and a thin line indicates these three values for all plots. Finally, y ticks have been completely removed and the height of the gray background boxes indicate the [−1,+1] range (this should also be indicated in the figure caption if it were to be used in an article).
Rule 9: Message Trumps Beauty
信息胜过美丽
Remember, in science, message and readability of the figure is the most important aspect while beauty is only an option, as dramatically shown in Figure 8.
Figure 8. Message trumps beauty.
This figure is an extreme case where the message is particularly clear even if the aesthetic of the figure is questionable. The uncanny valley is a well-known hypothesis in the field of robotics that correlates our comfort level with the human-likeness of a robot. To express this hypothetical nature, hypothetical data were used (数学公式省略) and the figure was given a sketched look (xkcd filter on matplotlib) associated with a cartoonish font that enhances the overall effect. Tick labels were also removed since only the overall shape of the curve matters. Using a sketch style conveys to the viewer that the data is approximate, and that it is the higher-level concepts rather than low-level details that are important [10].
Rule 10: Get the Right Tool
使用正确工具
Matplotlib is a python plotting library, primarily for 2-D plotting, but with some 3-D support, which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. It comes with a huge gallery of examples that cover virtually all scientific domains (http://matplotlib.org/gallery.html).
R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible.
Inkscape is a professional vector graphics editor. It allows you to design complex figures and can be used, for example, to improve a script-generated figure or to read a PDF file in order to extract figures and transform them any way you like.
TikZ and PGF are TeX packages for creating graphics programmatically. TikZ is built on top of PGF and allows you to create sophisticated graphics in a rather intuitive and easy manner, as shown by the Tikz gallery (http://www.texample.net/tikz/examples/all/).
GIMP is the GNU Image Manipulation Program. It is an application for such tasks as photo retouching, image composition, and image authoring. If you need to quickly retouch an image or add some legends or labels, GIMP is the perfect tool.
ImageMagick is a software suite to create, edit, compose, or convert bitmap images from the command line. It can be used to quickly convert an image into another format, and the huge script gallery (http://www.fmwconcepts.com/imagemagick/index.php) by Fred Weinhaus will provide virtually any effect you might want to achieve.
D3.js (or just D3 for Data-Driven Documents) is a JavaScript library that offers an easy way to create and control interactive data-based graphical forms which run in web browsers, as shown in the gallery at http://github.com/mbostock/d3/wiki/Gallery.
Cytoscape is a software platform for visualizing complex networks and integrating these with any type of attribute data. If your data or results are very complex, cytoscape may help you alleviate this complexity.
Circos was originally designed for visualizing genomic data but can create figures from data in any field. Circos is useful if you have data that describes relationships or multilayered annotations of one or more scales.
Notes
All the figures for this article were produced using matplotlib, and figure scripts are available from https://github.com/rougier/ten-rules.
References
1. Tufte EG (1983) The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press. 2. Doumont JL (2009) Trees, maps, and theorems. Brussels: Principiae. 3. Kosara R, Mackinlay J (2013) Storytelling: The next step for visualization. IEEE Comput 46: 44–50.
了解写作与学位基本知识
请长按下方二维码关注我们,或回到文章顶部后点击蓝色字体学位论文写作