How to… Various rules for how (not) to behave Kathy Yelick Derived from : How to Give a Bad talk: The Ten Commandments by David A. Patterson (slides by Rolf Riedi) Twelve Ways to Fool the Masses: Scientific Malpractice in High-Performance Computing by David Bailey 1. Thou shalt not waste space Poster board is expensive. Your ideas are priceless. My Space-Efficient Poster: Make sure to cover all white space -- no borders, or other separating between topics. Minimize line spacing. Use the minimum font legible visible from 1 foot away. 1. Thou shalt not waste space Poster board is expensive. Your ideas are priceless. My Space-Efficient Poster: • Make sure to cover all white space: • no borders, or other separating between topics. • Minimize line spacing. • Use the minimum font legible visible from 1 foot away. 2. Thou shalt not be neat Why vaste research time on prepare poster? Ignore spell’g, grammer and legibilite. 2. Thou shalt not be neat Why waste research time on preparing slides? Ignore spelling, grammar and legibility. Who cares what 30 people think? 3. Thou shalt not covet brevity Do you want to promote the stereotype that computer scientists can't write? Always use complete sentences, never just key words. If possible, use whole paragraphs to make sure your visitors will have to stand by your poster for a long time just to read the text. 3. Thou shalt not covet brevity Use key words. Don’t plan to read your poster. 4. Omit needless background Assume No you will always be present need for the poster to tell it’s own story Don’t both to label graphs – you memory is fine Use inside lingo (e.g., Bassi, PSI, my laptop) 4. Omit needless background Write poster to be reused without you Critical information should be there ◊ Label all axes on graphs (“Mflop/sec” not “speed”) ◊ Use globally terminology (e.g., “IBM Power5 with Federation switch” or “Pentium 4 with Gigabit Ethernet”) 5. Covet Content over Structure Just get the facts on the poster, don’t worry about placement Humans can be trained to read right-to-left, bottom-to-top, or any other order ◊ Experience with foreign languages proves this What we would do with more time Results on 8 processors Why this problem is Outline of our important planned solutions 6. Thou shalt not use color Flagrant use of color indicates uncareful research. It's also unfair to emphasize some words over others. Aside: Using color doesn’t mea a fancy plotter 7. Thou shalt not illustrate Confucius ◊ ``A but says picture is a 1000 words,'' Dijkstra says ◊ ``Pictures Who ◊ ◊ are for weak minds.'‘ are you going to believe? Wisdom from the ages or the person who first counted goto's? 8. Let the Poster Speak for Itself Do not stand near your poster Do not think about what you’re going to say to visitors If you worked in a team, let your partner do all the talking 9. Reuse, Recycle, Reclaim Once the paper is written, you can just glue the pages to the poster board, right? 10. Do Not Plan Ahead Why waste research time thinking about the poster? ◊ ◊ ◊ ◊ It could take an hours out of your several weeks of project work. How can you appear spontaneous if you plan ahead? Don’t worry about presentation when you’re collecting results Don’t get any feedback on your results Commandment X is most important. Even if you break the other nine, this one can save you. Twelve Ways to Fool the Masses: Scientific Malpractice in HighPerformance Computing David H. Bailey Lawrence Berkeley National Laboratory http://crd.lbl.gov/~dhbailey Lessons From History High standards of honesty and scientific rigor must be vigilantly enforced within a field. Rigorous peer review is essential. Scientific research must be based on solid empirical data and careful, objective analysis of that data. Scientists must be willing to provide all details of the experimental environment, so others can reproduce their results. A “politically correct” conclusion is no excuse for poor scholarship. Erudite-sounding technical terminology and fancy mathematical formulas are no substitutes for sound reasoning. Hype has no place in the scientific enterprise. “There is a real world; its properties are not social constructs; facts and evidence do matter.” – Sokal History of Parallel Computing 1976-1986: Initial research studies and demos. 1986-1990: First large-scale systems deployed. 1990-1994: Successes over-hyped; faults ignored. Shoddy measurement methods used. Questionable performance claims made. 1994-1998: Numerous firms fail; agencies cut funds. 1998-2002: Reassessment. 2002-2006: Recovering? Or slipping again into hype? Parallel System Performance Practices, circa 1990 Performance results on small-sized parallel systems were linearly scaled to full-sized systems. ◊ Example: 8,192-CPU results were linearly scaled to 65,536-CPU results, simply by multiplying by 8. ◊ Rationale: “We can’t afford a full-sized system.” ◊ Sometimes this was done without any clear disclosure in the paper or presentation. Parallel System Performance Practices, circa 1990 Highly tuned programs were compared with untuned implementations on other systems. ◊ In comparisons with vector systems, often little or no effort was made to tune the vector code. ◊ This was the case even for comparisons with SIMD parallel systems – here the SIMD code can be directly converted to efficient vector code. Parallel System Performance Practices, circa 1990 Inefficient algorithms were employed, requiring many more operations, in order to exhibit an artificially high Mflop/s rate. ◊ Some scientists employed explicit PDE schemes for applications where implicit schemes were known to be much better. ◊ One paper described doing a discrete Fourier transform by direct computation, rather than by using an FFT (8n2 operations rather than 5n log2n). Parallel System Performance Practices, circa 1990 Performance rates on 32-bit floating-point data on one system were compared with rates on 64-bit data on other systems. ◊ Using 32-bit data instead of 64-bit data effectively doubles data bandwidth, thus yielding artificially high performance rates. ◊ Some computations can be done safely with 32-bit floating-point arithmetic, but most cannot. ◊ Even 64-bit floating-point arithmetic is not enough for some scientific applications – 128-bit is required. Parallel System Performance Practices, circa 1990 In some cases, performance experiments reported in published results were not actually performed. ◊ Abstract of published paper: “The current Connection Machine implementation runs at 300-800 Mflop/s on a full [64K] CM-2, or at the speed of a single processor of a Cray-2 on 1/4 of a CM-2.” ◊ Buried in text: “This computation requires 568 iterations (taking 272 seconds) on a 16K Connection Machine.” I.e., the computation was not run on a full 64K CM-2. “In contrast, a Convex C210 requires 909 seconds to compute this example. Experience indicates that for a wide range of problems, a C210 is about 1/4 the speed of a single processor Cray-2, …” I.e., the computation was not run on a Cray-2 at all – it was run on a Convex system, and a very dubious conversion factor was used. Parallel System Performance Practices, circa 1990 Scientists were just as guilty as commercial vendors of questionable performance claims. ◊ ◊ ◊ The examples in my files were written by professional scientists and published in peerreviewed journals and conference proceedings. One example is from an award-winning paper. Scientists in some cases accepted free computer time or research funds from vendors, but did not disclose this fact in their papers. Scientists should be held to a higher standard than vendor marketing personnel. Performance Plot A Data for Plot A Total Objects 20 40 80 160 990 9600 Parallel system Run time 8:18 9:11 11:59 15:07 21:32 31:36 Vector system Run time 0:16 0:26 0:57 2:11 19:00 3:11:50* Notes: In last entry, the 3:11:50 figure is an estimate. The vector system code is “not optimized.” The vector system performance is better except for the last (estimated) entry. Performance Plot B Facts for Plot B 32-bit performance rates on a parallel system are compared with 64-bit performance on a vector system. Parallel system results are linearly extrapolated to a fullsized system from a small system (only 1/8 size). The vector version of code is “unvectorized.” The vector system “curves” are straight lines – i.e., they are linear extrapolations from a single data point. Summary: It appears that of all points on four curves in this plot, at most four points represent real timings. Twelve Ways to Fool the Masses 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Quote only 32-bit performance results, not 64-bit results. Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application. Quietly employ assembly code and other low-level language constructs. Scale up the problem size with the number of processors, but omit any mention of this fact. Quote performance results projected to a full system. Compare your results against scalar, unoptimized code on conventional systems. When direct run time comparisons are required, compare with an old code on an obsolete system. If Mflop/s rates must be quoted, base the operation count on the parallel implementation, not on the best sequential implementation. Quote performance in terms of processor utilization, parallel speedups or Mflop/s per dollar. Mutilate the algorithm used in the parallel implementation to match the architecture. Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment. If all else fails, show pretty pictures and animated videos, and don't talk about performance. Twelve Ways: Basic Principles Use well-understood, community-defined metrics. Base performance rates on operation counts derived from the best practical serial algorithms, not on schemes chosen just to exhibit artificially high Mflop/s rates on a particular system. Use comparable levels of tuning. Provide full details of experimental environment, so that performance results can be reproduced by others. Disclose any details that might affect an objective interpretation of the results. Honesty and reproducibility should characterize all work. Danger: We can fool ourselves, as well as others. New York Times, 22 Sept 1991 Excerpts from NYT Article “Rival supercomputer and work station manufacturers are prone to hype, choosing the performance figures that make their own systems look better.” “It’s not really to the point of widespread fraud, but if people aren’t somewhat more circumspect, it could give the field a bad name.” Fast Forward to 2007: Five New Ways to Fool the Masses Dozens of runs are made, but only the best performance figure is cited in the paper. Runs are made on part of an otherwise idle system, but this is not disclosed in the paper. Performance rates are cited for a run with only one CPU active per node. Special hardware, operating system or compiler settings are used that are not appropriate for real-world usage. “Scalability” is defined as a successful execution on a large number of CPUs, regardless of performance. Extra Slides Example from Physics: Measurements of Speed of Light Why the discrepancy between pre-1945 and post-1945 values? Probably due to biases and sloppy experimental methods. Example from Psychology: The “Blank Slate” The “blank slate” paradigm (1920-1990): The human mind at birth is a “blank slate.” Heredity and biology play no significant role in human personality – all behavioral traits are socially constructed. Current consensus, based on latest research: Humans at birth possess sophisticated facilities for language acquisition, pattern recognition and social life. Heredity, evolution and biology are major factors of personality development. How did these scientists get it so wrong? Sloppy experimental methodology and analysis. Pervasive biases and wishful thinking. Ref: Steven Pinker, The Blank Slate: The Modern Denial of Human Nature Example from Anthropology: The “Noble Savage” Anthropologists, beginning with Margaret Mead in the 1930s, taught that primitive societies (such as South Sea Islands) were idyllic: No violence, jealousy or warfare. Happy, uninhibited – no psychological problems or “hangups.” Beginning in the 1980s, a new breed of anthropologists began to reexamine these findings. They concluded: Most of these societies have murder rates several times higher than large U.S. cities. Death rates from inter-tribe warfare exceed that of Western societies by factors of 10 to 100. Complex, jealous taboos surround courtship and marriage, often justifying the killing of non-virgin brides or suspected adulterers. Why were the earlier studies so wrong? Answer: “Anthropological malpractice” – Pinker Postmodern Science Studies These scholars study the social and political factors involved in scientific discoveries. Some of these studies are interesting and useful, but others are highly questionable: Denials that science progresses towards fundamental truth. Claims that scientific theories are “socially constructed.” Politically charged rhetoric. Gratuitous use of erudite-sounding technical jargon. Significant misunderstandings of the mathematical and scientific topics being addressed. Application of arcane theories of math and physics into inappropriate arenas. Reluctance to submit scholarship to rigorous outside review. Ref: Fashionable Nonsense by Alan Sokal and Jean Bricmont The Sokal Hoax In 1996, Alan Sokal, a physicist at NYU, wrote a spoof of a postmodern science article, entitled “Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity”: Page after page of erudite-sounding nonsense. Numerous references to arcane scientific theories, including quantum mechanics, relativity, chaos theory, mathematical set theory, etc. Frequent, approving quotes from leading postmodern science scholars. Politically charged rhetoric. Deliberately written so that “any mathematician or physicist would realize that it was a spoof.” In spite of its these flaws, the article was accepted for publication in Social Text, a leading postmodern journal. It appeared in a special issue devoted to defending the science studies field against its detractors. Excerpts from Sokal’s Article Rather, [scientists] cling to the dogma … that there exists an external world, whose properties are independent of any individual human being and indeed of humanity as a whole; that these properties are encoded in “eternal” physical laws; and that human beings can obtain reliable, albeit imperfect and tentative, knowledge of these laws by hewing to the “objective” procedures and epistemological strictures prescribed by the (so-called) scientific method. [pg 217] Note: Sokal is deriding even the most basic notions of scientific reality and common sense. In this way the infinite-dimensional invariance group erodes the distinction between the observer and observed; the p of Euclid and the G of Newton, formerly thought to be constant and universal, are now perceived in their ineluctable historicity; and the putative observer becomes fatally de-centered, disconnected from any epistemic link to a space-time point that can no longer be defined by geometry alone. [pg 222] Note: In addition to gratuitous usage of technical jargon, Sokal is saying that p and G are not constants! Excerpts from Other (Serious) Articles in the Same Issue as Sokal’s Article Most theoretical physicists, for example, sincerely believe that however partial our collective knowledge may be, ... one day scientists shall find the necessary correlation between wave and particle; the unified field theory of matter and energy will transcend Heisenberg’s uncertainly principle. [Aronowitz, pg 181] Note: A “unified field theory” will not do away with wave-particle duality and Heisenberg’s uncertainty principle – these are inherent in quantum theory. [P]assionate partisans of wave and matrix mechanics explanations for the behavior of electrons were unable to reach agreement for decades. [Aronowitz, pg 195] Note: Even Aronowitz’s history is wrong – wave and matrix formulations of quantum mechanics were reconciled within weeks. Once it is acknowledged that the West does not have a monopoly on all the good scientific ideas in the world, or that reason, divorced from value, is not everywhere and always a productive human principle, then we should expect to see some self-modification of the universalist claims maintained on behalf of empirical rationality. Only then can we begin to talk about different ways of doing science, ways that downgrade methodology, experiment, and manufacturing in favor of local environments, cultural values, and principles of social justice. [Ross, pg 3-4] Note: Ross is advocating an extreme cultural relativism for science, discarding much of our rational, empirical methodology. 2005: A Sokal-Like Hoax in Computer Science In early 2005, some MIT graduate students submitted two papers to the 9th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI). : “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy” “The Influence of Probabilistic Methodologies on Networking” These papers were completely generated by means of a computer programs, with reasonable sentence structures, but otherwise simply a concatenation of computer science buzzwords, nonsensical charts and graphs, and nonexistent references. The first was accepted as a “non-reviewed” submission; the second was rejected, but without referee reports or other explanation. In neither case did either referees or the Program Committee note that these papers are utter gibberish. Abstracts of the Two Papers Abstract of Paper #1: Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice-over-IP and public-private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable. Abstract of Paper #2: In recent years, much research has been devoted to the exploration of von Neumann machines; however, few have deployed the study of simulated annealing. In fact, few security experts would disagree with the investigation of online algorithms . STEEVE, our new system for game-theoretic modalities, is the solution to all of these challenges. Recent Example #1 In 2003 a prominent computer vendor (which is also involved in the HPC world) submitted results on the SPEC benchmark: Used a special command to enable “memory read bypass,” which eliminates the need to wait for the snoop response required in a multiprocessor configuration. Used a special command to enable a maximum of eight hardware pre-fetch streams and disable software-based prefetching. Installed a special high-performance, single-threaded malloc library, geared for speed rather than memory efficiency. These settings are not appropriate for normal production usage, and thus the resulting performance figures are unrealistic. Recent Example #2 Recently a certain HPC vendor claimed, in a press release: Discovery of a “proof” of Amdahl’s law. “New” technology that is “provably optimal” by Amdahl’s law. Several people in the HPC community responded, some rather sharply, to these claims. The vendor has responded also. Lessons: Even if a firm or scientist has some good ideas, hype does not help their cause, and may endanger the community’s credibility. Peer-reviewed publications should accompany press announcements. “Extraordinary claims require extraordinary evidence.” – Carl Sagan Grid Computing Projects GRID GEON NSF Cyberinfrastructure SETI@Home Seti@home sustains 35 Tflop/s on 2M+ systems 1.7 x 1021 flops over 3 years Supernova Cosmology Infrastructure [Thanks to W. Johnston, LBNL] What the Grid Does Well Providing national or international access to important scientific datasets. Providing a uniform scheme for remote system access and user authentication. Providing a high-performance parallel platform for certain very loosely coupled computations. Providing a high-capability platform for large computations that can run on a single remote system, chosen at run time. Enabling new types of multi-disciplinary, multisystem, multi-dataset research. What the Grid Doesn’t Do So Well Scientific computations that require heavy interprocessor communication. ◊ Probably the majority of high-end scientific computations are of this nature. ◊ This doesn’t rule out such applications running remotely on a single system connected to the grid. Many classified or proprietary computations. ◊ Current grid security and privacy are not convincing for many of these users ◊ This doesn’t rule out “internal grids” -- some have been quite successful. The Role of Good Benchmarks in Combating Performance Abuse Well-designed, rigorous, scalable performance benchmark tests help bring order to the field. Well-thought-out and well-enforced “ground rules” are essential. A rational scheme must be provided for calculating performance rates. A well-defined test must be included to validate the correctness of the results. A repository of results must be maintained. Recent example: The HPCS benchmark suite. Lessons from History: Back to the Future High standards of honesty and scientific rigor must be vigilantly enforced within the HPC field. Rigorous peer review is essential. Performance claims must be based on solid benchmark data and open, objective analysis of that data. Well-constructed, community-defined benchmarks are essential to combat performance abuse. Researchers must be willing to provide all details of the experimental environment, so others can reproduce their results. A “politically correct” conclusion is no excuse for poor scholarship. Hype has no place in the scientific enterprise. Danger: We can fool ourselves, as well as others.