Home About us Products Services Contact us Bookmark
:: wikimiki.org ::
Directed Acyclic Graph

Directed acyclic graph

In mathematics, a directed acyclic graph, also called a dag or DAG, is a directed graph with no directed cycles; that is, for any vertex v, there is no directed path starting and ending on v. DAGs appear in models where it doesn't make sense for a vertex to have a path to itself; for example, if an edge uv indicates that v is a part of u, such a path would indicate that u is a part of itself, which is impossible.

Terminology

A source is a vertex with no incoming edges, while a sink is a vertex with no outgoing edges. A finite dag has at least one source and at least one sink. The length of a finite dag is the length (number of edges) of a longest directed path.

Properties

Every directed acyclic graph has a topological sort, an ordering of the vertices such that each vertex comes before all vertices it has edges to. In general, this ordering is not unique. DAGs can be considered to be a generalization of trees in which certain subtrees can be shared by different parts of the tree. In a tree with many identical subtrees, this can lead to a drastic decrease in space requirements to store the structure. Conversely, a DAG can be expanded to a forest of rooted trees using this simple algorithm:
- While there is a vertex v with in-degree n > 1,
  - Make n copies of v, each with the same outgoing edges but no incoming edges.
  - Attach one of the incoming edges of v to each vertex.
  - Delete v. If we explore the graph without modifying it or comparing nodes for equality, this forest will appear identical to the original DAG. Some algorithms become simpler when used on DAGs instead of general graphs. For example, search algorithms like depth-first search without iterative deepening normally must mark vertices they have already visited and not visit them again. If they fail to do this, they may never terminate because they follow a cycle of edges forever. Such cycles do not exist in DAGs.

Applications

Directed acyclic graphs have many important applications in computer science, including:
- The parse tree constructed by a compiler
- Bayesian networks
- A reference graph that can be garbage collected using simple reference counting
- The reference graph of a purely functional data structure (although some languages allow purely functional cyclic structures)
- Dependency graphs such as those used in instruction scheduling and makefiles
- Serializability Theory of Transaction Processing Systems
- Information categorisation systems, such as the Wikipedia Categories. Category:Graphs

Mathematics

Mathematics is often defined as the study of topics such as quantity, structure, space, and change. Another view, held by many mathematicians, is that mathematics is the body of knowledge justified by deductive reasoning, starting from axioms and definitions. Practical mathematics, in nearly every society, is used for such purposes as accounting, measuring land, or predicting astronomical events. Mathematical discovery or research often involves discovering and cataloging patterns, without regard for application. The remarkable fact that the "purest" mathematics often turns out to have practical applications is what Eugene Wigner has called "the unreasonable effectiveness of mathematics." Today, the natural sciences, engineering, economics, and medicine depend heavily on new mathematical discoveries. The word "mathematics" comes from the Greek μάθημα (máthema) meaning "science, knowledge, or learning" and μαθηματικός (mathematikós) meaning "fond of learning". It is often abbreviated maths in Commonwealth English and math in North American English.

History

:Main article: History of mathematics The evolution of mathematics might be seen to be an ever-increasing series of abstractions, or alternatively an expansion of subject matter. The first abstraction was probably that of numbers. The realization that two apples and two oranges do have something in common, namely that they fill the hands of exactly one person, was a breakthrough in human thought. In addition to recognizing how to count concrete objects, prehistoric peoples also recognized how to count abstract quantities, like time -- days, seasons, years. Arithmetic (e.g. addition, subtraction, multiplication and division), naturally followed. Monolithic monuments testify to a knowledge of geometry. Further steps need writing or some other system for recording numbers such as tallies or the knotted strings called khipu used by the Inca empire to store numerical data. Numeral systems have been many and diverse. Historically, the major disciplines within mathematics arose, from the start of recorded history, out of the need to do calculations on taxation and commerce, to understand the relationships among numbers, to measure land, and to predict astronomical events. These needs can be roughly related to the broad subdivision of mathematics, into the studies of quantity, structure, space, and change. Mathematics since has been much extended, and there has been a fruitful interaction between mathematics and science, to the benefit of both. Mathematical discoveries have been made throughout history and continue to be made today.

Inspiration, pure and applied mathematics, and aesthetics

Mathematics arises wherever there are difficult problems that involve quantity, structure, space, or change. At first these were found in commerce, land measurement and later astronomy; nowadays, all sciences suggest problems studied by mathematicians, and many problems arise within mathematics itself. Newton invented infinitesimal calculus and Feynman his Feynman path integral using a combination of reasoning and physical insight, and today's string theory also inspires new mathematics. Some mathematics is only relevant in the area that inspired it, and is applied to solve further problems in that area. But often mathematics inspired by one area proves useful in many areas, and joins the general stock of mathematical concepts. As in most areas of study, the explosion of knowledge in the scientific age has led to specialization in mathematics. One major distinction is between pure mathematics and applied mathematics. Within applied mathematics, two major areas have split off and become disciplines in their own right, statistics and computer science. Many mathematicians talk about the elegance of mathematics, its intrinsic aesthetics and inner beauty. Simplicity and generality are valued. There is beauty also in a clever proof, such as Euclid's proof that there are infinitely many prime numbers, and in a numerical method that speeds calculation, such as the fast Fourier transform. G. H. Hardy in "A Mathematicians Apology" expressed the belief that these esthetic considerations are, in themselves, sufficient to justify the study of pure mathematics. Main article: Mathematical beauty.

Notation, language, and rigor

Mathematical writing is not easily accessible to the layperson. A Brief History of Time, Stephen Hawking's 1988 bestseller, contained a single mathematical equation. This was the author's compromise with the publisher's advice, that each equation would halve the sales. The reasons for the inaccessibility even of carefully-expressed mathematics can be partially explained. Contemporary mathematicians strive to be as clear as possible in the things they say and especially in the things they write (this they have in common with lawyers). They refer to rigor. To accomplish rigor, mathematicians have extended natural language. There is precisely-defined vocabulary for referring to mathematical objects, and stating certain common relations. There is an accompanying mathematical notation which, like musical notation, has a definite content and also has a strict grammar (under the influence of computer science, more often now called syntax). Some of the terms used in mathematics are also common outside mathematics, such as ring, group and category; but are not such that one can infer the meanings. Some are specific to mathematics, such as homotopy and Hilbert space. It was said that Henri Poincaré was only elected to the Académie Française so that he could tell them how to define automorphe in their dictionary. Rigor is fundamentally a matter of mathematical proof. Mathematicians want their theorems to follow mechanically from axioms by means of formal axiomatic reasoning. This is to avoid mistaken 'theorems', based on fallible intuitions; of which plenty of examples have occurred in the history of the subject (for example, in mathematical analysis). Axioms in traditional thought were 'self-evident truths', but that conception turns out not to be workable in pushing the mathematical boundaries. At a formal level, an axiom is just a string of symbols, which has an intrinsic meaning only in the context of all derivable formulas of an axiomatic system. It was the goal of Hilbert's program to put all of mathematics on a firm axiomatic basis, but according to Gödel's incompleteness theorem every (strong enough) axiom system has undecidable formulas; and so a final axiomatization of mathematics is unavailable. Nonetheless mathematics is often imagined to be (as far as its formal content) nothing but set theory in some axiomatization, in the sense that every mathematical statement or proof could be cast into formulas within set theory.

Is mathematics a science?

Carl Friedrich Gauss referred to mathematics as the Queen of the Sciences. The mathematician-physicist Leon M. Lederman has quipped: "The physicists defer only to mathematicians, and the mathematicians defer only to God (though you may be hard pressed to find a mathematician that modest)." If one considers science to be strictly about the physical world, then mathematics, or at least pure mathematics, is not a science. An alternative view is that certain scientific fields (such as theoretical physics) are mathematics with axioms that are intended to correspond to reality. In fact, the theoretical physicist, J. M. Ziman, proposed that science is public knowledge and thus includes mathematics. [http://info.med.yale.edu/therarad/summers/ziman.htm] In any case, mathematics shares much in common with many fields in the physical sciences, notably the exploration of the logical consequences of assumptions. Intuition and experimentation also play a role in the formulation of conjectures in both mathematics and the (other) sciences.

Overview of fields of mathematics

As noted above, the major disciplines within mathematics first arose out of the need to do calculations in commerce, to understand the relationships between numbers, to measure land, and to predict astronomical events. These four needs can be roughly related to the broad subdivision of mathematics into the study of quantity, structure, space, and change (i.e. arithmetic, algebra, geometry and analysis). In addition to these main concerns, there are also subdivisions dedicated to exploring links from the heart of mathematics to other fields: to logic, to set theory (foundations) and to the empirical mathematics of the various sciences (applied mathematics). The study of quantity starts with numbers, first the familiar natural numbers and integers and their arithmetical operations, which are characterized in arithmetic. The deeper properties of whole numbers are studied in number theory. The study of structure began with investigations of Pythagorean triples. Neolithic monuments on the British Isles are constructed using Pythagorean triples. Eventually, this led to the invention of more abstract numbers, such as the square root of two. The deeper structural properties of numbers are studied in abstract algebra and the investigation of groups, rings, fields and other abstract number systems. Included is the important concept of vectors, generalized to vector spaces and studied in linear algebra. The study of vectors combines three of the fundamental areas of mathematics, quantity, structure, and space. The study of space originates with geometry, beginning with Euclidean geometry. Trigonometry combines space and number. The modern study of space generalizes these ideas to include higher-dimensional geometry, non-Euclidean geometries (which play a central role in general relativity) and topology. Quantity and space both play a role in analytic geometry, differential geometry, and algebraic geometry. Within differential geometry are the concepts of fiber bundles, calculus on manifolds. Within algebraic geometry is the description of geometric objects as solution sets of polynomal equations, combining the concepts of quantity and space, and also the study of topological groups, which combine structure and space. Lie groups are used to study space, structure, and change. Topology in all its many ramifications may be the greatest growth area in 20th century mathematics. Understanding and describing change is a common theme in the natural sciences, and calculus was developed as a most useful tool. The central concept used to describe a changing quantity is that of a function. Many problems lead quite naturally to relations between a quantity and its rate of change, and the methods of differential equations. The numbers used to represent continuous quantities are the real numbers, and the detailed study of their properties and the properties of real-valued functions is known as real analysis. These have been generalized, with the inclusion of the square root of negative one, to the complex numbers, which are studied in complex analysis. Functional analysis focuses attention on (typically infinite-dimensional) spaces of functions. One of many applications of functional analysis is quantum mechanics. Many phenomena in nature can be described by dynamical systems; chaos theory makes precise the ways in which many of these systems exhibit unpredictable yet still deterministic behavior. Beyond quantity, structure, space, and change are areas of pure mathematics that can be approached only by deductive reasoning. In order to clarify the foundations of mathematics, the fields of mathematical logic and set theory were developed. Mathematical logic, which divides into recursion theory, model theory, and proof theory, is now closely linked to computer science. When electronic computers were first conceived, several essential theoretical concepts in computer science were shaped by mathematicians, leading to the fields of computability theory, computational complexity theory, and information theory. Many of those topics are now investigated in theoretical computer science. Discrete mathematics is the common name for the fields of mathematics most generally useful in computer science. An important field in applied mathematics is statistics, which uses probability theory as a tool and allows the description, analysis, and prediction of phenomena where chance plays a part. It is used in all the sciences. Numerical analysis investigates methods for using computers to efficiently solve a broad range of mathematical problems that are typically beyond human capacity, and taking rounding errors or other sources of error into account to obtain credible answers.

Major themes in mathematics

An alphabetical and subclassified list of mathematical topics is available. The following list of themes and links gives just one possible view. For a fuller treatment, see Areas of mathematics or the list of lists of mathematical topics.

Quantity

This starts from explicit measurements of sizes of numbers or sets, or ways to find such measurements. : :NumberNatural numberIntegers – Rational numbers – Real numbers – Complex numbers – Hypercomplex numbers – Quaternions – Octonions – Sedenions – Hyperreal numbers – Surreal numbers – Ordinal numbers – Cardinal numbers – p-adic numbers – Integer sequences – Mathematical constants – Number namesInfinityBase

Structure

:Pinning down ideas of size, symmetry, and mathematical structure. : :Abstract algebraNumber theoryAlgebraic geometryGroup theoryMonoids – AnalysisTopologyLinear algebraGraph theoryUniversal algebraCategory theoryOrder theoryMeasure theory

Space

:A more visual approach to mathematics. : :TopologyGeometryTrigonometryAlgebraic geometryDifferential geometryDifferential topologyAlgebraic topologyLinear algebraFractal geometry

Change

:Ways to express and handle change in mathematical functions, and changes between numbers. : :ArithmeticCalculusVector calculusAnalysisDifferential equations – Dynamical systems – Chaos theoryList of functions

Foundations and methods

:Approaches to understanding the nature of mathematics. :philosophy of mathematicsmathematical intuitionismmathematical constructivismfoundations of mathematicsset theorysymbolic logicmodel theorycategory theoryLogicreverse mathematicstable of mathematical symbols

Discrete mathematics

:Discrete mathematics involves techniques that apply to objects that can only take on specific, separated values. : :CombinatoricsNaive set theoryTheory of computationCryptographyGraph theory

Applied mathematics

:Applied mathematics uses the full knowledge of mathematics to solve real-world problems. :Mathematical physicsMechanicsFluid mechanicsNumerical analysisOptimizationProbabilityStatisticsMathematical economicsFinancial mathematicsGame theoryMathematical biologyCryptographyInformation theory

Important theorems

:These theorems have interested mathematicians and non-mathematicians alike. :See list of theorems for more :Pythagorean theoremFermat's last theoremGödel's incompleteness theorems – Fundamental theorem of arithmeticFundamental theorem of algebraFundamental theorem of calculusCantor's diagonal argumentFour color theoremZorn's lemmaEuler's identityclassification theorems of surfacesGauss-Bonnet theoremQuadratic reciprocityRiemann-Roch theorem.

Important conjectures

See list of conjectures for more :These are some of the major unsolved problems in mathematics. :Goldbach's conjectureTwin Prime ConjectureRiemann hypothesisPoincaré conjectureCollatz conjectureP=NP? – open Hilbert problems.

History and the world of mathematicians

See also list of mathematics history topics :History of mathematicsTimeline of mathematicsMathematiciansFields medalAbel PrizeMillennium Prize Problems (Clay Math Prize)International Mathematical UnionMathematics competitionsLateral thinkingMathematical abilities and gender issues

Mathematics and other fields

:Mathematics and architectureMathematics and educationMathematics of musical scales

Common misconceptions

Mathematics is not a closed intellectual system, in which everything has already been worked out. There is no shortage of open problems. Pseudomathematics is a form of mathematics-like activity undertaken outside academia, and occasionally by mathematicians themselves. It often consists of determined attacks on famous questions, consisting of proof-attempts made in an isolated way (that is, long papers not supported by previously published theory). The relationship to generally-accepted mathematics is similar to that between pseudoscience and real science. The misconceptions involved are normally based on:
- misunderstanding of the implications of mathematical rigour;
- attempts to circumvent the usual criteria for publication of mathematical papers in a learned journal after peer review, with assumptions of bias;
- lack of familiarity with, and therefore underestimation of, the existing literature. The case of Kurt Heegner's work shows that the mathematical establishment is neither infallible, nor unwilling to admit error in assessing 'amateur' work. And like astronomy, mathematics owes much to amateur contributors such as Fermat and Mersenne. Mathematics is not accountancy. Although arithmetic computation is crucial to accountants, their main concern is to verify that computations are correct through a system of doublechecks. Advances in abstract mathematics are mostly irrelevant to the efficiency of concrete bookkeeping, but the use of computers clearly does matter. Mathematics is not numerology. Numerology uses modular arithmetic to reduce names and dates down to numbers, but assigns emotions or traits to these numbers intuitively or on the basis of traditions. Mathematical concepts and theorems need not correspond to anything in the physical world. In the case of geometry, for example, it is not relevant to mathematics to know whether points and lines exist in any physical sense, as geometry starts from axioms and postulates about abstract entities called "points" and "lines" that we feed into the system. While these axioms are derived from our perceptions and experience, they are not dependent on them. And yet, mathematics is extremely useful for solving real-world problems. It is this fact that led Eugene Wigner to write an essay on The Unreasonable Effectiveness of Mathematics in the Natural Sciences. Mathematics is not about unrestricted theorem proving, any more than literature is about the construction of grammatically correct sentences. However, theorems are elements of formal theories, and in some cases computers can generate proofs of these theorems more or less automatically, by means of automated theorem provers. These techniques have proven useful in formal verification of programs and hardware designs. However, they are unlikely to generate (in the near term, at least) mathematics with any widely recognized aesthetic value.

See also


- Mathematical game
- Mathematical problem
- Mathematical puzzle
- Puzzle

Bibliography


- Benson, Donald C., The Moment Of Proof: Mathematical Epiphanies (1999).
- Courant, R. and H. Robbins, What Is Mathematics? (1941);
- Davis, Philip J. and Hersh, Reuben, The Mathematical Experience. Birkhäuser, Boston, Mass., 1980. A gentle introduction to the world of mathematics.
- Boyer, Carl B., History of Mathematics, Wiley, 2nd edition 1998 available, 1st edition 1968 . A concise history of mathematics from the Concept of Number to contemporary Mathematics.
- Gullberg, Jan, Mathematics--From the Birth of Numbers. W.W. Norton, 1996. An encyclopedic overview of mathematics presented in clear, simple language.
- Hazewinkel, Michiel (ed.), Encyclopaedia of Mathematics. Kluwer Academic Publishers 2000. A translated and expanded version of a Soviet math encyclopedia, in ten (expensive) volumes, the most complete and authoritative work available. Also in paperback and on CD-ROM.
- Kline, M., Mathematical Thought from Ancient to Modern Times (1973).
- Pappas, Theoni, The Joy Of Mathematics (1989).

External links


- [http://www.cut-the-knot.org/ Interactive Mathematics Miscellany and Puzzles] — A collection of articles on various math topics, with interactive Java illustrations at cut-the-knot
- Rusin, Dave: [http://www.math-atlas.org/ The Mathematical Atlas]. A guided tour through the various branches of modern mathematics.
- Stefanov, Alexandre: [http://us.geocities.com/alex_stef/mylist.html Textbooks in Mathematics]. A list of free online textbooks and lecture notes in mathematics.
- Weisstein, Eric et al.: [http://www.mathworld.com/ MathWorld: World of Mathematics]. An online encyclopedia of mathematics.
- Polyanin, Andrei: [http://eqworld.ipmnet.ru/ EqWorld: The World of Mathematical Equations]. An online resource focusing on algebraic, ordinary differential, partial differential (mathematical physics), integral, and other mathematical equations.
- A mathematical thesaurus maintained by the [http://nrich.maths.org/ NRICH] project at the University of Cambridge (UK), [http://thesaurus.maths.org/ Connecting Mathematics]
- [http://planetmath.org/ Planet Math]. An online math encyclopedia under construction, focusing on modern mathematics. Uses the GFDL, allowing article exchange with Wikipedia. Uses TeX markup.
- [http://www.mathforge.net/ Mathforge]. A news-blog with topics ranging from popular mathematics to popular physics to computer science and education.
- [http://www.youngmath.net/concerns Young Mathematicians Network (YMN)]. A math-blog "Serving the Community of Young Mathematicians". Topics include: Math News, Grad and Undergrad Life, Job Search, Career, Work & Family, Teaching, Research, Misc...
- [http://metamath.org/ Metamath]. A site and a language, that formalize math from its foundations.
- [http://world.std.com/~reinhold/dir/mathmovies.html Math in the Movies]. A guide to major motion pictures with scenes of real mathematics
- [http://math.cofc.edu/faculty/kasman/MATHFICT/default.html Mathematics in fiction]. Links to works of fiction that refer to mathematics or mathematicians.
- [http://www.mathhelpforum.com/math-help Math Help Forum]. A forum, for math help, math discussion and debate.
- [http://www.sosmath.com/CBB S.O.S. Mathematics Cyberboard] a math help forum which incorporates a LaTeX extension, making it easier for members to write and display math formulae.
- [http://www-history.mcs.st-and.ac.uk/~history/ Mathematician Bibliography]. Extensive history and quotes from all famous mathematicians.
- [http://www.physicsmathforums.com/ Physics Math Forums]
-
Category:School subjects fiu-vro:Matõmaatiga zh-min-nan:Sò·-ha̍k ko:수학 ms:Matematik ja:数学 simple:Mathematics th:คณิตศาสตร์

Directed cycle

In graph theory, a cycle graph is a graph that consists of a number of cycles, or in other words, vertices connected in closed chains. A graph consisting of a single cycle is a simple cycle graph, or cyclic graph. Some authors use "cycle graph" as a synonym for "cyclic graph", leading to some confusion of terminology.

Cycle graphs in group theory

See Cycle graph (group) for the application of cycle graphs to the illustration of the structure of small finite groups.

References


- Eric W. Weisstein, [http://mathworld.wolfram.com/CycleGraph.html Cycle Graph] at MathWorld. Category: Graph theory

Topological sort

In graph theory, a topological sort of a directed acyclic graph (DAG) is a linear ordering of its nodes which is compatible with the partial order R induced on the nodes where x comes before y (xRy) if there's a directed path from x to y in the DAG. An equivalent definition is that each node comes before all nodes to which it has edges. Every DAG has at least one topological sort, and may have many.

Examples

The canonical application of topological sorting is in scheduling a sequence of jobs. The jobs are represented by vertices, and there is an edge from x to y if job x must be completed before job y can be done. Then, a topological sort gives an order in which to perform the jobs. This has applications in computer science, such as in instruction scheduling, ordering of formula cell evaluation in spreadsheets, dependencies in makefiles, and symbol dependencies in linkers.

Algorithms

The usual algorithm for topological sorting has running time linear in the number of nodes plus the number of edges (Θ(|V|+|E|)). First, find a list of "start nodes" which have no incoming edges and insert them into a queue Q. Then, while graph is nonempty if Q is empty output error message remove a node n from Q output n for each node m with an edge e from n to m remove edge e from the graph if m has no other incoming edges insert m into Q If this algorithm terminates without outputting all the nodes of the graph, it means the graph has at least one cycle and therefore is not a DAG, so a topological sort is impossible; in this case, the algorithm may report an error. Note that, reflecting the non-uniqueness of the resulting sort, the structure Q need not be a queue. It may be a stack (corresponding to depth-first search) or simply a set.

External links


- [http://www.nist.gov/dads/HTML/topologcsort.html NIST Dictionary of Algorithms and Data Structure: topological sort]

References


- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0262032937. Section 22.4: Topological sort, pp.549–552. Category:Graph algorithms Category:Sort algorithms

Depth-first search

Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. Intuitively, one starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking. Formally, DFS is an uninformed search that progresses by expanding the first child node of the search tree that appears and thus going deeper and deeper until a goal state is found, or until it hits a node that has no children. Then the search backtracks and starts off on the next node. In a non-recursive implementation, all freshly expanded nodes are added to a LIFO queue (stack) for expansion. Space complexity of DFS is much lower than BFS (breadth-first search). It also lends itself much better to heuristic methods of choosing a likely-looking branch. Time complexity of both algorithms are proportional to the number of verticies plus the number of edges in the graphs they traverse. When searching large graphs that can not be fully contained in memory, DFS suffers from non-termination when the length of a path in the search tree is infinite. The simple solution of "remember which nodes I have already seen" doesn't work because there can be insufficient memory. This can be solved by maintaining an increasing limit on the depth of the tree, which is called iterative deepening depth-first search. For the following graph: Image:graph.traversal.example.png a depth-first search starting at A, assuming that the left edges in the shown graph are chosen before right edges, and assuming the search remembers previously-visited nodes and will not repeat them (since this is a small graph), will visit the nodes in the following order: A, B, D, F, E, C, G. Performing the same search without remembering previously visited nodes results in visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B, D, F, E cycle and never reaching C or G. Iterative deepening prevents this loop and will reach the following nodes on the following depths, assuming it proceeds left-to-right as above:
- 0: A
- 1: A (repeated), B, C, E (Note that iterative deepening has now seen C, when a conventional depth-first search did not.)
- 2: A, B, D, F, C, G, E, F (Note that it still sees C, but that it came later. Also note that it sees E via a different path, and loops back to F twice.)
- 3: A, B, D, F, E, C, G, E, F, B For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will simply get longer before the algorithm gives up and tries another branch.

Pseudocode

function DFS(Start, Goal) push(Stack,Start) while Stack is not empty var Node := Pop(Stack) if Node.color = Black continue //C style continue if Node = Goal return Node Color(Node, Black) for Child in Expand(Node) if Child.color = White push(Stack, Child) dfs(G) search(vertex v)

External links


- [http://www.boost.org/libs/graph/doc/depth_first_search.html C++ Boost Graph Library: Depth-First Search]
- [http://www.cs.duke.edu/csed/jawaa/DFSanim.html Depth-First Search Animation (for a directed graph)]

References


- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0262032937. Section 22.3: Depth-first search, pp.540–549. Category:Search algorithms Category:Trees (structure) ja:深さ優先探索

Computer science

Computer science, an academic discipline (abbreviated CS or compsci), is a body of knowledge generally about computer hardware, software, computation and its theory. The discipline itself includes, but is not limited to, the fundamentals of computer languages, operating systems and mathematical foundations of computer science. The study of these fundamentals may lead to a wide variety of topics, such as algorithms, formal grammars, programming languages, program design, artificial intelligence and computer engineering. There exist a number of technical definitions of computer science. The status of computer science as a science is often challenged, typically arguing that it is more like mathematics and that it does not follow the scientific method, however these facts are not unanimously accepted. In popular language, the term computer science is often confusingly used to denominate anything related to computers.

History of computer science

Evolutionary

Before the 1920s, computers were human clerks that performed calculations. They were usually under the lead of a physicist. Many thousands of computers were employed in commerce, government, and research establishments. Most of these computers were women, and they were known to have a degree in calculus. After the 1920s, the expression computing machine refered to any machine that performed the work of a human computer, especially those in accordance with effective methods of The Church-Turing Thesis. The thesis states that a mathematical method is effective if it could be set out as a list of instructions able to be followed by a human clerk with paper and pencil, for as long as necessary, and without ingenuity or insight. Machines that computed with discrete values became known as the analog kind. They used machinery that represented discrete numeric quantities, like the angle of a shaft rotation or difference in electrical potential. Digital machinery, in contrast to analog, were able to render a state of a numeric value and store each individual digit. Digital machinery used difference engines or relays before the invention of faster memory devices. The phrase computing machine gradually gave away, after the late 1940s, to just computer as the onset of electronic digital machinery became common. These computers were able to perform the calculations that were performed by the previous human clerks. Since the values stored by digital machines were not bound to physical properties like the analog device, a logical computer, based on digital equipment, was able to do anything that could be described "purely mechanical." Alan Turing, known as the Father of Computer Science, invented such a logical computer, also known as a Turing Machine, that evolved into the modern computer from the tasks performed by the previous human clerks. These new computers were also able to perform non-numeric computations, like music. Computability, by logical computers, began a science by being able to make evident which was not explicitly defined into ordinary sense more immediate.

Academic discipline

Computer science has roots in electrical engineering, logic, mathematics, and linguistics. In the last third of the 20th century computer science emerged as a distinct discipline and developed its own methods and terminology. Originally, CS was taught as part of mathematics or engineering departments, for instance at the University of Cambridge in England and at the Gdansk University of Technology in Poland, respectively. Cambridge claims to have the world's oldest taught qualification in computing. The first computer science department in the United States was founded at Purdue University in 1962, while the first college entirely devoted to computer science was founded at Northeastern University in 1982. Most universities today have specific departments devoted to computer science, while some conjoin it with engineering, with applied mathematics, or other disciplines. Most research in computer science has focused on von Neumann computers or Turing machines (computation models that perform one small, deterministic step at a time). These models resemble, at a basic level, most real computers in use today. Computer scientists also study other models of computation, which includes parallel machines and theoretical models such as probabilistic, oracle, and quantum computers.

Careers

Some of the potential careers for those who study computer science are listed below:

Demographics


- Nearly half of all computer programmers held a bachelor’s degree in 2002; about 1 in 5 held a graduate degree. [http://www.bls.gov/oco/ocos110.htm]
- Computer programmer employment is expected to grow much more slowly than that of other computer specialists.
- Education requirements range from a 2-year degree to a graduate degree. [http://www.bls.gov/oco/ocos042.htm]
- Employment, other than computer programmer, is expected to increase much faster than the average as organizations continue to adopt increasingly sophisticated technologies.

Sub-disciplines of computer science

Computer Science has a number of major sub-fields which can be classified by a number of means (for example the [http://www.acm.org/class/1998/overview.html ACM classification system]).

Algorithms

The study of algorithms is aimed at creating techniques that will enable a computer to perform a certain task in an efficient manner. An algorithm is a set of well-defined instructions for accomplishing some task, often explained by analogy with a culinary recipe. Algorithms are often implemented in software, and advancing the state of the art in algorithms is responsible for many of the most spectacular successes in computing. An algorithms specialist may come up with methods to accomplish new tasks, but just as often, they will work on improving the efficiency of an existing algorithm. These improvements can come in "time" (the length of time it takes for the algorithm to work) and "space" (the amount of computer memory the algorithm consumes. The field of algorithms is highly formal and many things can be proved about a given algorithm (using complexity theory), including roughly how long it will take to complete, as compared to the size of its input (the number of options it must consider). One interesting open question in algorithms concerns the complexity classes P and NP, and whether or not P = NP. If it does, then a whole range of seemingly difficult algorithms can in theory be performed quickly.

Data structures

A data structure is a way to store data so that it can be remembered and retrieved efficiently. A well-designed data structure means that an algorithm can be performed using as little memory space and time as possible. Some data structures are very simple: the simplest is an array which is simply a numbered list of items. Since the memory of a computer is usually modelled as an array (of bytes), this is also one of the most important data structures in computer science. For example, strings of text are usually modelled as arrays. But there are many other data structures such as linked lists, trees, hash tables and many others that are quite different but critical to the science. Type theory classifies data at a most basic (mathematical) level into different types, such as integers, complex numbers, strings, etc. and deals with how those types can interact. Abstract data types, at a more concrete level, deal with how types and data structures are used in software programming.

Listing of sub-disciplines

Computer science is closely related to a number of fields. These fields overlap considerably, though important differences exist:

Major fields of importance for computer science

See also


- Benchmark
- Computer jargon
- Computer numbering formats
- Computer slang
- Computing
- Data acquisition
- European Association for Theoretical Computer Science
- IEEE John von Neumann Medal
- Internet
- List of algorithms
- List of basic computer science topics
- List of computer science conferences
- List of computing topics
- List of data structures
- List of open problems in computer science
- List of publications in computer science
- List of prominent pioneers in computer science
- Multimedia
- Online computations and algorithms
- Sensor network
- Turing Award (ACM)

External links


- [http://www.dmoz.org/Computers/Computer_Science/ Open Directory Project: Computer Science]
- [http://www.techbooksforfree.com/science.shtml Downloadable Science and Computer Science books]
- [http://liinwww.ira.uka.de/bibliography/ Collection of Computer Science Bibliographies]
- [http://www.geocities.com/tablizer/science.htm Belief that title "science" in "computer science" is inappropriate] Category:Computer science ko:컴퓨터 과학 ja:情報工学 simple:Computer science th:วิทยาการคอมพิวเตอร์

Bayesian network

A Bayesian network or Bayesian belief network or just belief network is a directed graph of nodes representing variables and arcs representing dependence relations among the variables. If there is an arc from node A to another node B, then we say that A is a parent of B. If a node has a known value, it is said to be an evidence node. A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis. Nodes are not restricted to representing random variables; this is what is "Bayesian" about a Bayesian network. A Bayesian network is a representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be X(1), ..., X(n). Let parents(A) be the parents of the node A. Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions \Pr[X(i)\mid\operatorname(X(i))] for i = 1 to n. If X has no parents, its probability distribution is said to be unconditional, otherwise it is conditional. Questions about dependence among variables can be answered by studying the graph alone. It can be shown that the graphical notion called d-separation corresponds to the notion of conditional independence: if nodes X and Y are d-separated (given specified evidence nodes), then variables X and Y are independent given the evidence variables. The set of all other nodes that node X can directly depend on is given by X's Markov blanket. In order to fully specify the Bayesian network and to carry out numerical calculations, it is necessary to further specify for each node X the probability distribution for X conditional on its parents. The distribution of X given its parents may have any form. However, it is common to work with discrete or Gaussian distributions, since that simplifies calculations. Sometimes only constraints on a distribution are known; one commonly uses the principle of maximum entropy to specify the distribution with the greatest information entropy given these constraints. In this way the distribution makes the fewest additional assumptions beyond what is known to be true as expressed by the constraints. (Analogously, in the specific context of a dynamic Bayesian network, one commonly specifies the conditional distribution for the hidden state's temporal evolution to maximize the entropy rate of the implied stochastic process.) Often these conditional distributions depend on unknown parameters which must be estimated from data, ideally using maximum likelihood. However, the likelihood maximization is often intractable for many problems, leading to the use of iterative approximation techniques, the most widespread being the expectation-maximization algorithm. The goal of inference is typically to find the conditional distribution of a subset of the variables, conditional on known values for some other subset (the evidence), and integrating over any other variables. This conditional distribution is known as the posterior distribution of the subset of the variables given the evidence. The posterior gives a universal sufficient statistic for detection applications, when one wants to choose values for the variable subset which minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically constructing extensions of Bayes' theorem to more complex problems. The most common exact inference methods are variable elimination that eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product, clique tree propagation that caches the computation so that the many variables can be queried at one time, and new evidence can be propagated quickly, recursive conditioning that allows for a space-time tradeoff but still allowing for the efficiency of variable elimination when enough space is used - all of these methods have complexity that is exponential in tree width. The most common approximate inference algorithms are stochastic simulation, mini-bucket elimination (which generalizes loopy belief propagation) and variational methods. Bayesian networks are used for modelling knowledge in gene regulatory networks, medicine, engineering, text analysis, image processing, data fusion, and decision support systems. Learning the structure of a Bayesian network is a very important part of machine learning. Given the information that the data is being generated by a Bayesian network and that all the variables are visible in every iteration, the following methods are used to learn the structure of the acyclic graph and the conditional probability table associated with it. The elements of a structure finding algorithm are a scoring function and a search strategy. An exhaustive search returning back a structure that maximizes the score is one implementation which is superexponential in the number of variables. A local search algorithm makes incremental changes aimed at improving the score of the structure. A global search algorithm like Markov chain Monte Carlo can avoid getting trapped in local minima. Friedman et. al. talk about using mutual information between variables and finding a structure that maximizes this. They do this by restricting the parent candidate set to k nodes and exhaustively searching therein.

See also


- Bayesian statistics
- graphical model
- machine learning

References


- Eugene Charniak [http://www.cs.ubc.ca/~murphyk/Bayes/Charniak_91.pdf Bayesian Networks Without Tears]. This is a nice intro for non-specialists.
- Kevin Murphy. An introduction to graphical models. 2001. http://www.ai.mit.edu/~murphyk/Papers/intro_gm.pdf
- Judea Pearl, Stuart Russell. Bayesian Networks. UCLA Cognitive Systems Laboratory, Technical Report (R-277), November 2000.
- Judea Pearl, Stuart Russell. Bayesian Networks, in M. A. Arbib (Ed.), Handbook of Brain Theory and Neural Networks, pp. 157–160, Cambridge, MA: MIT Press, 2003, ISBN 0262011972.
- Enrique Castillo, José Manuel Gutiérrez, and Ali S. Hadi. Expert Systems and Probabilistic Network Models. New York: Springer-Verlag, 1997. ISBN 0-387-94858-9
- Judea Pearl. Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29(3):241–288, 1986.
- J.W. Comley and [http://www.csse.monash.edu.au/~dld D.L. Dowe], "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", [http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10478&mode=toc chapter 11] (pp265–294) in P. Grunwald, M.A. Pitt and I.J. Myung (eds)., [http://mitpress.mit.edu/catalog/item/default.asp?sid=4C100C6F-2255-40FF-A2ED-02FC49FEBE7C&ttype=2&tid=10478 Advances in Minimum Description Length: Theory and Applications], Cambridge, MA: MIT Press, April 2005, ISBN 0262072629. (This paper puts decision trees in internal nodes of Bayes networks using [http://www.csse.monash.edu.au/~dld/MML.html Minimum Message Length] (MML). An earlier version is [http://www.csse.monash.edu.au/~dld/Publications/2003/Comley+Dowe03_HICS2003.ref Comley and Dowe (2003)], [http://www.csse.monash.edu.au/~dld/Publications/2003/Comley+Dowe03_HICS2003_GeneralBayesianNetworksAsymmetricLanguages.pdf .pdf].)
- Christian Borgelt and Rudolf Kruse. [http://fuzzy.cs.uni-magdeburg.de/books/gm/ Graphical Models – Methods for Data Analysis and Mining], Chichester, UK: Wiley, 2002, ISBN 0-470-84337-3
- [http://www.cs.ust.hk/faculty/lzhang/bio.html Nevin Lianwen Zhang] and [http://www.cs.ubc.ca/spider/poole/ David Poole], [http://www.cs.ubc.ca/spider/poole/papers/canai94.pdf A simple approach to Bayesian network computations], Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference (AI-94), Banff, May 1994, 171-178. This paper presents variable elimination for belief networks. Category:Bayesian networks Category:Networks


Reference counting

In computer science, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory. It is typically used as a means of deallocating objects which are no longer referenced.

Use in garbage collection

Reference counting is often known as a garbage collection algorithm where each object contains a count of the number of references to it held by other objects. If an object's reference count reaches zero, the object has become inaccessible, and it is destroyed. Simple reference counts require frequent updates. Whenever a reference is destroyed or overwritten, the reference count of the object it references is decremented, and whenever one is created or copied, the reference count of the object it references is incremented.

Advantages and disadvantages

The main advantage of reference counting over tracing garbage collection is that objects are reclaimed as soon as they can no longer be referenced, and in an incremental fashion, without long pauses for collection cycles and with clearly defined lifetime of every object. In real-time applications or systems with limited memory, this is important to maintain responsiveness. Reference counting is also among the simplest forms of garbage collection to implement. It also allows for effective management of non-memory resources such as operating system objects, which are often much scarcer than memory (tracing GC systems use finalizers for this, but the delayed reclamation may cause problems). Weighted reference counts are a good solution for garbage collecting a distributed system. Reference counting in naïve form has two main disadvantages over the tracing garbage collection, both of which require additional mechanisms to ameliorate:
- The frequent updates it involves are a source of inefficency. While tracing garbage collectors can impact efficiency severely via context switching and cache line faults, they collect relatively infrequently, while accessing objects is done continually. Also, less importantly, reference counting requires every memory-managed object to reserve space for a reference count. In tracing garbage collectors, this information is stored implicitly in the references that refer to that object, saving space, although tracing garbage collectors, particularly incremental ones, can require additional space for other purposes.
- The naïve algorithm described above can't handle reference cycles, an object which refers directly or indirectly to itself. A mechanism relying purely on reference counts will never consider cyclic chains of objects for deletion, since their reference count is guaranteed to stay nonzero. Methods for repairing exist but can also increase the overhead and complexity of reference counting — on the other hand, these methods need only be applied to data that might form cycles, often a small subset of all data.

Graph interpretation

When dealing with garbage collection schemes, it's often helpful to think of the reference graph, which is a directed graph where the vertices are objects and there is an edge from an object A to an object B if A holds a reference to B. We also have a special vertex or vertices representing the local variables and references held by the runtime system, and no edges ever go to these nodes, although edges can go from them to other nodes. In this context, the simple reference count of an object is the in degree of its vertex. Deleting a vertex is like collecting an object. It can only be done when the vertex has no incoming edges, so it does not affect the out degree of any other vertices, but it can affect the in degree of other vertices, causing their corresponding objects to be collected as well. In the graph interpretation, reference cycles are cycles. The connected component containing the special vertex contains the objects that can't be collected, while other connected components of the graph only contain garbage. By the nature of reference counting, each of these garbage components must contain at least one cycle.

Dealing with inefficiency of updates

Incrementing and decrementing reference counts every time a reference is created or destroyed can significantly impede performance. Not only do the operations take time, but they damage cache performance and can lead to pipeline bubbles. Even read-only operations like calculating the length of a list require a large number of reads and writes for reference updates with naïve reference counting. One simple technique is for the compiler to combine a number of nearby reference updates into one. This is especially effective for references which are created and quickly destroyed. Care must be taken, however, to put the combined update at the right position so that a premature free is avoided. The Deustch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in data structures, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and registers that no other reference to it still exists. Another technique devised by Henry Baker involves deferred increments, in which references which are stored in local variables do not immediately increment the corresponding reference count, but instead defer this until it is necessary. If such a reference is destroyed quickly, then there is no need to update the counter. This eliminates a large number of updates associated with short-lived references. However, if such a reference is copied into a data structure, then the deferred increment must be performed at that time. It is also critical to perform the deferred increment before the object's count drops to zero, resulting in a premature free. See the [http://citeseer.nj.nec.com/baker94minimizing.html paper] for more. A dramatic decrease in the overhead on counter updates was obtained by [http://www.cs.technion.ac.il/~levanoni/ Levanoni] and [http://www.cs.technion.ac.il/~erez/ Petrank]. They showed how to eliminate more than 99% of the counter updates for typical Java benchmarks. Furthermore, they present an enhanced algorithm that may run concurrently with multithreaded applications employing only fine synchronization. See the [http://www.cs.technion.ac.il/%7Eerez/Papers/refcount.pdf paper] for more.

Dealing with reference cycles

There are a variety of ways of handling the problem of collecting reference cycles. One is that a system may explicitly forbid reference cycles. In some systems like filesystems this is a common solution. Cycles are also sometimes ignored in systems with short lives and a small amount of cyclic garbage, particularly when the system was developed using a methodology of avoiding cyclic data structures wherever possible, typically at the expense of efficiency. Another solution is to periodically use a tracing garbage collector to reclaim cycles. Since cycles typically constitute a relatively small amount of reclaimed space, the collection cycles can be spaced much farther apart than with an ordinary tracing garbage collector. Bacon describes a cycle-collection algorithm for reference counting systems with some similarities to tracing systems, including the same theoretical time bounds, but that takes advantage of reference count information to run much more quickly and with less cache damage. It's based on the observation that an object cannot appear in a cycle until its reference count is decremented to a nonzero value. All objects which this occurs to are put on a roots list, and then periodically the program searches through the objects reachable from the roots for cycles. It knows it has found a cycle when decrementing all the reference counts on a cycle of references brings them all down to zero. An enhanced version of this algorithm is able to run concurrently with other operations. See the [http://www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf paper] for more.

Variants of reference counting

Although it's possible to augment simple reference counts in a variety of ways, often a better solution can be found by performing reference counting in a fundamentally different way. Here we describe some of the variants on reference counting and their benefits and drawbacks.

Weighted reference counting

In weighted reference counting, we assign each reference a weight, and each object tracks not the number of references referring to it, but the total weight of the references referring to it. The initial reference to a newly-created object has a large weight, such as 216. Whenever this reference is copied, half of the weight goes to the new reference, and half of the weight stays with the old reference. Because the total weight does not change, the object's reference count does not need to be updated. Destroying a reference decrements the total weight by the weight of that reference. When the total reaches zero, all references have been destroyed. If an attempt is made to copy a reference with a weight of 1, we have to "get more weight" by adding to the total weight and then adding this new weight to our reference, and then split it. The property of not needing to access a reference count when a reference is copied is particularly helpful when the object's reference count is expensive to access, for example because it is in another process, on disk, or even across a network. It can also help increase concurrency by avoiding many threads locking a reference count to increase it. Thus, weighted reference counting is most useful in parallel, multiprocess, database, or distributed applications. The primary problem with simple weighted reference counting is that destroying a reference still requires accessing the reference count, and if many references are destroyed this can cause the same bottlenecks we seek to avoid. Some adaptations of weighted reference counting seek to avoid this by attempting to give weight back from a dying reference to one which is still active. Weighted reference counting was independently devised by Bevan, in the paper Distributed garbage collection using reference counting, and Watson, in the paper An efficient garbage collection scheme for parallel computer architectures, both in 1987.

Examples of use

Delphi

A language that uses reference counting for garbage collection is Delphi. Delphi is not a completely garbage collected language, in that user-defined types must still be manually allocated and deallocated. It does provide automatic collection, however, for a few built-in types, such as strings, dynamic arrays, and interfaces, for ease of use and to simplify the generic database functionality. Some of the reasons reference counting may have been preferred to other forms of garbage collection in Delphi include:
- The general benefits of reference counting, such as prompt collection.
- Cycles either cannot occur or do not occur in practice using only the small set of garbage-collected built-in types.
- The overhead in code size required for reference counting is very small, and no separate thread of control is needed for collection as would be needed for a tracing garbage collector.
- Many instances of the most commonly used garbage-collected type, the string, have a short lifetime, since they are typically intermediate values in string manipulation.
- The reference count of a string is checked before mutating a string. This allows reference count 1 strings to be mutated directly whilst higher reference count strings are copied before mutation. This allows the general behaviour of old style pascal strings to be preserved whilst eliminating the cost of copying the string on every assignment.
- Because garbage-collection is only done on built-in types, reference counting can be efficiently integrated into the library routines used to manipulate each datatype, keeping the overhead needed for updating of reference counts low.

References

External links


- [http://www.memorymanagement.org/articles/recycle.html#reference The Memory Manager Reference: Beginner's Guide: Recycling: Reference Counts]
- [http://citeseer.nj.nec.com/baker94minimizing.html Minimizing Reference Count Updating with Deferred and Anchored Pointers for Functional Data Structures, Henry G. Baker]
- [http://www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf Concurrent Cycle Collection in Reference Counted Systems, David F. Bacon]
- [http://www.cs.technion.ac.il/%7Eerez/Papers/refcount.pdf An On-the-Fly Reference-Counting Garbage Collector for Java, Yossi Levanoni and Erez Petrank]
- [http://www.sdmagazine.com/documents/s=9730/cuj0412f/ Atomic Reference Counting Pointers: A lock-free, async-free, thread-safe, multiprocessor-safe reference counting pointer, Kirk Reinholtz] ja:参照カウント


Purely functional

Purely functional is a term in computing used to describe algorithms, data structures or programming languages that exclude destructive modifications (updates). As a consequence of this restriction, variables refer to immutable, persistent values. This means that old values available prior to a change continue to be accessible and may be updated following a modification.

Examples of purely functional data structures

Linked lists

This example is taken from Okasaki. See the bibliography. Singly linked lists are a bread and butter data structure in functional languages. In ML-derived languages and Haskell, they are purely functional because once a node in the list has been allocated, it cannot be modified, only copied or destroyed. Consider the two lists: xs = [0, 1, 2] ys = [3, 4, 5] These would be represented in memory by: image:purely_functional_list_before.gif where a circle indicates a node in the list (the arrow out showing the second element of the node which is a pointer to another node). Now concatenating the two lists: zs = xs ++ ys results in the following memory structure: image:purely_functional_list_after.gif Notice that the nodes in list xs have been copied, but the nodes in ys are shared. As a result, the original lists (xs and ys) persist and have not been modified. The reason for the copy is that the last node in xs (the node containing the original value 2) cannot be modified to point to the start of ys, because that would change the value of xs.

Trees

This example is taken from Okasaki. See the bibliography. Consider a binary tree used for fast searching, where every node has the recursive invariant that subnodes on the left are < (less than) the node, and subnodes on the right are > (greater than) the node. For instance, the set of data xs = [a, b, c, d, f, g, h] might be represented by the following binary search tree: Image:purely_functional_tree_before.gif A function which inserts data into the binary tree and maintains the invariant is: fun insert (x, E) = T (E, x, E) | insert (x, s as T (a, y, b)) = if x < y then T (insert (x, a), y, b) else if x > y then T (a, y, insert (x, b)) else s After executing ys = insert ("e", xs) we end up with the following: Image:purely_functional_tree_after.gif Notice two points: Firstly the original tree (xs) persists. Secondly many common nodes are shared between the old tree and the new tree. Such persistence and sharing is difficult to manage without some form of garbage collection (GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly found in functional programming languages.

Benefits and applications

The persistence property of purely functional data structures can be advantageous in the development of many applications which deal with multiple versions of an object. For example, say you have a thesaurus service on a website which uses a red-black tree to store its list of which words are synonyms for which other words. Because your thesaurus is comprehensive, this tree has about a million nodes. Now say you wish to add a feature that allows each user of your site to add their own custom words to their personal thesaurus. One way to do this is to make a copy of the tree for each user, and then add their custom words to it, but this is wasteful, because each user using the service simultaneously would require memory for at least a million nodes. Moreover, it would cause a significant processing delay to make the complete copy of the tree. A better approach is to store the words in a purely functional red-black tree. Then, you simply take the original version and produce a new tree based on it for each set of custom words. Because these new trees share large amounts of structure with the main tree, the space overhead for each additional user is at most 2klog2 n or about 40k nodes, where k is the number of custom nodes. Besides their efficiency benefits, the inherent referential transparency of functional data structures tends to make purely functional computation more amenable to analysis and optimization, both formal and informal.

Reference cycles

Since every value in a purely functional computation is built up out of existing values, it would seem that it is impossible to create a cycle of references, resulting in a reference graph (a graph which has edges from objects to the objects they reference) that is a directed acyclic graph. Under eager evaluation this is true with most types of data such as records, lists, and so on. However, in most functional languages functions can be defined recursively, referring to themselves, and with the added power of lazy evaluation, even regular data structures can be defined self-referentially. For example, in Haskell it is possible to create an infinite list of ones (1,1,1,...) which is represented as a circular linked list.

Bibliography

Purely functional data structures by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4. Category:Functional programming

Data structure

In computer science, a data structure is a way of storing data in a computer so that it can be used efficiently. Often a carefully chosen data structure will allow a more efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data structure. A well-designed data structure allows a variety of critical operations to be performed, using as little resources, both execution time and memory space, as possible. Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to certain tasks. For example, B-trees are particularly well-suited for implementation of databases, while routing tables rely on networks of machines to function. In the design of many types of programs, the choice of data structures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best data structure. After the data structures are chosen, the algorithms to be used often become relatively obvious. Sometimes things work in the opposite direction - data structures are chosen because certain key tasks have algorithms that work best with particular data structures. In either case, the choice of appropriate data structures is crucial. This insight has given rise to many formalised design methods and programming languages in which data structures, rather than algorithms, are the key organising factor. Most languages feature some sort of module system, allowing data structures to be safely reused in different applications by hiding their verified implementation details behind controlled interfaces. Object-oriented programming languages such as C++ and Java in particular use objects for this purpose. Since data structures are so crucial to professional programs, many of them enjoy extensive support in standard libraries of modern programming languages and environments, such as C++'s Standard Template Library, the Java API, and the Microsoft .NET framework. The fundamental building blocks of most data structures are arrays, records, discriminated unions, and references. For example, the nullable reference, a reference which can be null, is a combination of references and discriminated unions, and the simplest linked data structure, the linked list, is built from records and nullable references. There is some debate about whether data structures represent implementations or interfaces. How they are seen may be a matter of perspective. A data structure can be viewed as an interface between two functions or as an implementation of methods to access storage that is organized according to the associated data type.

See also


- List of data structures
- Persistent data structure
- Unstructured data

External links


- [http://nist.gov/dads/ Descriptions] from the Dictionary of Algorithms and Data Structures
- [http://hal9k.com/cug/links/subject39.htm Links from the C/C++ User’s Group Page] Category:Data_structures ko:자료구조 ja:データ構造 th:โครงสร้างข้อมูล

Instruction scheduling

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to rearrange the order of instructions so that a value is not used right after it is computed, which causes pipeline stalls, without changing the meaning of the code.

Data dependencies

Instruction scheduling is typically done on a single basic block. In order to determine whether rearranging the block's instructions in a certain way preserves the behavior of that block, we need the concept of a data dependency. There are three types of dependencies, which also happen to be the three data hazards: # Read after Write (RAW): Instruction 1 writes a value used by later Instruction 2. Instruction 1 must come first, or Instruction 2 will read the old value instead of the new. # Write after Read (WAR): Instruction 1 reads a location that is later overwritten by Instruction 2. Instruction 1 must come first, or it will read the new value instead of the old. # Write after Write (WAW): Two instructions both write the same location. They must occur in their original order. To make sure we respect these three types of dependencies, we construct a dependency graph, which is a directed graph where each vertex is an instruction and there is an edge from I1 to I2 if I1 must come before I2 due to a dependency. Then, any topological sort of this graph is a valid instruction schedule. Category: Compiler optimizations

Makefile

make is a utility that automates the process of converting files from one form to another, doing dependency tracking and invoking external programs to do additional work as needed. Its dependency tracking is very simple and centers around the modification time of the input files. Most frequently it is used for compilation of source code into object code, joining and then linking object code into executables or libraries. It uses files called "makefiles" to determine the dependency graph for a given output, and the build scripts which need passed to the shell to build them. The term "makefile" stems from their traditional file name of "makefile" or (later) "Makefile".

Origin

There are now a number of dependency-tracking build utilities, but make is one of the most wide-spread, primarily due to its inclusion in Unix, starting with the PWB variant, which featured a variety of tools targeting software development workloads. It was originally created by Stuart Feldman in 1977 at Bell Labs. In 2003 Dr. Feldman [http://campus.acm.org/public/membernet/storypage.May.2004.cfm?story=4&CFID=23207696&CFTOKEN=28895744 received] the ACM [http://www.acm.org/awards/ssaward.html Software System Award] for the invention of this important tool. Before make's introduction, the Unix build system would most likely consist of "make" and "install" shell scripts accompanying a program's source. Being able to combine the commands for the different targets into a single file, and being able to abstract out dependency tracking and archive handling, was an important step in the direction of modern build environments.

Modern versions

make has gone through a number of rewrites, and a number of from-scratch variants which used the same file format and basic algorithmic principles, and also provided a number of their own non-standard enhancements, in the time that followed. The two main variants of the traditional make utility are:
- BSD make, which is derived from Adam de Boor's work on a version of make capable of building targets in parallel, and survives with varying degrees of modification in FreeBSD, NetBSD and OpenBSD. Most notably, it has conditionals and iterative loops which are applied at the parsing stage and may be used to conditionally, and programmatically, construct the makefile, including generation of targets at runtime.
- GNU make, which is part of most GNU/Linux installations and is frequently used in conjunction with the GNU build system. Its notable departures from traditional make are most noticeable in pattern-matching in dependency graphs and build targets, as well as a number of functions which may be invoked to have the make utility do things like collect a list of all files in the current directory. Where BSD make has a rich set of internal macros at parse time, GNU make typically encourages the use of an external macro package like m4. POSIX includes standardization of the basic features and operation of the make utility, and is implemented with varying degrees of completeness in Unix-based versions of make. In general, simple makefiles may be used between various versions of make with reasonable success. GNU make and BSD make will look first for files named "GNUmakefile" and "BSDmakefile" respectively, which allows one to put makefiles which use implementation-defined behaviour in separate locations.

Advantages and disadvantages

Like most software which has been around for as long as make has, it has its share of fans and detractors. Many problems have surfaced with scaling make to work with modern, large software projects, some contend, but many also point out that for the common case, make works very well, and has very simple, but powerful, expressiveness. In any event, make is still used for the building of many complete operating systems, and modern replacements are not entirely different in their basic operation. With the advent of modern Integrated Development Environments, especially on non-Unix platforms, many programmers do not manually manage dependency tracking, or even the listing of which files are part of a project, and instead leave that task to be automated by their environment. Likewise, many modern programming languages have language-specific ways of listing dependencies which are more efficiently tracked through the use of language-specific build utilities.

Makefile structure

A makefile consists of lines of text which define a file (or set of files) or a rule name as depending on a set of files. Output files are marked as depending on their source files, for example, and source files are marked as depending on files which they include internally. After each dependency is listed, a series of lines of indented text may follow which define how to transform the input into the output, if the former has been modified more recently than the latter. In the case where such definitions are present, they are referred to as "build scripts" and are passed to the shell to generate the target file. The basic structure is:
# Comments use the pound sign (aka hash)
target: dependencies
	command 1
	command 2
	   .
	   .
	   .
	command n
A makefile also can contain definitions of variables and inclusion of other makefiles. Variables in makefiles may be overridden in the command line arguments passed to the make utility. This allows users to specify different behaviour for the build scripts and how to invoke programs, among other things. For example, the variable "CC" is frequently used in makefiles to refer to a C compiler, and the user may wish to provide an alternate compiler to use.

Example makefile

Below is a very simple makefile that would compile a source called "helloworld.c" using cc, a C compiler. The PHONY tag is a technicality that tells make that a particular target name does not produce an actual file. The $@ and $< are two of the so-called automatic variables and stand for the target name and so-called "implicit" source, respectively. There are a number of other automatic variables. Note that in the "clean" target, a minus prefixes the command, which tells make to ignore errors in running the command; make will normally exit if execution of a command fails at any point. In the case of a target to cleanup, typically called "clean", one wants to remove any files generated by the build process, without exiting if they don't exist. By tagging the clean target PHONY, we prevent make expecting a file to be produced by that target. Note that in this particular case, the minus prefixing the command is redundant in the common case, the -f or "force" flag to rm will prevent rm exiting due to files not existing. It may exit with an error on other, unintended errors which may be worth stopping the build for.
helloworld: helloworld.o
        cc -o $@ $<
 
helloworld.o: helloworld.c
        cc -c -o $@ $<
 
.PHONY: clean
clean:
        -rm -f helloworld helloworld.o

Similar tools

There are also many tools which are like make, but which are not compatible with its makefile format. A comprehensive list is available on the [http://www.a-a-p.org/tools_build.html A-A-P website]. Some of the most notable are:
- ant, which is popular for Java development and uses an XML file format.
- cook, a very powerful tool with a C like syntax.
- dmake (distributed make), which is used in the build process for Sun products such as OpenOffice.org and Solaris.
- jam, which is a generally enhanced, ground-up tool which is similar to make.
- Apache Maven, which is very similar to ant.
- mk - developed originally for Plan 9, said to have "improved upon make by removing all the vowels from the name", and introducing a completely new syntax that many consider to be easier to read and more powerful. It has been ported to Unix as part of plan9port.
- MPW Make -- Developed for Mac OS Classic and loosely similar to but not compatible with Unix make; it was designed with somewhat more complicated dependency syntax to allow for resource files. Mac OS X comes with GNU make. Available as part of the Macintosh Programmer's Workshop as a free, unsupported download from Apple.
- [http://search.cpan.org/dist/Module-Build/ Module::Build], which is a Perl module for the building and installation of other Perl modules. It replaces the existing [http://search.cpan.org/dist/ExtUtils-MakeMaker/ ExtUtils::MakeMaker] module, which actually used make to do the building.
- NAnt, a tool similar to ant for the .NET framework
- SCons, which is based around the Python programming language, and integrates a broader set of features than make. It is based on a Perl build driver called Cons.

External links


- [http://www.gnu.org/software/make/manual/html_chapter/make_toc.html The GNU make manual]
- [http://hepwww.ph.qmw.ac.uk/sim/gcclinux.html?desc=Compiling+and+Running+C%2B%2B+programs+on+Linux&file=gcclinux.html Compiling and Running C++ programs on Linux] Category:Compiling tools ja:Make

Wikipedia:Categories

This article is a quick introduction to categories. For detailed technical information on how to use categories, see Help:Category. For guidelines on creating and organizing categories, see Wikipedia:Categorization. For everything you ever wanted to know about categories, see Wikipedia:Categorisation FAQ.

To the new reader:

A category is a list page which serves to classify topics; for example, the :category:science is meant to hold the individual topics, such as biology and geography. If you are new to Wikipedia, and you have just selected the :category:science for reading you may have been surprised to have seen two lists: #Subcategories of :category:science, such as :category:astronomy, and #Articles, such as the Wikipedia article on astronomy.
- The first list contains more specialized categories where you can find detailed articles not in the top level category. For example, :category:astronomy contains the articles astrophysics and binary stars. There are often multiple levels of categories and subcategories.
- When you are interested in a general article, look in the second list (#2 above) for an article title. If you can not find a desired article, it is often in a subcategory. When you are reading an article and want to find the general category to explore, look at the bottom of that article for a "Categories:" box listing all the categories to which the topic of the page belongs. The Categories: links are also on the category pages allowing you to navigate up and down the lists to find the information you need.

Example: How to add category links to an article pag