Home About us Products Services Contact us Bookmark
:: wikimiki.org ::
Markov Random Field

Markov random field

A Markov network, also called a Markov random field, is represented by:
- an undirected graph G=(V,E) , where each vertex v \in V represents a random variable and each edge e = \lbrace u,v \in E \rbrace represents a dependency between random variables u and v,
- a set of potential functions \Phi = \phi_ : A_ \rightarrow R^+, where each c_i = \lbrace i_1, i_2, ... i_ \rbrace represents a clique in G, and A_ is the set of all possible assignments to the random variables represented by v_, v_, ... v_. In other words, each clique has associated with it a function from possible assignments to all nodes to the nonnegative real numbers. The potential functions used in a Markov network do not necessarily have a probabilistic interpretation by themselves, but a higher value indicates a more probable assignment to the variables in a clique. The network is used to represent the joint distribution over all variables represented by nodes in the graph. The probability of an assignment v_1, v_2, ... v_ to all nodes in the network is given by \frac \prod_^ \phi_(v_,v_, ... v_) , where Z is a normalizing constant \sum_ \prod_^ \phi_(v_,v_, ... v_) summing over all possible assignments to all nodes in the network. The Markov blanket of a node v_i in a Markov network is defined to be every node with an edge to v_i , i.e. all v_j such that \lbrace v_i, v_j \rbrace \in E. Every node v in a Markov network is conditionally independent of every other node given the Markov blanket of v. A Markov network is similar to a Bayesian network in its representation of dependencies, but a Markov network can represent dependencies that a Bayesian network can not. A Bayesian network, for instance, can not represent cyclic dependencies, while a Markov network can. As in a Bayesian network, one may calculate the conditional distribution of a set of nodes V' = \lbrace v_1 ... v_k \rbrace given values to another set of nodes W' = \lbrace w_1 ... w_l \rbrace in the Markov network by summing over all possible assignments to u \notin V',W'; this is called exact inference. However, exact inference is a #P-complete problem, and thus computationally intractable. Approximation techniques such as Markov chain Monte Carlo and belief propagation are more feasible in practice. Category:Probability theory Category:Computer science Category:Artificial intelligence

Undirected graph

:This article just presents the basic definitions. For a broader view see graph theory. For another mathematical use of "graph", see graph of a function. graph of a function In mathematics and computer science a graph is the basic object of study in graph theory. Informally, a graph is a set of objects called vertices joined by links called edges. Typically, a graph is depicted as a set of dots (vertices, nodes) joined by lines (the edges). Depending on the application some edges can be directed.

Definitions

Definitions in graph theory vary in the literature. Here are the conventions used in this encyclopedia.

Undirected graph

rightAn undirected graph or graph G is an ordered pair G:=(V, E) with
- V, a set of vertices or nodes,
- E, a set of unordered pairs of distinct vertices, called edges or lines. The vertices belonging to an edge are called the ends, endpoints, or end vertices of the edge. V (and hence E) are usually taken to be finite sets, and many of the well-known results are not true (or are rather different) for infinite graphs because many of the arguments fail in the infinite case.

Directed graph

rightA directed graph or digraph G is an ordered pair G:=(V, A) with
- V, a set of vertices or nodes,
- A, a set of ordered pairs of vertices, called directed edges, arcs, or arrows. An edge e = (x, y) is considered to be directed from x to y; y is called the head and x is called the tail of the edge. A variation on this definition is the oriented graph, which is a graph (or multigraph; see below) with an orientation or direction assigned to each of its edges. A distinction between a directed graph and an oriented simple graph is that if x and y are vertices, a directed graph allows both (x, y) and (y, x) as edges, while only one is permitted in an oriented graph. A more fundamental difference is that, in a directed graph (or multigraph), the directions are fixed, but in an oriented graph (or multigraph), only the underlying graph is fixed, while the orientation may vary. A directed graph may or may not be allowed to have loops, that is, edges where the start and end vertices are the same. By definition, this is forbidden in an oriented simple graph. A directed acyclic graph, also called a dag or DAG, is a directed graph with no directed cycles. A quiver is sometimes said to be simply a directed graph, but in practice it is a directed graph with vector spaces attached to the vertices and linear transformations attached to the arcs.

Mixed graph

A mixed graph G is an ordered triple G := (V,E,A) with V, E and A defined as above.

Variations in the definitions

As defined above, edges of undirected graphs have distinct ends, and E and A are sets (with distinct elements, like all sets). Many applications require more general possibilities, but terminology varies. A loop is an edge (directed or undirected) with both ends the same; these may be permitted or not permitted according to the application. In this context, an edge with two different ends is called a link. Sometimes E and A are allowed to be multisets, so that there can be more than one edge between the same two vertices. Another way to allow multiple edges is to make E a set, independent of V, and to specify the endpoints of an edge by an incidence relation between V and E. The same applies to a directed edge set A, except that there must be two incidence relations, one for the head and one for the tail of each edge. The unqualified word "graph" might allow or disallow loops or multiple edges in the literature, according to the preferences of the author and the requirements of the particular topic. If it is intended to exclude multiple edges (and, in the undirected case, to exclude loops), the graph can be called simple. On the other hand, if it is intended to allow multiple edges (and sometimes loops), the graph can be called a multigraph. Sometimes the word pseudograph is used to indicate that both multiple edges and loops are allowed. In exceptional situations it is even necessary to have edges with only one end, called halfedges, or no ends (loose edges); see for example signed graphs.

Further definitions

:For more definitions see Glossary of graph theory. Two edges of a graph are called adjacent (sometimes coincident) if they share a common vertex. Similarly, two vertices are called adjacent if they share a common edge, that is they are joined by an edge. An edge and a vertex on that edge are called incident. The graph with only one vertex and no edges is called the trivial graph. A graph with only vertices and no edges is known as an edgeless graph, empty graph, or null graph (there is no consistency in the literature). The graph with no vertices and no edges is sometimes called the null graph or empty graph, but not all mathematicians allow this object. In a weighted graph or digraph, each edge is associated with some value, variously called its cost, weight, length or other term depending on the application; such graphs arise in many contexts, for example in optimal route problems such as the traveling salesman problem. Normally, the vertices of a graph, by their nature as elements of a set, are distinguishable. This kind of graph may be called vertex-labeled. However, for many questions it is better to treat vertices as indistinguishable; then the graph may be called unlabeled. (Of course, the vertices may be still distinguishable by the properties of the graph itself, e.g., by the numbers of incident edges). If vertices are indistinguishable they may be distinguished by giving each vertex a label, hence the name vertex-labeled graph. The same remarks apply to edges, so that graphs which have labeled edges are called edge-labeled graphs. Graphs with labels attached to edges or vertices are more generally designated as labeled. Consequently, graphs in which vertices are indistinguishable and edges are indistinguishable are called unlabelled. (Note that in the literature the term labeled may apply to other kinds of labeling, besides that which serves only to distinguish different vertices or edges.)

Examples

Image:6n-graf.png
The picture is a graphic representation of the following graph
- V:=
- E:= The fact that vertex 1 is adjacent to vertex 2 is sometimes denoted by 1 ~ 2.
- In category theory a category can be considered a directed multigraph with the objects as vertices and the morphisms as directed edges. The functors between categories are then some, but not necessarily all, of the digraph morphisms.
- In computer science directed graphs are used to represent finite state machines and many other discrete structures.
- A binary relation R on a set X is a simple directed graph. Two edges x,y of X are connected by an arrow if xRy.

Important graphs


- In a complete graph each pair of vertices is joined by an edge, that is, the graph contains all possible edges.
- A planar graph can be drawn in a plane (embedded in a plane) with no crossing edges.
- A tree is a connected graph with no cycles.
- Bipartite graphs
- Perfect graphs
- Cographs
- Cayley graphs
- The Petersen graph and its generalizations

Operations on graphs

There are several operations that produce new graphs from old ones.

Unary operations


- Line graph
- Dual graph
- Complement graph

Binary operations


- Cartesian product of graphs
- Tensor product of graphs
- Strong product of graphs
- Lexicographic product of graphs

Generalizations

In a hypergraph, an edge can join more than two vertices. An undirected graph can be seen as a simplicial complex consisting of 1-simplices (the edges) and 0-simplices (the vertices). As such, complexes are generalizations of graphs since they allow for higher-dimensional simplices. Every graph gives rise to a matroid, but in general the graph cannot be recovered from its matroid, so matroids are not truly generalizations of graphs. In model theory, a graph is just a structure. But in that case, there is no limitation on the number of edges: it can be any cardinal number.

See also


- Polygon
- Tiling
- Glossary of graph theory
- List of graph theory topics
- Graph (data structure)
- Graph drawing
- Important publications in graph theory
- Dual graph

External links


- [http://www.math.uni-hamburg.de/home/diestel/books/graph.theory/ Graph Theory online textbook]
- [http://www.utm.edu/departments/math/graph/ Graph theory tutorial]
- [http://www.cs.wpi.edu/~dobrush/cs507/presentation/2001/Project10/ppframe.htm Graph theory]
- [http://students.ceid.upatras.gr/~papagel/project/contents.htm Some graph theory algorithm animations]
  - Step through the algorithm to understand it.
- [http://www2.hig.no/~algmet/animate.html The compendium of algorithm visualisation sites]
- [http://www.spectster.com/ A search site for finding algorithm implementations, explanations and animations]
- [http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/graph-benchmarks.htm Challenging Benchmarks for Maximum Clique, Maximum Independent Set, Minimum Vertex Cover and Vertex Coloring]
- [http://www.nd.edu/~networks/gallery.htm Image gallery no.1: Some real-life networks]
- [http://www.aisee.com/graphs/ Image gallery no.2: More real-life graphs]
- [http://people.freenet.de/Emden-Weinert/graphs.html Graph links collection]
- [http://ttt.upv.es/~arodrigu/grafos/index.htm Grafos spanish copyleft software]
- [http://www.ffconsultancy.com/products/ocaml_for_scientists/complete/ Source code for computing neighbor shells in particle systems under periodic boundary conditions]
- [http://jgaa.info/accepted/2004/BoyerMyrvold2004.8.3.pdf Edge Addition Planarity Algorithm] — Online version of a paper that describes the Boyer-Myrvold planarity algorithm.
- [http://jgaa.info/accepted/2004/BoyerMyrvold2004.8.3/planarity.zip Edge Addition Planarity Algorithm Source Code] — Free C source code for reference implementation of Boyer-Myrvold planarity algorithm, which provides both a combinatorial planar embedder and Kuratowski subgraph isolator. Category:Discrete mathematics Category:Graph theory Category:Algebraic graph theory Category:Topological graph theory ko:그래프 th:กราฟ (คณิตศาสตร์)



Clique (graph theory)

In graph theory, a clique in an undirected graph G, is a set of vertices V such that for every two vertices in V, there exists an edge connecting the two. This is equivalent to saying that the subgraph induced by V is a complete graph. The size of a clique is the number of vertices it contains. Unfortunately finding cliques within a graph can be a hard problem. Finding the largest clique in any graph (the clique problem) for example, is NP-complete. The opposite of a clique is an independent set, in the sense that every clique corresponds to an independent set in the complement graph. Category:Graph theory

Nonnegative

A negative number is a number that is less than zero, such as −3. A positive number is a number that is greater than zero, such as 3. Zero itself is neither negative nor positive, though in computing zero is sometimes treated as though it were a positive number. The non-negative numbers are the real numbers that are not negative (positive or zero). The non-positive numbers are the real numbers that are not positive (negative or zero). In the context of complex numbers positive implies real, but for clarity one may say "positive real number".

Negative numbers

Negative integers can be regarded as an extension of the natural numbers, such that the equation xy = z has a meaningful solution for all values of x and y. The other sets of numbers are then derived as progressively more elaborate extensions and generalizations from the integers. Negative numbers are useful to describe values on a scale that goes below zero, such as temperature, and also in bookkeeping where they can be used to represent debts. In bookkeeping, debts are often represented by red numbers, or a number in parentheses.

Non-negative numbers

A number is nonnegative if and only if it is greater than or equal to zero, i.e. positive or zero. Thus the nonnegative integers are all the integers from zero on upwards, and the nonnegative reals are all the real numbers from zero on upwards. A real matrix A is called nonnegative if every entry of A is nonnegative. A real matrix A is called totally nonnegative by matrix theorists or totally positive by computer scientists if the determinant of every square submatrix of A is nonnegative.

Sign function

It is possible to define a function sgn(x) on the real numbers which is 1 for positive numbers, −1 for negative numbers and 0 for zero (sometimes called the signum function): :\sgn(x)=\left\

Real numbers

In mathematics, the real numbers are intuitively defined as numbers that are in one-to-one correspondence with the points on an infinite line—the number line. The term "real number" is a retronym coined in response to "imaginary number". Real numbers may be rational or irrational; algebraic or transcendental; and positive, negative, or zero. Real numbers measure continuous quantities. They may in theory be expressed by decimal fractions that have an infinite sequence of digits to the right of the decimal point; these are often (mis-)represented in the same form as 324.823211247… The three dots indicate that there would still be more digits to come, no matter how many more might be added at the end. Measurements in the physical sciences are almost always conceived as approximations to real numbers. Writing them as decimal fractions (which are rational numbers that could be written as ratios, with an explicit denominator) is not only more compact, but to some extent conveys the sense of an underlying real number. The real numbers are the central object of study in real analysis. A real number is said to be computable if there exists an algorithm that yields its digits. Because there are only countably many algorithms, but an uncountable number of reals, most real numbers are not computable. Some constructivists accept the existence of only those reals that are computable. The set of definable numbers is broader, but still only countable. Computers can only approximate most real numbers with rational numbers; these approximations are known as floating point numbers or fixed-point numbers; see real data type. Computer algebra systems are able to treat some real numbers exactly by storing an algebraic description (such as "sqrt(2)") rather than their decimal approximation. Mathematicians use the symbol R (or alternatively, \Bbb , the letter "R" in blackboard bold) to represent the set of all real numbers. The notation Rn refers to an n-dimensional space of real numbers; for example, a value from R3 consists of three real numbers and specifies a location in 3-dimensional space. In mathematics, real is used as an adjective, meaning that the underlying field is the field of real numbers. For example real matrix, real polynomial and real Lie algebra.

History

Vulgar fractions had been used by the Egyptians around 1000 BC; around 500 BC, the Greek mathematicians led by Pythagoras realized the need for irrational numbers. Negative numbers were invented by Indian mathematicians around 600 AD, and then possibly reinvented in China shortly after. They were not used in Europe until the 1600s, but even in the late 1700s, Leonhard Euler discarded negative solutions to equations as unrealistic. The development of calculus in the 1700s used the entire set of real numbers without having defined them cleanly. The first rigorous definition was given by Georg Cantor in 1871.

Definition

Construction from the rational numbers

The real numbers can be constructed as a completion of the rational numbers. For details and other construction of real numbers, see construction of real numbers.

Axiomatic approach

Let R denote the set of all real numbers. Then:
- The set R is a field, meaning that addition and multiplication are defined and have the usual properties.
- The field R is ordered, meaning that there is a total order ≥ such that, for all real numbers x, y and z:
  - if xy then x + zy + z;
  - if x ≥ 0 and y ≥ 0 then xy ≥ 0.
- The order is Dedekind-complete, i.e., every non-empty subset S of R with an upper bound in R has a least upper bound (also called supremum) in R. The last property is what differentiates the reals from the rationals. For example, the set of rationals with square less than 2 has a rational upper bound (e.g., 1.5) but no rational least upper bound, because the square root of 2 is not rational. The real numbers are uniquely specified by the above properties. More precisely, given any two Dedekind complete ordered fields R1 and R2, there exists a unique field isomorphism from R1 to R2, allowing us to think of them as essentially the same mathematical object.

Properties

Completeness

The main reason for introducing the reals is that the reals contain all limits. More technically, the reals are complete (in the sense of metric spaces or uniform spaces, which is a different sense than the Dedekind completeness of the order in the previous section). This means the following: A sequence (xn) of real numbers is called a Cauchy sequence if for any ε > 0 there exists an integer N (possibly depending on ε) such that the distance |xn − xm| is less than ε provided that n and m are both greater than N. In other words, a sequence is a Cauchy sequence if its elements xn eventually come and remain arbitrarily close to each other. A sequence (xn) converges to the limit x if for any ε > 0 there exists an integer N (possibly depending on ε) such that the distance |xn − x| is less than ε provided that n is greater than N. In other words, a sequence has limit x if its elements eventually come and remain arbitrarily close to x. It is easy to see that every convergent sequence is a Cauchy sequence. An important fact about the real numbers is that the converse is also true: :Every Cauchy sequence of real numbers is convergent. That is, the reals are complete. Note that the rationals are not complete. For example, the sequence (1, 1.4, 1.41, 1.414, 1.4142, 1.41421, ...) is Cauchy but it does not converge to a rational number. (In the real numbers, in contrast, it converges to the square root of 2.) The existence of limits of Cauchy sequences is what makes calculus work and is of great practical use. The standard numerical test to determine if a sequence has a limit is to test if it is a Cauchy sequence, as the limit is typically not known in advance. For example, the standard series of the exponential function : \mathrm^x = \sum_^ \frac converges to a real number because for every x the sums : \sum_^ \frac can be made arbitrarily small by choosing N sufficiently large. This proves that the sequence is Cauchy, so we know that the sequence converges even if we do not know ahead of time what the limit is.

"The complete ordered field"

The real numbers are often described as "the complete ordered field", a phrase that can be interpreted in several ways. First, an order can be lattice-complete. It is easy to see that no ordered field can be lattice-complete, because it can have no largest element (given any element z, z + 1 is larger), so this is not the sense that is meant. Additionally, an order can be Dedekind-complete, as defined in the section Axioms. The uniqueness result at the end of that section justifies using the word "the" in the phrase "complete ordered field" when this is the sense of "complete" that is meant. This sense of completeness is most closely related to the construction of the reals from Dedekind cuts, since that construction starts from an ordered field (the rationals) and then forms the Dedekind-completion of it in a standard way. These two notions of completeness ignore the field structure. However, an ordered group (and a field is a group under the operations of addition and subtraction) defines a uniform structure, and uniform structures have a notion of completeness (topology); the description in the section Completeness above is a special case. (We refer to the notion of completeness in uniform spaces rather than the related and better known notion for metric spaces, since the definition of metric space relies on already having a characterisation of the real numbers.) It is not true that R is the only uniformly complete ordered field, but it is the only uniformly complete Archimedean field, and indeed one often hears the phrase "complete Archimedean field" instead of "complete ordered field". Since it can be proved that any uniformly complete Archimedean field must also be Dedekind complete (and vice versa, of course), this justifies using "the" in the phrase "the complete Archimedean field". This sense of completeness is most closely related to the construction of the reals from Cauchy sequences (the construction carried out in full in this article), since it starts with an Archimedean field (the rationals) and forms the uniform completion of it in a standard way. But the original use of the phrase "complete Archimedean field" was by David Hilbert, who meant still something else by it. He meant that the real numbers form the largest Archimedean field in the sense that every other Archimedean field is a subfield of R. Thus R is "complete" in the sense that nothing further can be added to it without making it no longer an Archimedean field. This sense of completeness is most closely related to the construction of the reals from surreal numbers, since that construction starts with a proper class that contains every ordered field (the surreals) and then selects from it the largest Archimedean subfield.

Advanced properties

The reals are uncountable; that is, there are strictly more real numbers than natural numbers, even though both sets are infinite. This is proved with Cantor's diagonal argument. In fact, the cardinality of the reals is 2ω, i.e., the cardinality of the set of subsets of the natural numbers. Since only a countable set of real numbers can be algebraic, almost all real numbers are transcendental. The non-existence of a subset of the reals with cardinality strictly between that of the integers and the reals is known as the continuum hypothesis. The continuum hypothesis can neither be proved nor be disproved; it is independent from the axioms of set theory. The real numbers form a metric space: the distance between x and y is defined to be the absolute value |x − y|. By virtue of being a totally ordered set, they also carry an order topology; the topology arising from the metric and the one arising from the order are identical. The reals are a contractible (hence connected and simply connected), separable metric space of dimension 1, and are everywhere dense. The real numbers are locally compact but not compact. There are various properties that uniquely specify them; for instance, all unbounded, continuous, and separable order topologies are necessarily homeomorphic to the reals. Every nonnegative real number has a square root in R, and no negative number does. This shows that the order on R is determined by its algebraic structure. Also, every polynomial of odd degree admits at least one root: these two properties make R the premier example of a real closed field. Proving this is the first half of one proof of the fundamental theorem of algebra. The reals carry a canonical measure, the Lebesgue measure, which is the Haar measure on their structure as a topological group normalised such that the unit interval [0,1] has measure 1. The supremum axiom of the reals refers to subsets of the reals and is therefore a second-order logical statement. It is not possible to characterize the reals with first-order logic alone: the Löwenheim-Skolem theorem implies that there exists a countable dense subset of the real numbers satisfying exactly the same sentences in first order logic as the real numbers themselves. The set of hyperreal numbers is much bigger than R but also satisfies the same first order sentences as R. Ordered fields that satisfy the same first-order sentences as R are called nonstandard models of R. This is what makes nonstandard analysis work; by proving a first-order statement in some nonstandard model (which may be easier than proving it in R), we know that the same statement must also be true of R.

Generalizations and extensions

The real numbers can be generalized and extended in several different directions. Perhaps the most natural extension are the complex numbers which contain solutions to all polynomial equations. However, the complex numbers are not an ordered field. Ordered fields extending the reals are the hyperreal numbers and the surreal numbers; both of them contain infinitesimal and infinitely large numbers and thus are not Archimedean. Occasionally, the two formal elements +∞ and −∞ are added to the reals to form the extended real number line, a compact space which is not a field but retains many of the properties of the real numbers. Self-adjoint operators on a Hilbert space (for example, self-adjoint square complex matrices) generalize the reals in many respects: they can be ordered (though not totally ordered), they are complete, all their eigenvalues are real and they form a real associative algebra. Positive-definite operators correspond to the positive reals and normal operators correspond to the complex numbers. Category:Elementary mathematics Category:Real numbers Category:Set theory ko:실수 ja:実数 th:จำนวนจริง

Joint distribution

Given two random variables X and Y, the joint distribution of X and Y is the distribution of X and Y together.

The discrete case

For discrete random variables, the joint probability mass function can be written as Pr(X = x & Y = y). This is :P(X=x\ \mathrm\ Y=y) = P(Y=y|X=x)P(X=x)= P(X=x|Y=y)P(Y=y).\; Since these are probabilities, we have :\sum_x \sum_y P(X=x\ \mathrm\ Y=y) = 1.\;

The continuous case

Similarly for continuous random variables, the joint probability density function can be written as fX,Y(xy) and this is :f_(x,y)=f_(y|x)f_X(x) = f_(x|y)f_Y(y) \; where fY|X(y|x) and fX|Y(x|y) give the conditional distributions of Y given X = x and of X given Y = y respectively, and fX(x) and fY(y) give the marginal distributions for X and Y respectively. Since this is a probability density, we have :\int_x \int_y f_(x,y) \; dy \; dx= 1.

Joint distribution of independent variables

If for discrete random variables \ P(X = x \ \mbox \ Y = y ) = P( X = x) \cdot P( Y = y) for all x and y, or for continuous random variables \ p_(x,y) = p_X(x) \cdot p_Y(y) for all x and y, then X and Y are said to be independent.

Multidimensional distributions

The joint distribution of two random variables can be extended to many random variables X1, ..., Xn by adding them sequentially with the identity :f_(x_1, \ldots, x_n) = f_( x_n | x_1, \ldots, x_) f_( x_1, \ldots, x_ ) .

See also


- Copula (statistics)

External links


- Category:Probability theory

Normalizing constant

The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics.

Definition and examples

In probability theory, a normalizing constant is a constant by which an everywhere nonnegative function must be multiplied in order that the area under its graph is 1, i.e. it is a probability density function or a probability mass function. For example, we have :\int_^\infty e^\,dx=\sqrt, so that : \varphi(x) = \frac e^ is a probability density function. This is the density of the standard normal distribution. (Standard, in this case, means the expected value is 0 and the variance is 1.) Similarly, :\sum_^\infty \frac=e^\lambda , and consequently :f(n)=\frac is a probability mass function on the set of all nonnegative integers. This is the probability mass function of the Poisson distribution with expected value λ. Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the Boltzmann distribution plays a central role in statistical mechanics. In that context, the normalizing constant is called the partition function.

Bayes' theorem

Bayes' theorem says that the posterior probability measure is proportional to the product of the prior probability measure and the likelihood function . Proportional to implies that one must multiply or divide by a normalizing constant in order to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have :P(H_0|D) = \frac where P(H0) is the prior probability that the hypothesis is true; P(D|H0) is the conditional probability of the data given that the hypothesis is true, but given that the data are known it is the likelihood of the hypothesis (or its parameters) given the data; P(H0|D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality: :P(H_0|D) \sim P(D|H_0)P(H_0). Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that :P(H_0|D) = \frac . In this case, the value :P(D)=\sum_i P(D|H_i)P(H_i) \; is the normalizing constant. It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.

Non-probabilistic uses

The Legendre polynomials are characterized by orthogonality with respect to the uniform measure on the interval [− 1, 1] and the fact that they are normalized so that their value at 1 is 1. The constant by which one multiplies a polynomial in order that its value at 1 will be 1 is a normalizing constant. Orthonormal functions are normalized such that :\langle f_i , \, f_j\rangle = \, \delta_ with respect to some inner product <fg>. Category:Probability theory

Conditionally independent

In probability theory, two events A and B are conditionally independent given a third event C precisely if the occurrence or non-occurrence of A and B are independent events in their conditional probability distribution given C. In other words, :\Pr(A \cap B|C) = \Pr(A|C)\Pr(B|C).\, Two random variables X and Y are conditionally independent given an event C if they are independent in their conditional probability distribution given C. Two random variables X and Y are conditionally independent given a third random variable W if for any measurable set S of possible values of W, X and Y are conditionally independent given the event [WS]. Conditional independence of more than two events, or of more than two random variables, is defined analogously.

Uses in Bayesian statistics

Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, ..., n, let Xi = 1 or 0 according as the ith chosen voter will or will not vote "yes". In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X1, ..., Xn are independent random variables. By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X1, ..., Xn are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.

See also

de Finetti's theorem Category:Probability theory

Bayesian network

A Bayesian network or Bayesian belief network or just belief network is a directed graph of nodes representing variables and arcs representing dependence relations among the variables. If there is an arc from node A to another node B, then we say that A is a parent of B. If a node has a known value, it is said to be an evidence node. A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis. Nodes are not restricted to representing random variables; this is what is "Bayesian" about a Bayesian network. A Bayesian network is a representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be X(1), ..., X(n). Let parents(A) be the parents of the node A. Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions \Pr[X(i)\mid\operatorname(X(i))] for i = 1 to n. If X has no parents, its probability distribution is said to be unconditional, otherwise it is conditional. Questions about dependence among variables can be answered by studying the graph alone. It can be shown that the graphical notion called d-separation corresponds to the notion of conditional independence: if nodes X and Y are d-separated (given specified evidence nodes), then variables X and Y are independent given the evidence variables. The set of all other nodes that node X can directly depend on is given by X's Markov blanket. In order to fully specify the Bayesian network and to carry out numerical calculations, it is necessary to further specify for each node X the probability distribution for X conditional on its parents. The distribution of X given its parents may have any form. However, it is common to work with discrete or Gaussian distributions, since that simplifies calculations. Sometimes only constraints on a distribution are known; one commonly uses the principle of maximum entropy to specify the distribution with the greatest information entropy given these constraints. In this way the distribution makes the fewest additional assumptions beyond what is known to be true as expressed by the constraints. (Analogously, in the specific context of a dynamic Bayesian network, one commonly specifies the conditional distribution for the hidden state's temporal evolution to maximize the entropy rate of the implied stochastic process.) Often these conditional distributions depend on unknown parameters which must be estimated from data, ideally using maximum likelihood. However, the likelihood maximization is often intractable for many problems, leading to the use of iterative approximation techniques, the most widespread being the expectation-maximization algorithm. The goal of inference is typically to find the conditional distribution of a subset of the variables, conditional on known values for some other subset (the evidence), and integrating over any other variables. This conditional distribution is known as the posterior distribution of the subset of the variables given the evidence. The posterior gives a universal sufficient statistic for detection applications, when one wants to choose values for the variable subset which minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically constructing extensions of Bayes' theorem to more complex problems. The most common exact inference methods are variable elimination that eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product, clique tree propagation that caches the computation so that the many variables can be queried at one time, and new evidence can be propagated quickly, recursive conditioning that allows for a space-time tradeoff but still allowing for the efficiency of variable elimination when enough space is used - all of these methods have complexity that is exponential in tree width. The most common approximate inference algorithms are stochastic simulation, mini-bucket elimination (which generalizes loopy belief propagation) and variational methods. Bayesian networks are used for modelling knowledge in gene regulatory networks, medicine, engineering, text analysis, image processing, data fusion, and decision support systems. Learning the structure of a Bayesian network is a very important part of machine learning. Given the information that the data is being generated by a Bayesian network and that all the variables are visible in every iteration, the following methods are used to learn the structure of the acyclic graph and the conditional probability table associated with it. The elements of a structure finding algorithm are a scoring function and a search strategy. An exhaustive search returning back a structure that maximizes the score is one implementation which is superexponential in the number of variables. A local search algorithm makes incremental changes aimed at improving the score of the structure. A global search algorithm like Markov chain Monte Carlo can avoid getting trapped in local minima. Friedman et. al. talk about using mutual information between variables and finding a structure that maximizes this. They do this by restricting the parent candidate set to k nodes and exhaustively searching therein.

See also


- Bayesian statistics
- graphical model
- machine learning

References


- Eugene Charniak [http://www.cs.ubc.ca/~murphyk/Bayes/Charniak_91.pdf Bayesian Networks Without Tears]. This is a nice intro for non-specialists.
- Kevin Murphy. An introduction to graphical models. 2001. http://www.ai.mit.edu/~murphyk/Papers/intro_gm.pdf
- Judea Pearl, Stuart Russell. Bayesian Networks. UCLA Cognitive Systems Laboratory, Technical Report (R-277), November 2000.
- Judea Pearl, Stuart Russell. Bayesian Networks, in M. A. Arbib (Ed.), Handbook of Brain Theory and Neural Networks, pp. 157–160, Cambridge, MA: MIT Press, 2003, ISBN 0262011972.
- Enrique Castillo, José Manuel Gutiérrez, and Ali S. Hadi. Expert Systems and Probabilistic Network Models. New York: Springer-Verlag, 1997. ISBN 0-387-94858-9
- Judea Pearl. Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29(3):241–288, 1986.
- J.W. Comley and [http://www.csse.monash.edu.au/~dld D.L. Dowe], "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", [http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10478&mode=toc chapter 11] (pp265–294) in P. Grunwald, M.A. Pitt and I.J. Myung (eds)., [http://mitpress.mit.edu/catalog/item/default.asp?sid=4C100C6F-2255-40FF-A2ED-02FC49FEBE7C&ttype=2&tid=10478 Advances in Minimum Description Length: Theory and Applications], Cambridge, MA: MIT Press, April 2005, ISBN 0262072629. (This paper puts decision trees in internal nodes of Bayes networks using [http://www.csse.monash.edu.au/~dld/MML.html Minimum Message Length] (MML). An earlier version is [http://www.csse.monash.edu.au/~dld/Publications/2003/Comley+Dowe03_HICS2003.ref Comley and Dowe (2003)], [http://www.csse.monash.edu.au/~dld/Publications/2003/Comley+Dowe03_HICS2003_GeneralBayesianNetworksAsymmetricLanguages.pdf .pdf].)
- Christian Borgelt and Rudolf Kruse. [http://fuzzy.cs.uni-magdeburg.de/books/gm/ Graphical Models – Methods for Data Analysis and Mining], Chichester, UK: Wiley, 2002, ISBN 0-470-84337-3
- [http://www.cs.ust.hk/faculty/lzhang/bio.html Nevin Lianwen Zhang] and [http://www.cs.ubc.ca/spider/poole/ David Poole], [http://www.cs.ubc.ca/spider/poole/papers/canai94.pdf A simple approach to Bayesian network computations], Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference (AI-94), Banff, May 1994, 171-178. This paper presents variable elimination for belief networks. Category:Bayesian networks Category:Networks


Sharp-P-complete

#P-complete, pronounced "sharp P complete", is a complexity class in complexity theory. A problem is #P-complete if and only if it is in #P, and every problem in #P can be reduced to it in polynomial time. Examples of #P-complete problems include:
- How many different variables assignments will satisfy a given DNF formula?
- What is the permanent of a given matrix?
- How many perfect matchings are there for a given bipartite graph? It is believed that there is no polynomial-time algorithm for solving #P-complete problems, since this would imply P = NP. If it is difficult to solve exactly, then can it even be approximated? No deterministic algorithm is known that can even find the approximate answer to within some reasonable error bound. However, there are probabilistic algorithms that return good approximations to some #P-complete problems with high probability. This is one of the most striking demonstrations of the power of probabilistic algorithms. It is surprising that some #P-complete problems correspond to easy P problems. The third example problem above is in this category. The question "Is there a perfect matching for a given bipartite graph?" can be answered in polynomial time. In fact, for a graph with V vertices and E edges, it can be answered in O(VE) time. The corresponding question "how many perfect matchings does the given bipartite graph have?" is #P-complete.

See also


- Proof that Permanent is #P-complete Category:Complexity classes


Markov Chain Monte Carlo

Markov chain Monte Carlo (MCMC) methods, sometimes called random walk Monte Carlo methods, are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. The quality of the sample improves as a function of the number of steps. Usually it is not hard to construct a Markov Chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing---the stationary distribution is reached quickly starting from an arbitrary position. Tools for proving rapid mixing include arguments based on conductance and the coupling method. Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past can produce exact samples, at the cost of additional computation and an unbounded (though finite on average) running time. The most common application of these algorithms is numerically calculating multi-dimensional integrals. In these methods, an ensemble of "walkers" moves around randomly. At each point where the walker steps, the integrand value at that point is counted towards the integral. The walker then may make a number of tentative steps around the area, looking for a place with reasonably high contribution to the integral to move into next. Random walk methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are correlated. A Markov chain is constructed in such a way as to have the integrand as its equilibrium distribution. Surprisingly, this is often easy to do.

Overview

These Markov chain Monte Carlo methods are ones where the direction the walker is likely to move depends only on where the walker is, and what the function value is in the area. These methods are easy to implement and analyse, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. This problem is called "slow mixing". More sophisticated algorithms use some method of preventing the walker from doubling back. For example, in "self avoiding walk" or SAW routines, the walker remembers where it has been before (at least for a few steps), and avoids stepping on those locations again. These algorithms are harder to implement, but may exhibit faster convergence (i.e. fewer steps for an accurate result). Various statistical problems can occur — for example, what happens when a walker paints itself into a corner? Multi-dimensional integrals often arise in Bayesian statistics and computational physics, so random walk Monte Carlo methods are widely used in those fields.

Random walk algorithms


- Metropolis-Hastings algorithm: Generates a random walk using a proposal density and a method for rejecting proposed moves.
- Gibbs sampling: Requires that all the conditional distributions of the target distribution can be sampled from exactly. Gibbs sampling has the advantage that it does not display random walk behaviour. However, it can run into problems when variables are strongly correlated. When this happens, a technique called simultaneous over-relaxation can be used.
- Hybrid Markov chain Monte Carlo: Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics where the potential function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid MCMC is that proposals move across the sample space in larger steps and are therefore less correlated and converge to the target distribution more rapidly.
- Slice sampling: Depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. This method alternates uniform sampling in the vertical direction with uniform sampling from the horizontal `slice' defined by the current vertical position.
- Reversible Jump

References


- Bernd A. Berg. "Markov Chain Monte Carlo Simulations and Their Statistical Analysis". Singapore, World Scientific 2004.
- George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
- A.E. Gelfand and A.F.M. Smith. "Sampling-Based Approaches to Calculating Marginal Densities". J. American Statistical Association, 85:398-409, 1990.
- Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. London: Chapman and Hall. First edition, 1995. (See Chapter 11.)
- S. Geman and D. Geman. "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741, 1984.
- C.P. Robert and G. Casella. "Monte Carlo Statistical Methods" (second edition). New York: Springer-Verlag, 2004. Category:Monte Carlo method

Category:Probability theory

Category:Probability and statistics Category:Measure theory ja:Category:確率論

Category:Artificial intelligence

Category:Computer science Category:Cognitive science Category:Cybernetics Category:Neuroscience Category:Interdisciplinary fields Category:Futurology Category:Consciousness studies ko:분류:인공 지능 ja:Category:人工知能 th:Category:ปัญญาประดิษฐ์

Road Atlanta

Road Atlanta is a 2.54-mile Grand Prix road course located in Braselton, Georgia. The road course has 12 challenging turns, including the famous "esses" series of turns coming off of the front straight. The track is owned by Panoz Motorsports, and is the home to Petit Le Mans, a 10 hour or 1000 mile endurance race of the American Le Mans Series. Category:Atlanta sports Category:Motor racing venues in the United States

witaminy zawory metalowe spalanie kalorii darmowe statystyki gry zrcznociowe










































:: RELATED NEWS ::
Medicit
Medici-suku oli firenzeläinen mahtisuku 1200-luvulta 1600-luvulle. Suku tuotti mm. kolme paavia, lukuisia Firenzen hallitsijoita, ja myöhemmin Ranskan kuningashuoneen jäseniä. Suku oli lähtöisin vaatimattomista oloista. Nimen Medici alkuperä ei ole varma, mutta se saattaa viitata apteekkialaan. Erään tarinan mu
Kierteissumu
Spiraaligalaksi on eräs Hubblen luokittelun mukainen galaksityyppi, jonka silmiinpistävin ominaisuus on spiraalimainen rakenne. Spiraaligalakseja arvellaan olevan noin 30 % kaikista galakseista. Kotigalaksimme Linnunrata on spiraaligalaksi.

Rakenne

Linnunrata Valtaosa spiraaligalaksin tähdistä kiertää galaksin keskusta sama
Sauvaspiraaligalaksi
Sauvaspiraaligalaksi eli tyypin SB galaksi on galaksi, jossa havaitaan spiraalihaarojen eli kierteishaarojen lisäksi sauvoja. Linnunradan uskotaan nykyisin olevan sauvaspiraaligalaksi tyyppiä SBbc. Esim NGC 1300 on litistyneen S:n muotoinen sauvaspiraaligalaksi. Sauvamainen rakenne ympäröi sauvaspiraaligalaksin elliptistä ydintä. sauvaspiraalit luokitellaan laajuutensa mukaan alaluokkiin SBa, SBb ja SBc. SBa on suppein, SBc on laajin. SBm-galaksit ovat kääpiösauvaspiraaligalakseja. Linnunrataa
Tykistöohjus
Tykistöohjus on epäsuoraa tulta ampuva tykistöase. Se on taktinen ohjus, jonka kantama on muutamia satoja kilometrejä (165–300 km). Tykistöohjuksen tyypillisiä maaleja ovat tavanomaisen tykistön kantaman ulkopuolelle jäävät vihollisen ryhmitysalueet, kauaskantoinen tykistö, sekä muut tärkeät maalit. Tykistöohjus eroaa tykistöraketista kahdella tapaa. Ensinnäkin, ohjuksen mootto
Blasaarit
Blasaari on kvasaaria muistuttava aktiivinen galaksi. Näissä galaksista lähtevä suihku suuntautuu meitä kohti tai lähes meitä kohti. Blasaarit jaetaan optisesti voimakkaasti muuttuviin (OVV) tyypin kvasaareihin ja BL Lacertae-kohteisiin. On myös löydetty mutamia kohteita, joiden ominaisuudet ovat näiden väliltä. blasaarien erikoispiirteitä ovat suuret kirkkaudet, ylivalonnopeudet, hyvin nopeat valonvaihtelut, suuri valon polarisaatio tavallisiin kvasaareihin verrattuna. Blasaareja
Epäsäännöllinen galaksi
Epäsäännöllinen galaksi (tyypin I ja Irr galaksi) on galaksi, jonka muoto on epämääräinen. Epäsäännöllinen galaksi ei ole spiraaligalaksi, sauvaspiraaligalaksi, elliptinen galaksi tai linssimäinen galaksi eli mikään
Sininen jättiläinen
Sininen jättiläinen on kuuma, sininen tai sinivalkoinen, nuori massiivinen tähti. Sinisen jättiläisen massa on yli 18 aurinkoa. Spektriluokka on O tai B. Absoluuttinen kirkkaus -5 tai -6 tai yli. Pintalämpötila yli 20.000 astetta tuottaa suuren osan säteilystä Linnunradassa oleva kuuman kaasun alue, jollaisia on galaksimme kiekossa kylmemmän tähtienvälisen kaasun sisässä. Kuplien kaasu on hiukkasten liikkeestä päätellen hyvin kuumaa mutta niin harvaa, ettei se käytännössä lämmitä. Aurinko sijaitsee paikallisessa kuumassa kuplassa eli Silmukka I:ssa. Kuumat kuplat syntyvät tähtien räjähtäessä supernovina sekä myös nuorten tähti
Silmukka I
Paikallinen kuuma kupla tai Paikallinen kuuma savupiippu on kuumaa kaasua sisältävä supernovajäännös, jonka sisällä Aurinkokin on. Kupla on syntynyt tähden räjähtäessä ja myös suurten tähtien tähtituulten vaikutuksista. Molemmat sinkoavat kaasuhiukkasia suurella nopeudella
All Rights Reserved 2005 wikimiki.org