:: wikimiki.org ::
| Multivariate Normal Distribution |
Multivariate normal distributionIn probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution).
General case
A random vector follows a multivariate normal distribution if it satisfies the following equivalent conditions:
- every linear combination is normally distributed
- there is a random vector , whose components are independent standard normal random variables, a vector and an matrix such that .
- there is a vector and a symmetric, positive semi-definite matrix such that the characteristic function of is
::
The following is not quite equivalent to the conditions above, since it fails to allow for a singular matrix as the variance:
- there is a vector and a symmetric, positive definite covariance matrix ( matrix) such that has density
:
where is the determinant of .
Note how the equation above reduces to that of the univariate normal distribution if is a scalar (i.e., a real number).
The vector in these conditions is the expected value of and the matrix is the covariance matrix of the components .
It is important to realize that the covariance matrix must be allowed to be singular.
That case arises frequently in statistics; for example, in the distribution of the vector of residuals in ordinary linear regression problems.
Note also that the are in general not independent; they can be seen as the result of applying the linear transformation to a collection of independent Gaussian variables .
The multivariate normal can be written in the following notation:
:
or to make it explicitly known is N-dimensional
:
Cumulative distribution function
The cumulative distribution function (cdf) is defined as the probability that all values in a random vector are less than or equal to the corresponding values in vector . Though there is no closed form for , there are a number of algorithms that estimate it numerically. For one such example, see [http://alex.strashny.org/a/Multivariate-normal-cumulative-distribution-function-(cdf)-in-MATLAB.html] (includes MATLAB code).
A counterexample
The fact that two random variables X and Y are normally distributed does not imply that the pair (X, Y) has a bivariate normal distribution. A simple example is one in which Y = X if |X| > 1 and Y = −X if |X| < 1.
If X and Y are normally distributed and independent, then they are "jointly normally distributed", i.e., the pair (X, Y) does have a bivariate normal distribution. There are of course also many bivariate normal distributions in which the components are correlated.
Bivariate case
In the 2-dimensional nonsingular case, the probability density function (with mean (0,0)) is
:
where is the correlation between and .
Linear transformation
If is a linear transformation of where is an matrix then has a multivariate normal distribution with expected value and variance (i.e., .
Corollary: any subset of the has a marginal distribution that is also multivariate normal.
To see this consider the following example: to extract the subset , use
:
which extracts the desired elements directly.
Correlations and independence
In general, random variables may be uncorrelated but highly dependent. But if a random vector has a multivariate normal distribution then any two or more of its components that are uncorrelated are independent. This implies that any two or more of its components that are pairwise independent are independent.
But it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. Two random variables that are normally distributed may fail to be jointly normally distributed, i.e., the vector whose components they are may fail to have a multivariate normal distribution. For an example of two normally distributed random variables that are uncorrelated but not independent, see normally distributed and uncorrelated does not imply independent.
Conditional distributions
Then if and are partitioned as follows
: with sizes
: with sizes
then the distribution of conditional on is multivariate normal where
:
and covariance matrix
:
This matrix is the Schur complement of in .
Note that knowing the value of to be alters the variance; perhaps more surprisingly, the mean is shifted by ; compare this with the situation of not knowing the value of , in which case would have distribution
.
The matrix is known as the matrix of regression coefficients.
Fisher information matrix
The Fisher information matrix (FIM) for a normal distribution takes a special formulation.
The element of the FIM for is
:
where
-
-
-
- is the trace function
Kullback-Leibler divergence
The Kullback-Leibler divergence from to is:
:
Estimation of parameters
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. See estimation of covariance matrices.
In short, the pdf is
:
and the ML estimator of the covariance matrix is
:
which is simply the sample covariance matrix.
Generating values drawn from the distribution
To generate values from a multivariate normal distribution given μ and A such that X = AZ + μ as detailed above, simply generate a suitable vector of independent standard normal values Z using for example the Box-Muller transform, and apply the foregoing equation.
In order to draw a random vector from the -dimensional multivariate normal distribution with mean vector and covariance matrix (required to be symmetric and positive definite), one proceeds as follows: First, compute the Cholesky decomposition (matrix square root) of , that is, find the unique lower triangular matrix such that . Second, let be a vector whose components are independent standard normal variates. Then compute as .
Category:Continuous distributions
StatisticsStatistics is a broad mathematical discipline which studies ways to collect, summarize and draw conclusions from data. It is applicable to a wide variety of academic disciplines from the physical and social sciences to the humanities, as well as to business, government, and industry.
Once data is collected, either through a formal sampling procedure or by recording responses to treatments in an experimental setting (cf experimental design), or by repeatedly observing a process over time (time series), graphical and numerical summaries may be obtained using descriptive statistics.
Patterns in the data are modeled to draw inferences about the larger population, using inferential statistics to account for randomness and uncertainty in the observations. These inferences may take the form of answers to essentially yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), prediction of future observations, descriptions of association (correlation), or modeling of relationships (regression).
The framework described above is sometimes referred to as applied statistics. In contrast, mathematical statistics (or simply statistical theory) is the subdiscipline of applied mathematics which uses probability theory and analysis to place statistical practice on a firm theoretical basis.
The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data.
Origin
The word statistics ultimately derives from the modern Latin term statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state. It acquired the meaning of the collection and classification of data generally in the early nineteenth century. It was introduced into English by Sir John Sinclair. Thus, the original principal purpose of statistics was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services; in particular, censuses provide regular information about the population. Today, however, the use of statistics has broadened far beyond the service of a state or government, to include such areas as business, natural and social sciences, and medicine, among others.
Statistical methods
Experimental and observational studies
A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on a response or dependent variable. There are two major types of causal statistical studies, experimental studies and observational studies. In both types of studies, the effect of changes of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types is in how the study is actually conducted.
An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation may have modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead data is gathered and correlations between predictors and the response are investigated.
An example of an experimental study is the famous Hawthorne studies which attempted to test changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in whether increased illumination would increase the productivity of the assembly line workers. The researchers first measured productivity in the plant then modified the illumination in an area of the plant to see if changes in illumination would affect productivity. Due to errors in experimental procedures, specifically the lack of a control group, the researchers while unable to do what they planned were able to provide the world with the Hawthorne effect.
An example of an observational study is a study which explores the correlation between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then perform statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers and then look at the number of cases of lung cancer in each group.
The basic steps for an experiment are to:
# plan the research including determining information sources, research subject selection, and ethical considerations for the proposed research and method,
# design the experiment concentrating on the system model and the interaction of independent and dependent variables,
# summarize a collection of observations to feature their commonality by suppressing details (descriptive statistics),
# reach consensus about what the observations tell us about the world we observe (statistical inference),
# document and present the results of the study.
Levels of measurement
There are four types of measurements or measurement scales used in statistics. The four types or levels of measurement (ordinal, nominal, interval, and ratio) have different degrees of usefulness in statistical research. Ratio measurements, where both a zero value and distances between different measurements are defined, provide the greatest flexibility in statistical methods that can be used for analysing the data. Interval measurements, with meaningful distances between measurements but no meaningful zero value (such as IQ measurements or temperature measurements in degrees Celsius). Ordinal measurements have imprecise differences between consecutive values but a meaningful order to those values. Nominal measurements have no meaningful rank order among values.
Statistical techniques
Some well known statistical tests and procedures for research observations are:
- Student's t-test
- chi-square
- analysis of variance (ANOVA)
- Mann-Whitney U
- regression analysis
- correlation
- Fischer's Least Significant Difference test
- Pearson product-moment correlation coefficient
- Spearman's rank correlation coefficient
Probability
The probability of an event is often defined as a number between one and zero. In reality however there is virtually nothing that has a probability of 1 or 0. You could say that the sun will certainly rise in the morning, but what if an extremely unlikely event destroys the sun? What if there is a nuclear war and the sky is covered in ash and smoke?
We often round the probability of such things up or down because they are so likely or unlikely to occur, that it's easier to recognize them as a probability of one or zero.
However, this can often lead to misunderstandings and dangerous behaviour, because people are unable to distinguish between, e.g., a probability of 10−4 and a probability of 10−9, despite the very practical difference between them. If you expect to cross the road about 105 or 106 times in your life, then reducing your risk of being run over per road crossing to 10−9 will make it unlikely that you will be run over while crossing the road for your whole life, while a risk per road crossing of 10−4 will make it very likely that you will have an accident, despite the intuitive feeling that 0.01% is a very small risk.
Use of prior probabilities of 0 (or 1) causes problems in Bayesian statistics, since the posterior distribution is then forced to be 0 (or 1) as well. In other words, the data is not taken into account at all! As Lindley puts it, if a coherent Bayesian attaches a prior probability of zero to the hypothesis that the Moon is made of green cheese, then even whole armies of astronauts coming back bearing green cheese cannot convince him. Lindley advocates never using prior probabilities of 0 or 1. He calls it Cromwell's rule, from a letter Oliver Cromwell wrote to the synod of the Church of Scotland on August 5th, 1650 in which he said "I beseech you, in the bowels of Christ, consider it possible that you are mistaken."
Important contributors to statistics
- Carl Gauss
- Blaise Pascal
- Sir Francis Galton
- William Sealey Gosset (known as "Student")
- Karl Pearson
- Sir Ronald Fisher
- Gertrude Cox
- Charles Spearman
- Pafnuty Chebyshev
- Aleksandr Lyapunov
- Isaac Newton
- Abraham De Moivre
- Adolph Quetelet
- Florence Nightingale
- John Tukey
- George Dantzig
See also list of statisticians.
Specialized disciplines
Some sciences use applied statistics so extensively that they have specialized terminology. These disciplines include:
- Biostatistics
- Business statistics
- Data mining (applying statistics and pattern recognition to discover knowledge from data)
- Economic statistics (Econometrics)
- Engineering statistics
- Statistical physics
- Demography
- Psychological statistics
- Social statistics (for all the social sciences)
- Statistical literacy
- Process analysis and chemometrics (for analysis of data from analytical chemistry and chemical engineering)
- Reliability engineering
- Statistics in various sports, particularly baseball and cricket
Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles it is a key tool, and perhaps the only reliable tool.
Software
Modern statistics is supported by computers to perform some of the very large and complex calculations required.
Whole branches of statistics have been made possible by computing, for example neural networks.
The computer revolution has implications for the future of statistics, with a new emphasis on 'experimental' and 'empirical' statistics.
Statistical packages in common use include:
See also
- Analysis of variance (ANOVA)
- Extreme value theory
- Instrumental variables estimation
- List of academic statistical associations
- List of national and international statistical services
- List of publications in statistics
- List of statistical topics
- List of statisticians
- Machine learning
- Misuse of statistics
- Multivariate statistics
- Permutation test
- Regression analysis
- Statistical package
- Statistical phenomena
External links
- [http://www.hkshum.net/stats/ Clear explanation of the three Statistical Distributions studied throughout secondary school] great for younger students.
General sites and organizations
- [http://lib.stat.cmu.edu/ Statlib: Data, Software and News from the Statistics Community (Carnegie Mellon)]
- [http://www.cbs.nl/isi/ International Statistical Institute]
- [http://www.mathcs.carleton.edu/probweb/probweb.html The Probability Web]
Link collections
- [http://www.cbs.nl/isi/FreeTools.htm Free Statistical Tools on the WEB (at ISI)]
- [http://www.york.ac.uk/depts/maths/histstat Materials for the History of Statistics (Univ. of York)]
- [http://www.xycoon.com/ Statistics resources and calculators (Xycoon)]
- [http://members.aol.com/johnp71/javastat.html StatPages.net (statistical calculations, free software, etc.)]
- [http://www.nih.gov/sigs/bioethics/casestudies.html Bioethics Resources on the Web from the U.S. National Institute of Health (links to tutorials, case studies, and on-line courses)]
Online courses and textbooks
- [http://www.statsoft.com/textbook/stathome.html Electronic Statistics Textbook (StatSoft,Inc.)]
- [http://www.vias.org/tmdatanaleng/ Teach/Me Data Analysis (a Springer-Verlag book)]
- [http://www.richland.cc.il.us/james/lecture/m170/ Statistics: Lecture Notes (from a professor at Richland Community College)]
- [http://statistics.cyberk.com/splash/ CyberStats: Electronic Statistics Textbook (CyberGnostics, Inc)]
- [http://www.stat.ucla.edu/%7Edinov/courses_students.html A variety of class notes and educational materials on probability and statistics]
Statistical software
- [http://www.r-project.org/ R Project for Statistical Computing (free software)]
- [http://www.socr.ucla.edu/ Statistics Online Computational Resource (UCLA)]
- [http://root.cern.ch/ Root Analysis Framework (CERN)]
- [http://www.newmdsx.com/ Multidimensional Scaling Software]
- [http://www.rosuda.org/Software/ Software for interactive graphical analyses]
- [http://www.rank1st.com/website_monitoring/index.html Website Analytics and Monitoring]
- [http://www.csdassn.org/software_reports.cfm Software Reports] by Statistical Software Newsletter
- [http://chirouble.univ-lyon2.fr/~ricco/tanagra/ Tanagra (free software)], including machine learning and data mining techniques
Other resources
- [http://www.sixsigmafirst.com/anova.htm ANOVA]
- [http://www.math.uah.edu/stat/index.html Virtual Laboratories in Probability and Statistics (Univ. of Alabama)] (requires MathML and Java 2 Runtime Environment)
- [http://www.ericdigests.org/2000-2/resources.htm Resources for Teaching and Learning about Probability and Statistics (ERIC Digests)]
- [http://www.ericdigests.org/1993/marriage.htm Resampling: A Marriage of Computers and Statistics (ERIC Digests)]
- [http://www.execpc.com/~helberg/statframes.html Statistical Resources on the Web]
- [http://www.conceptstew.co.uk/PAGES/s4t_glossary_A.html Statistics glossary]
- [http://www.statistics.com/content/glossary/index.php3 Statistics Glossary at statistics.com]
- [http://jobs.strategy-blogs.com/Statisticians.html Statistician Job Outlook - Analysis of wages and working environment for the occupation]
- [http://www.amstat.org/sections/sis/ Statistics in Sports (Section of the ASA)]
- [http://meta.wikimedia.org/wiki/Statistics Statistics - Meta], statistics of Wikimedia projects
Additional references
Lindley, D. Making Decisions. John Wiley. Second Edition 1985. ISBN 0471908088
Category:Mathematical science occupations
-
Category:Applied mathematics
Category:Academic disciplines
ms:Statistik
ja:統計学
simple:Statistics
th:สถิติศาสตร์
fiu-vro:Statistiga
Random vectorA multivariate random variable or random vector is a vector X = (X1, ..., Xn) whose components are scalar-valued random variables on the same probability space (Ω, P). Every such random vector gives rise to a probability measure on Rn with the Borel algebra as underlying sigma-algebra. This measure is also known as the joint distribution of the random vector. The distributions of each of the component random variables Xi are called marginal distributions.
Category:Probability theory
Matrix (math):For the square matrix section, see square matrix.
In mathematics, a matrix (plural matrices) is a rectangular table of numbers or, more generally, of elements of a ring-like algebraic structure. In this article, the entries of a matrix are real or complex numbers unless otherwise noted.
Matrices are useful to record data that depend on two categories, and to keep track of the coefficients of systems of linear equations and linear transformations.
For the development and applications of matrices, see matrix theory.
Definitions and notations
The horizontal lines in a matrix are called rows and the vertical lines are called columns. A matrix with m rows and n columns is called an m-by-n matrix (or m×n matrix) and m and n are called its dimensions.
The entry of a matrix A that lies in the i -th row and the j-th column is called the i,j entry or (i,j)-th entry of A. This is written as Ai,j or A[i,j].
We often write to define an m × n matrix A with each entry in the matrix A[i,j] called aij for all 1 ≤ i ≤ m and 1 ≤ j ≤ n.
Example
The matrix
:
is a 4×3 matrix. The element A[2,3] or a2,3 is 7.
Adding and multiplying matrices
Sum
If two m-by-n matrices A and B are given, we may define their sum A + B as the m-by-n matrix computed by adding corresponding elements, i.e.,
(A + B)[i, j] = A[i, j] + B[i, j]. For example
:
Another, much less often used notion of matrix addition is the direct sum.
Scalar multiplication
If a matrix A and a number c are given, we may define the scalar multiplication cA by
(cA)[i, j] = cA[i, j].
For example
:
These two operations turn the set M(m, n, R) of all m-by-n matrices with real entries into a real vector space of dimension mn.
Multiplication
Multiplication of two matrices is well-defined only if the number of columns of the first matrix is the same as the number of rows of the second matrix. If A is an m-by-n matrix (m rows, n columns) and B is an n-by-p matrix (n rows, p columns), then their product AB is the m-by-p matrix (m rows, p columns) given by
:(AB)[i, j] = A[i, 1] - B[1, j] + A[i, 2] - B[2, j] + ... + A[i, n] - B[n, j] for each pair i and j.
It is easy to remember how to do this by imagining the matrix as a vector of vectors:
:Let and
:Then
For instance:
:
This multiplication has the following properties:
- (AB)C = A(BC) for all k-by-m matrices A, m-by-n matrices B and n-by-p matrices C ("associativity").
- (A + B)C = AC + BC for all m-by-n matrices A and B and n-by-k matrices C ("right distributivity").
- C(A + B) = CA + CB for all m-by-n matrices A and B and k-by-m matrices C ("left distributivity").
It is important to note that commutativity does not generally hold; that is, given matrices A and B and their product defined, then generally AB ≠ BA.
Matrices are said to anticommute if AB = -BA. Such matrices are very important in representations of Lie algebras and in Representations of Clifford algebras
Linear transformations, ranks and transpose
Matrices can conveniently represent linear transformations because matrix multiplication neatly corresponds to the composition of maps, as will be described next. This same property makes them powerful data structures in high-level programming languages.
Here and in the sequel we identify Rn with the set of "rows" or n-by-1 matrices.
For every linear map f : Rn -> Rm there exists a unique m-by-n matrix A such that f(x) = Ax for all x in Rn.
We say that the matrix A "represents" the linear map f.
Now if the k-by-m matrix B represents another linear map g : Rm -> Rk, then the linear map g o f is represented by BA. This follows from the above-mentioned associativity of matrix multiplication.
More generally, a linear map from an n-dimensional vector space to an m-dimensional vector space is represented by an m-by-n matrix, provided that bases have been chosen for each.
The rank of a matrix A is the dimension of the image of the linear map represented by A; this is the same as the dimension of the space generated by the rows of A, and also the same as the dimension of the space generated by the columns of A.
The transpose of an m-by-n matrix A is the n-by-m matrix Atr (also sometimes written as AT or tA) formed by turning rows into columns and columns into rows, i.e. Atr[i, j] = A[j, i] for all indices i and j. If A describes a linear map with respect to two bases, then the matrix Atr describes the transpose of the linear map with respect to the dual bases, see dual space.
We have (A + B)tr = Atr + Btr and (AB)tr = Btr - Atr.
Square matrices and related definitions
A square matrix is a matrix which has the same number of rows as columns. The set of all square n-by-n matrices, together with matrix addition and matrix multiplication is a ring. Unless n = 1, this ring is not commutative.
M(n, R), the ring of real square matrices, is a real unitary associative algebra. M(n, C), the ring of complex square matrices, is a complex associative algebra.
The unit matrix or identity matrix In, with elements on the main diagonal set to 1 and all other elements set to 0, satisfies MIn=M and InN=N for any m-by-n matrix M and n-by-k matrix N.
For example, if n = 3:
:
The identity matrix is the identity element in the ring of square matrices.
Invertible elements in this ring are called invertible matrices or non-singular matrices. An n by n matrix A is invertible if and only if there exists a matrix B such that
:AB = In ( = BA).
In this case, B is the inverse matrix of A, denoted by A−1.
The set of all invertible n-by-n matrices forms a group (specifically a Lie group) under matrix multiplication, the general linear group.
If λ is a number and v is a non-zero vector such that Av = λv, then we call v an eigenvector of A and λ the associated eigenvalue. (Eigen means "own" in German.) The number λ is an eigenvalue of A if and only if A−λIn is not invertible, which happens if and only if pA(λ) = 0. Here pA(x) is the characteristic polynomial of A. This is a polynomial of degree n and has therefore n complex roots (counting multiple roots according to their multiplicity). In this sense, every square matrix has n complex eigenvalues.
The determinant of a square matrix A is the product of its n eigenvalues, but it can also be defined by the Leibniz formula. Invertible matrices are precisely those matrices with nonzero determinant.
The Gauss-Jordan elimination algorithm is of central importance: it can be used to compute determinants, ranks and inverses of matrices and to solve systems of linear equations.
The trace of a square matrix is the sum of its diagonal entries, which equals the sum of its n eigenvalues.
Matrix exponential is defined for square matrices, using power series.
Special types of matrices
In many areas in mathematics, matrices with certain structure arise. A few important examples are
- Symmetric matrices are such that elements symmetric to the main diagonal (from the upper left to the lower right) are equal, that is, ai,j=aj,i.
- Skew-symmetric matrices are such that elements symmetric to the main diagonal are the negative of each other, that is, ai,j= - aj,i. In a skew-symmetric matrix, all diagonal elements are zero, that is, ai,i=0.
- Hermitian (or self-adjoint) matrices are such that elements symmetric to the diagonal are each others complex conjugates, that is, ai,j=a - j,i, where the superscript ' - ' signifies complex conjugation.
- Toeplitz matrices have common elements on their diagonals, that is, ai,j=ai+1,j+1.
- Stochastic matrices are square matrices whose columns are probability vectors; they are used to define Markov chains.
For a more extensive list see list of matrices.
Matrices in abstract algebra
If we start with a ring R, we can consider the set M(m,n, R) of all m by n matrices with entries in R. Addition and multiplication of these matrices can be defined as in the case of real or complex matrices (see above). The set M(n, R) of all square n by n matrices over R is a ring in its own right, isomorphic to the endomorphism ring of the left R-module Rn.
Similarly, if the entries are taken from a semiring S, matrix addition and multiplication can still be defined as usual. The set of all square n×n matrices over S is itself a semiring. Note that fast matrix multiplication algorithms such as the Strassen algorithm generally only apply to matrices over rings and will not work for matrices over semirings that are not rings.
If R is a commutative ring, then M(n, R) is a unitary associative algebra over R. It is then also meaningful to define the determinant of square matrices using the Leibniz formula; a matrix is invertible if and only if its determinant is invertible in R.
All statements mentioned in this articles for real or complex matrices remain correct for matrices over an arbitrary field.
Matrices over a polynomial ring are important in the study of control theory.
History
The study of matrices is quite old. Latin squares and magic squares have been studied since prehistoric times.
Matrices have a long history of application in solving linear equations. Leibniz, one of the two founders of calculus, developed the theory of determinants in 1693. Cramer developed the theory further, presenting Cramer's rule in 1750. Carl Friedrich Gauss and Wilhelm Jordan developed Gauss-Jordan elimination in the 1800s.
The term "matrix" was first coined in 1848 by J. J. Sylvester. Cayley, Hamilton, Grassmann, Frobenius and von Neumann are among the famous mathematicians who have worked on matrix theory.
Olga Taussky Todd (1906-1995) started to use matrix theory when investigating an aerodynamic phenomenon called flutter, during WWII.
Further reading
A more advanced article on matrices is matrix theory.
External links
- [http://www.easycalculation.com/matrix/index.php Matrix Calculators: dynamic online calculators]
- [http://www.ualr.edu/~lasmoller/matrices.html Matrix name and history: very brief overview]
- [http://wims.unice.fr/wims/wims.cgi?module=tool/linear/matrix.en WIMS Matrix Calculator] computes determinant, rank, inverse etc. online.
- [http://www.xycoon.com/matrix_algebra.htm Introduction to Matrix Algebra: definitions and properties]
- [http://digilander.libero.it/foxes/ Excel add-ins for Matrix Algebra and Extended Precision functions] These are freeware, open source.
Category:Abstract algebra
Category:Linear algebra
ko:행렬
ja:行列
simple:Matrix (mathematics)
th:เมทริกซ์ (คณิตศาสตร์)
Positive-definiteIn mathematics, a definite bilinear form B is one for which
:B(v, v)
has a fixed sign (positive or negative) when it is not 0.
To give a formal definition, let K be one of the fields R (real numbers) or C (complex numbers). Suppose that V is a vector space over K, and
:B : V × V → K
is a bilinear map which is Hermitian in the sense that B(x, y) is always the complex conjugate of B(y, x). Then B is positive-definite if
:B(x, x) > 0
for every nonzero x in V. If it is greater than or equal to zero, we say B is positive semidefinite. Similarly for negative definite and negative semidefinite. If it is otherwise unconstrained, we say B is indefinite.
A self-adjoint operator A on an inner product space is positive-definite if
:(x, Ax) > 0 for every nonzero vector x.
See in particular positive-definite matrix.
See also
- positive-definite function
- restricted negative-definite function
Category:Multilinear algebra
Covariance matrixIn statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions, of the concept of the variance of a scalar-valued random variable.
If is a column vector with scalar random variable components, and is the expected value of the kth element of , i.e., , then the covariance matrix is defined as:
:
::
The element is the covariance between and .
This concept generalizes to higher dimensions the concept of variance of a scalar-valued random variable , defined as
:
where .
Conflicting nomenclatures and notations
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector , because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector . Unfortunately, several different conventions jar to some degree with each other:
Standard notation:
:
Also standard notation (unfortunately conflicting with the above):
:
Also standard notation:
: (the "cross-covariance" between two random vectors)
The first two of these usages conflict with each other. The first and third are in perfect harmony. The first notation is found in William Feller's universally admired two-volume book on probability.
Properties
For and the following basic properties apply:
#
#
#
#
#
#
# If ,
#
# If and are independent, then
where and are a random vectors, is a random vector, is vector, and are matrices.
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way.
This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.
Complex random vectors
The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:
:
where the complex conjugate of a complex number is denoted .
If is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:
:
where denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.
Estimation
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle.
It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a matrix than as a mere scalar.
See estimation of covariance matrices.
External references
- [http://mathworld.wolfram.com/CovarianceMatrix.html Covariance Matrix] at Mathworld
Category:Statistics
DeterminantIn linear algebra, a determinant is a function depending on n that associates a scalar det(A) to every n×n square matrix A. The fundamental geometric meaning of a determinant is as the scale factor for volume when A is regarded as a linear transformation. Determinants are important both in calculus, where they enter the substitution rule for several variables, and in multilinear algebra.
For a fixed positive integer n, there is a unique determinant function for the n×n matrices over any commutative ring R. R is the field of real or complex numbers.
A determinant of A is also sometimes denoted by |A|, but this notation is ambiguous: it is also used to for certain matrix norms, and for the square root of .
Determinants of 2-by-2 matrices
The 2×2 matrix
:
has determinant
:.
The interpretation when the matrix has real number entries is that this gives the area of the parallelogram with vertices at (0,0), (a,c), (b,d), and (a + b, c + d), with a sign factor (which is −1 if A as a transformation matrix flips the unit square over).
A formula for larger matrices will be given below .
Applications
Determinants are used to characterize invertible matrices (namely as those matrices, and only those matrices, with non-zero determinants), and to explicitly describe the solution to a system of linear equations with Cramer's rule. They can be used to find the eigenvalues of the matrix through the characteristic polynomial
:
where I is the identity matrix of the same format as A.
One often thinks of the determinant as assigning a number to every sequence of vectors in , by using the square matrix whose columns are the given vectors.
With this understanding, the sign of the determinant of a basis can be used to define the notion of orientation in Euclidean spaces. The determinant of a set of vectors is positive if the vectors form a right-handed coordinate system, and negative if left-handed.
Determinants are used to calculate volumes in vector calculus: the absolute value of the determinant of real vectors is equal to the volume of the parallelepiped spanned by those vectors. As a consequence, if the linear map is represented by the matrix , and is any measurable subset of , then the volume of is given by . More generally, if the linear map is represented by the -by- matrix , and is any measurable subset of , then the -dimensional volume of is given by . By calculating the volume of the tetrahedron bounded by four points, they can be used to identify skew lines.
The volume of any tetrahedron, given its vertices a, b, c, and d, is (1/6)·|det(a−b, b−c, c−d)|, or any other combination of pairs of vertices that form a simply connected graph.
General definition and computation
Suppose is a square matrix.
If is a 1-by-1 matrix, then
If is a 2-by-2 matrix, then
For a 3-by-3 matrix , the formula is more complicated:
:
For a general -by- matrix, the determinant was defined by Gottfried Leibniz with what is now known as the Leibniz formula:
:
The sum is computed over all permutations of the numbers and denotes the signature of the permutation : +1 if is an even permutation and −1 if it is odd (see even and odd permutations).
This formula contains (factorial) summands and is therefore impractical to use it to calculate determinants for large .
In general, determinants can be computed with the Gauss algorithm using the following rules:
- If is a triangular matrix, i.e. whenever , then
- If results from by exchanging two rows or columns, then
- If results from by multiplying one row or column with the number , then
- If results from by adding a multiple of one row to another row, or a multiple of one column to another column, then
Explicitly, starting out with some matrix, use the last three rules to convert it into a triangular matrix, then use the first rule to compute its determinant.
It is also possible to expand a determinant along a row or column using Laplace's formula, which is efficient for relatively small matrices. To do this along row , say, we write
:
where the represent the matrix cofactors, i.e. is times the minor , which is the determinant of the matrix that results from by removing the -th row and the -th column.
Example
Suppose we want to compute the determinant of
:
We can go ahead and use the Leibniz formula directly:
:
Alternatively, we can use Laplace's formula to expand the determinant along a row or column. It is best to choose a row or column with many zeros, so we will expand along the second column:
:
A third way (and the method of choice for larger matrices) would involve the Gauss algorithm. When doing computations by hand, one can often shorten things dramatically by smartly adding multiples of columns or rows to other columns or rows; this doesn't change the value of the determinant, but may create zero entries which simplifies the subsequent calculations. In our example, adding the second column to the first one is especially useful:
:
and this determinant can be quickly expanded along the first column:
:
Properties
The determinant is a multiplicative map in the sense that
: for all n-by-n matrices and .
This is generalized by the Cauchy-Binet formula to products of non-square matrices.
It is easy to see that and thus
: for all -by- matrices and all scalars .
The matrix (over the real or complex numbers, or some other field) is invertible if and only if det(A)≠0; in this case we have
:
Expressed differently: the vectors v1,...,vn in Rn form a basis if and only if det(v1,...,vn) is non-zero.
A real matrix and its transpose have the same determinant:
:.
The determinants of
a complex matrix and of its conjugate transpose
are conjugate:
:.
(Note the conjugate transpose is identical to the transpose for a real matrix)
If and are similar, i.e., if there exists an invertible matrix such that = , then by the multiplicative property,
:
This means that the determinant is a similarity invariant. Because of this, the determinant of some linear transformation T : V → V for some finite dimensional vector space V is independent of the basis for V. The relationship is one-way, however: there exist matrices which have the same determinant but are not similar.
If is a square -by- matrix with real or complex entries and if λ1,...,λn are the (complex) eigenvalues of listed according to their algebraic multiplicities, then
:
This follows from the fact that is always similar to its Jordan normal form, an upper triangular matrix with the eigenvalues on the main diagonal.
From this connection between the determinant and the eigenvalues, one can derive a connection between the trace function, the exponential function, and the determinant:
:.
Performing the substitution in the above equation yields
:
Derivative
The determinant of real square matrices is a polynomial function from to , and as such is everywhere differentiable. Its derivative can be expressed using Jacobi's formula:
:
where adj(A) denotes the adjugate of A. In particular, if A is invertible, we have
:
or, more colloquially,
:
if the entries in the matrix are sufficiently small. The special case where is equal to the identity matrix yields
: .
Generalizations and related functions
As was pointed out above, it is possible to unambiguously define the determinant of any linear map f : V → V, if V is a finite-dimensional vector space.
It makes sense to define the determinant for matrices whose entries come from any commutative ring. The computation rules, the Leibniz formula and the compatibility with matrix multiplication remain valid, except that now a matrix is invertible if and only if is an invertible element of the ground ring.
Abstractly, one may define the determinant as a certain anti-symmetric multilinear map as follows: if is a commutative ring and denotes the free R-module with generators, then
:
is the unique map with the following properties:
- det is -linear in each of the arguments.
- det is anti-symmetric, meaning that if two of the arguments are equal, then the determinant is zero.
- , where is that element of which has a 1 in the -th coordinate and zeros elsewhere.
Linear algebraists prefer to use the multilinear map approach to define determinant, whereas combinatorialists may prefer the Leibniz formula. (Of course, even when using the above abstract approach, one has to use the Leibniz formula to show that such a multilinear map actually exists.)
The Pfaffian is an analog of the determinant for antisymmetric matrices. It is a polynomial of degree , and its square is equal to the determinant of the matrix.
There is no direct generalisation of determinants, or of the notion of volume, to spaces of infinite dimension. There are various approaches possible, including the use of the extension of the trace of a matrix, and functional determinants.
History
Historically, determinants were considered before matrices. Originally, a determinant was defined as a property of a system of linear equations. The determinant "determines" whether the system has a unique solution (which occurs precisely if the determinant is non-zero). In this sense, two-by-two determinants were considered by Cardano at the end of the 16th century and larger ones by Leibniz about 100 years later. Following him Cramer (1750) added to the theory, treating the subject in relation to sets of equations. The recurrent law was first announced by Bezout (1764).
It was Vandermonde (1771) who first recognized determinants as independent functions. Laplace (1772) gave the general method of expanding a determinant in terms of its
complementary minors: Vandermonde had already given a
special case. Immediately following, Lagrange (1773) treated
determinants of the second and third order. Lagrange was the first
to apply determinants to questions outside elimination theory; he proved
many special cases of general identities.
Gauss (1801) made the next advance. Like Lagrange, he made much use of determinants in the theory of numbers. He introduced the word determinants (Laplace had used resultant), though not in the present signification, but rather as applied to the discriminant of a quantic. Gauss also arrived at the notion of reciprocal (inverse) determinants, and came very near the multiplication theorem.
The next contributor of importance is Binet (1811, 1812), who formally
stated the theorem relating to the product of two matrices of
columns and rows, which for the special case of reduces
to the multiplication theorem. On the same day (Nov. 30, 1812) that
Binet presented his paper to the Academy, Cauchy also presented one
on the subject. (See Cauchy-Binet formula.) In this he used the word determinant in its
present sense, summarized and simplified what was then known on the
subject, improved the notation, and gave the multiplication theorem
with a proof more satisfactory than Binet's. With him begins the theory in its generality.
The next important figure was Jacobi (from 1827). He early used the functional determinant which Sylvester later called the Jacobian, and in his memoirs in Crelle for 1841 he specially treats this subject, as well as the class of alternating functions which Sylvester has called alternants. About the time of Jacobi's last memoirs, Sylvester (1839) and Cayley began their work.
The study of special forms of determinants has been the natural result of the completion of the general theory. Axisymmetric determinants have been studied by Lebesgue, Hesse, and Sylvester;
persymmetric determinants by Sylvester and Hankel; circulants by Catalan, Spottiswoode, Glaisher, and Scott; skew determinants and Pfaffians, in connection with the theory of orthogonal transformation, by Cayley; continuants by Sylvester; Wronskians (so called by Muir) by Christoffel and Frobenius; compound determinants by Sylvester, Reiss, and Picquet; Jacobians and Hessians by Sylvester; and symmetric gauche determinants by Trudi. Of the text-books on the subject Spottiswoode's was the first. In America, Hanus (1886) and Weld (1893) published treatises.
Category:Matrix theory
Category:Linear algebra
Category:Abstract algebra
Category:Algebra
ko:행렬식
ja:行列式
Expected valueIn probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff ("value"). Thus, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times. Note that the value itself may not be expected in the general sense; it may be unlikely or even impossible.
For example, an American roulette wheel has 38 equally possible outcomes. A bet placed on a single number pays 35-to-1 (this means that you are paid 35 times your bet and your bet is returned, so you get 36 times your bet). So the expected value of the profit resulting from a $1 bet on a single number is, considering all 38 possible outcomes:
:
which is about -$0.0526. Therefore one expects, on average, to lose over five cents for every dollar bet.
Mathematical definition
In general, if is a random variable defined on a probability space , then the expected value of (denoted or sometimes or ) is defined as
:
where the Lebesgue integral is employed. Note that not all random variables have an expected value, since the integral may not exist (e.g., Cauchy distribution). Two variables with the same probability distribution will have the same expected value, if it is defined.
If is a discrete random variable with values , , ... and corresponding probabilities , , ... which add up to 1, then can be computed as the sum or series
:
as in the gambling example mentioned above.
If the probability distribution of admits a probability density function , then the expected value can be computed as
:
It follows directly from the discrete case definition that if is a constant random variable, i.e. for some fixed real number , then the expected value of is also .
The expected value of an arbitrary function of x, g(x), with respect to the probability density function f(x) is given by
:
Properties
Linearity
The expected value operator (or expectation operator) is linear in the sense that
:
for any two random variables and (which need to be defined on the same probability space) and any real numbers and .
Functional non-invariance
In general, the expectation operator and functions of random variables do not commute; that is
:
except as noted above.
Non-multiplicativity
In general, the expected value operator is not multiplicative, i.e. is not necessarily equal to , except if and are independent or uncorrelated.
This lack of multiplicativity gives rise to study of covariance and correlation.
Iterated expectation
For any two random variables one may define the conditional expectation:
:
Then the expectation of satisfies
:
Hence, the following equations holds:
:
The right hand side of this equation is referred to as the iterated expectation. This proposition is treated in law of total expectation.
Inequality
If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or equal to that of Y:
If , then .
In particular, since and , the absolute value of expectation of a random variable is less or equal to the expectation of its absolute value:
:
Representation
It is easily seen that the following formula holds for any nonnegative real--valued random variable (such that ), and positive real number :
:
Uses and applications of the expected value
The expected values of the powers of are called the moments of ; the moments about the mean of are expected values of powers of . The moments of some random variables can be used to specify their distributions, via their moment generating functions.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. This estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates that (under fairly mild conditions) as the size of the sample gets larger, the variance of this estimate gets smaller.
In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose is a discrete random variable with values and corresponding probabilities . Now consider a weightless rod on which are placed weights, at locations along the rod and having masses (whose sum is one). The point at which the rod balances (its center of gravity) is . (Note however, that the center of mass is not the same as the center of gravity.)
Expectation of matrices
If is an matrix, then the expected value of the matrix is a matrix of expected values:
:
This property is utilized in covariance matrices.
See also
- Conditional expectation
- An inequality on location and scale parameters.
- Expected value is also a key concept in economics and finance.
- The general term expectation.
External links
-
Category:Probability theory
ja:期待値
Covariance matrixIn statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions, of the concept of the variance of a scalar-valued random variable.
If is a column vector with scalar random variable components, and is the expected value of the kth element of , i.e., , then the covariance matrix is defined as:
:
::
The element is the covariance between and .
This concept generalizes to higher dimensions the concept of variance of a scalar-valued random variable , defined as
:
where .
Conflicting nomenclatures and notations
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector , because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector . Unfortunately, several different conventions jar to some degree with each other:
Standard notation:
:
Also standard notation (unfortunately conflicting with the above):
:
Also standard notation:
: (the "cross-covariance" between two random vectors)
The first two of these usages conflict with each other. The first and third are in perfect harmony. The first notation is found in William Feller's universally admired two-volume book on probability.
Properties
For and the following basic properties apply:
#
#
#
#
#
#
# If ,
#
# If and are independent, then
where and are a random vectors, is a random vector, is vector, and are matrices.
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way.
This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.
Complex random vectors
The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:
:
where the complex conjugate of a complex number is denoted .
If is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:
:
where denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.
Estimation
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle.
It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a matrix than as a mere scalar.
See estimation of covariance matrices.
External references
- [http://mathworld.wolfram.com/CovarianceMatrix.html Covariance Matrix] at Mathworld
Category:Statistics
MATLAB
MATLAB refers to a numerical computing environment and its core programming language. Created by The MathWorks, MATLAB allows easy matrix manipulation, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs in other languages. Although it specializes in numerical computing, an optional toolbox interfaces with the Maple symbolic engine, making it a full computer algebra system. It is used by more than one million people in industry and academia and runs on most modern operating systems, including Windows, Mac OS, Linux and Unix. The current version is MATLAB 7.1 Service Pack 3. It is available for commercial use for approximately US$2000 and US$100 for an academic license with a limited set of Toolboxes.
History
Short for "MATrix LABoratory", MATLAB was invented in the late 1970s by Cleve Moler, then chairman of the computer science department at the University of New Mexico. He designed it to give his students access to LINPACK and EISPACK without having to learn Fortran. It soon spread to other universities and found a strong audience within the applied mathematics community. Jack Little, an engineer, was exposed to it during a visit Moler made to Stanford University in 1983. Recognizing its commercial potential, he joined with Moler and Steve Bangert. They rewrote MATLAB in C and founded The MathWorks in 1984 to continue its development. These rewritten libraries were lovingly known as JACKPAC. MATLAB was first adopted by control design engineers, Little's specialty, but quickly spread to many other domains. It is now also used in education, in particular the teaching of linear algebra and numerical analysis.
Example MATLAB code
This code, excerpted from the function magic.m, creates a magic square M for odd values of n.
[J,I] = meshgrid(1:n);
A = mod(I+J-(n+3)/2,n);
B = mod(I+2 - J-2,n);
M = n - A + B + 1;
Note that this code performs operations on vectors and matrices without the use of "for" loops. Idiomatic MATLAB programs usually operate on whole arrays at a time. The MESHGRID utility function above creates arrays like these:
I = 1 1 1 J = 1 2 3
2 2 2 1 2 3
3 3 3 1 2 3
Most scalar functions can also be used on arrays, and will apply themselves in parallel to each element. Thus mod(2 - J,n) will (scalar) multiply the entire J array with 2, before reducing each element modulo n.
MATLAB does include standard "for" and "while" loops, but it is almost always faster to write and execute code that expresses vector operations.
There are many other programs that do similar tasks to MATLAB; for a list, see List of numerical analysis software.
Criticism
MATLAB itself is a proprietary product of The MathWorks. Unlike common programming languages such as C or FORTRAN, the MATLAB language is not managed or specified by a 3rd party standards committee such as ANSI. Obtaining a fully compatible and up to date MATLAB platform requires purchasing the product. Some programs are available that implement significant subsets of the MATLAB programming language (notably the free software GNU Octave project), but these are not 100% compatible and do not include various domain-specific tools. Consequently, MATLAB customers may be subject to vendor lock-in.
MATLAB was originally written in FORTRAN and later re-written in C. The language shows this mixed heritage with a sometimes erratic syntax: neither C nor FORTRAN, but a combination of both. This mixed syntax can lead to interpretation problems. For example, the expression:
y = f(x)
could either refer to function f with argument x or the x value of matrix f. This ambiguity is difficult to resolve without closely examining the code. Similar difficulties surround the - and ' operators.
One of the basic datatypes in MATLAB is a matrix, an array of numbers devoid of important attributes required by real world data such as engineering units, sampling rates and time/date markers. Especially the lack of sample rate information is a serious shortcoming for signal processing applications, where data is typically sampled at a constant interval. These attributes must be managed by the user with custom programming, which is error-prone and time-consuming.
MATLAB is a procedural programming language, so it cannot automatically update variables in response to input changes as one might want for simulations or exploratory data analysis. Consider, for example, the following fragment:
t = 1:100;
y = log(t);
If variable t changes, e.g. t = 100:1000, the user must manually re-evaluate y to obtain the updated result. The MathWorks offers a supplementary package, Simulink, that partially automates these tasks for systems modeling and simulation applications.
Despite these shortcomings, MATLAB continues to be employed in many technical analysis applications, though several viable competitors are emerging.
See also
Toolboxes and other add-ons:
- Simulink
- Stateflow
- COMSOL Multiphysics (formerly FEMLAB)
Alternative programs with (quite) similar syntax:
- Scilab
- GNU Octave
External links
- [http://www.mathworks.com/products/matlab/ The MATLAB product page at The MathWorks]
- [http://www.mathworks.com/matlabcentral/ MATLAB Central the MATLAB user community]
- [http://dmoz.org/Science/Math/Software/MATLAB/ The MATLAB category at the Open Directory Project]
- [http://www.mathworks.com/company/newsletters/news_notes/clevescorner/dec04.html Additional information about the history of and inspiration for MATLAB, written by Cleve Moler]
- [news://comp.soft-sys.matlab comp.soft-sys.matlab]
Category:Domain-specific programming languages
Category:Numerical programming languages
Category:Array programming languages
ja:MATLAB
Probability density functionIn mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. A probability density function is everywhere non-negative and its integral from −∞ to +∞ is equal to 1. If a probability distribution has density f(x), then intuitively the infinitesimal interval [x, x + dx] has probability f(x) dx. Informally, a probability density function can be seen as a "smoothed out" version of a histogram: if one empirically measures values of a continuous random variable repeatedly and produces a histogram depicting relative frequencies of output ranges, then this histogram will resemble the random variable's probability density (assuming that the variable is sampled sufficiently often and the output ranges are sufficiently narrow).
Formally, a probability distribution has density f(x) if f(x) is a non-negative Lebesgue-integrable function R → R such that the probability of the interval [a, b] is given by
:
for any two numbers a and b. This implies that the total integral of f must be 1. Conversely, any non-negative Lebesgue-integrable function with total integral 1 is the probability density of a suitably defined probability distribution.
Simplified explanation
A probability density function is any function f(x) that describes the probability density in terms of the input variable x in a manner described below.
- f(x) is greater than or equal to zero for all values of x
- The total area under the graph is 1:
::
The actual probability can then be calculated by taking the integral of the function f(x) by the integration interval of the input variable x.
For example: the variable x being within the interval 4.3 < x < 7.8 would have the actual probability of
: | | |