Cumulative Distribution Functions and MATLAB

My textbook for my current statistics class does a pitiful job of describing the cumulative density function (CDF). It simply states:

FX(x) = P{X<=x}

The book then jumps onward for four pages talking about different types of random variables. So what is the CDF then? What is its significance? The CDF of a random variable X is the probability that the value of the random variable X is less than or equal to the parameter ‘x’. The CDF is bounded such that negative infinite as the parameter would result in ‘0’ (zero) and positive infinite would result in ‘1’ (one). As the value of the parameter ‘x’ increases, the probability that the random variable X is less than or equal to ‘x’ also increases.

As an example, let X be a random variable representing the price of stocks on the NASDAQ market. If you wanted to compute FX($0.01), which would be the probability that a stock on the NASDAQ market is currently priced at $0.01 or less, the result would in all likelihood be quite small. On the other hand, FX($500) would return a number very close to 1.0 depending on the state of the market (note: Google is at $625.77 as of yesterday).  FX($15.00) would return a number between 0 and 1 that is probably somewhat closer to 0.5 perhaps.

So what does that buy? Well, the CDF is a model that can help one understand a random variable. We can come up with a set of parameters to pass into the CDF that will help us better understand how the values of the random variable are distributed across the population. With our NASDAQ example, we could pass in $1.00, $15.00, $25,.00, $50.00, and $100.00 to get a feel for where the majority of the stock prices are spread out in price range.

How does one compute the CDF then? Well, if we are not given the model for the CDF, then we need to know more about the type of random variable. Take a normal or Gaussian random variable X. There is a mathematical expression this type of random variable, such as this nasty integral for the normal distribution function, and having not done any calculus since 2001, there is no way in hell I will deal with the integrals. Hopefully if I only had a calculator and a statistics book I would be able to use some estimations and a precalculated table in the back of the book to work out the result. Better yet, I’d be at my desk with access to MATLAB so that I can use the built in function for the CDF–wait for it–named ‘cdf’. For a normal/Gaussian random variable, to compute the CDF, use:

cdf('Normal', x, mu, sigma)

The first parameters selects the type of the random variable, the second parameter is the original CDF parameter from the mathematical expression at the beginning of this post, and the third and forth parameters are parameters of a normal/Gaussian distribution model. For other types of random variables, refer to MATLAB’s documentation on the type of distribution as well as the required model parameters.

On Probability/Statistics Education for Electrical Engineers

I am talking a course in random variables and stochastic processes this semester as a prerequisite for a communications systems theory course I want to take next semester. Why is it a prerequisite to communications systems theory? This course in random variables and stochastic processes aims to teach spectral estimation and signal detection, which is of course important for understanding how receiver technology works. Essentially, it lays the foundation for some of the theory in communications systems. The course description for this course is as follows:

Probabilistic descriptions of signals and noise, including joint, marginal and conditional densities, autocorrelation, cross-correlation and power spectral density. Linear and nonlinear transformations. Linear least-squares estimation. Signal detection.

Before starting the class I was looking forward to this course initially, but my confidence wavered as I flipped through the textbook for the course. The first half of the book looked boring with pages and pages of integral equations and abstruse mathematical notation. Ugh! Double integrals! Stuff I’ll never use and quickly forget after the course is over.

Yet the last half of the book on stochastic processes looked very interesting: statistics, spectral estimation, entropy, and Markov chains and processes. Perhaps there would be hope after all, and the course would focus on the application of probability theory and random variables to real-world engineering problems?

Unfortunately though, the textbook is quite poorly written, introducing concepts but providing no real justification or rationality for why any of the material really matters. Instead of explaining things, the author points to complex equations or mathematical expressions. The book has numerous examples, yet the examples skip steps and make all sorts of assumptions that are not clearly explained. Why is such an equation used? What led us to use such equation? The book, and the course as well, are in such a frantic push to get into the heavy meat of the course that we just blast through distribution functions and random variables, not stopping to examine anything, ask why or how, or even put into perspective the importance of what we are learning.

The textbook, and the course that follows this book, instead focuses on performing calculations. Never-mind discussing the material and its application to the world around us. Instead we shall perform calculations and show how clever we can be with algebraic manipulations with the problems at the end of each chapter. Lets just do some calculations and call that learning and mastering material. My greatest frustration with electrical engineering education is the focus and obsession with performing calculations. I wonder how Computer Science curriculums cover this material?

I shall stop ranting. I will get through this class because I have to, but I don’t have to like it. I am hoping the will improve. I’m going to find a supplemental book to hopefully gain more of an intuition for this subject material. I’m also going to learn how to do the calculations with MATLAB, because if I’m ever to use this material again, it will be with the help of MATLAB or R.