Monday, August 17, 2015

MAT133 Practice Problems: Financial Math, Linear Algebra, and Lagrange Multipliers

It's like I blinked and suddenly eight months have passed since my last update. That's enough time to gestate an entire human baby. Sadly(??), that's not today's topic.

I recently created some practice problems with solutions for the University of Toronto course MAT133, Calculus and Linear Algebra for Commerce. Download the practice problems here (PDF). It isn't much, but should help anyone looking for extra practice in financial math, matrices, and the method of Lagrange multipliers.

I'll update this post if I ever generate more MAT133-related practice material. The Fall/Winter MAT133 website hosts huge amounts of excellent material, including past tests.

That's all for today. See you in probably fewer than eight months!

Sunday, December 28, 2014

Driving A Car Over a Circular Hill

Here's a textbook physics question about circular motion:
A car drives over a hill with a circular vertical profile. At what speed will the car lift off?
Finding the minimum liftoff speed at the top of the hill is easy enough. Here's the free body diagram:
Figure 1
Two forces act on the car: weight, \(mg\), pulls downward, and a normal force, \(N\), pushes upward. If the car stays on the hill, then \(N\) must be less than \(mg\), since the net force \(F_\text{net} = mg - N\) must point downward to hold the car on a circular path around the hill's curve. As usual, the centripetal force required for the car's circular motion is \(F_\text{c} = mv^2/r\), where \(v\) is the car's speed and \(r\) is the hill's radius of curvature. This centripetal force must be provided entirely by \(F_\text{net}\):
\[\begin{equation}
\frac{mv^2}{r} = mg - N. \label{eq1}
\end{equation}
\]As soon as the car leaves the ground, the normal force will have nothing to act on, so it will vanish. As a result, setting \(N = 0\) gives the threshold speed at which the car lifts off:
\[\begin{align}
\frac{mv^2}{r} &= mg - 0 \nonumber \\
v &= \sqrt{gr}
\end{align}\]
The car will lift off if its speed at the top of the hill exceeds \(\sqrt{gr}\).

That was easy because we limited our analysis to the top of the hill. But the question didn't ask specifically about just the top of the hill. If the car begins its climb at some initial speed, would it still lift off right at the crest of the hill? Or would it fly out of control immediately? Or maybe it would lift off somewhere on the other side of the hill?

Depending on your hobbies, you might not have any intuition about how cars behave on circular hills with red and blue lights strobing in the bullet-cracked rear-view, your buddy firing back with his good arm while shrieking about churches in his native tongue. So let's analyze a more general question:
A car coasts up a circular hill with initial speed \(v_0\). Where on the hill does the car lift off, and for what initial speed?
Figure 2. Notes from a speed bump designer's last day on the job.

The car's position on the hill is specified by its angle from vertical, \(\theta\). The angle subtended by the hill from vertical is \(\theta_0\). The hill's height is \(h_0 = r - r\cos\theta_0\).

Some Restrictions

  • \(\theta_0 > 0\), otherwise there would be no hill.
  • Let's stick to the regime \(\theta_0 < \pi/2\), otherwise the "hill" would really be a terrain bubble with walls steeper than vertical.
  • We'll require \(\theta \leq \theta_0\). If not, then the car is beyond the circular extent of the hill, which doesn't make sense.

One Way to Proceed: Energy Conservation

The car rolls up the hill without applying power to its wheels, so its mechanical energy (kinetic + gravitational potential) is conserved. (Another plausible model would be to hit the gas to keep the car moving at constant speed, but that turns out not to be as interesting.) Initially, the car has total energy \(mv_0^2/2\). After climbing through a height \(h\), the car will slow to speed \(v < v_0\), having gained gravitational potential energy \(mgh\). Energy conservation gives the car's speed in terms of height:
\[\begin{align}
\frac{1}{2}mv_0^2 + 0 &= \frac{1}{2}mv^2 + mgh \nonumber \\
v^2 &= v_0^2 - 2gh \label{asdfjkl}
\end{align}\]

Getting Over the Hill

Before trying to find the liftoff speed, let's find the minimum initial speed required to get over the hill at all. Equating initial kinetic energy with potential energy at the crest of the hill gives \(mv_0^2/2 = mgh_0\), or \(v_0 = \sqrt{2gh_0}\). Substituting \(h_0 = r - r\cos \theta_0\) gives \(v_0 = \sqrt{2gr(1 - \cos \theta_0)}\) as the threshold initial speed for cresting the hill. If the car starts up the hill slower than this, it will roll to a stop partway up and roll back down the same side.

It seems reasonable that if the car can't even get over the hill, it wouldn't have enough speed to fly off the road. We'll look at this again toward the end.

The Liftoff Criterion

The car will stay on the hill if the radial component of its weight, \(mg \cos \theta\), minus the opposing normal force, \(N\), is strong enough to provide the centripetal force required for circular motion. This gives something very close to Eq. (\(\ref{eq1}\)):
\[\begin{align}
m\frac{v^2}{r} &= mg\cos \theta - N.
\end{align}\]
As before, setting \(N = 0\) gives the speed above which the car will lift off:
\[\begin{align}
v^2 &= gr\cos \theta. \nonumber
\end{align}\]
The car will lift off if its speed exceeds this threshold, corresponding to the inequality
\[\begin{align}
v^2 > gr\cos \theta. \nonumber
\end{align}\]
Substituting \(v^2 = v_0^2 - 2gh\) from Eq. (\(\ref{asdfjkl}\)) gives
\[\begin{align}
v_0^2 - 2gh &> gr \cos \theta. \label{someeq}
 \end{align}\]
Finally, inserting \(h = r\cos \theta - r\cos \theta_0\) gives
\[\begin{align}
v_0^2 - 2gr(\cos \theta - \cos \theta_0) &> gr \cos \theta \nonumber \\
v_0^2 + 2gr\cos \theta_0 &> 3gr \cos \theta \nonumber \\
\frac{v_0^2}{3gr} + \frac{2}{3} \cos \theta_0 &> \cos \theta \label{coastsol}
\end{align}\]
The car will leave the ground whenever this liftoff criterion is met. Note that everything on the left-hand side is constant.

A Closer Look at the Left Side of Equation (\(\ref{coastsol}\))

Let's start with the cosine term on the left side of the liftoff criterion. Since \(\cos \theta_0\) is bounded above by 1, \(0 < \cos \theta_0 < 1\), so \(0 < \frac{2}{3} \cos \theta_0 < \frac{2}{3}.\) Nothing surprising here. For any sizable hill, this term will be somewhat less than \(\frac{2}{3}\).

The second term on the left side of Eq. (\(\ref{coastsol}\)), \(v_0^2/(3gr)\), is never less than zero, and can be made arbitrarily large by choosing a high enough \(v_0\). If \(v_0\) is zero, the car obviously shouldn't lift off. Sure enough, setting \(v_0 = 0\) in Eq. (\(\ref{coastsol}\)) makes the liftoff criterion \(\frac{2}{3}\cos\theta_0 > \cos\theta\). In other words, \(\cos\theta_0\) must be so large that a mere \(2/3\) of it is enough to exceed \(\cos\theta\). Obviously, \(\theta = \theta_0\) won't work, since the inequality goes the wrong way: \(\frac{2}{3}\cos\theta < \cos\theta\) for \(0 < \theta < \pi/2\). We'll definitely need the strict inequality \(\theta_0 < \theta\) to make the car to lift off when \(v_0 = 0\). But we've already noted the restriction that \(\theta_0 ≥ \theta\) for any hill, so the liftoff criterion is never satisfied for \(v_0 = 0\).

Reassuringly, then, Eq. (\(\ref{coastsol}\)) makes sense in the zero-speed limit: given zero initial speed, the car will never lift off (unless \(\theta\) and \(\theta_0\) both exceed \(90°\), and the car just falls off). Furthermore, choosing a large enough \(v_0\) will ensure that the inequality (\(\ref{coastsol}\)) is satisfied for any value of \(\theta\), meaning that if the car begins with a high enough speed, it will leave the hill immediately.

The Minimum Initial Speed Required for Liftoff

Given \(g\), \(r\), and \(\theta_0\), we can choose a particular initial speed \(v_0\) such that \(v_0^2/(3gr) = \frac{1}{3}\cos \theta_0\). The special \(v_0\) value that makes this true is
\[\begin{equation}
v_0 = \sqrt{gr\cos\theta_0}. \label{liftoffspeed} \end{equation}\]
At this special speed, the liftoff criterion, Eq. (\(\ref{coastsol}\)), becomes
\[\begin{align}
\frac{1}{3}\cos \theta_0 + \frac{2}{3}\cos \theta_0 = \cos \theta_0 &> \cos \theta \nonumber \\
\implies \theta_0 &< \theta
\end{align} \]
As noted above, this inequality is never true, since it implies that the car's position has exceeded the maximum angular extent of the hill. Thus, the car will remain grounded if its initial speed is lower than \(\sqrt{gr\cos\theta_0}\).

If the car starts up the hill at a speed greater than \(\sqrt{gr\cos\theta_0}\), then \(v_0^2/(3gr) > \frac{1}{3}\cos \theta_0\), and the liftoff criterion becomes \(A \cos \theta_0 > \cos \theta\), where \(A > 1\). Even if \(\cos \theta_0\) is smaller than \(\cos \theta\), we can still satisfy \(A \cos \theta_0 > \cos \theta\) with a big enough \(A\). In particular, whenever \(A\) is a tiny bit bigger than 1, then \(\theta_0 = \theta\) is always enough to satisfy the liftoff criterion. But \(\theta_0 = \theta\) describes the car's initial position. Thus, if \(v_0 > \sqrt{gr\cos\theta_0}\), then the car flies off the surface of the hill right away!

In conclusion, there is no way to send a car coasting up a hill so that it lifts off at the top of the hill, or anywhere else, for that matter. The car will either stay on the hill for its entire motion, or it will lift off immediately (and may crash back onto the hill a short time later).

Incidentally, recall that the minimum speed needed to get over the hill is \(v_0 = \sqrt{2gr(1-\cos\theta_0)}\). This will be lower than the minimum lift-off speed \(\sqrt{gr\cos\theta_0}\) if
\[\begin{align}
2 - 2\cos\theta_0 &< \cos\theta_0 \nonumber \\
2 &< 3\cos\theta_0 \nonumber \\
\theta_0 &< \cos^{-1}\left(\frac{2}{3}\right) \approx 48.2° \nonumber \\

\end{align}\]
So, for a small to sizable hill with half-angle \(\theta_0 < 48.2°\), the car can roll over smoothly to the other side if it starts with speed \(\sqrt{2gr(1-\cos\theta_0)} < v_0 < \sqrt{gr\cos\theta_0}\). For a steeper hill with \(\theta_0 > 48.2°\), the inequality reverses to \(\sqrt{2gr(1-\cos\theta_0)} > \sqrt{gr\cos\theta_0}\), meaning the liftoff speed is less than the speed required to get over the hill. On such a steep hill, either the car stops and rolls back down, or it starts with enough speed that it lifts off immediately; there's no way to roll it to the other side without catching air.

Some Actual Numbers

For a concrete example, let's choose \(\theta_0 = \pi/4 = 45°\), so \(\cos\theta_0 = 1/\sqrt{2}\). Let's give the car an initial speed below the liftoff speed given by Eq. (\(\ref{liftoffspeed}\)) of \(v_0^2 = gr\cos\theta_0\). Let's go with \(v_0^2 = \frac{1}{2}gr\cos\theta_0 = gr/(2\sqrt{2})\). The car will lift off as soon as \(\theta\) satisfies
\[\begin{align}
\cos \theta &< \frac{2}{3} \cos \left( \frac{\pi}{4}\right) + \frac{gr/(2\sqrt{2})}{3gr} \nonumber \\ &= \frac{2}{3\sqrt{2}} + \frac{1}{6\sqrt{2}} \nonumber \\ &= \frac{4}{6\sqrt{2}} + \frac{1}{6\sqrt{2}} = \frac{5}{6\sqrt{2}}\nonumber \\ \theta &> \cos^{-1}\left(\frac{5}{6\sqrt{2}}\right) = 53.9°
\end{align}\]
The car would lift off if \(\theta\) could exceed 53.9°, but it can't, since the hill's half-angle is only 45°.

Now, what if the initial speed slightly exceeds the lift-off threshold? Let's set \(v_0^2 = 1.1 gr\cos\theta_0\) \(= 1.1 gr/\sqrt{2}\). The liftoff criterion in this case is
\[\begin{align}
\cos \theta &< \frac{2}{3} \cos \left(\frac{\pi}{4}\right) + \frac{1.1gr/\sqrt{2}}{3gr} \nonumber \\ &= \frac{2}{3\sqrt{2}} + \frac{1.1}{3\sqrt{2}} \nonumber \\ \theta &> 43.1°
\end{align}\]
Given this initial speed, the car will take off whenever it's more than 43.1° from vertical. This first occurs at the car's initial position, where \(\theta = 45°\), so the car lifts off immediately.

Monday, November 17, 2014

A Vector Sum Trap in a Related-Rates Problem

A student of mine recently showed me a math problem that's easy to get wrong, as I did, through hasty vector addition. Here's the problem (adapted from this PDF):
Two radar stations, A and B, are tracking a ship generally north of both stations. Station B is located 6 km east of station A. At a certain instant, the ship is 5 km from A and also 5 km from B. At the same instant, station A reads that the distance between station A and the ship is increasing at the rate of 28 km/h. Station B reads that the distance between station B and the ship is increasing at 4 km/h. How fast and in what direction is the ship moving?

Solution 1: Related Rates 

If you've studied related rates before (who hasn't!), you'll look for an equation involving distances, differentiate that equation with respect to time, and solve for the unknown rate.

Let's use Newton's notation for time derivatives: \(\dot a = \frac{da}{dt}\). This problem gives you two rates: \(\dot a = v_a = 28 \text{ km/h} \) and \(\dot b = v_b = 4 \text{ km/h}\). You need to find the ship's speed and direction, i.e., its velocity vector \(v\), which you can express as the vector sum of horizontal and vertical components: \(v_x + v_y\). This is useful here because \(v_x\) and \(v_y\) are easily related to \(x\) and \(h\), as follows.

The large triangle's top corner stays anchored to the ship as it moves; the ship will "drag" the top corner while \(h\) remains vertical. So if the ship has horizontal speed \(v_x\), the right triangle base \(x\) must change at exactly \(v_x\) to move the top corner along with the ship, which means \(v_x = \dot x\). Similarly, \(v_y = \dot h\). So we just need to find \(\dot x\) and \(\dot h\) to solve this problem.

The right triangle with base \(x\) gives us \(h^2 + x^2 = a^2\), or \(h^2 = a^2 - x^2\). The other right triangle with base \((6 - x)\) gives \(h^2 + (6 - x)^2 = b^2\). Combining these equations gives
\[ \begin{equation}
b^2 = (6 - x)^2 + a^2 - x^2 \label{asdf}
\end{equation}
\]in which \(x\) is the only unknown. Good! The rate of change of \(x\) will give us the ship's horizontal speed. Differentiate both sides of (\(\ref{asdf}\)) with respect to time, using the chain rule on \(a(t), b(t),\) and \(x(t)\):
\[ \begin{align}
2b \dot b &= 2(6 - x)(-\dot x) + 2a \dot a - 2x \dot x \nonumber \\
b \dot b &= (x - 6) \dot x + a \dot a - x \dot x \nonumber \\
(6\text{ km}) \dot x &= a \dot a - b \dot b = (5\text{ km})(28\text{ km/h}) - (5\text{ km})(4\text{ km/h})  \nonumber \\
\dot x &= 20 \text{ km/h}  \nonumber
\end{align}
\]And from \(h^2 = a^2 - x^2\),
\[ \begin{align}
h \dot h &= (4\text{ km})\dot h = a \dot a - x \dot x = (5\text{ km})(28\text{ km/h}) - (3\text{ km})(20\text{ km/h}) \nonumber \\
\dot h &= 20 \text{ km/h} \nonumber
\end{align}
\] The ship's velocity is \(\mathbf{(v_x, v_y) = (20, 20)} \) km/h, or \(\mathbf{20\sqrt{2} \approx 28.8}\) km/h northeast.

Solution 2: Vector Addition

To check that answer and withdraw from the bleakness of the real world for a little while longer, let's find the sum of the given velocities \(v_a\) and \(v_b\) by resolving them into their horizontal and vertical components and adding those:

Note that we don't actually need the value of \(\theta\), since it only ever appears as \(\cos \theta = 3/5\) or \(\sin \theta = 4/5\) (these values are readable straight from the first diagram). Adding up components gives:
\[ \begin{align}
v_x &= v_{a}\cos \theta - v_{b} \cos \theta \nonumber \\
&= \frac{3}{5}(28 - 4) \nonumber \\
&= \mathbf{14.4 \text{ km/h}} \nonumber \\
v_y &= v_{a}\sin \theta + v_{b} \sin \theta \nonumber \\
&= \frac{4}{5}(28 + 4) \nonumber \\
&= \mathbf{25.6 \text{ km/h}} \nonumber
\end{align}
\]...Wait, what the hell!? \((v_x, v_y) = (14.4, 25.6)\) km/h translates to a speed of 29.4 km/h in a direction 60.6° north of east, shown in red below. This sum clearly disagrees with the related-rates answer of 28.8 km northeast, shown in blue.

How could this happen? Adding two vectors is dead simple; no way did you screw that up. Did you flip a sign somewhere in the related-rates analysis? Maybe \(\dot x\) doesn't actually equal \(v_x\) or something? What is going on!?

What Is Going On

It's true that the unknown velocity vector \(v\) has components \(v_a = 28 \text{ km/h}\) and \(v_b = 4 \text{ km/h}\). The problem is that these components are taken along non-orthogonal (not right-angled) lines \(a\) and \(b\), so they can't simply add back up to \(v\)! Some of \(v_a\) points in the direction of \(v_b\), and some of \(v_b\) points along \(v_a\). Here's a surefire way to convince yourself that these measured velocities shouldn't add up to the ship's velocity: picture multiple radar stations, say 10 of them, distributed along the line AB. It wouldn't make sense to add all of their velocity measurements.

Here's a geometric way to realize why the vector sum \(v_a + v_b \ne v\). The a-component of the unknown velocity \(v\) is \(v_a\). This defines a whole family of vectors whose component along the line defined by \(a\) is \(v_a = 28 \text{ km/h}\). Here's that family of vectors in green:

Similarly, there is a family of vectors whose component along the line defined by \(b\) is \(v_b = 4 km/h\):

Only one vector belongs to both families. That vector is the correct velocity of the ship, which we easily found using related rates to be \(20\sqrt{2} \approx 28.8 \text{ km/h}\) 45 degrees north of east:

You could of course solve this problem using vector geometry: the unknown vector \(v\) forms some angle \(\alpha\) with the positive x-axis, so the angle between \(v\) and \(v_a\) is \(\theta - \alpha\), etc. However, it's quicker to use related rates... As long as you don't add up vector components that point along non-orthogonal directions.

Tuesday, April 22, 2014

Bayes' Theorem, Part 2


As implied in Part One, this article series is supposed to be an easy introduction to Bayes' Theorem for non-experts (by a non-expert), not a thinly veiled job application directed at government agencies that don't officially exist.


Review: Testing Positive for a Rare Disease Doesn't Mean You're Sick

The previous article in this series illustrated a surprising fact about disease screening: if the disease you're testing for is sufficiently rare, then a positive diagnosis is probably wrong. This seemingly WTF outcome is an instance of the false positive paradox. It arises when the event of interest (in this case, being diseased) is so statistically rare that true positives are drowned out by a background of false positives.

Bayes' Theorem allows us to analyze this paradox, as shown below. But first, we need to define true and false positives and negatives.

False Positives and False Negatives

No classification test is perfect. Any real-world diagnostic test will sometimes mistakenly report disease in a healthy person. This type of error is defined as a false positive. If you test for the disease in a large number of people who are known to be healthy, a certain percentage of the test results will be false positives. This percentage is called the false positive rate of the test. It's the probability of getting a positive result if you test a healthy person.

The other type of classification error is the false negative -- for example, a clean bill of health mistakenly issued to someone who's actually sick. If you run your test on a large number of people known to be sick, the test will fail to detect disease in some percentage of them. This percentage is known as the test's false negative rate.

The lower the false positive rate and false negative rate, the better the test. Both rates are independent of population size and disease prevalence.

But now we get to the root of the false positive paradox: if the disease is rare enough, then the vast majority of people you test will be healthy. This unavoidable testing of crowds of healthy people represents plenty of opportunities to get false positives. These false positives drown out the relatively faint true positive signal coming from the few sick people in the population. And if the test obtains each true positive at the cost of many false positives, any given positive result is probably a false one. It's intuitive by this point that the error rate of a screening process depends not only on the accuracy of the test itself, but also on the rarity of what you're screening for.

For a more rigorous understanding, we need to derive Bayes' Theorem. To do that, we need some basic probability theory.

Probability Basics

The probability that some event \(A\) will occur is written \(P(A)\). All probabilities are limited to values between 0 ("impossible") and 1 ("guaranteed"). If we let \(H\) stand for the event that a fair coin lands heads up, then \(P(H) = 0.5\), or 50%. If \(X\) stands for rolling a "20" on a 20-sided die, then \(P(X) = 1/20\), or 5%.

If two events \(A\) and \(B\) cannot occur at the same time, they are said to be mutually exclusive, and the probability that either \(A\) or \(B\) occurs is just \(P(A) + P(B)\). Rolling a 19 and rolling a 20 on a 20-sided die are mutually exclusive events, so the probability of rolling 19 or 20 is \(1/20 + 1/20 = 1/10\).

The opposite of an event, or its complement, is denoted with a tilde (\(\text{~}\)) before the letter. The probability that \(A\) will not occur is written \(P(\text{~}A)\). If \(A\) has only two possible values, such as heads/tails, sick/healthy, or guilty/innocent, then \(A\) is called a binary event and \(P(A) + P(\text{~}A) = 1\), which just says that either \(A\) happens or it doesn't. Heads and tails, sickness and health, and guilt and innocence are all mutually exclusive binary events.

Conditional Probability

So far, we've considered probabilities of single events occurring in theoretical isolation: a single coin flip, a single die roll. Now, consider the probability of an event \(A\) given that some other event \(B\) has occurred. This new probability is read as "the probability of A given B" or "the probability of A conditional on B." Because this new probability quantifies the occurrence of A under the condition that B has definitely occurred, it is known as a conditional probability. Standard notation for conditional probability is:
\[ \begin{equation} P(A|B) \end{equation} \]
The vertical bar stands for the word "given." \(P(A|B)\) means "the probability of \(A\) given \(B\)."

It's really important to recognize right away that \(P(A|B)\) is not the same as \(P(B|A)\). To see why, dream up two related real-world events and think about their conditional probabilities:
  • probability that a road is wet given that it's raining: \(P(\text{wet road} ~ | ~ \text{raining})\)
  • probability that it's raining given that the road is wet: \(P(\text{raining} ~ | ~ \text{wet road})\)
The road will certainly get wet if it rains, but many things besides rain could result in a wet road (use your imagination). Therefore,
\[ \begin{equation} P(\text{wet road} ~ | ~ \text{raining})  >  P(\text{raining} ~ | ~ \text{wet road}). \end{equation} \]

Bayes' Theorem Converts between P(A|B) and P(B|A)

Okay, so \(P(A|B)\) does not equal \(P(B|A)\), but how are they related? If we know one quantity, how do we get the other? This section title blatantly gave away the answer.

To derive Bayes' Theorem, consider events \(A\) and \(B\) that have nonzero probabilities \(P(A)\) and \(P(B)\). Let's say that \(B\) has just occurred. What is the probability that \(A\) occurs given the occurrence of \(B\)? In symbols, what is \(P(A|B)\)?

Well, since \(A\) occurs after \(B\), it will certainly be true that both \(A\) and \(B\) will have occurred. The occurrence of both \(A\) and \(B\) is itself an event; let's call it \(AB\), with probability \(P(AB)\). Now, note that \(P(B)\) will always be greater or equal to \(P(AB)\), because the "\(A\)" in "\(AB\)" represents an added criterion for event completion. The chance of both \(A\) and \(B\) occurring has to be lower than the chance of just \(B\) occurring (unless, of course, \(A\) is guaranteed to occur).

The value of \(P(AB)\) itself isn't as interesting as the ratio of \(P(AB)\) to \(P(B)\), and here's why. This ratio compares the probability of both A and B to the probability of B just by itself. The ratio gives the proportion of possible occurrences of \(AB\) relative to the possible occurrences of \(B\). You should be able to convince yourself that this ratio is none other than the conditional probability \(P(A|B)\):
\[  \begin{equation} P(A|B) = \frac{P(AB)}{P(B)}.  \end{equation} \]
Rearranging gives
\[  \begin{equation} P(AB) = P(A|B)P(B). \label{whatstheuse} \end{equation} \]
Similarly,
\begin{align} P(B|A) &= \frac{P(BA)}{P(A)} \\
P(BA) &= P(B|A)P(A). \end{align}
Since the order of \(A\) and \(B\) doesn't affect the probability of both occurring, we have \(P(AB) = P(BA)\), so
\[ \begin{equation} P(A|B)P(B) = P(B|A)P(A). \end{equation} \]
This leads to Bayes' Theorem:
\[  \begin{equation}P(A|B) = \frac{P(B|A)P(A)}{P(B)} \label{existentialangst} \end{equation} \]
There we have it: to convert from \(P(B|A)\) to \(P(A|B)\), multiply \(P(B|A)\) by the ratio \(P(A)/P(B)\). Let's see what this looks like in the disease example.

Back to the Disease Example

Let the symbols \(+\) and \(-\) stand for a positive and negative diagnosis, and let \(D\) stand for the event that disease is present. Since the test must return either \(+\) or \(-\) every time we test someone, \(P(+) + P(-) = 1\). And since disease must either be present or absent, \(P(D) + P(\text{~}D) = 1\).

Now, to determine precisely how much a positive diagnosis should worry us, we care about the probability of disease given a positive diagnosis. This is just \(P(D|+)\). By Bayes' Theorem (Equation \(\ref{existentialangst}\)), we need to calculate
\[  \begin{equation} P(D|+) = \frac{P(+|D)P(D)}{P(+)}  \end{equation} \]
Consider each term on the right side:

\(P(+|D)\) is just the probability of getting a positive diagnosis given the presence of disease, i.e., the probability that the test works as advertised as a disease detector. This is the definition of the true positive rate, AKA the sensitivity, a very commonly quoted test metric.

\(P(D)\) is the probability of disease in a person randomly selected from our population. In other words, \(P(D)\) is the disease prevalence (e.g., 15 per 10,000 people).

What about the denominator, \(P(+)\)? It's the probability of getting a positive diagnosis in a randomly selected person. A positive diagnosis can be either 1. a true positive, or 2. a false positive.
  1. A true positive is defined by the occurrence of both \(D\) and \(+\). Equation \(\ref{whatstheuse}\) says that the probability of both \(D\) and \(+\) is \(P(+|D)P(D)\). \(P(+|D)\) is the true positive rate, and \(P(D)\) is the disease prevalence.
  2. A false positive is defined by the occurrence of both \(\text{~}D\) and \(+\). The probability of this is \(P(+|\text{~}D)P(\text{~}D)\). \(P(+|\text{~}D)\) is the false positive rate (after which the paradox is named), and \(P(\text{~}D) = 1 - P(D)\).
Since true and false positives are mutually exclusive events, their probabilities add up to give the probability of a positive outcome = either a true positive or a false positive. Thus,
\[  \begin{equation} P(+) = P(+|D)P(D) + P(+|\text{~}D)P(\text{~}D). \end{equation} \]

Bayes' Theorem for the disease-screening example now looks like this:
\[  \begin{equation} P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|\text{~}D)P(\text{~}D)} \label{thehorror} \end{equation} \]

Plugging in Example Numbers

Part One of this series gave concrete numbers for a hypothetical outbreak of dancing plague. Let's insert those numbers into our newly minted Equation \(\ref{thehorror}\) to calculate the value of a positive diagnosis.
  • Dancing plague was assumed to affect one in 1000 people, so \(P(D) = 1/1000\). Since each person either has or does not have disease, \(P(D) + P(\text{~}D) = 1\).
  • Test sensitivity was 99%. Sensitivity is synonymous with the true positive rate, so this tells us that \(P(+|D) = 0.99\). And since the test must return either \(+\) or \(-\) when disease is present, \(P(-|D) = 1 - 0.99 = 0.01\).
  • Test specificity was 95%. Specificity is synonymous with the true negative rate, so \(P(-|\text{~}D) = 0.95\). Then \(P(+|\text{~}D) = 1 - 0.95 = 0.05\).
Inserting these numbers into Equation \(\ref{thehorror}\) gives
\[ \begin{align}
P(D|+) &= \frac{0.99 \cdot \frac{1}{1000}}{0.99 \cdot \frac{1}{1000} + 0.05 \cdot (1 - \frac{1}{1000})} \label{whyareyouevenwritingthis} \\
&= \frac{\frac{0.99}{1000}}{\frac{0.99}{1000} + 0.05 \cdot \frac{999}{1000}} \\
&= 0.01943 \\ &\simeq 1.9 \%
\end{align} \]
As expected, this is the same answer we got in Part One through a less rigorous approach.

Final Remarks

In this excessively long article, we derived Bayes' Theorem and used it to confirm our earlier reasoning in Part One that a positive diagnosis of dancing plague has only a 2% chance of being correct. This low number is an example of the false positive paradox, and Equation \(\ref{whyareyouevenwritingthis}\) reveals its origin.

The form of Equation \(\ref{whyareyouevenwritingthis}\) is [something] divided by [something + other thing], or \(\frac{t}{t + f}\). If \(f\) is small compared to \(t\), then \(\frac{t}{t+f} \simeq \frac{t}{t} = 1\), which means that the probability of disease given a positive test result is close to 100%. But if \(f\) becomes much larger than \(t\), then \(\frac{t}{t+f}\) becomes much less than 1. Looking at Equations \(\ref{thehorror}\) and \(\ref{whyareyouevenwritingthis}\), you can see that \(t\) matches up with the term \(P(+|D)P(D)\), the probability of getting a true positive, and \(f\) matches up with \(P(+|\text{~}D)P(\text{~}D)\), the probability of getting a false positive. In our example, \(t \simeq 0.001 \) and \(f \simeq 0.05 \). Thus, the chance of getting a false positive is 50 times higher than the chance of getting a true positive. That's why someone who tests positive probably has nothing to worry about, other than the social stigma of getting tested in the first place.

Cliffhanger Ending

In my next post, I'll explain the Bayesian methodology I used in the course of my involvement with the series To Catch A Killer. Essentially, the above analysis can be adapted to homicide investigations by replacing rare-disease prevalence with the homicide rate for a specific time, place, and demographic, and by treating the presence or absence of forensic evidence as positive or negative diagnostic test outcomes.

Monday, January 27, 2014

Bayes' Theorem, Part 1: Not Just a Mnemonic for Apostrophe Placement

If you're intimately familiar with Bayes' Theorem or profoundly bored of it, you may still find value in this post by taking a shot every time you read the words "theorem" and "disease."

I first encountered Bayes' Theorem in a high school conversation about email spam filters. I didn't retain much about either the theorem or spam filters, but promptly added the term "Bayes' Theorem" to my mental list of Things That Sound Vaguely Technical And Also Possibly Sinister. (That list includes the names of every military and/or aerospace contractor that ever existed. If you think of any exceptions, send them my way.)

Years afterward, Bayes' Theorem started cropping up in my medical biophysics studies and after-hours discussions about airport and border security. More recently, I used Bayes' Theorem to weigh forensic evidence in the upcoming documentary series To Catch a Killer. The theorem seems to appear everywhere and makes you sound smart, but just what is it?

Basically, Bayes' Theorem tells you how to update your beliefs using new information. That's the best plain-English definition I can think of. More generally, Bayes' Theorem tells you how to manipulate conditional probabilities, saving you from fallacious logic along the lines of "most Pabst drinkers are hipsters, so most hipsters drink Pabst." (It may be true that most Pabst drinkers are not hipsters, but that's not the point of this fallacy. The lesson for me is that I come up with poor examples.)

Bayes' Theorem follows directly from basic probability principles, but proper derivations tend to look like field notes by Will Hunting on how to outperform pompous Harvard grad students at impressing Minnie Driver. Accordingly, this post shall include zero equations, which is great, since I figured out how to embed equations in my last post. Instead, I'll try to show the importance of Bayes' Theorem by posing the following brain teaser to you, dear reader.


Brain Teaser: You Tested Positive for a Rare Disease; Do You Really Have It?

Imagine that a disease afflicts 0.1% of the general population, or 1 in 1000 people. A particular diagnostic test returns either "positive" or "negative" to indicate the presence or absence of the disease. Let's say you know that this test is 99% sensitive. That's a compact way of saying that out of 100 people who truly do have the disease, 99 of them will correctly test positive, whereas 1 will erroneously test negative, even though they actually have the disease. Let's also say you know that the test is 95% specific. That means that out of 100 disease-free people, 95 will correctly test negative, but 5 of these healthy people will erroneously be told that they have the disease.

Suppose you run this test on yourself, and sweet buttery Jesus, it says you're positive. This deeply distresses you, as it should if the disease in question were, say, dancing plague. As psychosomatic head-bobbing sets in, you ask yourself the following question: given the positive test result, what are the chances that I actually have dancing plague?

Take another look at those goddamn numbers. The test is 99% sensitive and 95% specific. Should you embrace your groovy fate and invest in a bell-bottomed suit and unnervingly realistic John Travolta mask? Is all hope lost? Is the jig up?!

Think it over and decide on your final answer before reading on. At the very least, don't bother with precise numbers, but decide whether you think the chance of actually having dancing plague is more or less than 50%, given your positive diagnosis.

If you haven't seen this kind of question before, the chance that your answer exceeds 50% exceeds 50%. It turns out that even though you tested positive, the chance that you have the disease is only about 2%! Choreographed celebrations are in order.


Explanation

You don't actually need to know anything about Bayes' Theorem to correctly answer the above question, though you might end up stepping through the theorem without knowing it. Here's one way to proceed.

Pick a sample of 1000 people from the general population. On average, only 1 of these people will actually have the disease. The vast majority, 999 out of 1000, will be healthy. Our initial sample thus consists of 999 healthy people and 1 sick person. Now, test them all.

Our test is 99% sensitive. That means that when the one diseased guy in our sample gets tested, he'll be correctly identified as sick 99 times out of 100. Very rarely, 1 time in 100, the test will mess up and give him a negative result.

The specificity of 95% means that most healthy people will test negative, as they should. 95% of the initial 999 healthy people, or 949.05 of them on average, will correctly be told that they're disease-free. However, the remaining 49.95 healthy people will erroneously receive positive test results, even though they're fine.

Therefore, by testing each of our starting 1000 people, we'd find an average of 0.99 correct positive diagnoses and 49.95 incorrect positive diagnoses, giving 50.94 positive diagnoses in total. Rounding off the numbers, it's obvious that about 51 people in our initial 1000 will be freaked out by positive test results. However, only one of these people will actually have the disease.

If you test positive, you could be any one of those 51 people, so try not to panic: the chance that you're the one person who actually has dancing plague is 1/51, or 1.96%.


Final Remarks

What was that about Bayes' Theorem helping to update your beliefs? "Belief" refers to one possible way to interpret what it means for a random outcome to have some numerically determined chance of occurring. In the above disease example, it's sensible to think of the chance that someone is ill as a measure of how firmly you believe that they're ill.

If you randomly chose one person from the general population and didn't test them, you'd be pretty skeptical that they're ill, since the disease is so rare. The chance that you picked someone with the disease is 1/1000. Running the test then gives you new information -- specifically, the test outcome. That outcome is sometimes wrong, but you can still use the new information to update your prior belief that the person has the disease. If the person tests positive, your belief just jumped from a prior value of 1/1000 to a "posterior" value of 1/51, a 20-fold increase.

Cliffhanger Ending

In a future post, we'll derive Bayes' Theorem and show how it applies to this and other problems. Until next time!

EDIT: Part 2 is here.

Tuesday, January 7, 2014

Typesetting Equations in Web Pages with MathJax

One blog feature that should have influenced my choice of blogging platform is the ability to typeset equations. In this imperfect world, I didn't think that far ahead, so I chose Blogger out of familiarity with Google's ways, and for the convenience of having created an account in 2011 that's lain dormant until recently. Fortunately, the equation-writing solution that I test in this post should work for just about any web document out there.

I'm sporadically active on the Astronomy Picture of the Day (APOD) forums, where discussion occasionally spirals into the astrophysical realm and waxes technical. That's where I first encountered the need to typeset equations online. I resorted to a quick and dirty solution:
  1. Decide whether laziness trumps steps 2-4.
  2. Use a LaTeX-to-image converter such as LaTeXiT to generate equation images.
  3. Upload images to university webspace.
  4. Embed images inline with text.
LaTeX is a mathematical markup language that's as common and useful in research circles as actual latex is in meatspace. There's a lot of general information out there about LaTeX, so here, I'll just focus on getting it to work in Blogger posts. At the risk of overstretching the bizarre dating/relationship theme that seems to pervade everything I write, introducing LaTeX on the second date post demonstrates the unrealistic optimism that it'll actually come in handy so the user won't have to. Okay, fun's over: it's pronounced LAH-tek or LAY-tek, probably in reference to the last name of LaTeX's creator, Leslie Lamport. (But how is that pronounced...?)

A cursory web search identified MathJax as the most magic-like way to type posts in LaTeX and see equations across all major browsers. I'd heard of MathML before, but since I'm so used to LaTeX already, MathJax looked way too easy to pass up, and it is. All the end user needs to do is to include the MathJax JavaScript library in their HTML via the following snippet [per official instructions]:

<script type="text/javascript"
  src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

(That's the secure snippet, which imports the script over HTTPS. An HTTP variant is provided through the above official instructions link for those who'd rather flirt with men in the middle.)

After embedding the above script in your HTML, everything should be hunky dory. Hack up some LaTeX in your HTML view; the default math delimiters for inline and paragraph equations are $$...$$ and \[...\]. So, this string:

\[ \nabla \times \mathbf E = - \frac{\partial \mathbf B}{\partial t} \] 

gives you this:
\[ \nabla \times \mathbf E = - \frac{\partial \mathbf B}{\partial t} \]
Huge thanks go out to the MathJax team, and to you, dear reader, for reading all the way to the end of this healthcare.gov of a post. Until next time!

tl;dr: including a MathJax script in your web page lets you type LaTeX directly in HTML.

Thursday, January 2, 2014

Hello, world!

Thought I'd try out this newfangled "web log" fad that's been sweeping the BBSes! Wow, I hope my second sentence isn't that sarcastic.

My primary aim for this blog is to consolidate and share ideas that don't seem to fit elsewhere (i.e., anywhere), and to keep a publicly accessible electronic record of selected long-term projects. In other words, I hope that using this blog as an open lab notebook will force myself to maintain guilt-fueled progress on stuff I write about by fabricating a pervasive sense of accountability to a merciless multitude of silent and/or imaginary readers. Updating frequently enough should also give me some much-needed practice at putting together word series that look good and then doing this again multiple times per minute with different words.

Coming soon: geographic profiling of serial homicide! Statistical crime analysis! A personalized, curse-laden introduction to the Google Maps API! Blood flow in arteries! Sundry language quirks! Comments from spammers, probably! Less sensationalism.

Hoping that signing off like this ends up looking perfectly natural and savvy,
Peter