<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Probability Theory: The Logic of Science</title>
	<atom:link href="http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/</link>
	<description>topics: functional programming, concurrency, web-development, REST, dynamic languages</description>
	<lastBuildDate>Sun, 04 Jul 2010 07:43:34 -0700</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Math World &#124; Higher-Order » Blog Archive » Probability Theory: The Logic of Science</title>
		<link>http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/comment-page-1/#comment-1368</link>
		<dc:creator>Math World &#124; Higher-Order » Blog Archive » Probability Theory: The Logic of Science</dc:creator>
		<pubDate>Wed, 16 Sep 2009 09:50:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.higher-order.net/?p=85#comment-1368</guid>
		<description>[...] Excerpt from:  Higher-Order » Blog Archive » Probability Theory: The Logic of Science [...]</description>
		<content:encoded><![CDATA[<p>[...] Excerpt from:  Higher-Order » Blog Archive » Probability Theory: The Logic of Science [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silkop</title>
		<link>http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/comment-page-1/#comment-28</link>
		<dc:creator>silkop</dc:creator>
		<pubDate>Sat, 30 Aug 2008 13:05:51 +0000</pubDate>
		<guid isPermaLink="false">http://blog.higher-order.net/?p=85#comment-28</guid>
		<description>I suppose that this &quot;realized in the greatest number of ways&quot; remark is what I am nit-picky about. IIRC, Jaynes elsewhere chides the &quot;orthodoxians&quot; for considering not the data at hand, but rather &quot;what could have been, but is not&quot;. However, in order to justify maximum entropy, he seems to implicitly rely on a very similar approach:

First, consider all the possible &quot;worlds&quot; that agree with the constraints but are equally likely based on indifference. In each such world a particular frequency distribution is &quot;realized&quot;. Then, examine which distributions are going to come up most often if you keep drawing randomly from the bag of worlds; this of course is a basic problem solved by the multinomial distribution.

If my remarks are not clear, think about his broken windows example. N windows have been broken into an integer number of pieces and all that we know is the average number of pieces (seems like a rather strange situation to me, but who am I to criticize textbook examples). If we assume some upper integer limit on the number of pieces per window, we can easily imagine a concrete world in which the &quot;first&quot; window was broken into p_1 pieces, the &quot;second&quot; window into p_2 pieces and so on until p_n. Now, if we enumerate all the possible worlds (and there&#039;s a finite number of them, based on our assumptions about the number of windows and pieces), some of them will agree with the average number of pieces constraint, most will not. Then, we conceptually put these matching worlds into a &quot;bag&quot;, sample from this bag and examine the relative frequency of each number of pieces in each drawn world. What the maxent principle says is that an overwhelming number of draws from the bag will have the relative frequencies very close to most other draws, and that the most frequent frequency distribution can be calculated by maximizing entropy (why this correspondence holds is not explained very well in the book, I find).

Why are we willing to accept the maxent frequency distribution, which is after all based on a thought up generative sampling model? So far, the only good answer I understand is that other distributions would have to be also based on thought up generative sampling models - ones that are even more ridiculous (arbitrary) than the maxent one. Sometimes I wonder if it is the only answer.

As for CS people not knowing about Jaynes: I think it is &quot;Jaynes&#039;s fault&quot; - he assumes that his reader has a working knowledge of calculus (and often also &quot;orthodox&quot; statistics and history) to follow his reasoning. This may be true for physicists, but certainly isn&#039;t true for CS students. The funny thing is Jaynes has inspired me to improve my maths education. There&#039;s something magnetic in the way he explains stuff and deals with critics.</description>
		<content:encoded><![CDATA[<p>I suppose that this &#8220;realized in the greatest number of ways&#8221; remark is what I am nit-picky about. IIRC, Jaynes elsewhere chides the &#8220;orthodoxians&#8221; for considering not the data at hand, but rather &#8220;what could have been, but is not&#8221;. However, in order to justify maximum entropy, he seems to implicitly rely on a very similar approach:</p>
<p>First, consider all the possible &#8220;worlds&#8221; that agree with the constraints but are equally likely based on indifference. In each such world a particular frequency distribution is &#8220;realized&#8221;. Then, examine which distributions are going to come up most often if you keep drawing randomly from the bag of worlds; this of course is a basic problem solved by the multinomial distribution.</p>
<p>If my remarks are not clear, think about his broken windows example. N windows have been broken into an integer number of pieces and all that we know is the average number of pieces (seems like a rather strange situation to me, but who am I to criticize textbook examples). If we assume some upper integer limit on the number of pieces per window, we can easily imagine a concrete world in which the &#8220;first&#8221; window was broken into p_1 pieces, the &#8220;second&#8221; window into p_2 pieces and so on until p_n. Now, if we enumerate all the possible worlds (and there&#8217;s a finite number of them, based on our assumptions about the number of windows and pieces), some of them will agree with the average number of pieces constraint, most will not. Then, we conceptually put these matching worlds into a &#8220;bag&#8221;, sample from this bag and examine the relative frequency of each number of pieces in each drawn world. What the maxent principle says is that an overwhelming number of draws from the bag will have the relative frequencies very close to most other draws, and that the most frequent frequency distribution can be calculated by maximizing entropy (why this correspondence holds is not explained very well in the book, I find).</p>
<p>Why are we willing to accept the maxent frequency distribution, which is after all based on a thought up generative sampling model? So far, the only good answer I understand is that other distributions would have to be also based on thought up generative sampling models &#8211; ones that are even more ridiculous (arbitrary) than the maxent one. Sometimes I wonder if it is the only answer.</p>
<p>As for CS people not knowing about Jaynes: I think it is &#8220;Jaynes&#8217;s fault&#8221; &#8211; he assumes that his reader has a working knowledge of calculus (and often also &#8220;orthodox&#8221; statistics and history) to follow his reasoning. This may be true for physicists, but certainly isn&#8217;t true for CS students. The funny thing is Jaynes has inspired me to improve my maths education. There&#8217;s something magnetic in the way he explains stuff and deals with critics.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/comment-page-1/#comment-20</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Thu, 21 Aug 2008 19:34:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.higher-order.net/?p=85#comment-20</guid>
		<description>Hello silkop.

Good to hear from you; an interesting response! I wasn&#039;t really expecting much activity on this thread, since this blog is centered around computer science, and if you are a computer scientist, it is quite unlikely that you have come by Jaynes&#039; book... I myself came by it by pure coincidence.

But you have! Great; and thanks for the reference to Cox&#039; original work, I will definitely read that when I get the time and opportunity ;-)

Regarding your problem with MaxEnt, I think Jaynes gives a really satisfactory explanation on this. One can think of the &quot;frequentist thing&quot; as a degenerate special case of proper Bayesian reasoning in the case where the prior information says nothing, i.e., when we use the principle of insufficient reasoning. Now, in the case of MaxEnt we have actual prior information, say in the form of average values that the solution must satisfy. Frequentist theory (conventional statistics) cannot make use of this prior information, but MaxEnt can: intuitively, one gets a prior which is as uniform as possible while respecting the given constraints. So the frequency correspondence is not a bad thing it is good; it is what makes MaxEnt as noncommittal as possible. 


If you read section 11.8, page 365 in the 2003 edition, there is an interesting exposition on &#039;frequency correspondence&#039;: &quot;...the probability distribution which maximises entropy is numerically identitical with the &lt;i&gt;frequency&lt;/i&gt; distribution which can be realized in the greatest number of ways (which is vastly greater than it&#039;s competitors).</description>
		<content:encoded><![CDATA[<p>Hello silkop.</p>
<p>Good to hear from you; an interesting response! I wasn&#8217;t really expecting much activity on this thread, since this blog is centered around computer science, and if you are a computer scientist, it is quite unlikely that you have come by Jaynes&#8217; book&#8230; I myself came by it by pure coincidence.</p>
<p>But you have! Great; and thanks for the reference to Cox&#8217; original work, I will definitely read that when I get the time and opportunity <img src='http://blog.higher-order.net/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Regarding your problem with MaxEnt, I think Jaynes gives a really satisfactory explanation on this. One can think of the &#8220;frequentist thing&#8221; as a degenerate special case of proper Bayesian reasoning in the case where the prior information says nothing, i.e., when we use the principle of insufficient reasoning. Now, in the case of MaxEnt we have actual prior information, say in the form of average values that the solution must satisfy. Frequentist theory (conventional statistics) cannot make use of this prior information, but MaxEnt can: intuitively, one gets a prior which is as uniform as possible while respecting the given constraints. So the frequency correspondence is not a bad thing it is good; it is what makes MaxEnt as noncommittal as possible. </p>
<p>If you read section 11.8, page 365 in the 2003 edition, there is an interesting exposition on &#8216;frequency correspondence&#8217;: &#8220;&#8230;the probability distribution which maximises entropy is numerically identitical with the <i>frequency</i> distribution which can be realized in the greatest number of ways (which is vastly greater than it&#8217;s competitors).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silkop</title>
		<link>http://blog.higher-order.net/2008/08/18/probability-theory-the-logic-of-science/comment-page-1/#comment-18</link>
		<dc:creator>silkop</dc:creator>
		<pubDate>Wed, 20 Aug 2008 18:18:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.higher-order.net/?p=85#comment-18</guid>
		<description>I agree wholeheartedly on all points! However, Jaynes&#039; book is dangerous before you get your PhD because you start recognizing bs all around you, like in every second CS paper where the word &quot;probability&quot; comes up!

If you haven&#039;t read Cox&#039;s original exposition, it is worth a trip to library. His little &quot;algebra of probable inference&quot; book is short and neat, maybe even easier to follow than Jaynes.

Anyway, I see one problem with the explanatory approach taken by Cox/Jaynes (which I started transcribing for the average Joe non-mathematician in my blog). Their discussion of the probability rules and their uniqueness in context of logical propositions is nice and dandy, but it leaves a lingering question of where these proposition sets are supposed to come from in practice. When Jaynes explains his maximum entropy principle, it looks very much like he is doing a &quot;frequentist&quot; thing after all to arrive at the atomic probabilities. Basically he&#039;s counting possibilities and weighing more complex propositions by counting the number of atomic propositions that imply them and weighing this against other complex propositions.</description>
		<content:encoded><![CDATA[<p>I agree wholeheartedly on all points! However, Jaynes&#8217; book is dangerous before you get your PhD because you start recognizing bs all around you, like in every second CS paper where the word &#8220;probability&#8221; comes up!</p>
<p>If you haven&#8217;t read Cox&#8217;s original exposition, it is worth a trip to library. His little &#8220;algebra of probable inference&#8221; book is short and neat, maybe even easier to follow than Jaynes.</p>
<p>Anyway, I see one problem with the explanatory approach taken by Cox/Jaynes (which I started transcribing for the average Joe non-mathematician in my blog). Their discussion of the probability rules and their uniqueness in context of logical propositions is nice and dandy, but it leaves a lingering question of where these proposition sets are supposed to come from in practice. When Jaynes explains his maximum entropy principle, it looks very much like he is doing a &#8220;frequentist&#8221; thing after all to arrive at the atomic probabilities. Basically he&#8217;s counting possibilities and weighing more complex propositions by counting the number of atomic propositions that imply them and weighing this against other complex propositions.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
