<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>mikeash.com pyblog/friday-qa-2018-04-27-generating-text-with-markov-chains-in-swift.html comments</title><link>http://www.mikeash.com/?page=pyblog/friday-qa-2018-04-27-generating-text-with-markov-chains-in-swift.html#comments</link><description>mikeash.com Recent Comments</description><lastBuildDate>Sat, 06 Jun 2026 20:00:41 GMT</lastBuildDate><generator>PyRSS2Gen-1.0.0</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Chris C. - 2018-06-13 17:11:02</title><link>http://www.mikeash.com/?page=pyblog/friday-qa-2018-04-27-generating-text-with-markov-chains-in-swift.html#comments</link><description>NLTokenizer in 12 beta/10.14 beta looks to be really useful for something like this.
&lt;br /&gt;
&lt;br /&gt;&lt;a href="https://developer.apple.com/documentation/naturallanguage/nltokenizer?changes=latest_minor"&gt;https://developer.apple.com/documentation/naturallanguage/nltokenizer?changes=latest_minor&lt;/a&gt;</description><guid isPermaLink="true">53695090144d50676dab9922aa4f9b9e</guid><pubDate>Wed, 13 Jun 2018 17:11:02 GMT</pubDate></item><item><title>Noah - 2018-04-30 19:04:39</title><link>http://www.mikeash.com/?page=pyblog/friday-qa-2018-04-27-generating-text-with-markov-chains-in-swift.html#comments</link><description>Could a method like this also be used for forensics - determining if a given body of text was written by a suspect?
&lt;br /&gt;
&lt;br /&gt;For example, if you had a large body of writing from User_123 and used it to populate the probabilities of a Markov chain, you would expect that another large body of writing from the same user would generate similar probabilities. So given a body of writing from an unknown user, you could compare the probabilities and determine whether or not User_123 wrote both. Although I imagine this approach would have quite large error bars - and would be more error prone as you dealt with smaller and smaller bodies of writing.
&lt;br /&gt;
&lt;br /&gt;I'm also guessing the accuracy of the above approach would be affected by the topic covered by the body of writing - for instance I'm guessing the Markov chain generated from your blog would be very different from one generated from your text messages to family members - probably fewer references to ARC and APIs.
&lt;br /&gt;
&lt;br /&gt;In any case - very interesting write up!</description><guid isPermaLink="true">5646d1bc15083be4be8906594e4854ad</guid><pubDate>Mon, 30 Apr 2018 19:04:39 GMT</pubDate></item></channel></rss>
