Chris C. - 2018-06-13 17:11:02

Wed, 13 Jun 2018 17:11:02 GMT

NLTokenizer in 12 beta/10.14 beta looks to be really useful for something like this.

https://developer.apple.com/documentation/naturallanguage/nltokenizer?changes=latest_minor

Noah - 2018-04-30 19:04:39

Mon, 30 Apr 2018 19:04:39 GMT

Could a method like this also be used for forensics - determining if a given body of text was written by a suspect?

For example, if you had a large body of writing from User_123 and used it to populate the probabilities of a Markov chain, you would expect that another large body of writing from the same user would generate similar probabilities. So given a body of writing from an unknown user, you could compare the probabilities and determine whether or not User_123 wrote both. Although I imagine this approach would have quite large error bars - and would be more error prone as you dealt with smaller and smaller bodies of writing.

I'm also guessing the accuracy of the above approach would be affected by the topic covered by the body of writing - for instance I'm guessing the Markov chain generated from your blog would be very different from one generated from your text messages to family members - probably fewer references to ARC and APIs.

In any case - very interesting write up!

mikeash.com pyblog/friday-qa-2018-04-27-generating-text-with-markov-chains-in-swift.html comments

Chris C. - 2018-06-13 17:11:02

Noah - 2018-04-30 19:04:39