Jekyll2018-10-12T15:11:12+00:00https://proinsias.github.io/pages/proinsias/An independent mind…My home on the web.Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Are categorical variables getting lost in your random forests?2018-07-12T15:35:00+00:002018-07-12T15:35:00+00:00https://proinsias.github.io/pages/proinsias/Are-categorical-variables-getting-lost-in-your-random-forests<p>I’ve been one-hot encoding categorical variables for as long as I have been using
sci-kit learn.
It turns out that you can lose a lot of predictive power this way,
and that alternatives do exist.</p>
<blockquote>
<p>Decision tree models can handle categorical variables without one-hot encoding them.
However, popular implementations of decision trees (and random forests) differ
as to whether they honor this fact.
We show that one-hot encoding can seriously degrade tree-model performance.
Our primary comparison is between H2O (which honors categorical variables)
and scikit-learn (which requires them to be one-hot encoded).</p>
</blockquote>
<p>(via <a href="https://roamanalytics.com/2016/10/28/are-categorical-variables-getting-lost-in-your-random-forests/">Are categorical variables getting lost in your random forests?</a>
)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/I’ve been one-hot encoding categorical variables for as long as I have been using sci-kit learn. It turns out that you can lose a lot of predictive power this way, and that alternatives do exist.Using multiple worktrees with git2017-07-12T12:06:00+00:002017-07-12T12:06:00+00:00https://proinsias.github.io/pages/proinsias/Using-multiple-worktrees-with-git<blockquote>
<p>When working with multiple branches at the same time, people clone the whole
git repository again.</p>
</blockquote>
<p>I am one of these people. I have been using <code class="highlighter-rouge">git</code> for years, and I can’t believe
I’ve not known about <code class="highlighter-rouge">git worktree</code>! – this makes it easy to work on various
braches in the same repository without having to clone a new copy of the repo.</p>
<p>(via <a href="https://stacktoheap.com/blog/2016/01/19/using-multiple-worktrees-with-git/">Using multiple worktrees with git</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/When working with multiple branches at the same time, people clone the whole git repository again.Gathering weak npm credentials2017-07-06T10:41:00+00:002017-07-06T10:41:00+00:00https://proinsias.github.io/pages/proinsias/Gathering-weak-npm-credentials<p>We <em>all</em> know the importance of strong passwords, don’t we?</p>
<p>In case you don’t, here’s an
<a href="https://github.com/ChALkeR/notes/blob/master/Gathering-weak-npm-credentials.md">example</a>
of how a security researcher was able to
obtain direct publish access to <em>14%</em> of npm packages through some fairly
basic techniques that take advantage of poor password practices.</p>
<p>(via <a href="https://github.com/ChALkeR/">ChALkeR</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/We all know the importance of strong passwords, don’t we?Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes2017-05-25T15:51:00+00:002017-05-25T15:51:00+00:00https://proinsias.github.io/pages/proinsias/CCA-for-Analyzing-Sequences-of-Medical-Billing-Codes<p>Due to the vast number of medical billing codes, it is generally infeasible to
generate machine learning features from them as one-hot vectors.
<a href="https://arxiv.org/abs/1612.00516">This paper</a> discusses the use of
<a href="https://en.wikipedia.org/wiki/Canonical_correlation">Canonical Correlation Analysis</a>
to reduce this dimensionality and capture the inherent relationships that exist
between the codes.</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Due to the vast number of medical billing codes, it is generally infeasible to generate machine learning features from them as one-hot vectors. This paper discusses the use of Canonical Correlation Analysis to reduce this dimensionality and capture the inherent relationships that exist between the codes.Explaining complex machine learning models with LIME2017-05-01T10:48:00+00:002017-05-01T10:48:00+00:00https://proinsias.github.io/pages/proinsias/Explaining-complex-machine-learning-models-with-LIME<p>Another nice
<a href="https://shiring.github.io/machine_learning/2017/04/23/lime">write-up</a>
on the use of Local Interpretable Model-Agnostic Explanations (LIME) to
explain complex machine learning models.</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Another nice write-up on the use of Local Interpretable Model-Agnostic Explanations (LIME) to explain complex machine learning models.Combating Fake News With a Smartphone2017-03-09T10:01:00+00:002017-03-09T10:01:00+00:00https://proinsias.github.io/pages/proinsias/Combating-Fake-News-With-a-Smartphone<blockquote>
<p>Automatically [add] extra digital proof data to all photos and videos you take.</p>
</blockquote>
<p>Pretty cool!</p>
<p>(via the <a href="https://guardianproject.info/2017/02/24/combating-fake-news-with-a-smartphone-proof-mode/">Guardian Project</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Automatically [add] extra digital proof data to all photos and videos you take.SHAttered2017-02-23T10:43:00+00:002017-02-23T10:43:00+00:00https://proinsias.github.io/pages/proinsias/SHAttered<p>Awesome work to demonstrate how to deliberately cause a SHA-1 collision.</p>
<blockquote>
<p>It is now practically possible to craft two colliding PDF files and obtain a
SHA-1 digital signature on the first PDF file which can also be abused as a
valid signature on the second PDF file.</p>
</blockquote>
<p>(via <a href="https://shattered.it/">SHAttered</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Awesome work to demonstrate how to deliberately cause a SHA-1 collision.Unlearning descriptive statistics2017-02-10T16:54:00+00:002017-02-10T16:54:00+00:00https://proinsias.github.io/pages/proinsias/Unlearning-descriptive-statistics<p>Top tips on better descriptive statistics:</p>
<ul>
<li>Instead of the mean, use the median and/or the mode.</li>
<li>Instead of the standard deviation, use the mean absolute deviation, the median absolute deviation, or the interquartile range.</li>
<li>Instead of z-scores, use percentile ranks.</li>
<li>Instead of skewness, use a QQ-plot or a histogram.</li>
<li>Instead of x standard deviations from the mean, use x median absolute deviations from the median.</li>
<li>Instead of correlation metrics, just use a scatterplot.</li>
</ul>
<p>(via <a href="http://debrouwere.org/2017/02/01/unlearning-descriptive-statistics/">Unlearning descriptive statistics</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/Top tips on better descriptive statistics:The legends of mathematics that almost never were2017-02-07T14:06:00+00:002017-02-07T14:06:00+00:00https://proinsias.github.io/pages/proinsias/The-legends-of-mathematics-that-almost-never-were<p>It’s always made me sad when people tell me they dislike mathematics.
I always wonder if it was the subject or their teachers that they disliked…</p>
<blockquote>
<p>Mathematical genius resides within every one of us. Most people just don’t know it yet. That’s because genius is fragile. If you don’t embraced genius and tend to with care, it will slip away, leaving behind just a subdued vision of the mathematicians we could have become.</p>
</blockquote>
<p>(via <a href="https://medium.freecodecamp.com/mathematical-genius-is-fragile-society-needs-to-stop-destroying-it-5fdf3f08336e#.o72a1bds9">Mathematical genius is fragile. We need to stop destroying it.</a>)</p>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/It’s always made me sad when people tell me they dislike mathematics. I always wonder if it was the subject or their teachers that they disliked…The Truth About Bad Science2017-01-26T10:20:00+00:002017-01-26T10:20:00+00:00https://proinsias.github.io/pages/proinsias/The-Truth-About-Bad-Science<p><a href="http://bit.ly/2kwWpYO">This wired.com article</a>
on ‘bad science’ speaks to me on so many levels.
It hurts my soul to see how many published studies are not reproducible.</p>
<blockquote>
<p>“A new study shows…” are “the four most dangerous words”.</p>
</blockquote>Francis T. O'Donovanfrancis.odonovan@gmail.comhttps://proinsias.github.io/This wired.com article on ‘bad science’ speaks to me on so many levels. It hurts my soul to see how many published studies are not reproducible.