Pandas: Display DataFrames side by side
Using html in a Jupyter Notebook
Pandas: Display DataFrames side by side
Using html in a Jupyter Notebook
Pandas: Transforming two DataFrame columns into a dictionary
Using the zip command
How to Tag Docker Images with Git Commit Information
Link versions of a Docker image back to Git commits
Copy a file with progress and save hash to a different file
Using pv, tee, and sha256sum
Using a single sudo to run multiple && arguments
There’s more than one way to skin a cat
Python: Best way to implement a simple queue
Implement queues using the collections.deque module
Git: Show commits in one branch but not another
Using git log
Jq: Getting all the values of an array
Different approaches to extract the information you want
Git: Create an empty commit
Using the allow-empty flag
Python: repr vs str dunder methods
Use repr() for Programmers vs str() for Users
Time downloads using sample files of various sizes
Using wget and time
Put a Christmas tree in your terminal using ctree
Celebration time!
Some useful command line tools
sponge and pee
Merge multiple JPEGs
Stack JPEGs vertically
Force python class to accept only certain attributes
Use the slots attribute to limit the attributes of the class
How to display bash history without line numbers
Useful for direct copy & pasting of commands
How to find broken symlinks
Find stale symlinks optimally
Formatting text paragraphs in python
The textwrap module to the rescue!
You Should Be Using Python’s Walrus Operator
How and why to automatically use the walrus operator
Cherry Picking a Range of Commits with Git
Git’s cherry-picking syntax is easy to mess up
Show all Docker build output
Ensure all layer commands are visible
Preserve environment when using sudo
Preserve environment when using sudo
How to put backslash escape sequence into an f-string
Putting a backslash escape sequence into an f-string
Python: Implement a custom context manager
How to create your own ‘with’
Matplotlib: Determine which backend is in use
Quick one-liner to identify the backend
Shell: Identify process using a given port number
Quickly get the information you need to free a used port
Shell: Create a file backup with modification date as suffix
Useful prior to changing a configuration file
Shell: Dedup PATH variable
Perl to the rescue!
Python: Comment regular expressions
Use re.VERBOSE to your advantage
Git: Using 1password, gpg and git for seamless commit signing on linux
Make your tools work for you!
Python: The divmod() function
Get the quotient and remainder for a fraction
Jupyter: Trust notebook from command line
Speed up your usage of notebooks
Docker: Verify docker-compose configuration
Validate your docker-compose.yml files
SQL: Select a random sample
Using ordering and limit functions
SQL: Calculate percentage of column
Using a cross join or a subselect query
Pandas: Split string column into separate columns
Using the expand option of pd.Series.str.split
Nix: Streaming gsutil transfers
Avoid landing data unnecessarily
Nix: Remove backspaces and tabs with col
Tidy up text output of various commands
Git: Clone a specific branch
Speed up your git clones
Python: Get a notification via knockknock
Add notifications to your scripts or python code
Pandas: Some notes on groupby
Groupby expert level
Nix: Make a noise!
Get a notification from your script
Website: How to create drafts in Jekyll
Save those valuable thoughts for later
Python: Use underscores as visual separators for numbers
Aid readability of numbers
Nix: Echo and sudo for files owned by root
How to modify a file owned by root
Pandas: Three new functions
Tips from Towards Data Science
Pandas: Extra pd.options.display options
More control over display of your pandas objects
Nix: Sponge soaks up standard input and writes it to a file
Easily construct pipelines that read from and write to the same file
Git: Sort branches by recency
A more useful default behavior
Git: Dynamic identity
A great example of when to use the gitconfig includeif directive
Bash: Run entire shell script as root
Avoid prepending commands with sudo
Python: Optimize logging of expensive debugging operations
Only log if threshold is set appropriately
Python: Fix issues using PyCharm and Docker
Two troubleshooting tips
Pandas: Simplify filter expressions with between
Create more readable code
Pandas: Regular expressions with str.contains
How to enable regex flags
MacOS: How to validate your FileVault Recovery Key
Do a dry run from the Terminal
Nix: Split a file by line number
Extra options for output file names
Nix/Macos: See what a process is doing
Understand why that process is taking so long
Bash: Get source directory of a script within the script
Works no matter where the script is being called from
Nix: Convert reStructuredText to Markdown
Another use for pandoc
Mac: Logout a user from the command line
Skip the UI entirely
Bash: Escaping strings easily
Quick tip using history expansion
Python: The SimpleNamespace Utility Class
Easily add and remove attributes
Python: Get the most of floats
Additional methods useful in various scenarios
Python: Format text paragraphs with textwrap
Wrap and fill lines
Python: Random string generation with digits and upper-case letters
Using random.SystemRandom
Nix: Find the total size of files within a directory
du command to the rescue!
Nix: Extract file extension from file name
Using variable manipulation
ICS: Providing a ‘subscribe in Google Calendar’ link for a feed
Show Google Calendar interface even with a custom feed
Python: Upgrading Homebrew packages using pip
Deep dive into homebrew python environments
GitHub Actions: Skipping a step without failing
The continue-on-error option
Nix: Stty - sane terminal settings
Fix garbled shell session
Nix: Mkfile - Create an empty file
Of any size!
Nix: Fs_usage - View File System Usage
As a continuous stream
Mac: Restart or shutdown
Using reboot and poweroff
Mac: Gatekeeper via the command line
Manage exceptions
Mac: Date and Time
From the command line
Mac: Ask user for password via GUI
Using Applescript
Python: Salted Hash
The hashlib module
Pandas: Create a DataFrame via the Clipboard
The read_clipboard function.
Pandas: Make a Data Frame with random floats
The aptly-named makeDataFrame function
Pandas: Combine Functions
The combine and combine_first functions
Sklearn: Tree diagram
Via the plot_tree command
Sklearn: Pipeline diagram
Visualize your sklearn pipelines in Jupyter
Sklearn: Column transformations
Specify which columns to apply the most appropriate preprocessing
Pandas: Pipe function
Create a method chain in pandas
Mac: Use Touch ID to Authenticate Sudo Commands
Stop typing your password so many times
Pandas: Named Aggregation
Simplify your groupbys
Mac: Remove quarantine flag from app
Using the xattr command
Mac: Software Update from the Command Line
Via the softwareupdate command
Nix: List files and group by extension
Using the X option of the ls command
Science: Light v Sound
A factor of almost exactly a million
Bitbucket/GitHub: Create pull request from command line
Via hub command or handy url
Git: Blocked ssh port
Fix this using your .ssh/config file
Git: Stash tips
Annotate or summarize your stashes, and more
MySql: Show full query in process list
Using the show full processlist command
TSQL: Basic T-SQL
Some quick one liners
SQL: List Columns in Table
The ANSI standard
Docker: Filtered system prune
The filter option
Spark: Date Arithmetic with Multiple Columns
Using the expr function
Spark: Count number of duplicate rows
Using the groupby function
Mac: Sync with time server (Mojave)
Using the sntp command
Docker: Set Timezone
Using the zoneinfo command
Docker: Start Container as Root User
Using the -u option
Git: Work on multiple branches simultaneously
Using the git worktree command
Git: Untrack A File Without Deleting It
Using the cached flag
Git: Stashing Untracked Files
Using the untracked flag
Git: Snapshot
Using the git stash save command
Git: Stashing Only Unstaged Changes
Using the k flag
Git: Staging Stashes Interactively
A partial stash
Git: Two ways of squashing commits
Use git merge to squash large number of commits
Git: Specify the ssh key to use
Via various configuration options
Git: Show The diffstat Summary Of A Commit
Using the git show –stat command
Git: Show All Commits For A File Beyond Renaming
Using the follow flag
Git: Interactively Unstage Changes
Using the git reset –patch command
Git: Undo a Git Mistake
Using the git reflog command
Git: Resetting A Reset
Using the git reflog command
Git: Accessing A Lost Commit
Using the git reflog command
Git: Reference A Commit Via Commit Message Pattern Matching
Using the git show command
Git: Split Up a Commit, Rewrite History
Via an interactive rebasing
Git: Git Log With Authors
The author flag
Git: Git Log since
Summarizing your day’s work
Git: List Filenames Without The Diffs
The name-only flag
Git: List Different Commits Between Two Branches
The left-right option
Git: Last Commit A File Appeared In
Using the git log command
Git: Ignore Changes To A Tracked File
Using the git update-index command
Git: List Most Git Commands
Using the git help command
Git: Determine The Hash Id For A Blob
Using the git hash-object command
Git: Grep Over Commit Messages
Using the git log command
Git: LFS Track
Adding a new type of large file to your repository
Git: LFS Pull
Addressing git checkout fails
Git: LFS Prune
Tidying up your local git LFS cache
Git: Migrate LFS hosting provider
Using the git lfs command
Git: Intro to Large File Storage (LFS)
An extension of git
Git: LFS Fetch
Extra configuration
Git: LFS Clone
How LFS integrates with git
Git: Clean Up Old Remote Tracking References
Using git fetch origin –prune
Git: List All Files Changed Between Two Branches
Using git diff –name-only master
Git: Delete Remote Git Tags
Via git push
Git: Configure to Use Single Key Presses in Interactive Mode
The interactive.singlekey option
Git: Change default editor for git
The core.editor option
Git: Diffing With Patience
Using the diff.algorithm option
Git: Amend Author Of Previous Commit
With the amend author options
Git: Delete All Untracked Files
Via git clean
Git: Checkout Previous Branch
Using the hyphen shorthand
Git: Checkout Old Version Of A File
Using the appropriate hash
Git: Grab A Single File From A Stash
Using the checkout command
Git: Use a file from another branch
Using git checkout
Git: Clean Out All Local Branches
A handy one-liner
Git: Intent To Add
Using git add -N
Matplotlib: Save plot to file
Using the savefig function
Docker: ‘Could not find’ with network prune
Free up the address pool
Sklearn: Fix No Space Left on Device Error
Via the JOBLIB_TEMP_FOLDER environment variable
Docker: Attach/Detach
Interacting with containers
AWS CLI: Modify volume size
Using awscli
Travis: Skip unnecessary builds
Via the “ci skip” suffix
Matplotlib: Use LaTeX
Via the ‘text.usetex’ parameter
Jupyter: Show contents of external script
Using the pycat magic command
Jupyter: Share variables
Via the store_var magic command
Jupyter: Set environment variables
Via the env magic command
Jupyter: Get the output of all variables
Via the InteractiveShell.ast_node_interactivity configuration variable
Snowflake: Create table with comment
Be kind to future you, … and other developers
Pandas: Write DataFrame to table with to_sql
Via sqlalchemy
Pandas: sort_index
Sort a dataframe by index
Pandas: Use dtype to speed up reading with read_csv
Avoid inferring data types
Pandas: Use chunksize to iterate through files with read_csv
Handle files too large for memory
Pandas: Extract days from a timedelta64
Via the components property
Pandas: Inverse of boolean
Using the tilde operator
Pandas: Accessing additional parameters for a plot
Via the plot axes.
Pandas: Options
The get_option, reset_option, and set_option functions.
Pandas: Option Context
Temporarily change pandas options
Pandas: Speed up merges
Specify the key column of the merge as the index of your dataframes, then join instead of merge
Pandas: Get memory usage of DataFrame with info
Set the configuration option display.memory_usage
Pandas: Groupby Not As Index
Via the as_index parameter
Pandas: Iterating through groups
Accessing the name and DataFrameGroupBy
Pandas: Get rows with max value of group
Via the apply function
Pandas: DateOffset as a Frequency Increment
How to apply pandas.tseries.offsets.DateOffset
Pandas: Count number of non-NaN entries
Via the count method
Pandas: Complex aggregation expressions
Using a dictionary of aggreations
Pandas: Calculate percentile ranking relative to another column
Via the searchsorted command
CompSci: GUIDs are not strings
The string representation of a GUID should not be relevant to a program
AWS CLI: List account aliases
Using the aws iam list-account-aliases command
SQL: UNION v UNION ALL
UNION removes duplicate records, whereas UNION ALL does not
Jupyter: High-res plots
Via the InlineBackend.figure_format configuration option
Numpy: Set number of decimal places using set_printoptions
Via the precision argument
Mac: Emoji
The characters palette
Mac: Copy files intelligently with ditto
ditto is slightly more advanced but can be advantageous to cp for several reasons
Snowflake: Get_ddl statements
Generate a DDL statement that can be used to recreate the specified object
Git: Using multiple worktrees
Via the git worktree command
Slack: Accessing Direct Messages from Disabled Users
Via their Message Archives
Bash: Append to variable without creating leading colon if unset
Using an expansion operator
Emacs: Kill line from anywhere on that line
Using the shift-backspace command
LaTeX: Send a bit of LaTeX
Via the mathb.in service
Travis: Why is my build not running?
Using travis requests
Snowflake: Increment date and time values with dateadd
Some datetime examples
Matplotlib: Add title to collection of plots
Via the matplotlib.pyplot.suptitle command
Snowflake: Show file formats
List the file formats for which you have access privileges
Spark: Select all and add a new column
Using select
Python: Normalize text with unicodedata
Remove accents, etc.
Python: The last value printed
Using ‘_’
Python: Generate random but reproducible UUID with seed
Using the random module
Python: Time the execution of a program
Using the time module
Python: SpooledTemporaryFile
From the tempfile module
Python: Check available modules with pydoc
Find a list of all python modules installed on a machine
Python: Private variables
Follow the convention of putting two underscores at the beginning of the variable’s name
Python: Print without newline
Using the end parameter of the print function
Python: Flushing while printing
Using the flush keyword
Python: Pickle in Python2 and Python3
Control compatibility support
Python: Sort lists naturally not lexicographically with natsort
Using the natsorted function
Python: An improved tuple
The namedtuple
Python: MyPy variable annotations
New to Python 3.6
Python: Mutable default arguments
A python gotcha
Python: Lambdas as lexical closures
A function that remembers the values from the enclosing lexical scope even when the program flow is no longer in that scope
Python: Lambdas as function expressions
Defining a function inline
Python: Parallel for loops
Using joblib’s Parallel function
Python: Cache function output
Using sklearn.externals.joblib.Memory
Python: Invert a mapping
Using items()
Python: Hash a file
Using the hashlib module
Python: Function disassembler
Using the dis module
Python: Thousands Separator in Formatted Strings
Just add ‘:,’ to the format specifier
Python: Find the most common elements in an iterable
Using the collections module
Python: Specify requirements depending on python version using environment markers
For example python_version and sys.platform
Python: Difference between append and ‘+=’ for lists
An exception to the general interchangeability
Python: Deep copy a compound object
Using deepcopy
Python: Nested Comprehensions
Using two fors within a comprehension
Python: Collect garbage
Using the gc module
Python: Class inheritance
Using the issubclass command
Python: Check if string is null or empty
Using strip
Numpy: Extract days from timedelta64
Using np.timedelta64 itself
Mac: Power shortcuts
How to turn Mac on or off quickly
Spark: Calculating the length of a column with size
Using the size command
AWS CLI: Assuming a role
Using the aws config files
Matplotlib: Get current axis
Using the gca command
Matplotlib: Use logarithmic scale
Using the set_xscale and set_yscale commands
Matplotlib: Plot error as shaded region
Using the fill_between command
Matplotlib: Clearing a figure
cla v clf v close
Numpy: Return maximum of array ignoring NaNs using nanmax
Using np.nanmax
Nix: Check If A Port Is In Use
Using the lsof command
Nix: Check Ubuntu Version
Using the lsb_release command
Nix: Disk Speed Benchmark
Using the dd command
Nix: CPU Benchmark
Using the dd command
Nix: Saying Yes
Using the yes command
Nix: Max out CPU with Yes
Using the yes command
Nix: Where Are The Binaries
Using the where command
Nix: Get length of longest line with wc
Using the –max-line-length flag
Nix: Watch That Program
For example, watch ls
Nix: Monitor System Memory with vmstat
Can set a sampling period too
Nix: List contents of directories in a tree-like format
Via the tree command
Nix: Duplicate pipe content
Write to multiple files at the same time
Nix: Force cpus to run at max
Using the stress command
Nix: Sorted human readable sizes
Using the –human-numeric-sort flag with sort
Nix: Sort In Numerical Order
Using the –numeric-sort flag
Nix: See all versions of package in the archive
Using the apt-cache policy command
Nix: Search Man Page Descriptions
Using the apropos command
Nix: Search Files Specific To A Language
Using the ack command
Nix: Save matching and non matching lines with awk
Advanced awk usage
Nix: SSH With Port Forwarding
Using the -L flag
Nix: Stay connected without an interactive SSH shell
Using the -N flag
Nix: Run local scripts remotely with SSH
Via the bash command
Nix: SSH pipes
Execute half your command locally and half remotely
Nix: Killing A Frozen SSH Session
Something else to do rather than mashing the keyboard
Nix: SSH Escape Sequences
Power user commands
Nix: Reverse a String
Using the rev command
Nix: Repeat Yourself
Using the repeat command
Nix: Merge pdf files
Using the qpdf command
Nix: Monitor the progress of data through a pipe
Using the pv command
Nix: List parent pid with ps
Using the -f flag
Nix: Create A File Descriptor with Process Substitution
Make a command’s output appear to come from a file
Nix: PID Of The Current Shell
Using a special parameter of bash
Nix: Standard output to clipboard buffer
Using the pbcopy command on macOS
Nix: Open The Current Command In An Editor
Using Ctrl-x Ctrl-e
Nix: Printing with lpr
On macOS
Nix: List Of Sessions To A Machine
Using the last command
Nix: Kill Everything Running On A Certain Port
Using the lsof command
Nix: Install Packages From A Specific Repository
Using yum or apt-get
Nix: Find MAC address of network interfaces with ifconfig
Grep for the hardware address (HWaddr)
Nix: Search History
Using control-r
Nix: Pattern-matched search of your history
Quick unix tool tip
Nix: Last Argument Of The Last Command
Quick shortcut
Nix: Global Substitution On The Previous Command
Different substitution options
Nix: Hexdump A Compiled File
Using the C option
Nix: Only Show The Matches
Using the o flag
Nix: List Names Of Files With Matches
Using the l flag
Nix: Grep For Multiple Patterns
Using the e flag
Nix: Grep For Files Without A Match
Using the files-without-match flag
Nix: Command Line Length Limitations
Using getconf to retrieve standard configuration variables
Nix: Find process preventing file deletion with fuser
Lists process IDs of all processes that have one or more files open
Nix: Find Newer Files
Using the newer argument
Nix: Exclude A Directory With Find
Using the not and path arguments
Nix: Upgrading Ubuntu
Using the do-release-upgrade command
Nix: Do Not Overwrite Existing Files
Use the no-clobber option of cp
Nix: Determine The IP Address Of A Domain
Using the dig command
Nix: Curling For Headers
Using the head flag for curl
Nix: Curling With Basic Auth Credentials
Using the user flag of curl
Nix: Convert tabs to/from spaces
Using the expand/unexpand commands
Nix: Convert file contents to lower case with tr
tr is short for translate characters
Nix: Combine multiple consecutive blank lines into one
Using the squeeze-blank option
Nix: Change Default Shell For A User
Take a new shell for a spin!
Nix: Cat A File With Line Numbers
Using the number option
Emacs: List lines matching a regular expression
Using the occur command
Spark: Materializing and Unpersisting Cache
Optimize your use of Spark DataFrames
Tmux: tmux in your tmux
Use your prefix twice to access inner tmux instance
Tmux: Adjusting Window Pane Size
Using the resize-pane command
Tmux: Rename The Current Session
Using $
Tmux: Pane Killer
Using x
Tmux: Paging Up And Down
Using control-u and control-d
Tmux: Open New Window With A Specific Directory
Using new-window
Tmux: Create A New Session In A New Server
Using the -L flag
Tmux: Create A Named tmux Session
Using the new command
Tmux: List Sessions
Using the ls or list-sessions command
Tmux: List All Key Bindings
Using list-keys
Tmux: Kill The Current Session
Using kill-session
SQL: Word Count for a Column
Using array_length and regexp_split_to_array
SQL: Cleanup Databases
Using drop database
SQL: Day Of Week For A Date
Using date_part
SQL: Day Of Week By Name For A Date
Using to_char
SQL: Count Records By Type
Using group by
PSQL: Checking The Type Of A Value
Using the pg_typeof function
PSQL: Terminating A Connection
Using the pg_terminate_backend command
PSQL: List Connections To A Database
Using the pg_stat_activity table
PSQL: Sleeping
Using the pg_sleep function
PSQL: Get The Size Of An Index
Using the pg_size_pretty and pg_relation_size functions
PSQL: Restore a database
Using the pg_restore function
PSQL: Get The Size Of A Table
Using the pg_relation_size function
PSQL: Dump a database
Using pg_dump function
PSQL: Get The Size Of A Database
Using the pg_database_size function
PSQL: Change The Current Directory
The cd meta-command
PSQL: Auto Expanded Display
Intelligently display results vertically or horizontally
PSQL: Sets With The Values Command
Create single or even multiple columns of values
PSQL: Who Is The Current User
The current_user variable
PSQL: Use Argument Indexes
Stop repeating yourself
PSQL: Use unlogged tables for caches
Trade crash-safety for speed
PSQL: Types By Category
Array, Boolean, String, Numeric, Composite, etc.
PSQL: Truncate Tables With Dependents
Truncate in pairs or via a cascade
PSQL: Truncate All Rows
Use truncate rather than delete
PSQL: Turn Timing On
Via the timing command
PSQL: List All Rows In A Table
The table command
PSQL: Limit Execution Time Of Statements
Set a hard timeout
PSQL: Special Math Operators
Factorial, square root, absolute value operators
PSQL: Find The Location Of Postgres Config Files
Via show config_file
PSQL: Find The Data Directory
Via show data_directory
PSQL: Configure The Timezone
Via show/set timezone
PSQL: Set A Seed For The Random Number Generator
Allow for reproducibility
PSQL: Send A Command To psql
Execute SQL from the command line
PSQL: Use a psqlrc File For Common Settings
Launch PSQL with a custom configuration
PSQL: A Better Null Display Character
Update the default null
PSQL: String Contains Another String
Via the position function
PSQL: Salt And Hash A Password With pgcrypto
Via the crypt and gen_salt functions
PSQL: Generating UUIDs With pgcrypto
Avoid the OSSP UUID library
PSQL: Compute Hashes With pgcrypto
md5, sha1, sha224, sha256, sha384 and sha512
PSQL: List Various Kinds Of Objects
Useful meta-commands
PSQL: List Database Users
Using the du command
PSQL: List Database Objects With Disk Usage
The dt command
PSQL: List All The Databases
Using the list command
PSQL: Insert Just The Defaults
Using the ‘default values’ options
PSQL: Generate Series Of Numbers
Using the generate_series function
PSQL: Export Query Results To A CSV
Using the copy function
PSQL: Clear The Screen In psql
Via the clear shell command
PSQL: Storing Emails With citext
Ignore case in email addresses
PSQL: Getting A Slice Of An Array
Using brackets
PSQL: Defining Arrays
With one or two dimensions
PSQL: Renaming A Table
Using alter table
PSQL: Restart A Sequence
Using alter sequence
PSQL: Determining The Age Of Things
The aptly-named age function
Mac: Resizing Both Corners Of A Window
Using the option key
Homebrew: Switch Versions of a Brew Formula
Pick between installed versions
GitHub: Link to headers in READMEs
Quick way to generate a table of contents
GitHub: Exclude Whitespace Changes From GitHub Diffs
Just add w=1 to the diff URL
GitHub: Add Emoji To GitHub Repository Description
A workaround for the limited unicode set
Bash: Partial String Matching In Bash Scripts
Using a wildcard
Bash: Directional Commands
Move the cursor without the arrow keys
Bash: Jump To The Ends Of Your Shell History
The meta key to the rescue
Nix: The cron schedule expression editor
Quick way of getting the syntax right
Declare your python dependencies within your Jupyter notebook
Reproducible workflows are simplified with tools like Nix for shell scripts and juv for Jupyter notebooks, enabling dependency declarations directly within s...
Why you should really prepare for your one-on-ones
Maximize the impact of your 1-on-1 meetings by preparing thoroughly, not just with your direct reports but also with your managers, to boost both job perform...
The Most Harmful Inventor in History
It is difficult to surpass the magnitude of the damage caused by two particular inventions, and both were created by the same man
‘Fun’ asteroid simulator
Where do you want to take out?
Pets v Cattle: Making a personal disaster recovery plan
When disaster strikes, how quickly will you recover?
100 Years of Data Visualization – It’s Time to Stop Making the Same Mistakes
Data viz tips from a 1914 book we can still learn from
Helping people online
How to overcome the communication limitations of the internet and actually help people
The evolution of US gun violence
Our acceptance of violence today stands in striking contrast to Americans’ horror at the 1929 Valentine’s Day Massacre
The social contract of open source
Please be kind to your open source maintainers
Entropy Explained, With Sheep
You won’t fall asleep with this!
The true meaning of work-life balance
Hug your kids
How to talk to children
Fred Rogers’ tips
Debunk Flat Earthers
Equipped with cardboard
How to create a healthy society
Move fast and break things is an abomination if your goal is to create a healthy society
How to Use Astrophysics to Solve Earthbound Problems
Cross-pollination at its best
How to be Black
A satirical guide to race issues
The Importance of a Cup of Tea
A discriminating palate leads to novel rigorous statistical methods
Are categorical variables getting lost in your random forests?
Comparing two approaches
Gathering weak npm credentials
Taking advantage of poor password practices
Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes
Addressing the high dimensionality of these codes
Explaining complex machine learning models with LIME
Using Local Interpretable Model-Agnostic Explanations
SHAttered
Deliberately cause a SHA-1 collision
The legends of mathematics that almost never were
Mathematical genius resides within every one of us
The Truth About Bad Science
Many published studies are not reproducible!
Why Should I Trust You?: Explaining the Predictions of Any Classifier
People don’t trust black-box models
Death by Diagnosis on Freakonomics
If we don’t measure patient outcomes, how do we know how well our healthcare is doing?!
Making machine learning models interpretable
From the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
The relativity of raw data
Why provenance of data is important
The Friendliest of Fire
An American pilot shoots down an American plane
Hello World!
My first post
Why You’re Not Getting Value from Your Data Science
If companies want to get value from their data, they need to focus on accelerating human understanding of data, scaling the number of modeling questions they...
List only untracked files
Using ls-files
Articles from arXiv.org as responsive HTML5 web pages
One character makes a big difference
Update your command line tools
Modern versions of common vintage tools
Absurd trolley problems
Playful thought experiments
Starting robust reliable and maintainable bash scripts
A useful header for bash scripts to avoid common bugs
Show a zoomable world map in your terminal
Browse the world from your command line
Nix: Play with jq with jq-play
Test run your jq filters
Match arguments to help text
Better understand your shell commands
Jupyter: View notebook in terminal
Via an open-source command line tool
HTTP: Simulate unexpected network conditions with deelay.me
Another webpage to add to your toolkit
Finding nulls in DataFrames
Handy one liners I use all the time
Machine Learning for Product Managers
And those who communicate with them
Communicating Data Science with impact
The difference between a junior and senior Data Scientist
Model error quantification
Visualize the cross-validation results
Data problems
9 things to check with a new data set.
Probabilistic interpretation of AUC
How to explain it to a layperson
How to use multiprocessing with pandas
Using the aptly named multiprocessing module
The Modal American
The most common values
Combating Fake News With a Smartphone
The Guardian Project
Spark: Orderby Partitioning
The spark.sql.shuffle.partitions configuration option
Unlearning descriptive statistics
Top tips
The Data Visualisation Catalogue
Get inspired to visualize!
Minnow Telescope Finds Massive Planet
Since ancient times, mankind has studied the sky and wondered what the ‘wandering stars’ (planets) might be. In the last two decades, we have found hundreds ...
The Little Telescope that Could
In the shadow of the ‘Big Eye’, this is the Little Telescope That Could…