Posts by Tag

til

(TIL) Python: Salted Hash

less than 1 minute read

The salt is just a randomly derived bit of data that you prefix or postfix your data with to dramatically increase the complexity of a dictionary atta...

(TIL) Pandas: Make Data Frame

less than 1 minute read

pandas has a built-in function makeDataFrame() to return a DataFrame containing random floats. Note that this is using the private API, and the exact details...

(TIL) Pandas: Read Clipboard

less than 1 minute read

The pandas.read_clipboard() method is as simple as it sounds: it reads copy-pasted tabular data and parses it into a Data Frame. For instance, try running...

(TIL) Pandas: Pipe function

less than 1 minute read

Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.

(TIL) Sklearn: Tree diagram

less than 1 minute read

The plot_tree() function allows you to create a diagram of steps present in a decision tree model:

(TIL) Sklearn: Pipeline diagram

less than 1 minute read

Estimators can be displayed with a HTML representation when shown in a jupyter notebook. This can be useful to diagnose or visualize a Pipeline with many ...

(TIL) Sklearn: Column transformations

less than 1 minute read

The Scikit-learn pipeline has a function called ColumnTransformer which allows you to easily specify which columns to apply the most appropriate preproces...

(TIL) Pandas: Named Aggregation

1 minute read

pandas>=0.25 supports named aggregation, allowing you to specify the output column names when you aggregate a groupby, instead of renaming. This will be e...

(TIL) Mac: Remove quarantine flag from app

less than 1 minute read

I encountered an issue with my favorite macos Markdown editor MacDown where macos Catalina was reporting the file as damaged. It turns out that Catalina has ...

(TIL) Science: Light v Sound

less than 1 minute read

Roughly: Light travels about a foot per nanosecond, sound travels about a foot per millisecond. A factor of almost exactly a million. — Colin Wright ...

(TIL) Git: Blocked ssh port

less than 1 minute read

I mostly clone GitHub and Bitbucket repositories using SSH URLs, so that I can protect this access with an SSH private/public keypair. Unfortunately, some fi...

(TIL) Git: Stash tips

1 minute read

I’m a big fan of using git stash to shelve some changes in my repository so that I can move on to some other task. Here are some advanced git stash commands ...

(TIL) Spark: Count number of duplicate rows

less than 1 minute read

To count the number of duplicate rows in a pyspark DataFrame, you want to groupBy() all the columns and count(), then select the sum of the counts for the ro...

(TIL) Docker: Set Timezone

less than 1 minute read

To set which timezone your docker container should use, add the following to your Dockerfile:

(TIL) Git: Stashing Untracked Files

less than 1 minute read

Normally when stashing changes, using git stash, git is only going to stash changes to tracked files. If there are any new files in your project that aren’t ...

(TIL) Git: Snapshot

less than 1 minute read

To save a snapshot of your current work in git, try this command:

(TIL) Git: Stashing Only Unstaged Changes

less than 1 minute read

If you have both staged and unstaged changes in your project, you can perform a stash on just the unstaged ones by using the -k flag. The staged changes will...

(TIL) Git: Two ways of squashing commits

less than 1 minute read

It is handy to squash down your commits before merging your PR with my-new-cool-feature. You can either squash them down by doing an interactive rebase like ...

(TIL) Git: Interactively Unstage Changes

less than 1 minute read

I often use git add --patch to interactively stage changes for a commit. Git takes me through changes to tracked files piece by piece to check if I want to s...

(TIL) Git: Undo a Git Mistake

less than 1 minute read

git reflog is a record of your actions in Git. With this command, you can undo almost any Git mistake.

(TIL) Git: Resetting A Reset

less than 1 minute read

Sometimes we run commands like git reset --hard HEAD~ when we shouldn’t have. We wish we could undo what we’ve done, but the commit we’ve reset is gone forev...

(TIL) Git: Accessing A Lost Commit

less than 1 minute read

If you have lost track of a recent commit (perhaps you did a reset), you can generally still get it back. Run git reflog and look through the output to see i...

(TIL) Git: Git Log With Authors

less than 1 minute read

In my never-ending quest to better summarize my work at the end of the day using computers, I discovered today the Git --author flag. It works like this:

(TIL) Git: Git Log since

less than 1 minute read

At the end of each day, I try to record what I did, to jog my memory during the next morning’s standup. This is a helpful aid:

(TIL) Git: List Filenames Without The Diffs

less than 1 minute read

The git show command will list all changes for a given reference including the diffs. With diffs included, this can get rather verbose at times. If you just ...

(TIL) Git: Last Commit A File Appeared In

less than 1 minute read

In my project, I have a README.md file that I haven’t modified in a while. I’d like to take a look at the last commit that modified it. The git log command c...

(TIL) Git: LFS Track

less than 1 minute read

When you add a new type of large file to your repository, you’ll need to tell Git LFS to track it by specifying a pattern using the git lfs track command:

(TIL) Git: LFS Pull

less than 1 minute read

You can pull from a Git LFS repository using a normal git pull. No explicit commands are needed to retrieve Git LFS content. However, if the checkout fails f...

(TIL) Git: LFS Prune

1 minute read

You can delete files from your local Git LFS cache with the git lfs prune command. This will delete any local Git LFS files that are considered ‘old’. An old...

(TIL) Git: Migrate LFS hosting provider

less than 1 minute read

To migrate a Git LFS repository from one hosting provider to another, you can use a combination of git lfs fetch and git lfs push with the --all option speci...

(TIL) Git: LFS Fetch

1 minute read

Git LFS typically only downloads the files needed for commits that you actually checkout locally. However, you can force Git LFS to download extra content fo...

(TIL) Git: LFS Clone

less than 1 minute read

Once Git LFS is installed, you can clone a Git LFS repository as normal using git clone. At the end of the cloning process Git will checkout the default bran...

(TIL) Git: Delete Remote Git Tags

less than 1 minute read

Tagging releases with Git is a good idea. In case your tags get off track, here is how you delete a Git tag locally and on a remote:

(TIL) Git: Diffing With Patience

less than 1 minute read

The default diff algorithm used by Git is pretty good, but it can get mislead by larger, complex changesets. The result is a noisier, misaligned diff output.

(TIL) Git: Delete All Untracked Files

less than 1 minute read

Git provides a command explicitly intended for cleaning up (read: removing) untracked files from a local copy of a repository.

(TIL) Git: Checkout Old Version Of A File

less than 1 minute read

When you want to return to a past version of a file, you can reset to a past commit. When you don’t want to abandon a bunch of other changes, this isn’t goin...

(TIL) Git: Use a file from another branch

less than 1 minute read

Sometimes you just need one file from another branch. Sure you could git cherry-pick but then you’re dealing with commits. That sort of thing gets sticky fas...

(TIL) Git: Clean Out All Local Branches

less than 1 minute read

Sometimes a project can get to a point where there are so many local branches that deleting them one by one is too tedious. This one-liner can help:

(TIL) Git: Intent To Add

less than 1 minute read

Git commands like git diff and git add --patch are awesome, but their little caveat is that they only work on files that are currently tracked in the reposit...

(TIL) Stats: Better descriptive statistics

less than 1 minute read

Instead of the mean, use the median and/or the mode. Instead of the standard deviation, use the mean absolute deviation, the median absolute deviation, ...

(TIL) Docker: Attach/Detach

less than 1 minute read

To detach from a container, you hold Ctrl and press P, then Q. This only works if the container was started with both -t and -i.

(TIL) Travis: Skip unnecessary builds

less than 1 minute read

Especially when you’re working with a large team with multiple Travis-enabled repositories, you’ll want to avoid running any unnecessary builds. The most ...

(TIL) Jupyter: Output of all variables

less than 1 minute read

If you try to see the output of more variables without explicitly writing print in front of each, only the last one gets outputted. With this, you get the...

(TIL) Pandas: sort_index

less than 1 minute read

Dataframes have a new sort_index method to sort a dataframe by index. This is equivalent to the deprecated sort method with the columns argument set to `None.

(TIL) Pandas: Options

less than 1 minute read

You can use the following functions to interact with the options in pandas:

(TIL) Pandas: Option Context

less than 1 minute read

If you want to temporarily change pandas options, instead of doing so manually as follows:

(TIL) Pandas: Speed up merges

less than 1 minute read

You can improve the speed of a merge by first specifying the key column of the merge as the index of your dataframes, and then using join instead of merge:

(TIL) CompSci: GUIDs are not strings

less than 1 minute read

GUIDs are not strings. They are numbers. We render them as strings for readability. We should not process them as strings. We should not pass them around ...

(TIL) AWS CLI: List account aliases

less than 1 minute read

If you want the URL for your sign-in page to contain your company name (or other friendly identifier) instead of your AWS account ID, you can create an al...

(TIL) Mac: Emoji

less than 1 minute read

Press the Command-Control-space to launch the characters palette. Then click on the Emoji icon in the sidebar on the left side of the Character window. You c...

(TIL) Travis: Why is my build not running?

less than 1 minute read

Sometimes you push to Travis CI and there is no new build. What to do in such case? Has Travis CI got your commits? Is the branch you were using disabled?...

(TIL) Mac: Power shortcuts

less than 1 minute read

Control-Command-Power/Eject will reboot the Mac instantly. Command-Option-Control-Power/Eject will shut it down. Command-Shift-Q will log off. Shift...

(TIL) Python: SpooledTemporaryFile

less than 1 minute read

The tempfile.SpooledTemporaryFile function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds the pa...

(TIL) Python: Private variables

less than 1 minute read

To make Python treat a variable as pseudo-private, follow the convention of putting two underscores (i.e., __) at the beginning of the variable’s name, e.g.:

(TIL) Python: Flushing while printing

less than 1 minute read

Since Python 3.3, you can force the normal print() function to flush without the need to use sys.stdout.flush(); just set the flush keyword argument to Tr...

(TIL) Python: Pickle in Python2 and Python3

less than 1 minute read

The python3 pickle.load() function has optional keyword arguments that can be used to control compatibility support for pickle stream generated by Python 2:

(TIL) Python: An improved tuple

1 minute read

A downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individ...

(TIL) Python: Lambdas as lexical closures

less than 1 minute read

A ‘lexical closure’ is a fancy name for a function that remembers the values from the enclosing lexical scope even when the program flow is no longer in that...

(TIL) Python: Parallel for loops

1 minute read

Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator ...

(TIL) Python: Cache function output

1 minute read

Joblib traces parameters passed to a function, and if the function has been called with the same parameters it returns the return value cached on a disk.

(TIL) Python: Deep copy a compound object

2 minute read

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain muta...

(TIL) Python: Collect garbage

less than 1 minute read

If you have a variable with a large memory footprint, you can force garbage collection using the gc Garbage Collector module:

(TIL) AWS CLI: Assuming a role

less than 1 minute read

You can configure the AWS Command Line Interface to use a role by creating a profile for the role in the ~/.aws/config file. The following example shows a ro...

(TIL) Matplotlib: Get current axis

less than 1 minute read

matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.

(TIL) Nix: Check If A Port Is In Use

less than 1 minute read

The lsof command is used to list open files. This includes listing network connections. This means I can check if a particular port is in use and what proces...

(TIL) Nix: CPU Benchmark

less than 1 minute read

dd in conjunction with any stream-processing CPU-intensive program can be used as a simple CPU benchmark!

(TIL) Nix: Saying Yes

less than 1 minute read

Tired of being prompted for confirmation by command-line utilities? Wish you could blindly respond ‘yes’ to whatever it is they are bugging you about? The ye...

(TIL) Nix: Watch That Program

less than 1 minute read

Have you ever been working in the terminal and found yourself repeating the same command many times? Delegate that work to the computer.

(TIL) Nix: Duplicate pipe content

less than 1 minute read

To duplicate the content while piping you can use the tee utility. One straightforward and useful example is that tee can be used to write to multiple fil...

(TIL) Nix: Sort In Numerical Order

less than 1 minute read

By default, the sort command will sort things alphabetically. If you have numerical input though, you may want a numerical sort. This is what the -n flag is ...

(TIL) Nix: Search Man Page Descriptions

less than 1 minute read

You can use the apropos command with a keyword argument to search for that words occurrence throughout all the man pages on your system. For instance, invoki...

(TIL) Nix: SSH pipes

less than 1 minute read

One of the benefits of piping is that you can use it over networks and it does wonders for data transfer. Note that half of the command is executed locally, ...

(TIL) Nix: Killing A Frozen SSH Session

less than 1 minute read

Whenever an SSH session freezes, I usually mash the keyboard in desperation and then kill the terminal session. This can be avoided though. SSH will listen f...

(TIL) Nix: List parent pid with ps

less than 1 minute read

The ps command, which stands for process status, is a great way to find different processes running on a machine. Information like their pid (process id) is ...

(TIL) Nix: Printing with lpr

less than 1 minute read

Recently while trying to fix a printer I used lpr a bunch of times. It’s not exactly new to me, but never fails to surprise people when I use it.

(TIL) Nix: Search History

less than 1 minute read

Often times there is a very specific command you have entered into your bash prompt that you need to run again. You don’t want to have to type it again and s...

(TIL) Nix: Last Argument Of The Last Command

less than 1 minute read

You can use !$ as a way to reference the last argument in the last command. This makes for an easy shortcut when you want to switch out commands for the same...

(TIL) Nix: Hexdump A Compiled File

2 minute read

The hexdump unix utility allows you to dump the contents of a compiled/executable file in a _readable_ hexadecimal format. Adding the -C flag includes a side...

(TIL) Nix: Only Show The Matches

less than 1 minute read

Tools like grep, ack, and ag make it easy to search for lines in a file that contain certain text and patterns. They all come with the -o flag which tells th...

(TIL) Nix: List Names Of Files With Matches

less than 1 minute read

I often use grep and ag to search for patterns in a group or directory of files. Generally I am interested in looking at the matching lines themselves. Howev...

(TIL) Nix: Grep For Multiple Patterns

less than 1 minute read

You can use the -e flag with the grep command to search for a pattern. Additionally, you can use multiple -e flags to search for multiple patterns. For insta...

(TIL) Nix: Grep For Files Without A Match

less than 1 minute read

The grep command is generally used to find files whose contents match a pattern. With the -L (--files-without-match) flag, grep can be used to find files tha...

(TIL) Nix: Find Newer Files

less than 1 minute read

Use the -newer flag with the name of a file to find files that have a newer modification date than the named file.

(TIL) Nix: Exclude A Directory With Find

less than 1 minute read

Using find is a handy way to track down files that meet certain criteria. However, if there are directories full of irrelevant files, you may end up with a l...

(TIL) Nix: Upgrading Ubuntu

less than 1 minute read

I recently discovered that my Linode box was running a fairly old version of Ubuntu. Because it is a remote box that I SSH into, there is no graphical user i...

(TIL) Nix: Curling For Headers

less than 1 minute read

If you want to inspect the headers of a response from some endpoint, look no further than a quick curl command. By including the -I flag, curl will return ju...

(TIL) Nix: Convert tabs to/from spaces

less than 1 minute read

The command expand in GNU coreutils converts tabs in each input file to spaces. The command unexpand does the reverse, converting spaces in each input files ...

(TIL) Nix: Change Default Shell For A User

less than 1 minute read

You can change the default shell program for a particular unix user with the chsh command. Just tell it what shell program you want to use (e.g. bash or zsh)...

(TIL) Spark: Orderby Partitioning

less than 1 minute read

Remember that orderBy uses the number of partitions specified by spark.conf.get("spark.sql.shuffle.partitions"). The default for this is 200. Can change manu...

(TIL) Tmux: tmux in your tmux

less than 1 minute read

If you are running tmux locally and you shell into another machine to access tmux remotely, you will suddenly find yourself in tmux inception. You will have ...

(TIL) Tmux: Adjusting Window Pane Size

less than 1 minute read

In tmux, the size of window panes can be adjusted incrementally with the resize-pane command. For instance, to resize a pane in any direction (left, down, up...

(TIL) Tmux: Rename The Current Session

less than 1 minute read

If you’ve created an unnamed tmux session or you no longer like the original name, you can open a prompt to change it by hitting

(TIL) Tmux: Pane Killer

less than 1 minute read

The current pane can be killed (closed) using the following key binding:

(TIL) Tmux: Paging Up And Down

less than 1 minute read

When in copy mode (<prefix>[), you can move the cursor around like you would in vim with the directional keys (hjkl). This works fine until you want to...

(TIL) Tmux: List Sessions

less than 1 minute read

Not sure if tmux is running or, if it is, which sessions are available? You can list all the currently running sessions right from the command-line.

(TIL) Tmux: List All Key Bindings

less than 1 minute read

There are a couple ways to list all the tmux key bindings. If you are not currently in a tmux session, you can still access the list from the terminal with

(TIL) Tmux: Kill The Current Session

less than 1 minute read

When you are done with the current tmux session and you no longer need it, you can simply kill it. You can do so within the session with the following comman...

(TIL) SQL: Day Of Week By Name For A Date

less than 1 minute read

By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day tod...

(TIL) SQL: Count Records By Type

less than 1 minute read

If you have a table with some sort of type column on it, you can come up with a count of the records in that table by type. You just need to take advantage o...

(TIL) PSQL: List Connections To A Database

less than 1 minute read

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the pr...

(TIL) PSQL: Sleeping

less than 1 minute read

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just...

(TIL) PSQL: Get The Size Of A Table

less than 1 minute read

With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can by e...

(TIL) PSQL: Dump a database

less than 1 minute read

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (th...

(TIL) PSQL: Get The Size Of A Database

less than 1 minute read

If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.

(TIL) PSQL: Change The Current Directory

less than 1 minute read

When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative p...

(TIL) PSQL: Auto Expanded Display

less than 1 minute read

By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be ...

(TIL) PSQL: Use Argument Indexes

less than 1 minute read

In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by p...

(TIL) PSQL: Types By Category

less than 1 minute read

Postgres has many types, each of which fall into a particular category. These categories include Array, Boolean, String, Numeric, Composite, etc. Each of the...

(TIL) PSQL: Truncate All Rows

less than 1 minute read

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

(TIL) PSQL: Turn Timing On

less than 1 minute read

When digging around your database and running queries, it is helpful to have an eye on the speed of those queries. This can give insight into where there are...

(TIL) PSQL: Find The Data Directory

less than 1 minute read

Where does postgres store all of the data for a database cluster? Well, in its data directory. Where exactly that data directory is can depend on how the dat...

(TIL) PSQL: Configure The Timezone

less than 1 minute read

Running show timezone; will reveal the timezone for your postgres connection. If you want to change the timezone for the duration of the connection, you can ...

(TIL) PSQL: A Better Null Display Character

less than 1 minute read

By default, psql will display null values with whitespace. This makes it difficult to quickly identify null values when they appear amongst a bunch of other ...

(TIL) PSQL: Compute Hashes With pgcrypto

less than 1 minute read

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. A...

(TIL) PSQL: List Various Kinds Of Objects

less than 1 minute read

Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different...

(TIL) PSQL: Insert Just The Defaults

less than 1 minute read

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situat...

(TIL) PSQL: Generate Series Of Numbers

less than 1 minute read

Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop ...

(TIL) PSQL: Export Query Results To A CSV

less than 1 minute read

Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to c...

(TIL) PSQL: Clear The Screen In psql

less than 1 minute read

The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear f...

(TIL) PSQL: Storing Emails With citext

less than 1 minute read

Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn’t care if they ty...

(TIL) PSQL: Getting A Slice Of An Array

less than 1 minute read

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slic...

(TIL) PSQL: Renaming A Table

less than 1 minute read

Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign ...

(TIL) PSQL: Restart A Sequence

less than 1 minute read

In postgres, if you are truncating a table or doing some other sort of destructive action on a table in a development or testing environment, you may notice ...

(TIL) GitHub: Link to headers in READMEs

less than 1 minute read

Anytime you add a header to a markdown file, GitHub attaches an href with its downcased name. ‘JavaScript’ receives a link to #javascript, for instance.

Back to Top ↑

nix

(TIL) Nix: Check If A Port Is In Use

less than 1 minute read

The lsof command is used to list open files. This includes listing network connections. This means I can check if a particular port is in use and what proces...

(TIL) Nix: CPU Benchmark

less than 1 minute read

dd in conjunction with any stream-processing CPU-intensive program can be used as a simple CPU benchmark!

(TIL) Nix: Saying Yes

less than 1 minute read

Tired of being prompted for confirmation by command-line utilities? Wish you could blindly respond ‘yes’ to whatever it is they are bugging you about? The ye...

(TIL) Nix: Watch That Program

less than 1 minute read

Have you ever been working in the terminal and found yourself repeating the same command many times? Delegate that work to the computer.

(TIL) Nix: Duplicate pipe content

less than 1 minute read

To duplicate the content while piping you can use the tee utility. One straightforward and useful example is that tee can be used to write to multiple fil...

(TIL) Nix: Sort In Numerical Order

less than 1 minute read

By default, the sort command will sort things alphabetically. If you have numerical input though, you may want a numerical sort. This is what the -n flag is ...

(TIL) Nix: Search Man Page Descriptions

less than 1 minute read

You can use the apropos command with a keyword argument to search for that words occurrence throughout all the man pages on your system. For instance, invoki...

(TIL) Nix: SSH pipes

less than 1 minute read

One of the benefits of piping is that you can use it over networks and it does wonders for data transfer. Note that half of the command is executed locally, ...

(TIL) Nix: Killing A Frozen SSH Session

less than 1 minute read

Whenever an SSH session freezes, I usually mash the keyboard in desperation and then kill the terminal session. This can be avoided though. SSH will listen f...

(TIL) Nix: List parent pid with ps

less than 1 minute read

The ps command, which stands for process status, is a great way to find different processes running on a machine. Information like their pid (process id) is ...

(TIL) Nix: Printing with lpr

less than 1 minute read

Recently while trying to fix a printer I used lpr a bunch of times. It’s not exactly new to me, but never fails to surprise people when I use it.

(TIL) Nix: Search History

less than 1 minute read

Often times there is a very specific command you have entered into your bash prompt that you need to run again. You don’t want to have to type it again and s...

(TIL) Nix: Last Argument Of The Last Command

less than 1 minute read

You can use !$ as a way to reference the last argument in the last command. This makes for an easy shortcut when you want to switch out commands for the same...

(TIL) Nix: Hexdump A Compiled File

2 minute read

The hexdump unix utility allows you to dump the contents of a compiled/executable file in a _readable_ hexadecimal format. Adding the -C flag includes a side...

(TIL) Nix: Only Show The Matches

less than 1 minute read

Tools like grep, ack, and ag make it easy to search for lines in a file that contain certain text and patterns. They all come with the -o flag which tells th...

(TIL) Nix: List Names Of Files With Matches

less than 1 minute read

I often use grep and ag to search for patterns in a group or directory of files. Generally I am interested in looking at the matching lines themselves. Howev...

(TIL) Nix: Grep For Multiple Patterns

less than 1 minute read

You can use the -e flag with the grep command to search for a pattern. Additionally, you can use multiple -e flags to search for multiple patterns. For insta...

(TIL) Nix: Grep For Files Without A Match

less than 1 minute read

The grep command is generally used to find files whose contents match a pattern. With the -L (--files-without-match) flag, grep can be used to find files tha...

(TIL) Nix: Find Newer Files

less than 1 minute read

Use the -newer flag with the name of a file to find files that have a newer modification date than the named file.

(TIL) Nix: Exclude A Directory With Find

less than 1 minute read

Using find is a handy way to track down files that meet certain criteria. However, if there are directories full of irrelevant files, you may end up with a l...

(TIL) Nix: Upgrading Ubuntu

less than 1 minute read

I recently discovered that my Linode box was running a fairly old version of Ubuntu. Because it is a remote box that I SSH into, there is no graphical user i...

(TIL) Nix: Curling For Headers

less than 1 minute read

If you want to inspect the headers of a response from some endpoint, look no further than a quick curl command. By including the -I flag, curl will return ju...

(TIL) Nix: Convert tabs to/from spaces

less than 1 minute read

The command expand in GNU coreutils converts tabs in each input file to spaces. The command unexpand does the reverse, converting spaces in each input files ...

(TIL) Nix: Change Default Shell For A User

less than 1 minute read

You can change the default shell program for a particular unix user with the chsh command. Just tell it what shell program you want to use (e.g. bash or zsh)...

Back to Top ↑

sql

(TIL) SQL: Day Of Week By Name For A Date

less than 1 minute read

By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day tod...

(TIL) SQL: Count Records By Type

less than 1 minute read

If you have a table with some sort of type column on it, you can come up with a count of the records in that table by type. You just need to take advantage o...

(TIL) PSQL: List Connections To A Database

less than 1 minute read

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the pr...

(TIL) PSQL: Sleeping

less than 1 minute read

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just...

(TIL) PSQL: Get The Size Of A Table

less than 1 minute read

With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can by e...

(TIL) PSQL: Dump a database

less than 1 minute read

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (th...

(TIL) PSQL: Get The Size Of A Database

less than 1 minute read

If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.

(TIL) PSQL: Change The Current Directory

less than 1 minute read

When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative p...

(TIL) PSQL: Auto Expanded Display

less than 1 minute read

By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be ...

(TIL) PSQL: Use Argument Indexes

less than 1 minute read

In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by p...

(TIL) PSQL: Types By Category

less than 1 minute read

Postgres has many types, each of which fall into a particular category. These categories include Array, Boolean, String, Numeric, Composite, etc. Each of the...

(TIL) PSQL: Truncate All Rows

less than 1 minute read

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

(TIL) PSQL: Turn Timing On

less than 1 minute read

When digging around your database and running queries, it is helpful to have an eye on the speed of those queries. This can give insight into where there are...

(TIL) PSQL: Find The Data Directory

less than 1 minute read

Where does postgres store all of the data for a database cluster? Well, in its data directory. Where exactly that data directory is can depend on how the dat...

(TIL) PSQL: Configure The Timezone

less than 1 minute read

Running show timezone; will reveal the timezone for your postgres connection. If you want to change the timezone for the duration of the connection, you can ...

(TIL) PSQL: A Better Null Display Character

less than 1 minute read

By default, psql will display null values with whitespace. This makes it difficult to quickly identify null values when they appear amongst a bunch of other ...

(TIL) PSQL: Compute Hashes With pgcrypto

less than 1 minute read

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. A...

(TIL) PSQL: List Various Kinds Of Objects

less than 1 minute read

Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different...

(TIL) PSQL: Insert Just The Defaults

less than 1 minute read

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situat...

(TIL) PSQL: Generate Series Of Numbers

less than 1 minute read

Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop ...

(TIL) PSQL: Export Query Results To A CSV

less than 1 minute read

Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to c...

(TIL) PSQL: Clear The Screen In psql

less than 1 minute read

The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear f...

(TIL) PSQL: Storing Emails With citext

less than 1 minute read

Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn’t care if they ty...

(TIL) PSQL: Getting A Slice Of An Array

less than 1 minute read

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slic...

(TIL) PSQL: Renaming A Table

less than 1 minute read

Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign ...

(TIL) PSQL: Restart A Sequence

less than 1 minute read

In postgres, if you are truncating a table or doing some other sort of destructive action on a table in a development or testing environment, you may notice ...

Back to Top ↑

git

(TIL) Git: Blocked ssh port

less than 1 minute read

I mostly clone GitHub and Bitbucket repositories using SSH URLs, so that I can protect this access with an SSH private/public keypair. Unfortunately, some fi...

(TIL) Git: Stash tips

1 minute read

I’m a big fan of using git stash to shelve some changes in my repository so that I can move on to some other task. Here are some advanced git stash commands ...

(TIL) Git: Stashing Untracked Files

less than 1 minute read

Normally when stashing changes, using git stash, git is only going to stash changes to tracked files. If there are any new files in your project that aren’t ...

(TIL) Git: Snapshot

less than 1 minute read

To save a snapshot of your current work in git, try this command:

(TIL) Git: Stashing Only Unstaged Changes

less than 1 minute read

If you have both staged and unstaged changes in your project, you can perform a stash on just the unstaged ones by using the -k flag. The staged changes will...

(TIL) Git: Two ways of squashing commits

less than 1 minute read

It is handy to squash down your commits before merging your PR with my-new-cool-feature. You can either squash them down by doing an interactive rebase like ...

(TIL) Git: Interactively Unstage Changes

less than 1 minute read

I often use git add --patch to interactively stage changes for a commit. Git takes me through changes to tracked files piece by piece to check if I want to s...

(TIL) Git: Undo a Git Mistake

less than 1 minute read

git reflog is a record of your actions in Git. With this command, you can undo almost any Git mistake.

(TIL) Git: Resetting A Reset

less than 1 minute read

Sometimes we run commands like git reset --hard HEAD~ when we shouldn’t have. We wish we could undo what we’ve done, but the commit we’ve reset is gone forev...

(TIL) Git: Accessing A Lost Commit

less than 1 minute read

If you have lost track of a recent commit (perhaps you did a reset), you can generally still get it back. Run git reflog and look through the output to see i...

(TIL) Git: Git Log With Authors

less than 1 minute read

In my never-ending quest to better summarize my work at the end of the day using computers, I discovered today the Git --author flag. It works like this:

(TIL) Git: Git Log since

less than 1 minute read

At the end of each day, I try to record what I did, to jog my memory during the next morning’s standup. This is a helpful aid:

(TIL) Git: List Filenames Without The Diffs

less than 1 minute read

The git show command will list all changes for a given reference including the diffs. With diffs included, this can get rather verbose at times. If you just ...

(TIL) Git: Last Commit A File Appeared In

less than 1 minute read

In my project, I have a README.md file that I haven’t modified in a while. I’d like to take a look at the last commit that modified it. The git log command c...

(TIL) Git: LFS Track

less than 1 minute read

When you add a new type of large file to your repository, you’ll need to tell Git LFS to track it by specifying a pattern using the git lfs track command:

(TIL) Git: LFS Pull

less than 1 minute read

You can pull from a Git LFS repository using a normal git pull. No explicit commands are needed to retrieve Git LFS content. However, if the checkout fails f...

(TIL) Git: LFS Prune

1 minute read

You can delete files from your local Git LFS cache with the git lfs prune command. This will delete any local Git LFS files that are considered ‘old’. An old...

(TIL) Git: Migrate LFS hosting provider

less than 1 minute read

To migrate a Git LFS repository from one hosting provider to another, you can use a combination of git lfs fetch and git lfs push with the --all option speci...

(TIL) Git: LFS Fetch

1 minute read

Git LFS typically only downloads the files needed for commits that you actually checkout locally. However, you can force Git LFS to download extra content fo...

(TIL) Git: LFS Clone

less than 1 minute read

Once Git LFS is installed, you can clone a Git LFS repository as normal using git clone. At the end of the cloning process Git will checkout the default bran...

(TIL) Git: Delete Remote Git Tags

less than 1 minute read

Tagging releases with Git is a good idea. In case your tags get off track, here is how you delete a Git tag locally and on a remote:

(TIL) Git: Diffing With Patience

less than 1 minute read

The default diff algorithm used by Git is pretty good, but it can get mislead by larger, complex changesets. The result is a noisier, misaligned diff output.

(TIL) Git: Delete All Untracked Files

less than 1 minute read

Git provides a command explicitly intended for cleaning up (read: removing) untracked files from a local copy of a repository.

(TIL) Git: Checkout Old Version Of A File

less than 1 minute read

When you want to return to a past version of a file, you can reset to a past commit. When you don’t want to abandon a bunch of other changes, this isn’t goin...

(TIL) Git: Use a file from another branch

less than 1 minute read

Sometimes you just need one file from another branch. Sure you could git cherry-pick but then you’re dealing with commits. That sort of thing gets sticky fas...

(TIL) Git: Clean Out All Local Branches

less than 1 minute read

Sometimes a project can get to a point where there are so many local branches that deleting them one by one is too tedious. This one-liner can help:

(TIL) Git: Intent To Add

less than 1 minute read

Git commands like git diff and git add --patch are awesome, but their little caveat is that they only work on files that are currently tracked in the reposit...

(TIL) GitHub: Link to headers in READMEs

less than 1 minute read

Anytime you add a header to a markdown file, GitHub attaches an href with its downcased name. ‘JavaScript’ receives a link to #javascript, for instance.

Back to Top ↑

psql

(TIL) PSQL: List Connections To A Database

less than 1 minute read

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the pr...

(TIL) PSQL: Sleeping

less than 1 minute read

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just...

(TIL) PSQL: Get The Size Of A Table

less than 1 minute read

With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can by e...

(TIL) PSQL: Dump a database

less than 1 minute read

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (th...

(TIL) PSQL: Get The Size Of A Database

less than 1 minute read

If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.

(TIL) PSQL: Change The Current Directory

less than 1 minute read

When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative p...

(TIL) PSQL: Auto Expanded Display

less than 1 minute read

By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be ...

(TIL) PSQL: Use Argument Indexes

less than 1 minute read

In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by p...

(TIL) PSQL: Types By Category

less than 1 minute read

Postgres has many types, each of which fall into a particular category. These categories include Array, Boolean, String, Numeric, Composite, etc. Each of the...

(TIL) PSQL: Truncate All Rows

less than 1 minute read

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

(TIL) PSQL: Turn Timing On

less than 1 minute read

When digging around your database and running queries, it is helpful to have an eye on the speed of those queries. This can give insight into where there are...

(TIL) PSQL: Find The Data Directory

less than 1 minute read

Where does postgres store all of the data for a database cluster? Well, in its data directory. Where exactly that data directory is can depend on how the dat...

(TIL) PSQL: Configure The Timezone

less than 1 minute read

Running show timezone; will reveal the timezone for your postgres connection. If you want to change the timezone for the duration of the connection, you can ...

(TIL) PSQL: A Better Null Display Character

less than 1 minute read

By default, psql will display null values with whitespace. This makes it difficult to quickly identify null values when they appear amongst a bunch of other ...

(TIL) PSQL: Compute Hashes With pgcrypto

less than 1 minute read

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. A...

(TIL) PSQL: List Various Kinds Of Objects

less than 1 minute read

Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different...

(TIL) PSQL: Insert Just The Defaults

less than 1 minute read

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situat...

(TIL) PSQL: Generate Series Of Numbers

less than 1 minute read

Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop ...

(TIL) PSQL: Export Query Results To A CSV

less than 1 minute read

Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to c...

(TIL) PSQL: Clear The Screen In psql

less than 1 minute read

The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear f...

(TIL) PSQL: Storing Emails With citext

less than 1 minute read

Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn’t care if they ty...

(TIL) PSQL: Getting A Slice Of An Array

less than 1 minute read

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slic...

(TIL) PSQL: Renaming A Table

less than 1 minute read

Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign ...

(TIL) PSQL: Restart A Sequence

less than 1 minute read

In postgres, if you are truncating a table or doing some other sort of destructive action on a table in a development or testing environment, you may notice ...

Back to Top ↑

python

(TIL) Python: Salted Hash

less than 1 minute read

The salt is just a randomly derived bit of data that you prefix or postfix your data with to dramatically increase the complexity of a dictionary atta...

(TIL) Python: SpooledTemporaryFile

less than 1 minute read

The tempfile.SpooledTemporaryFile function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds the pa...

(TIL) Python: Private variables

less than 1 minute read

To make Python treat a variable as pseudo-private, follow the convention of putting two underscores (i.e., __) at the beginning of the variable’s name, e.g.:

(TIL) Python: Flushing while printing

less than 1 minute read

Since Python 3.3, you can force the normal print() function to flush without the need to use sys.stdout.flush(); just set the flush keyword argument to Tr...

(TIL) Python: Pickle in Python2 and Python3

less than 1 minute read

The python3 pickle.load() function has optional keyword arguments that can be used to control compatibility support for pickle stream generated by Python 2:

(TIL) Python: An improved tuple

1 minute read

A downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individ...

(TIL) Python: Lambdas as lexical closures

less than 1 minute read

A ‘lexical closure’ is a fancy name for a function that remembers the values from the enclosing lexical scope even when the program flow is no longer in that...

(TIL) Python: Parallel for loops

1 minute read

Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator ...

(TIL) Python: Cache function output

1 minute read

Joblib traces parameters passed to a function, and if the function has been called with the same parameters it returns the return value cached on a disk.

(TIL) Python: Deep copy a compound object

2 minute read

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain muta...

(TIL) Python: Collect garbage

less than 1 minute read

If you have a variable with a large memory footprint, you can force garbage collection using the gc Garbage Collector module:

Back to Top ↑

pandas

(TIL) Pandas: Make Data Frame

less than 1 minute read

pandas has a built-in function makeDataFrame() to return a DataFrame containing random floats. Note that this is using the private API, and the exact details...

(TIL) Pandas: Read Clipboard

less than 1 minute read

The pandas.read_clipboard() method is as simple as it sounds: it reads copy-pasted tabular data and parses it into a Data Frame. For instance, try running...

(TIL) Pandas: Pipe function

less than 1 minute read

Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.

(TIL) Pandas: Named Aggregation

1 minute read

pandas>=0.25 supports named aggregation, allowing you to specify the output column names when you aggregate a groupby, instead of renaming. This will be e...

(TIL) Pandas: sort_index

less than 1 minute read

Dataframes have a new sort_index method to sort a dataframe by index. This is equivalent to the deprecated sort method with the columns argument set to `None.

(TIL) Pandas: Options

less than 1 minute read

You can use the following functions to interact with the options in pandas:

(TIL) Pandas: Option Context

less than 1 minute read

If you want to temporarily change pandas options, instead of doing so manually as follows:

(TIL) Pandas: Speed up merges

less than 1 minute read

You can improve the speed of a merge by first specifying the key column of the merge as the index of your dataframes, and then using join instead of merge:

Back to Top ↑

tmux

(TIL) Tmux: tmux in your tmux

less than 1 minute read

If you are running tmux locally and you shell into another machine to access tmux remotely, you will suddenly find yourself in tmux inception. You will have ...

(TIL) Tmux: Adjusting Window Pane Size

less than 1 minute read

In tmux, the size of window panes can be adjusted incrementally with the resize-pane command. For instance, to resize a pane in any direction (left, down, up...

(TIL) Tmux: Rename The Current Session

less than 1 minute read

If you’ve created an unnamed tmux session or you no longer like the original name, you can open a prompt to change it by hitting

(TIL) Tmux: Pane Killer

less than 1 minute read

The current pane can be killed (closed) using the following key binding:

(TIL) Tmux: Paging Up And Down

less than 1 minute read

When in copy mode (<prefix>[), you can move the cursor around like you would in vim with the directional keys (hjkl). This works fine until you want to...

(TIL) Tmux: List Sessions

less than 1 minute read

Not sure if tmux is running or, if it is, which sessions are available? You can list all the currently running sessions right from the command-line.

(TIL) Tmux: List All Key Bindings

less than 1 minute read

There are a couple ways to list all the tmux key bindings. If you are not currently in a tmux session, you can still access the list from the terminal with

(TIL) Tmux: Kill The Current Session

less than 1 minute read

When you are done with the current tmux session and you no longer need it, you can simply kill it. You can do so within the session with the following comman...

Back to Top ↑

visualization

(TIL) Matplotlib: Get current axis

less than 1 minute read

matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.

The Data Visualisation Catalogue

less than 1 minute read

Looking for inspiration for your data viz project? Can’t remember what a particular visualization is called? Check out the Data Visualisation Catalogue.

Back to Top ↑

mac

(TIL) Mac: Remove quarantine flag from app

less than 1 minute read

I encountered an issue with my favorite macos Markdown editor MacDown where macos Catalina was reporting the file as damaged. It turns out that Catalina has ...

(TIL) Mac: Emoji

less than 1 minute read

Press the Command-Control-space to launch the characters palette. Then click on the Emoji icon in the sidebar on the left side of the Character window. You c...

(TIL) Mac: Power shortcuts

less than 1 minute read

Control-Command-Power/Eject will reboot the Mac instantly. Command-Option-Control-Power/Eject will shut it down. Command-Shift-Q will log off. Shift...

Back to Top ↑

matplotlib

(TIL) Matplotlib: Get current axis

less than 1 minute read

matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.

Back to Top ↑

plot

(TIL) Matplotlib: Get current axis

less than 1 minute read

matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.

Back to Top ↑

datascience

Machine Learning for Product Managers

less than 1 minute read

In a previous post, I discussed the importance of learning how to properly communicate Data Science to maximize the impact of your work. Product Managers are...

Communicating Data Science with impact

less than 1 minute read

One of the major differentiators between a new Data Scientist and a more experienced one is how the more senior practitioner spends a lot of time understandi...

The Data Visualisation Catalogue

less than 1 minute read

Looking for inspiration for your data viz project? Can’t remember what a particular visualization is called? Check out the Data Visualisation Catalogue.

Making machine learning models interpretable

less than 1 minute read

This month, the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning held a special session on “Interpretab...

The relativity of raw data

1 minute read

Data scientists often say that they want access to the ‘raw data’ – but what does that term mean?

Back to Top ↑

spark

(TIL) Spark: Count number of duplicate rows

less than 1 minute read

To count the number of duplicate rows in a pyspark DataFrame, you want to groupBy() all the columns and count(), then select the sum of the counts for the ro...

(TIL) Spark: Orderby Partitioning

less than 1 minute read

Remember that orderBy uses the number of partitions specified by spark.conf.get("spark.sql.shuffle.partitions"). The default for this is 200. Can change manu...

Back to Top ↑

github

(TIL) Git: Blocked ssh port

less than 1 minute read

I mostly clone GitHub and Bitbucket repositories using SSH URLs, so that I can protect this access with an SSH private/public keypair. Unfortunately, some fi...

(TIL) GitHub: Link to headers in READMEs

less than 1 minute read

Anytime you add a header to a markdown file, GitHub attaches an href with its downcased name. ‘JavaScript’ receives a link to #javascript, for instance.

Back to Top ↑

jupyter

(TIL) Jupyter: Output of all variables

less than 1 minute read

If you try to see the output of more variables without explicitly writing print in front of each, only the last one gets outputted. With this, you get the...

Back to Top ↑

docker

(TIL) Docker: Set Timezone

less than 1 minute read

To set which timezone your docker container should use, add the following to your Dockerfile: