Nix: Convert reStructuredText to Markdown
I’ve used pandoc a lot before to convert Markdown files to PDFs. I just found out it can also convert reStructuredText text files to Markdown format:
I’ve used pandoc a lot before to convert Markdown files to PDFs. I just found out it can also convert reStructuredText text files to Markdown format:
```bash #!/bin/bash
sudo launchctl bootout gui/$(id -u <username>) # or sudo launchctl bootout user/$(id -u <username>) Works on macOS 10.11.x or later.
Enter a line of Bash starting with a # comment, then run !:q on the next line to see what that would be with proper Bash escaping applied.
To check what a long-rerunning process is doing on linux, use strace:
To count the number of fields in a delimiter-separated text file, use awk:
Similar to the int data type, floats also have several additional methods useful in various scenarios:
Python’s textwrap module is useful for rearranging text, e.g. wrapping and filling lines.
The SimpleNamespace type from the types library provides an alternative to an empty class (class MyClass: pass) from which one can add and remove attribut...
Here’s a nice one liner for generating a random string using just digits and upper-case letters:
Say you need to find the total size of jpegs within a directory with subdirectories. The following command has you covered:
If you need to know in a *nix script the extension of a file, try the following:
Homebrew Python packages work by setting up their own package-specific virtual environments. This means you can upgrade them without waiting for the tap.
I wanted to have a GitHub Action step run that might fail, but if it failed the rest of the steps should still execute and the overall run should be treat...
If you provide your own custom generated ICS file hosted at a URL, it’s nice to be able to give Google Calendar users an easy way to subscribe to that fee...
This function will use AppleScript to present a password entry dialog to make your scripts a little more user friendly:
List Available Timezones:
You can restart or shutdown from the command line:
Restore sane shell settings, in case your shell session went insane because some script or application turned it into a garbled mess:
Add Gatekeeper Exception:
View a continuous stream of file system access info:
Creates an empty 10 gigabyte test file:
The salt is just a randomly derived bit of data that you prefix or postfix your data with to dramatically increase the complexity of a dictionary atta...
pandas has two handy functions for combining DataFrames:
pandas has a built-in function makeDataFrame() to return a DataFrame containing random floats. Note that this is using the private API, and the exact details...
The pandas.read_clipboard() method is as simple as it sounds: it reads copy-pasted tabular data and parses it into a Data Frame. For instance, try running...
Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.
The plot_tree() function allows you to create a diagram of steps present in a decision tree model:
Estimators can be displayed with a HTML representation when shown in a jupyter notebook. This can be useful to diagnose or visualize a Pipeline with many ...
The Scikit-learn pipeline has a function called ColumnTransformer which allows you to easily specify which columns to apply the most appropriate preproces...
On newer MacBook Pros, you can add Touch ID as an acceptable method of authenticating sudo commands. This is done by editing the /etc/pam.d/sudo file:
pandas>=0.25 supports named aggregation, allowing you to specify the output column names when you aggregate a groupby, instead of renaming. This will be e...
I encountered an issue with my favorite macos Markdown editor MacDown where macos Catalina was reporting the file as damaged. It turns out that Catalina has ...
There are lots of Terminal commands that you can use to change or update your Mac’s OS. My favorite is this quick tip to download macOS updates and installat...
ls -X will group files by extension. — Unix tool tip (UnixToolTip) October 22, 2019
Roughly: Light travels about a foot per nanosecond, sound travels about a foot per millisecond. A factor of almost exactly a million. — Colin Wright ...
Create a GitHub pull request from command line
I mostly clone GitHub and Bitbucket repositories using SSH URLs, so that I can protect this access with an SSH private/public keypair. Unfortunately, some fi...
I’m a big fan of using git stash to shelve some changes in my repository so that I can move on to some other task. Here are some advanced git stash commands ...
SHOW PROCESSLIST
```sql – Show all databases. select name from master.sys.databases;
The ANSI standard way of listing all columns in a database table is:
Every now and then I run:
Say you have a timestamp column created_at, and an integer column number that represents a number of weeks, how do you use the date_add function to calculate...
To count the number of duplicate rows in a pyspark DataFrame, you want to groupBy() all the columns and count(), then select the sum of the counts for the ro...
To check and update your system time in macos Mojave:
To set which timezone your docker container should use, add the following to your Dockerfile:
When sudo is not available in container, you can jump into a running container as root user using one of these commands:
Say you are on a feature branch, and want to make a bug fix in master. Rather than stashing your changes, or clone-ing the entire repository, you can create ...
Generally when I invoke git rm <filename>, I do so with the intention of removing a file from the project entirely. git-rm does exactly that, removing ...
Normally when stashing changes, using git stash, git is only going to stash changes to tracked files. If there are any new files in your project that aren’t ...
To save a snapshot of your current work in git, try this command:
If you have both staged and unstaged changes in your project, you can perform a stash on just the unstaged ones by using the -k flag. The staged changes will...
The -p flag can be used with git stash, just as it is used with git add, for interactively staging a stash.
It is handy to squash down your commits before merging your PR with my-new-cool-feature. You can either squash them down by doing an interactive rebase like ...
The following methods will tell git which private key to use.
Use the --stat flag when running git show on a commit to see the diffstat summary of that commit. For instance, this is what I get for a recent commit to del...
By including -- <filename> with a git log command, we can list all the commits for a file. The following is an example of such a command with some form...
I often use git add --patch to interactively stage changes for a commit. Git takes me through changes to tracked files piece by piece to check if I want to s...
git reflog is a record of your actions in Git. With this command, you can undo almost any Git mistake.
Sometimes we run commands like git reset --hard HEAD~ when we shouldn’t have. We wish we could undo what we’ve done, but the commit we’ve reset is gone forev...
If you have lost track of a recent commit (perhaps you did a reset), you can generally still get it back. Run git reflog and look through the output to see i...
Generally when referencing a commit, you’ll use the SHA or a portion of the SHA. For example with git-show:
When working on a branch with multiple commits, you can “go back in time” and revise previous commits any way you please.
In my never-ending quest to better summarize my work at the end of the day using computers, I discovered today the Git --author flag. It works like this:
At the end of each day, I try to record what I did, to jog my memory during the next morning’s standup. This is a helpful aid:
The git show command will list all changes for a given reference including the diffs. With diffs included, this can get rather verbose at times. If you just ...
There are times when I want to get a sense of the difference between two branches. I don’t want to look at the actual diff though, I just want to see what co...
In my project, I have a README.md file that I haven’t modified in a while. I’d like to take a look at the last commit that modified it. The git log command c...
Files that should never be tracked are listed in your .gitignore file. But what about if you want to ignore some local changes to a tracked file?
You can list most git commands by using the -a flag with git-help:
Git’s hash-object command can be used to determine what hash id will be used by git when creating a blob in its internal file system.
The git log command supports a --grep flag that allows you to do a text search (using grep, obviously) over the commit messages for that repository. For the ...
When you add a new type of large file to your repository, you’ll need to tell Git LFS to track it by specifying a pattern using the git lfs track command:
You can pull from a Git LFS repository using a normal git pull. No explicit commands are needed to retrieve Git LFS content. However, if the checkout fails f...
You can delete files from your local Git LFS cache with the git lfs prune command. This will delete any local Git LFS files that are considered ‘old’. An old...
To migrate a Git LFS repository from one hosting provider to another, you can use a combination of git lfs fetch and git lfs push with the --all option speci...
Setting up Git LFS
Git LFS typically only downloads the files needed for commits that you actually checkout locally. However, you can force Git LFS to download extra content fo...
Once Git LFS is installed, you can clone a Git LFS repository as normal using git clone. At the end of the cloning process Git will checkout the default bran...
After working on a Git-versioned project for a while, you may find that there are a bunch of references to remote branches in your local repository. You know...
The git-diff command can help with finding all files that have changed between two branches. For instance, if you are at the HEAD of your current feature bra...
Tagging releases with Git is a good idea. In case your tags get off track, here is how you delete a Git tag locally and on a remote:
When staging changes in interactive mode (git add -p), you have a number of options associated with single keys. y is yes, n is no, a is this and all remaini...
The default diff algorithm used by Git is pretty good, but it can get mislead by larger, complex changesets. The result is a noisier, misaligned diff output.
The author of the previous commit can be amended with the following command
Git provides a command explicitly intended for cleaning up (read: removing) untracked files from a local copy of a repository.
Git makes it easy to checkout the last branch you were on.
When you want to return to a past version of a file, you can reset to a past commit. When you don’t want to abandon a bunch of other changes, this isn’t goin...
In git, you can reference a commit SHA or branch to checkout differing versions of files.
Sometimes you just need one file from another branch. Sure you could git cherry-pick but then you’re dealing with commits. That sort of thing gets sticky fas...
Sometimes a project can get to a point where there are so many local branches that deleting them one by one is too tedious. This one-liner can help:
Git commands like git diff and git add --patch are awesome, but their little caveat is that they only work on files that are currently tracked in the reposit...
It’s very straightforward to save a matplotlib figure to disk:
If you encounter the following error when starting a docker container:
While fitting a sklearn model, I encountered the following error:
Instead of the mean, use the median and/or the mode. Instead of the standard deviation, use the mean absolute deviation, the median absolute deviation, ...
To detach from a container, you hold Ctrl and press P, then Q. This only works if the container was started with both -t and -i.
Here’s how to modify the size of the volume attached to an EC2 instance “my_ec2”:
Especially when you’re working with a large team with multiple Travis-enabled repositories, you’ll want to avoid running any unnecessary builds. The most ...
Use LaTex in your plots:
%pycat shows you (in a popup) the syntax highlighted contents of an external file:
To pass variables between notebooks, first store the variable using:
You can set environment variables directly from the notebook without having to restart the kernel:
If you try to see the output of more variables without explicitly writing print in front of each, only the last one gets outputted. With this, you get the...
You can specify comments for a table, and even the columns in the table:
You can write DataFrames to a database table via a sqlalchemy connection as follows:
Dataframes have a new sort_index method to sort a dataframe by index. This is equivalent to the deprecated sort method with the columns argument set to `None.
By default, pandas will infer the data types of the columns when reading in a csv file. To speed up this read, you can specify the data types using the dtype...
Suppose you wish to iterate through a (potentially very large) file lazily rather than reading the entire file into memory.
To extract the integer value of days from a numpy.timedelta64 in pandas, use dt.days to obtain the days attribute as integers.
In pandas, you can use the tilde (~) to flip bool values:
You can use the following functions to interact with the options in pandas:
If you want to temporarily change pandas options, instead of doing so manually as follows:
You can improve the speed of a merge by first specifying the key column of the merge as the index of your dataframes, and then using join instead of merge:
The memory usage of a DataFrame (including the index) is shown when accessing the info method of a DataFrame. A configuration option, display.memory_usage, s...
Let’s consider the following DataFrame:
```python df = pd.DataFrame(np.random.randn(10,3),columns=list(‘ABC’))
df.groupby('Sp').apply(lambda t: t[t.Count==t.Count.max()])
pandas’ DateOffset represents a regular frequency increment:
The count() method returns the number of non-NaN values in each column. Similarly, count(axis=1) returns the number of non-NaN values in each row.
```python df = pd.DataFrame(np.random.randn(10,3),columns=list(‘ABC’))
Say we have two columns of data representing the same quantity; one column is from training data, the other is from validation data. How can we efficiently g...
GUIDs are not strings. They are numbers. We render them as strings for readability. We should not process them as strings. We should not pass them around ...
If you want the URL for your sign-in page to contain your company name (or other friendly identifier) instead of your AWS account ID, you can create an al...
UNION removes duplicate records, whereas UNION ALL does not.
You can enable high-resolution plots in jupyter notebook using the following configuration:
> import numpy as np > np.set_printoptions(precision=4, ) > print(np.array([1.123456789])) [ 1.1235]
Press the Command-Control-space to launch the characters palette. Then click on the Emoji icon in the sidebar on the left side of the Character window. You c...
ditto is slightly more advanced but can be advantageous to ‘cp’ for several reasons, as it not only preserves ownership attributes and permissions but als...
Get_ddl statements
When working with multiple branches at the same time, people clone the whole git repository again.
You can’t actually delete a user in Slack. You can, however, disable a user. And the cool thing about disabled users is that you can still access the mess...
Say you need to append a directory to PATH, but only add a leading : if PATH is already set. The standard
Use M-shift-backspace.
You can use mathb.in to send someone a link to a bit of LaTeX.
Sometimes you push to Travis CI and there is no new build. What to do in such case? Has Travis CI got your commits? Is the branch you were using disabled?...
``` sql // Add two years to the current date: select dateadd(year, 2, current_date());
Use matplotlib.pyplot.suptitle() to add a centered title to the figure:
To list the file formats for which you have access privileges, use:
Here’s how to select all columns and add a new column.
To extract the integer value of days from a numpy.timedelta64, you divide it with a timedelta64 of one day:
Control-Command-Power/Eject will reboot the Mac instantly. Command-Option-Control-Power/Eject will shut it down. Command-Shift-Q will log off. Shift...
Using the unicodedata Python module it’s easy to normalize any unicode data strings (remove accents etc):
In the python REPL, _ is assigned the last value printed:
I often generate UUIDs ( Universally Unique Identifiers ), but when I use these in testing, I want to do so reproducibly. Turns out you can do this using a s...
import time start_time = time.time() main() print("--- %s seconds ---" % (time.time() - start_time))
The tempfile.SpooledTemporaryFile function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds the pa...
Find a list of all python modules installed on a machine by running the following command in a terminal:
To make Python treat a variable as pseudo-private, follow the convention of putting two underscores (i.e., __) at the beginning of the variable’s name, e.g.:
To print a string without appending the usual newline, use the end parameter of the print function:
Since Python 3.3, you can force the normal print() function to flush without the need to use sys.stdout.flush(); just set the flush keyword argument to Tr...
The python3 pickle.load() function has optional keyword arguments that can be used to control compatibility support for pickle stream generated by Python 2:
When you try to sort a list of strings that contain numbers, the normal python sort algorithm sorts lexicographically, so you might not get the results that ...
A downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individ...
In Python 3.6, variables (in global, class or local scope) can now have type annotations using either of the following two forms:
One of the most confusing moments for new developers is when they discover how Python treats default arguments in function definitions.
A ‘lexical closure’ is a fancy name for a function that remembers the values from the enclosing lexical scope even when the program flow is no longer in that...
How do you define an function inline and then immediately call it? Like this:
Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator ...
Joblib traces parameters passed to a function, and if the function has been called with the same parameters it returns the return value cached on a disk.
Given this dictionary:
```python import hashlib
You can use Python’s built-in dis module to disassemble functions and inspect their CPython VM bytecode:
It’s very easy to add thousands separators to numbers:
```python import collections c = collections.Counter(‘helloworld’)
You can use the environment markers to achieve this in requirements.txt since pip 6.0:
In many applications += and append are interchangeable, but for Python lists they are not.
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain muta...
You probably know how to rewrite a for loop such as:
If you have a variable with a large memory footprint, you can force garbage collection using the gc Garbage Collector module:
You can check for class inheritance relationships with the issubclass() built-in:
if mystring and mystring.strip(): print "not null and not empty string" else: print "null or empty string"
In Spark >= 1.5 you can use the size function to calculate the length of a column:
You can configure the AWS Command Line Interface to use a role by creating a profile for the role in the ~/.aws/config file. The following example shows a ro...
```python import matplotlib.pyplot as plt
```python import matplotlib.pyplot as plt import numpy as np
matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.
```python import matplotlib.pyplot as plt
>>> import numpy as np >>> np.nanmax([1, 2, np.nan]) 2.0
The lsof command is used to list open files. This includes listing network connections. This means I can check if a particular port is in use and what proces...
Are you on Ubuntu? Want to know what version (release) of Ubuntu you are using?
In Linux, the dd command can be used for simple I/O performance measurements as follows:
dd in conjunction with any stream-processing CPU-intensive program can be used as a simple CPU benchmark!
Tired of being prompted for confirmation by command-line utilities? Wish you could blindly respond ‘yes’ to whatever it is they are bugging you about? The ye...
If you want a quick and easy method to max out the usage of a CPU core, just use yes:
When I want to know where an executable is, I use which like so:
wc -L shows the length of the longest line in a file.
Have you ever been working in the terminal and found yourself repeating the same command many times? Delegate that work to the computer.
vmstat allows the user to monitor virtual memory statistics such as processes, memory, paging, block IO, traps , disks and cpu activity.
Why don’t I use this command more?
To duplicate the content while piping you can use the tee utility. One straightforward and useful example is that tee can be used to write to multiple fil...
To make your 4 CPUs run flat out for 60 seconds:
du -hsc * | sort -h
By default, the sort command will sort things alphabetically. If you have numerical input though, you may want a numerical sort. This is what the -n flag is ...
Use apt-cache policy <package>. For example:
You can use the apropos command with a keyword argument to search for that words occurrence throughout all the man pages on your system. For instance, invoki...
The ack command makes it easy to narrow the set of searched files to those of a specific programming language. For instance, if you have a rails project and ...
awk '/pattern/ {print $0 >"yes.csv"; next}{print $0 >"no.csv"}' input.csv
Use the -L flag with ssh to forward a connection to a remote server
You may need to connect to a remote location via SSH, but if the remote location doesn’t provide an interactive shell, the connection is most likely to dr...
To run a local script enki.sh on a remote machine, use:
One of the benefits of piping is that you can use it over networks and it does wonders for data transfer. Note that half of the command is executed locally, ...
Whenever an SSH session freezes, I usually mash the keyboard in desperation and then kill the terminal session. This can be avoided though. SSH will listen f...
To see the SSH Escape Sequences, hit <Enter>~?.
Reverse a string with the rev command.
Use the repeat command to repeat some other command.
You can use qpdf to merge pdf files into a single file as follow:
You can use pv to monitor the progress of any pipe, by putting it between input/output pipes.
The ps command, which stands for process status, is a great way to find different processes running on a machine. Information like their pid (process id) is ...
Process substitution can be used to create a file descriptor from the evaluation of a shell command. The syntax for process substitution is <(LIST) where ...
$ expands to the process ID of the shell. So, you can see the PID of the current shell with echo $$.
Adding | pbcopy to the end of any command will send the standard output to your clipboard.
If you are working with a complicated command in the terminal trying to get the arguments just right. Such as this curl:
Recently while trying to fix a printer I used lpr a bunch of times. It’s not exactly new to me, but never fails to surprise people when I use it.
The last command is a handy way to find out who has been connecting to a machine and when.
You can quickly kill everything running on a certain port with the following command.
Here’s how to install a package from a specific repository:
To get the MAC addresses of all your interfaces, grep the HWaddr from the ifconfig command:
Often times there is a very specific command you have entered into your bash prompt that you need to run again. You don’t want to have to type it again and s...
In bash, !?foo will repeat the most recent command that contained the string ‘foo’.
You can use !$ as a way to reference the last argument in the last command. This makes for an easy shortcut when you want to switch out commands for the same...
Let’s say we just executed the following command:
The hexdump unix utility allows you to dump the contents of a compiled/executable file in a _readable_ hexadecimal format. Adding the -C flag includes a side...
Tools like grep, ack, and ag make it easy to search for lines in a file that contain certain text and patterns. They all come with the -o flag which tells th...
I often use grep and ag to search for patterns in a group or directory of files. Generally I am interested in looking at the matching lines themselves. Howev...
You can use the -e flag with the grep command to search for a pattern. Additionally, you can use multiple -e flags to search for multiple patterns. For insta...
The grep command is generally used to find files whose contents match a pattern. With the -L (--files-without-match) flag, grep can be used to find files tha...
The other day I tried to run a rm command on the contents of a directory with a LOT of files.
Sometimes when deleting a file, the error “File is already in use” is encountered, with further trouble locating the process using the file.
Use the -newer flag with the name of a file to find files that have a newer modification date than the named file.
Using find is a handy way to track down files that meet certain criteria. However, if there are directories full of irrelevant files, you may end up with a l...
I recently discovered that my Linode box was running a fairly old version of Ubuntu. Because it is a remote box that I SSH into, there is no graphical user i...
When using the cp command to copy files, you can use the -n flag to make sure that you do not overwrite existing files.
The dig (domain information grouper) command can be used to get more information about a domain name. To discover the IP address for a given domain, invoke d...
If you want to inspect the headers of a response from some endpoint, look no further than a quick curl command. By including the -I flag, curl will return ju...
I often use curl to take a quick look at the responses of particular endpoints. If I try to curl a URL that is secured with HTTP Basic Authentication, this i...
The command expand in GNU coreutils converts tabs in each input file to spaces. The command unexpand does the reverse, converting spaces in each input files ...
To convert the contents of a file to all lower case, you can use:
cat --squeeze-blank (or cat -s for short) will suppress repeated empty output lines.
You can change the default shell program for a particular unix user with the chsh command. Just tell it what shell program you want to use (e.g. bash or zsh)...
You can quickly view a file using cat
Remember that orderBy uses the number of partitions specified by spark.conf.get("spark.sql.shuffle.partitions"). The default for this is 200. Can change manu...
If you want to reuse a dataframe df without having to recreate it, you can use df.cache() to tell Spark to keep the dataframe in memory.
If you are running tmux locally and you shell into another machine to access tmux remotely, you will suddenly find yourself in tmux inception. You will have ...
In tmux, the size of window panes can be adjusted incrementally with the resize-pane command. For instance, to resize a pane in any direction (left, down, up...
If you’ve created an unnamed tmux session or you no longer like the original name, you can open a prompt to change it by hitting
The current pane can be killed (closed) using the following key binding:
When in copy mode (<prefix>[), you can move the cursor around like you would in vim with the directional keys (hjkl). This works fine until you want to...
When you initially start a tmux session, the default directory is based off of whatever the current working directory was. Any subsequent windows opened with...
Any tmux command will, by default, be invoked against the default server. You can instruct tmux to perform commands against a different server with the -L fl...
When creating a new tmux session
Not sure if tmux is running or, if it is, which sessions are available? You can list all the currently running sessions right from the command-line.
There are a couple ways to list all the tmux key bindings. If you are not currently in a tmux session, you can still access the list from the terminal with
When you are done with the current tmux session and you no longer need it, you can simply kill it. You can do so within the session with the following comman...
Assuming I have a database with a posts table:
drop database foo_development;
Given a date in PostgreSQL
By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day tod...
If you have a table with some sort of type column on it, you can come up with a count of the records in that table by type. You just need to take advantage o...
The pg_typeof() function allows you to determine the data type of anything in Postgres.
Consider the scenario where you are trying to drop a database, but there are existing connections.
The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the pr...
Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just...
Want to get an idea of how much disk space that additional index is taking up?
To restore the dump, create a fresh database and then use pg_restore:
With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can by e...
Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (th...
If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.
When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative p...
By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be ...
You can concisely create sets of values in PostgreSQL using the values command.
You can determine the current user of a psql session by selecting on the current_user
In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by p...
Using a Postgres table for caching? You might want to try making it unlogged.
Postgres has many types, each of which fall into a particular category. These categories include Array, Boolean, String, Numeric, Composite, etc. Each of the...
If you have tables A and B where B has a foreign key referencing A, then trying to truncate A will result in something like this:
Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.
When digging around your database and running queries, it is helpful to have an eye on the speed of those queries. This can give insight into where there are...
Perhaps the more common way to list all rows in a table is with the following select command:
You can limit the amount of time that postgres will execute a statement by setting a hard timeout. By default the timeout is 0 (see show statement_timeout;) ...
Postgres has all the mathematical operators you might expect in any programming language (e.g. +,-,*,/,%). It also has a few extras that you might not be exp...
If you can connect to your database with psql, then you can easily find the location of your Postgres config files. After connecting, I can ask Postgres to s...
Where does postgres store all of the data for a database cluster? Well, in its data directory. Where exactly that data directory is can depend on how the dat...
Running show timezone; will reveal the timezone for your postgres connection. If you want to change the timezone for the duration of the connection, you can ...
In PostgreSQL, the internal seed for the random number generator is a run-time configuration parameter. This seed parameter can be set to a particular seed i...
You can send a command to psql to be executed by using the -c flag
There are a handful of settings that I inevitably turn on or configure each time I open up a psql session. I can save myself a little time and sanity by conf...
By default, psql will display null values with whitespace. This makes it difficult to quickly identify null values when they appear amongst a bunch of other ...
You can check if a string contains another string using the position function.
The pgcrypto extension that ships with PostgreSQL can be used to do a number of interesting things. This includes functions for doing salted password hashing...
If you check out the docs for the uuid-ossp extension, you’ll come across the following message.
The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. A...
Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different...
Within psql, type \du to list all the users for a database and their respective permissions.
I’ll often times use \d or \dt to check out the tables in my database. This shows the schema, object name, object type (e.g. table), and owner for each.
There are two ways to list all the available databases. The first is a psql only command:
If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situat...
Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop ...
Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to c...
The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear f...
Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn’t care if they ty...
Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slic...
In postgres, an array can be defined using the array syntax like so:
Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign ...
In postgres, if you are truncating a table or doing some other sort of destructive action on a table in a development or testing environment, you may notice ...
In PostgreSQL, we can determine the age of something (or someone) by passing a timestamp to the age function.
Hold the option key while resizing a corner of a window and it will simultaneously and equivalently resize the opposite corner.
Anytime you add a header to a markdown file, GitHub attaches an href with its downcased name. ‘JavaScript’ receives a link to #javascript, for instance.
If you run a tidy ship and use plugins like vim-spacejam, then whitespace changes cluttering up your git diffs probably isn’t much of an issue.
Add Emoji To GitHub Repository Description
To compare two strings in a bash script, you will have a snippet of code similar to the following:
You can move the cursor without arrow keys. Here is the keyboard equivalent for each.
There are all sorts of ways to do things in your shell environment without reaching for the arrow keys. For instance, if you want to move up to the previous ...
If you’ve installed a couple versions of a program via brew and you’d like to switch from the currently linked version to the other installed version, you ca...
Do you always forget how to read/write cronjobs? Check out crontab.guru - the cron schedule expression editor.
All Physics students learn the Second Law of Thermodynamics, that entropy always increases. But not all such students understand why this is. To do this, let...
Some advice from a parent dealing with the death of his young son:
Carl Sagan debunks Flat Earthers using nothing more than a piece of cardboard:
As a parent of two small children, I have to constantly remind myself that some positive or negative event that seems trivial to me may be incredible or deva...
Microsoft Research’s danah boyd has been given an award by the Electronic Frontier Foundation, and gave a magnificant speech on her experience as a women in ...
The cross-pollination that naturally occurs when people move along a diverse career path is of great benefit to the areas they find themselves in. This was e...
I just finished Baratunde Thurston’s How to be Black, a wonderful “satirical guide to race issues – written for black people and those who love them”.
A discriminating palate leads to novel rigorous statistical methods
I’ve been one-hot encoding categorical variables for as long as I have been using sci-kit learn. It turns out that you can lose a lot of predictive power thi...
We all know the importance of strong passwords, don’t we?
Due to the vast number of medical billing codes, it is generally infeasible to generate machine learning features from them as one-hot vectors. The paper Can...
Another nice write-up on the use of Local Interpretable Model-Agnostic Explanations (LIME) to explain complex machine learning models.
Awesome work to demonstrate how to deliberately cause a SHA-1 collision.
It’s always made me sad when people tell me they dislike mathematics. I always wonder if it was the subject or their teachers that they disliked…
This wired.com article on ‘bad science’ speaks to me on so many levels. It hurts my soul to see how many published studies are not reproducible.
If you can’t understand how a model makes a prediction, how can you trust that prediction?
Q: What’s the number-one problem in healthcare? A: I think the number-one problem is we don’t measure performance. We don’t measure the outcomes of patien...
This month, the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning held a special session on “Interpretab...
Data scientists often say that they want access to the ‘raw data’ – but what does that term mean?
How on earth could an American pilot get a Distinguished Flying Cross for shooting down an American plane? The Friendliest of Fire (via Now I Know) tells all.
In the proud tradition of programmers everywhere, I use my first blog post to say “Hello World!”.
In a previous post, I discussed the importance of learning how to properly communicate Data Science to maximize the impact of your work. Product Managers are...
One of the major differentiators between a new Data Scientist and a more experienced one is how the more senior practitioner spends a lot of time understandi...
Another nice Medium post from Benjamin Obi Tayo has a good summary of the types of issues you should always be mindful of when you get a new data set:
When we train a machine-learning model, we almost always report some performance metric, such as accuracy, recall, or F1-score.
The area under the ROC (receiver operating characteristic) curve, or AUC, is a popular and robust metric for machine learning classification. However, one is...
When trying to characterize a dataset, we often reach for the old standby: the mean of each property. If we give it some more thought, we might consider usin...
Automatically [add] extra digital proof data to all photos and videos you take.
Top tips on better descriptive statistics:
Looking for inspiration for your data viz project? Can’t remember what a particular visualization is called? Check out the Data Visualisation Catalogue.
Since ancient times, mankind has studied the sky and wondered what the “wandering stars” (planets) might be. In the last two decades, we have found hundreds ...
In the shadow of the ‘Big Eye’, this is the Little Telescope That Could…