In the Harvard Business Review, Kalyan Veeramachaneni has a great discussion of principles to meet this goal:
]]>If companies want to get value from their data, they need to focus on accelerating human understanding of data, scaling the number of modeling questions they can ask of that data in a short amount of time, and assessing their implications. In our work with companies, we ultimately decided that creating true impact via machine learning will come from a focus on four principles:
- Stick with simple models: We decided that simple models, like logistic regression or those based on random forests or decision trees, are sufficient for the problems at hand. The focus should instead be on reducing the time between the data acquisition and the development of the first simple predictive model.
- Explore more problems: Data scientists need the ability to rapidly define and explore multiple prediction problems, quickly and easily. Instead of exploring one business problem with an incredibly sophisticated machine learning model, companies should be exploring dozens, building a simple predictive model for each one and assessing their value proposition.
- Learn from a sample of data—not all the data: Instead of focusing on how to apply distributed computing to allow any individual processing module to handle big data, invest in techniques that will enable the derivations of similar conclusions from a data subsample. By circumventing the use of massive computing resources, they will enable the exploration of more hypotheses.
- Focus on automation: To achieve both reduced time to first model and increased rate of exploration, companies must automate processes that are normally done manually. Over and over across different data problems, we found ourselves applying similar data processing techniques, whether it was to transform the data into useful aggregates, or to prepare data for predictive modeling—it’s time to streamline these, and to develop algorithms and build software systems that do them automatically.
git ls-files --others --exclude-standard
from IPython.display import display_html
from itertools import chain,cycle
def display_side_by_side(*args,titles=cycle([''])):
# source: https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side
html_str=''
for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
html_str+='<th style="text-align:center"><td style="vertical-align:top">'
html_str+="<br>"
html_str+=f'<h2>{title}</h2>'
html_str+=df.to_html().replace('table','table style="display:inline"')
html_str+='</td></th>'
display_html(html_str,raw=True)
df1 = pd.read_csv("file.csv")
df2 = pd.read_csv("file2")
display_side_by_side(df1.head(),df2.head(), titles=['Sales','Advertising'])
### Output
Two DataFrames side by side. [Photo by Lucas Soares.]
Via Lucas Soares.
]]>import pandas as pd
df = pd.DataFrame(dict(a=["a","b","c"],b=[1,2,3]))
df_dictionary = dict(zip(df["a"],df["b"]))
df_dictionary
# Output is {'a': 1, 'b': 2, 'c': 3}
Via Lucas Soares.
]]>I wanted to be able to track versions of the Docker image (and the Dockerfile used to create those images), and link those versions back to specific Git commits in the source repository.
This variation of the git log command will print only the full hash of the last commit to the repository: git log -1 --format=%H
If you prefer the shortened commit hash …, then just change the %H
to %h
, like this: git log -1 --format=%h
.
You’ll need to add lines like this to your Dockerfile:
ARG GIT_COMMIT=unspecified
LABEL org.opencontainers.image.revision=$GIT_COMMIT
Note that I’ve updated the label name in the original post to reflect an update later in the post.
The first line defines a build-time argument, and [setting this to ]
=unspecified
means that if the built-time argument is omitted or not supplied, it will default to the value of “unspecified”. The second line takes the information from the argument and adds it as a label on the image.[Now] build the image with the
--build-arg
flag:
docker build -t flask-local-build --build-arg GIT_COMMIT=$(git log -1 --format=%h) .
Note that the --build-arg
flag applies to docker-compose
commands too.
When you build the image this way, you can then see the Git commit attached to the image as a label using this command:
docker inspect flask-local-build | jq '.[].ContainerConfig.Labels'
Via Scott Lowe.
]]>pv file.txt | tee >(sha256sum > file.sha1) > file-copy.txt
Via commandlinefu.com.
]]># 1.
sudo -s <<< 'apt update -y && apt upgrade -y'
# 2.
sudo sh -c 'apt update -y && apt upgrade -y'
Via commandlinefu.com.
]]>So it was nice to see an evolution of this service – now you can:
Change the “X” in any arXiv article link to the “5” in ar5iv to get a modern HTML5 document.
]]>📢 Welcome to https://t.co/YKX9oX7hp4
— Deyan Ginev (@dginev) January 31, 2022
Change the "X" in any arXiv article link to the "5" in ar5iv to get a modern HTML5 document.
Thread: what is included, why now, and how we hope to merge back into arXiv. #OA #OpenScience #preprints
1/10
A simple list can easily be used to implement a queue abstract data structure. A queue implies the first-in, first-out principle.
However, this approach will prove inefficient because inserts and pops from the beginning of a list are slow (all elements need shifting by one).
It’s recommended to implement queues using the
collections.deque
module as it was designed with fast appends and pops from both ends.
from collections import deque
queue = deque(["a", "b", "c"])
queue.append("d")
queue.append("e")
queue.popleft()
queue.popleft()
print(queue)
# output is: deque(['c', 'd', 'e'])
A reverse queue can be implemented by opting for
appendleft
instead ofappend
andpop
instead ofpopleft
.
Via enki.com.
]]>To see a list of which commits are on one branch but not another, use git log
:
git log --no-merges oldbranch ^newbranch
You can list multiple branches to include and exclude, e.g.:
git log --no-merges oldbranch1 oldbranch2 ^newbranch1 ^newbranch2
The --no-merges
flag exclude commits that are merges.
[You can show] commits and commit contents from other-branch that are not in your current branch:
git show @..other-branch
Additionally you can apply the commits from other-branch directly to your current branch:
git cherry-pick @..other-branch
To show the commits in oldbranch but not in newbranch:
git log newbranch..oldbranch
To show the diff by these commits (note there are three dots):
git diff newbranch...oldbranch
[To] list all branches [that] contain the commits from “branch-to-delete”:
git branch --contains branch-to-delete
Via Stack Overflow.
]]>