Postgresql short-circuit evaluation

The purpose of this post is to demonstrate that Postgres follows short-circuit evaluation. That means, if you are checking two boolean values, if you are seeing if one or the other is true, and the first one is true, then you don't need to check the second one, because you know the whole statement is true. For example:

true or true: true; when we see "or", we know that because the first statement was "true", we don't need to check the second statement.
true or false: true; same answer as above, for the same reason. this statement is still true even though the second part of the statement is "false", because the first part is true and it's joined using "or".
Read more Postgresql short-circuit evaluation

Git Tricks

Always remember the difference between "log" and "diff". Log lists commits (optionally showing code changes using -p/--patch), while diff shows the code changes. Log can help you understand changes over time, while diff can help you get a holistic understanding of those changes.

Learning to learn

The first thing to know about git is how to read the manpages. The man pages can be accessed by running these from the command line: "man git", "man git-diff", "man git-log", etc. Note that there is a top-level "git" command (one that doesn't have "diff"/"log"/etc following it), and that it does have its own man page. It introduces all the git commands and the terminology/syntax.

Git Diff

For example, if you know someone implemented a feature but it took them multiple commits, you can use git-log to find the commit just before the first commit to implement the feature Read more Git Tricks

Apple Reminders Tips

Apple Reminders is a bit buggy. If you have a ton of reminders like I do, it can start exhibiting some strange behavior.

If you see a reminder that's missing a description, BE CAREFUL. There's a possibility that what you're seeing is not what's truly there - and in that case, any recent modifications you made to tasks (edits/deletions/marked completions) might have modified the wrong tasks. If you just deleted any tasks, you should hit the undo button to un-delete any tasks that you just deleted. After undoing any deletions, quit Reminders, and reopen it. Hopefully, at that point, things will be correct and you'll be able to proceed as normal. (If you quit reminders after deleting tasks, there's no way to get the tasks back unless you have a Time Machine backup, and even then I'm not sure how to get those reminders back. It would be a big hassle). Read more Apple Reminders Tips

Increasing daily outdoor temperature for the past 3 days

If you know the mnemonic "ROY G BIV" for the rainbow, then this graph should be pretty easy to read. R is red for today, O is orange for yesterday, Y yellow for the day before that, etc (green blue indigo violet). Just start from today and follow your eyes to the left until you reach the border, then mentally pick the next color, then find where that color starts on the right side of the graph. Repeat.

My motivation for this post was that the outside temperature was noticeably higher yesterday than the previous days; it was over 80ºf. I had a feeling my graphs would be pretty interesting - this graph shows that it consistently got warmer each day for the past 3 days.

San Jose, CA outdoor temperature from 2018-03-23 to 2018-03-29.

I haven't seen that on weather sites before! Siri says she can't tell you about past weather.. but that should be even easier than trying to predict the future!

Machine Learning, self-explainability and ethics

Jason Yosinski gives a really good video demonstration of the Deep Visualization Toolbox http://yosinski.com/deepvis (near the bottom of the page). This toolbox allows users to understand how their trained image classification deep neural network is thinking under the hood - not just that it recognizes bookshelves, but all the things that go into recognizing a bookshelf, for example, the edges of the bookshelf and the text on the binding of the books.
Ideas on self-explainability
I think it becomes clearer from this video that the individual neurons in different layers learned things implicitly - like edge detection, text detection, face detection, and then the last layer of the network is basically running classification on the output of the intermediate steps to detect things that are labeled from the labeled images from the image dataset like animals, objects, etc. The idea I have regarding explainability is this: instead of simply having implicit feature detection (where you could then try to manually understand what each neuron is doing by experimentation like in this video), have explicit feature detection by having the final output (or more likely the second-to-last layer) be labels of many things that are tangible, like "fur", "cat ears", "dog tail", "human nose", etc, and then run classification on the output of that layer. It's possible that the network would lose some accuracy, because practitioners' hope with the current system is that whatever features it's learned implicitly (although less understandable to humans) are more expressive for final classification than our explicit intermediate features like "cat ears", but it's also possible that it would keep a similar level of accuracy while also being able to explain itself to humans. But that would be difficult because it would take a lot more effort to label the data - for every bookshelf image you'd have to label it "no cat ears", "no dog ears", etc., and for every cat image you'd have to label "no books", "no book shelves", etc., but images can contain both cats and bookshelves, and cats can be fur-less or tail-less and bookshelves can be book-less.
Ethics, and thoughts on how to improve

I think that using images of current employees of a company as a classifier to determine whether a job candidate would be a good employee is extremely unethical, for multiple reasons. The first and biggest glaring assumption is that current employees are good employees. Another assumption is that there is a correlation between someone's appearance and how good of an employee they are - I think that's inherently flawed, and that there are many outliers who would be heavily-discriminated against based on this model. Another problem (which is one of the main things I hear from the ethics regarding machine learning) is that this re-encodes existing human bias directly into the model and now more-strongly quantifies it in a way that is even more difficult to break. Maybe the current and past hiring managers are biased against a particular ethnic group - then this model would be biased against that ethnic group, and it's likely that no one from that ethnic group would ever be able to break that barrier which is now encoded into a model. If people want to use machine learning to augment the hiring process, they will need to use a vastly different approach than this one. The output should not be "probably a good employee" or "probably not a good employee"; as I said before, it could be something more tangible, like "critical thinker", "team player", etc. - the same things that human interviewers look for in potential job candidates. (Although I have a vague idea about what the output of that model could look like based on the examples I just mentioned, I am not sure what the input would look like - a written essay, a video/audio interview/clip, etc. - but I'm sure all of those have their own problems.)

Additional issues that are inherent to the approach that was mentioned
Another problem with their model is that, assuming that images of their current employees are good indicators of good employees, to train a model you also need training data that contrasts with the positive label, i.e. the negative label, which would be images of people who are not good employees - where does that data come from, who decides that, and how do they decide that? Even if this model could possibly have a chance of working in any conceivable way, they need a lot more data than just images of all the current and past employees to make the model viable, and to get that amount of data they need many people to gather and label that data, and the people who do that work should both be able to judge good and bad employees and be trustworthy to judge based on the metrics they are asked to use to label the data. If someone in a particular group, an ethnic group or any other kind of group, decides that they want to give a leg up to the other people in their group rather than labeling based on good/bad employee status, then the model will behave differently than expected. But if the label-workers do label the data in the way that they are asked, then it starts feeling like the Milgram experiment (Wikipedia).

My Anaconda/Jupyter Setup

This is mainly for myself so that I could rebuild my setup from scratch, but it would be nice if anyone else can benefit from it!

This is incomplete. I have class in 30 minutes. But this should have the majority of the information necessary for me to complete this post. Some of this information or these urls may even be repeated here...

https://www.anaconda.com/download/ - it will automatically detect your operating system

note: when installing python packages in anaconda, always first try to install using conda, e.g. `conda install somepackage`. if that doesn't work, try googling to find if it can be installed using conda under a different channel, e.g. "-c ericmjl" in "conda install -c ericmjl environment_kernels" (from below). Finally, if those fail, resort to "pip install somepackage".

conda create python=3.6 -n env_name
# source the environment. after sourcing that environment, all installations will go in there.
source activate env_name
conda install -c conda-forge jupyter_contrib_nbextensions
# this is automatically installed as a dependency of jupyter_contrib_nbextensions, so there's no need to install it if installing the other one
# conda install -c conda-forge jupyter_nbextensions_configurator

# ipdb (IPython debugger): from conda, on github
conda install -c conda-forge ipdb

https://anaconda.org/conda-forge/ipython-sql
conda install -c conda-forge ipython-sql

to see which environment the notebook is currently running in:
conda install -c ericmjl environment_kernels
https://stackoverflow.com/a/39070588/2821804
which links to http://stuartmumford.uk/blog/jupyter-notebook-and-conda.html

https://github.com/Cadair/jupyter_environment_kernels
http://stuartmumford.uk/blog/jupyter-notebook-and-conda.html
https://stackoverflow.com/questions/37085665/in-which-conda-environment-is-jupyter-executing

pip install environment_kernels # can this not be done with "conda install" instead?

Good example of a Jupyter Notebook: https://www.kaggle.com/ash316/novice-to-grandmaster/notebook (which analyzes survey results of people who participate on kaggle.com, where Kaggle is "The Home of Data Science & Machine Learning").

Naming Conda Environments so they can be seen from inside Jupyter Notebooks:
http://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments
ipykernel should be installed. the url explains how you can install it.
python -m ipykernel install --user --name myenv --display-name "Python 3.? (myenv)"
If you are already in a Jupyter Notebook in that environment, you can reload the page, and go to Jupyter Notebook Menubar -> Kernel -> Change Kernel -> and you should be able to see the kernel you just renamed. (even if you are already in that kernel/environment, the name won't show up until you reload that kernel.

How I learn efficiently

Have you ever felt like you should be able to remember something you recently learned, but can't? Do you forget most of the information you learned in a class within months after it ends?

Dr. Piotr Wozniak, creator of the software SuperMemo that was released by 1995, has spent much of his life experimenting to develop an algorithm that helps people learn the most information in the least amount of time while also letting people retain that information for a longer period after learning it (e.g. days/weeks/months/years).

The software works by calculating the "forgetting curve", using past information about your history of correct recall among all flash cards and each specific flash card, and then predicting when you will forget that information, in an attempt to ask you again right before you forget.

By asking right before you forget (the algorithm might have calculated the next forgetting curve within days or hours), it has been shown to help you retain the information for a longer period of time after reviewing it. Asking you right before or right as you are forgetting can create a small struggle for you to remember (while still being able to actually remember), which seems to help strengthen the memory pathways in your brain.

SuperMemo's website has a fresh look now (2018-01-20) since the last time I looked at it a couple years ago, which could indicate that their software has also improved, but the SuperMemo software is known for being complicated and Windows-only. There is a more modern software that I use, which is based on SuperMemo and one of its older algorithms that is publicly posted, called Anki.

I used Anki to get As in my college courses in Calculus, Calculus II, and Linear Algebra. This list would probably be longer if I knew about Anki earlier in my life.

However, flash cards should not be your only source of studying and learning. If you try to memorize information that you would otherwise never use in your life, you may be able to consistently answer those flash cards correctly, but you may not be able to remember that information outside of flash card sessions. It also helps to memorize information that is related to information you already know. Dr. Piotr Wozniak wrote an article with helpful information on rules of learning and flashcard-making called Effective learning: Twenty rules of formulating knowledge.

Gary Wolf wrote an in-depth article on WIRED about Dr. Piotr Wozniak and his development of SuperMemo called Want to Remember Everything You'll Ever Learn? Surrender to This Algorithm.

Anki's documentation has an Introduction section that explains its purpose and differences compared to SuperMemo.

SuperMemo's main website is https://www.supermemo.com/en.

You may also be able to improve your memorization skills by using Harry Lorayne's methods in The Memory Book: The Classic Guide to Improving Your Memory at Work, at School, and at Play (1996). It is easier to remember information that is related to things we already know, for example, if you already know about history, and you are presented with a new piece of information about history that relates to your existing knowledge of history, then it will be easier to remember because it fits in and makes sense. Lorayne's methods basically allow you to create associations between completely unrelated information using a variety of systems, such as turning numbers into words, and memorizing lists of items by creating ridiculous mental images that tie those items together (for an example and explanation of some memory systems/techniques, see https://www.memory-improvement-tips.com/memory-association.html). While I still remember all of the number-sound associations for turning numbers into words from that book, I have experienced difficulty in forming a habit to use his techniques on a daily basis. This stuff should probably be taught in elementary/middle/high school to help form long-lasting habits that will stick with people for life. He also has newer books that I have not reviewed, which could be even better than the one I listed.

In Anki, I use reversible cards, and the type-answer method.

Type the answer and writing the answer on a whiteboard (or with a tablet computer): To get the type-answer method, follow this reddit answer (there's a reddit-anki community). Writing out the answer improves my ability to gauge how well I knew the answer. If I just try to think of the answer in my head and then reveal the back of the card, it's easier to fool myself into thinking I knew the answer when really maybe I was thinking between two different answers, or I didn't fully produce the answer in my head. By typing the answer or writing it on a whiteboard, I can't fool myself into thinking I knew the answer when I didn't, because my response and the true answer are clearly there for comparison (the typed answer doesn't have to be exact if you're not trying to memorize a poem or a legal definition, because then you're memorizing verbatim words instead of ideas). However, I have found it interesting when it comes to certain mathematical definitions, that I might remember the concept but forget a detail, for example wither i=1,2,3,...,k or i=1,2,3,...,infinity. At that point, it's up for you to decide based on context whether to penalize yourself for forgetting that detail (like you'll miss points if you were to get wrong on an upcoming midterm) vs letting it go (you're just trying to get a good overview of the material).

Reversible cards: To get reversible cards, follow part 2 of this tutorial. Not all cards make sense to have a reverse, for example, "how many bits are there in the key of the DES encryption algorithm?" should have the answer "64", but the reverse shouldn't say "what is 64?", because 64 can be many things besides the number of bits in the DES key, like 2^6, a Beatles song, the name of a magazine, etc (wikipedia). However, if you really want a mental cue for that reverse, it could say "what is 64 regarding the DES encryption algorithm?" as long as 64 doesn't appear anywhere else in the DES algorithm.