Numpy Basics

Posted on 2018-05-15 by violet — No Comments ↓

Numpy is a python library for efficiently dealing with arrays. Under the hood, it can leverage C and Fortran to achieve those efficient array operations.

It's important to note that if you want to use numpy for a single element, use np.array([1]) as opposed to np.array(1) or np.uint8(1). Operations on np.uint8 or a scalar np.array (such as np.array(1)) aren't guaranteed to return a numpy data type or to behave properly:

In [35]: type(np.array(10000) * 1000000000000000000000000000000)
Out[35]: int

In [36]: a = np.array(10000)

In [37]: a *= 1000000000000000000000000000000
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
----> 1 a *= 1000000000000000000000000000000

TypeError: ufunc 'multiply' output (typecode 'O') could not be coerced
to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

Although it's also important to remember that, under normal operation, you need to deal with overflow. Depending on your application, overflow can be a desirable thing. Here's an example of overflow:

In [1]: import numpy as np
In [3]: a = np.array([255], dtype=np.uint8)
In [4]: a
Out[4]: array([255], dtype=uint8)
In [5]: a+1
Out[5]: array([0], dtype=uint8)

Instead of 255+1 becoming 256, it became 0, because 255 is the maximum value a uint8 can hold, so when 1 was added to it, all the bits were flipped from 1s to 0s, i.e. from 11111111 to 00000000. uint8: "u" means unsigned, as in no negative numbers; int means integers, as in no decimal places; 8 means 8 bits, as in 8 digits, each of which is either a 0 or a 1.

Internet Security for Everyone

Posted on 2018-05-14 by violet — No Comments ↓

Why you should care: These days, almost everyone uses computers or "smart" phones (or internet-connected toys/refrigerators/thermostats). It's difficult to avoid them. But everyone should know some basic things when using them, so that you don't unwittingly give away your passwords, credit card numbers, or let your computer become a zombie in a botnet.

Security things to remember when using computers:
Read more Internet Security for Everyone ›

Postgresql short-circuit evaluation

Posted on 2018-05-11 by violet — No Comments ↓

The purpose of this post is to demonstrate that Postgres follows short-circuit evaluation. That means, if you are checking two boolean values, if you are seeing if one or the other is true, and the first one is true, then you don't need to check the second one, because you know the whole statement is true. For example:

true or true: true; when we see "or", we know that because the first statement was "true", we don't need to check the second statement.
true or false: true; same answer as above, for the same reason. this statement is still true even though the second part of the statement is "false", because the first part is true and it's joined using "or".
Read more Postgresql short-circuit evaluation ›

Git Tricks

Posted on 2018-05-11 by violet — No Comments ↓

Always remember the difference between "log" and "diff". Log lists commits (optionally showing code changes using -p/--patch), while diff shows the code changes. Log can help you understand changes over time, while diff can help you get a holistic understanding of those changes.

Learning to learn

The first thing to know about git is how to read the manpages. The man pages can be accessed by running these from the command line: "man git", "man git-diff", "man git-log", etc. Note that there is a top-level "git" command (one that doesn't have "diff"/"log"/etc following it), and that it does have its own man page. It introduces all the git commands and the terminology/syntax.

Git Diff

For example, if you know someone implemented a feature but it took them multiple commits, you can use git-log to find the commit just before the first commit to implement the feature Read more Git Tricks ›

Local Article Saver Database System (Like my own Pocket)

Posted on 2018-04-29 by violet — No Comments ↓

This has been on my todo list for a long time, but I finally made my own article saver! It's like Pocket!

https://github.com/mica5/article-saver

Database in postgresql.
Page viewer as a python falcon server.
Saver/searcher in a Jupyter notebook.

Apple Reminders Tips

Posted on 2018-04-26 by violet — 3 Comments ↓

Apple Reminders is a bit buggy. If you have a ton of reminders like I do, it can start exhibiting some strange behavior.

If you see a reminder that's missing a description, BE CAREFUL. There's a possibility that what you're seeing is not what's truly there - and in that case, any recent modifications you made to tasks (edits/deletions/marked completions) might have modified the wrong tasks. If you just deleted any tasks, you should hit the undo button to un-delete any tasks that you just deleted. After undoing any deletions, quit Reminders, and reopen it. Hopefully, at that point, things will be correct and you'll be able to proceed as normal. (If you quit reminders after deleting tasks, there's no way to get the tasks back unless you have a Time Machine backup, and even then I'm not sure how to get those reminders back. It would be a big hassle). Read more Apple Reminders Tips ›

Increasing daily outdoor temperature for the past 3 days

Posted on 2018-03-29 by violet — No Comments ↓

If you know the mnemonic "ROY G BIV" for the rainbow, then this graph should be pretty easy to read. R is red for today, O is orange for yesterday, Y yellow for the day before that, etc (green blue indigo violet). Just start from today and follow your eyes to the left until you reach the border, then mentally pick the next color, then find where that color starts on the right side of the graph. Repeat.

My motivation for this post was that the outside temperature was noticeably higher yesterday than the previous days; it was over 80ºf. I had a feeling my graphs would be pretty interesting - this graph shows that it consistently got warmer each day for the past 3 days.

San Jose, CA outdoor temperature from 2018-03-23 to 2018-03-29.

I haven't seen that on weather sites before! Siri says she can't tell you about past weather.. but that should be even easier than trying to predict the future!

Machine Learning, self-explainability and ethics

Posted on 2018-03-15 by violet — No Comments ↓

Jason Yosinski gives a really good video demonstration of the Deep Visualization Toolbox http://yosinski.com/deepvis (near the bottom of the page). This toolbox allows users to understand how their trained image classification deep neural network is thinking under the hood - not just that it recognizes bookshelves, but all the things that go into recognizing a bookshelf, for example, the edges of the bookshelf and the text on the binding of the books.

Ideas on self-explainability

I think it becomes clearer from this video that the individual neurons in different layers learned things implicitly - like edge detection, text detection, face detection, and then the last layer of the network is basically running classification on the output of the intermediate steps to detect things that are labeled from the labeled images from the image dataset like animals, objects, etc. The idea I have regarding explainability is this: instead of simply having implicit feature detection (where you could then try to manually understand what each neuron is doing by experimentation like in this video), have explicit feature detection by having the final output (or more likely the second-to-last layer) be labels of many things that are tangible, like "fur", "cat ears", "dog tail", "human nose", etc, and then run classification on the output of that layer. It's possible that the network would lose some accuracy, because practitioners' hope with the current system is that whatever features it's learned implicitly (although less understandable to humans) are more expressive for final classification than our explicit intermediate features like "cat ears", but it's also possible that it would keep a similar level of accuracy while also being able to explain itself to humans. But that would be difficult because it would take a lot more effort to label the data - for every bookshelf image you'd have to label it "no cat ears", "no dog ears", etc., and for every cat image you'd have to label "no books", "no book shelves", etc., but images can contain both cats and bookshelves, and cats can be fur-less or tail-less and bookshelves can be book-less.

Ethics, and thoughts on how to improve

I think that using images of current employees of a company as a classifier to determine whether a job candidate would be a good employee is extremely unethical, for multiple reasons. The first and biggest glaring assumption is that current employees are good employees. Another assumption is that there is a correlation between someone's appearance and how good of an employee they are - I think that's inherently flawed, and that there are many outliers who would be heavily-discriminated against based on this model. Another problem (which is one of the main things I hear from the ethics regarding machine learning) is that this re-encodes existing human bias directly into the model and now more-strongly quantifies it in a way that is even more difficult to break. Maybe the current and past hiring managers are biased against a particular ethnic group - then this model would be biased against that ethnic group, and it's likely that no one from that ethnic group would ever be able to break that barrier which is now encoded into a model. If people want to use machine learning to augment the hiring process, they will need to use a vastly different approach than this one. The output should not be "probably a good employee" or "probably not a good employee"; as I said before, it could be something more tangible, like "critical thinker", "team player", etc. - the same things that human interviewers look for in potential job candidates. (Although I have a vague idea about what the output of that model could look like based on the examples I just mentioned, I am not sure what the input would look like - a written essay, a video/audio interview/clip, etc. - but I'm sure all of those have their own problems.)

Additional issues that are inherent to the approach that was mentioned
Another problem with their model is that, assuming that images of their current employees are good indicators of good employees, to train a model you also need training data that contrasts with the positive label, i.e. the negative label, which would be images of people who are not good employees - where does that data come from, who decides that, and how do they decide that? Even if this model could possibly have a chance of working in any conceivable way, they need a lot more data than just images of all the current and past employees to make the model viable, and to get that amount of data they need many people to gather and label that data, and the people who do that work should both be able to judge good and bad employees and be trustworthy to judge based on the metrics they are asked to use to label the data. If someone in a particular group, an ethnic group or any other kind of group, decides that they want to give a leg up to the other people in their group rather than labeling based on good/bad employee status, then the model will behave differently than expected. But if the label-workers do label the data in the way that they are asked, then it starts feeling like the Milgram experiment (Wikipedia).

Laptop battery plot over time

Posted on 2018-03-13 by violet — No Comments ↓

By-Day Minimum/Maximum Temperatures/Humidities in my Room Since November 2017

Posted on 2018-02-05 by violet — No Comments ↓

Uncategorized