Dev Week 18

Posted on 2024-12-20 by violet — No Comments ↓

Another backdated blog post.

I've found some nirvana: AI + BookStack for a self-writing Knowledge Base.

I have tried Wiki softwares for developing knowledge bases, including MediaWiki (the software behind Wikipedia family of websites) and Trac (Python-based), but they always fail my use cases. Those are excellent for knowledge graphs, which have arbitrarily many connections, but not so great for a general knowledge base. You must manage the indexes and connections yourself, which becomes tedious very quickly.

Enter BookStack (GitHub). The simplest way to use BookStack is to have Books with pages. If you fall in love with it as much as I have, you'll also be using chapters and shelves.

Shelves have books, books have pages and chapters; chapters have pages. This provides 4 levels deep of organization.

A book can appear on multiple shelves; a shelf is a lightweight grouping of some books into a logical unit. Your "Network Security" book could appear on both the Network shelf and the Security shelf.

There's a BookStack API integration on GitHub/Pypi, which is just a lightweight wrapper that dynamically loads the OpenAPI spec from your BookStack instance to know what the API looks like (i.e. it doesn't have the API pre-built in; this library could likely be aimed at other APIs without substantial effort).

There was a bug, as the url they constructed to grab the OpenAPI spec ended with an extraneous /, resulting in a redirect, which caused the authentication to fail to apply after the client tried to visit the redirected url that didn't contain the /. I submitted a PR and hopefully it will get merged in someday.

Using this BookStack API library, I wrote a script (in a Jupyter Notebook) to download the entire hierarchy of content from my BookStack, and a printer that accepts an optional depth. Here's an example snippet:

Systems and Infrastructure
    Gentoo
        📄 Miscellaneous
        📄 Portage Package Management
    Desktop Environments
        📗 Input Configuration
            📄 X11 Input Tools (xev, xinput)
            📄 Kernel Input Tools (evtest, showkey)
            📄 udev Configuration
            📄 libinput Setup
        📗 X11 System
            📄 Window Management with xdotool and X11 Tools
            📄 X Authentication
    Linux
        📗 Application Standards
            📄 XDG Base Directory Specification
            📄 Icon Themes and Resources
            📄 Desktop Entry Files
            📄 Application Autostart
            📄 Runtime Directories
    System Administration
        📄 Sudo Configuration
        📄 Borg and Frontends
        📄 Common CLI Tools Reference
        📄 Mount Information
        📗 System Logging
            📄 Logrotate Configuration
            📄 Rsyslog Setup
            📄 Systemd Journal Management

The benefit of this is that I can quickly and easily share my structure with [insert your friendly chatbot's name here] and ask for recommendations on the existing structure and how to restructure based on new content.

As an example, the XDG Base Directory Specification would have originally gone in the Desktop Environments book since it's related to X11, but Claude suggested that since it's a specification and more general to Linux and not tied to desktop environments, it makes more sense to put it in the Linux book. A big part of my development of this whole structure is understanding how the pieces relate to each other. For example, I don't know whether a Riesling wine is a type of grape or a company (I just looked it up, and it's a variety of grape). By codifying my knowledge into a knowledge base as I learn, it helps me solidify the hierarchy of knowledge and relationships between concepts.

Sometimes I expect a knowledge base article for a given topic to be pretty straightforward, in which case I just ask the AI, "please write a md kb article about common SSH tunneling commands." (md: markdown format; kb: Knowledge Base.) For concepts I'm less certain about, such as XAuth, I follow a line of questioning until I feel the conversation covers my previous lack of understanding reasonably well, and then ask, "would you please summarize our conversation as a md kb article?"

From there, if it's not clear where it fits in the hierarchy, I can some context from the output of my program, such as the snippet included above, removing anything clearly unrelated, and ask for it to propose where the new KB article fits, and to suggest any improvements to the overall structure and organization. Just a few days ago I did a major restructuring, because one of my books was originally "Linux and Gentoo", which was extremely restrictive and made it more difficult to add content for either one. Claude even included some of the 📗 and 📄 emojis in its output along with following the same indentation structure, making it very easy to verify and integrate the suggestions into my BookStack instance.

Overall, I'm elated with this combination. I haven't verified every piece of information in all the articles, and I'm certain there's a lot of information that isn't correct, but I feel like the happiest rat collecting all manner of shiny objects and gathering them back in my tunnel. It gives me a sense of satisfaction with building my own knowledge base of things I care about and am interested in. It also incentivizes me to try to understand things that I previously had no hope of remembering later. Now I have a hierarchy clearly laid out. Sometimes it's more important to know where to find information than it is to know the information itself. This is especially true today in the Information Age (AI Age? AI Era?), when it's impossible to know everything and vital to be able to figure anything out quickly, since that's the majority of my day-to-day life now.

Dev Week 17

Posted on 2024-12-13 by violet — No Comments ↓

This post is actually being written 2025-01-01, but I'm backdating some posts to make up for the weeks I missed. This post is about OpenAI's recent announcements of its o1 models, which makes it all the more puzzling that I'm backdating something to a time before it became known/relevant to the world.

The typical end-user AI that most people have access to are still pretty dumb and I wouldn't count on it getting a whole lot smarter. I'm not saying it won't; I'm saying I'm keeping my expectations low. However, I always thought it was a bad idea to feed input directly from the user, to an LLM, and provide the output directly back to the user. Even with all the attempts at "guardrails", this makes a lot of assumptions of trust in both the user and the AI. But researchers have shown that AI can be made to spill their secrets or act/speak against their safety directives.

I always felt that there needs to be some smaller, simpler AI watching the output as it comes out of the model to check for basics such as: is the output response actually addressing the input request? Is its reasoning sound? Is it hallucinating? This seems pretty obvious to me, and is apparently one of the biggest breakthroughs recently for OpenAI to make their model produce many (hundreds? thousands?) of candidate outputs, and have another model which was trained directly on assessing the output of other models to pick the best (or at least most sound?) between all the responses. My idea would have involved a more programmatic/software approach with more communication overhead, but I'm certain those smart people over there could figure out how to integrate it all into a single model architecture and reduce communication between the components if that's what they're going for.

Apparently their new stuff is blowing everything previous out of the water. Every time a new benchmark gets created with the intention of stumping the models for a while longer, it gets beaten within 1-2 months.

The foreshadowing to OpenAI's release of these models was Sam Altman's earlier unsupported claim that we could be hitting AGI soon. This originally sounded like a way to appease shareholders, but now we know what he was so excited about.

This raises a million possible next conversation points, from politics to capitalism to marginalization to World War 3, but I'll leave it at that and get to my next back-dated post.

Reverse SSHFS

Posted on 2021-08-16 by violet — No Comments ↓

Mount a local folder on a remote machine.

This method uses ssh -R to open a reverse tunnel, and the remote machine has to have authorization to ssh back into the local machine. This may not be practical for all users, and if that is not acceptable for you, then this method may not be viable.

First, from the local machine:

ssh remote-machine -R 2222:localhost:22

2222:localhost:22 says "on the remote machine, open port 2222 to point back to 'localhost:22' from my vantage point, which is the ssh port back into the machine I'm ssh'ing from."

Then, on the remote machine within that ssh session:

sshfs -p 2222 localhost:local_folder remote_mount_folder -o uid=$(id -u),gid=$(id -g),allow_other

By default, if you try executing code files inside of the folder, you get a lot of strange "can't open file" errors, even when you try doing so as sudo. The uid=...,gid=...,allow_other sets the permissions such that your user temporarily "owns" the stuff that's mounted, and then other users are allowed to access data in the directory as well. This allows, for example, to run code out of that directory that then run by a less-privileged user.

I find this method very helpful when developing on a local codebase but executing/running on a remote server.

The Average Internet User's Guide to Being Paranoid on the Internet

Posted on 2021-07-16 by violet — No Comments ↓

This is a rant from 2021-07-16 that will develop into a more polished and thorough blog post and/or series of blog posts.

when it comes to malware, prevention is the best. once you are infected with malware, you don't know how sophisticated the attacker is, and if they are sophisticated, then you almost certainly have a backdoor or a means of persistence (surviving reboots, surviving attempts at removal, etc). if you want to get rid of it, i really don't know where i'd start other than a reputable antivirus software.

have windows defender do a full scan. it can do some amount of detection and removal. i don't know how well (if at all) it handles rootkits. malware with sophistication hides itself from tools, modifying the operating system so that attempts to detect it are difficult or impossible. once you've run a full scan with windows defender, you want to turn your computer off and boot from external media like a flash drive, probably using a linux-based operating system, and then you can scan the drives with that.

with regards to prevention, there are a lot of things i do on a regular basis to prevent getting malware. you want to harden your machine, which means making it less susceptible to attack, and the means is usually just to reduce your attack surface. user-friendly operating systems like windows and macOS try so hard to be user-friendly that they basically have their genitals hanging out in the wind asking to get attacked. they work under the assumption that their software is secure, but there's always new zero days or other previously unknown vulnerabilities, as bugs are written faster than they can be found and squashed. such is the state of the software industry, and we trust much of our lives to software and operating system producers.

in all your email clients (web-based or native), set it to not download/load/display external content. even if not for malware, there's still tracking beacons in the form of single-pixel images that simply make your computer call out to all the peeping toms and say "I'm here! I'm here! come track me!" (by this, i mean advertising companies, whether they be giants like facebook or google, or even more vicious advertising groups).

go through all your operating system settings and turn off "let my device be discovered on the network". turn off bluetooth, wifi, and other wireless technologies. again, having wireless technologies turned on is like having your genitals dangling in the wind, asking to get prodded by anyone who knows how to put kali linux on a flash drive (hint: it's not difficult). i only use wireless technologies on a regular basis on my phone. otherwise, i'm wired for everything else, including my headphones.

macOS has a feature called "power nap", where even when it's asleep it will occasionally turn on briefly to check emails, text messages, etc… turn it off.

on your home router, go through your settings and turn off uPNP, universal plug-n-play. having that on is one of the absolute best ways to get hacked. i know it's tempting to open things up for gaming or so you can reach your NAS from anywhere in the world. don't do it. if you want to connect to your home network from anywhere in the world, use a VPN. there are a VPN router hardwares you can get to achieve this.

i personally would also use ipv4 with NAT, and turn off ipv6. your refrigerator and your alexa don't need their own public IP addresses on the open internet. NAT effectively works like a firewall, not allowing any unexpected traffic in. you're most likely already using NAT. when you access stuff on the internet from your computer, your router opens up a tunnel back in from whoever you're talking to so the communications can get back in to you. otherwise, unsolicited traffic is dropped at the router and never makes it inside your network (unless you have uPNP on or port forwarding enabled on some ports).

in your browsers, install ublock origin. it blocks ads, and a lot of ads are called "malvertising" because attackers can often inject a malicious payload into ads and get them downloaded by users. so, as much as someone might want to "support websites by keeping ads enabled".. you're also letting in attackers 👍️

downloading files.. this one is a little complicated.. but it's still very important to understand if you ever plan on downloading things (most people do). only download things from sources you trust. AND, only download those things from sites that have HTTPS! (S is short for "secure", and HTTPS is "Hypertext Transfer Protocol Secure".) if you visit a website and your browser says "hey, this page isn't safe!" and then you go ahead anyways, and then you proceed to download files, you're asking to get hacked. because someone can MitM ("man in the middle"), modify the downloaded file in-route to your computer, and inject malicious stuff in there to take over your computer.

OKAY, so there is an exception. there are such things as "mirrors", and many of them don't use HTTPS. AS LONG AS YOU DO SOME VERIFICATION, this MAY be safe. see, the original source ~~will~~ [should] give you a hash. hashes are one-way functions that take data of arbitrary length and return a fixed-length string of characters, such as a9b07e070fa2a28976a7d460abb300d1. whenever you download a file from an http source (and preferably also from an https source), you MUST check the file hash! and make sure the hash they provide isn't md5 or sha1, because those have been cracked. attackers can inject malware and still make calculated modifications to the payload get the hash to match. with stronger hashing algorithms like sha256, sha512, there currently aren't any publicly known ways to crack them.

if you get a link from a mirror and aren't given a hash, assume it's malicious. seriously.

and even if you are given a hash, if you don't trust the source, still don't download it.

ahh, checking links before you click. because one click is all it takes to get attacked. and it's not like clicking the link will cause you to obviously and immediately get compromised. it's not like you click the link, a webpage loads, and it says "haha! i got you!" or your screen goes dark or something. if attackers have any level of sophistication, they will hide as best they can. and they will use your machine to send send emails, launch DDoS attacks, and possibly to try to infect other machines on your local network (i.e. your computers on your home network aren't even safe from each other! don't leave important data/files on an unauthenticated file server!!)

you can almost always hover over a link (in email clients, web browsers, etc) and see where they go. if you can't, then you should scream expletives as loudly as you can directed at the vendor of the software, and then never use the software again, and also never trust the company, and then publicly defame the company whenever you get a chance.

the most important part of a url is the domain name. unfortunately, lately many companies have sold their souls to advertising companies and bypassed important security guarantees of the internet, and they've basically allowed themselves to be taken over so they can continue to track their users (well, now the site A that decided to do that, with advertising company B, and now B has their hands in A's pants or wallet or wherever they want to put their hands).

okay, back to urls and domain names. in https://www.google.com/q?=something+interesting, the domain is google.com and the subdomain is www.google.com. i personally own two domains. it's about $16 a year. and there are attackers who buy look-alike domains like paypai instead of paypal, called "domain squatting", in the hopes that they can trick you into clicking links to their site and drop your credentials to them, or simply download a tasty malicious payload to your beloved machines. so be very careful when checking links to make sure that the base domain (google.com in this example) is exactly who you think it is. READ VERY CAREFULLY!

in addition to that, don't click a link that says https://evilhacker.com/innocent_webpage.html.

AND, even if the displayed url with blue and underlined LOOKS like a full url, STILL HOVER OVER IT AND READ THE URL!! DON'T LET THEM TRICK YOU!! it's so unimaginably easy to make the actual url look nearly the same while still being a malicious website, while the displayed url is actually a friendly website.

Information Representation, Understanding, Mnemonics, Information Compression

Posted on 2018-10-07 by violet — No Comments ↓

My brother got me these little notebooks - FIELD NOTES (website). Inside the package with the notebooks came a tiny promotional/marketing slip. It says something that resounds with me - "I'm not writing it down to remember it later, I'm writing it down to remember it now." I kept the tiny promotional slip due to that quote, because it seems to follow a central theme in the way human brains work and a big part of the reason I have a blog: you remember things better when you write (or draw) them down.

Here are some thoughts on the subject.

What writing is:

Internal representation: If you can encode your thoughts into either words or drawings, you know your thoughts better than if you can't
An encoding of your understanding/beliefs/knowledge at a given moment. It's an encoding because it depends on your knowledge - someone who can't read the language you're writing in can't understand it, and you may be writing things that even people who can read the language you write can't read. If you get amnesia, you might not even be able to understand your own notes.

Representations and other ideas:

Graphs: your brain can attach related ideas like in graphs (graphs as in nodes and edges, not as in graph paper), even in simple graphs like trees or DAGs. The more you already know about a topic, the easier it is to learn something new about it, because it fits as a new node on the graph, connected to one or more other nodes - it might be the case that the more nodes that new node can connect to, the easier it will be to remember it. For example, it's easier to remember that Sundar Pichai is the CEO of Google than it is to remember the name of a CEO you've never heard of belonging to a company you've never heard of.
Compression: the Aliens movie was pitched as "Jaws in space" - it shares a common structure with something that many people are already familiar with. The person who pitched the movie didn't have to go into a long explanation about the movie. They got the whole idea across in 3 words by using previous knowledge.
Mnemonics: it's easy to remember a simple sentence like "every good boy does fine" to remember that the musical staff goes "EGBDF". The Memory Book by Harry Lorayne and Jerry Lucas is designed to help you mnemonic-ize every bit of information by exploiting structure in information or converting it into a form that allows you to relate it to something you already know. Here is an example from their book, outlining the method of remembering names and faces:
- "Most of us recognize faces (did you ever hear anyone say, 'Oh, I know your name, but I don't recognize your face'?). It's the names we have trouble with. Since we do usually recognize faces, the thing to do is apply a system wherein the face tells us the name. That is basically what our system accomplishes, if it is applied correctly."
- The method first converts the name into something memorable. "...many names that already have meaning...immediately create pictures in your mind." (And there is a method of converting unmemorable names into memorable phrases)
- Then find something memorable about their face
- Then attach the two together, using methods first introduced earlier in the book.
Structure
- We have schedule books so we can see how pieces of information relate to each other in time
  - In police television shows when they construct a timeline, they're taking information from different formats (e.g. free-form text) and re-organizing those pieces of information as they relate to each other through time. If the information were in a SQL database, then it would be as simple as "...WHERE date IS NOT NULL ... ORDER BY date"
- When we first meet someone and add them to our address book (or digital equivalent), we parse out their information into first name, last name, phone, email, etc. Our contact lists aren't typically filled with unstructured text biographies with phone numbers and email addresses scattered around. Having the structure allows us to quickly look up specific pieces of information later - when you're providing someone's phone number to someone else, you don't scan through an entire biography to find their phone number: it's right there in the "phone number" field.
- When pieces of related information are introduced by ignoring the common structure of those pieces and instead diving directly into the first one, it's more difficult to assimilate that knowledge than if the "meta" information was introduced first and the different pieces were compared/contrasted so that the information could be compressed and remembered more easily. An example was when I was taking statistics - we learned about the Bernoulli Distribution before really learning what a probability distribution was. I had difficulty remembering specific details about the distributions, like their formulas and applications, until I created a tabulation of the distributions and saw their applications and formulas side-by-side and saw how they were similar and different.
Even as I write this, I'm structuring the information as a tree - each main bullet point splits into bullet points that are related to their parent bullet, and the bullet points on the same level are related to each other in some way.

An interesting result: I can read articles about topics I'm familiar with more quickly than I can read articles about topics I'm not familiar with. For the former, I can often skim through as background knowledge I already know is repeated, but for the latter, I have to look things up as I reach things I'm not aware of.

Things I haven't figured out how to keep track of in an efficient format yet:

The things that are going on in the lives of people I care about. People usually have multiple threads going on in their lives, e.g. their relationships and their health. If I just write down some notes after speaking with someone and record a timestamp, then each of the person's life threads will be jumbled together with the other information in those notes. Without modifying the notes, to go back and read to understand an individual thread, it will be necessary to read unrelated notes about other threads. If the information were broken out into tabular form, then each note could be broken out into sub-notes that would each contain a field that indicates the relevant life thread; However, maybe there's a better way to structure that information.

What I like about computer programs compared to human language text:

A computer program or computer software is a computer-program-encoded set of instructions of steps to follow. Programmers have encoded their thoughts of how to convert inputs to outputs, into a format that a computer can read and execute.
If the output isn't as you expect, then you know you didn't encode your thoughts properly into that programming language. You can provide an input, run the program, and review the output to see if it's what you expected. If it's not what you expected, then you can review the steps of the program to find what went wrong. That's the purpose of unit testing - create a well-defined set of inputs and outputs and make sure that the program matches your expectation.
Reading free-form human-produced text takes more effort, because you have to have an understanding of the world and the state of that world before the text occurred, and then mentally execute the text-based "program" and mentally modify your internal representation of the state, or produce new states (e.g. when first being introduced to a branch of mathematics you've never been exposed to before), as the text progresses. Sentences that contain too much information are difficult to process because we may need to read the sentence multiple times or stop in the middle of a sentence in order to mentally break it down into units of understanding. Programs define the variables at the beginning, whereas human-written text doesn't usually introduce the world before it walks through the steps. If you write for people like you write for computers, then it may be easier for humans to follow your writing. (Although I recognize this blog post isn't exceptionally well-organized - I just had a bunch of ideas and wanted to get them out before I forgot.)

SJSU CS157A DBMS1 Dr. Lin 2018-08-28T

Posted on 2018-08-29 by violet — No Comments ↓

Go to the DB1 project page: http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/
Download and run the create_FIRST1a.rtf file http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/create_FIRST1a.rtf on your DB.
Then download and run "populate_FIRST1a .rtf" http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/populate_FIRST1a%20.rtf, the one that has a space in the name, because the one without a space in the name has "FIRST1a_ID_SEQ.nextval" in the queries, which the table creation file "create_FIRST1a.rtf" doesn't create.

Then execute these commands in turn and review the results:

select sum(salary) from first1a;
-- remember, this poor guy got cancer from the company's parts he was handling
delete from first1a where snum="s3" and pnum="p2";
select sum(salary) from first1a;

Dr. Lin said that this employee wanted time off but also should still be paid. I think the point he was trying to show here is that by removing him from the list of employees who are available to continue working in the company, because of the poor schema design, that he was also removed from sum(salary), so now we don't know how much the company is paying its employees because the database design can't properly represent that.

However, I believe Dr. Lin introduced a mistake when he used an aggregate function "sum(salary)", because each employee's salary is getting counted each time for the number of parts that employee handles. What we really want to do is see each employee's salary once, and then sum up the results.

See each employee's salary once:

select distinct snum,salary
    from first1a;
+------+--------+
| snum | salary |
+------+--------+
| S1   |  40000 |
| S2   |  30000 |
| S3   |  30000 |
| S4   |  40000 |
| S7   |  60000 |
+------+--------+

Then, from that table, select sum(salary):

select sum(salary)
    from (
        -- note that this is the query from above
        select distinct snum,salary
            from first1a
    ) table_alias_1;
+-------------+
| sum(salary) |
+-------------+
|      200000 |
+-------------+

What this shows is that you can not only select from tables in the database, but you can also select from the results of a select statement, which is called a subquery. Note that "table_alias_1" is just a necessary part of the MariaDB syntax - without naming it that, an error would appear: "ERROR 1248 (42000): Every derived table must have its own alias".

Note that the ") table_alias_1" could have just as easily read ") AS table_alias_1". "AS" is optional. However, table_alias_1 is not a very good name for the subquery. Sometimes it's difficult to think of a good name for a subquery so we end up naming them things like "step1", "step2", etc., but in this case, an apt name could be "employee_salaries" or "distinct_employee_salaries".

Connecting to MariaDB (especially in Python)

Posted on 2018-08-29 by violet — No Comments ↓

The Password File

First, set up your .my.cnf file: https://dev.mysql.com/doc/refman/8.0/en/option-files.html
Here is an example:

[client]
host=localhost
port=3306
user=mica
password='your user db password here'
database=db1

This file allows you to connect to the DB more easily. With it, all you have to type (assuming your db is on the same machine you're running commands from): "mysql db1". Without it, you would have to type: "mysql -u mica db1 -p" and then type your password at the prompt every time you want to connect. (Note that you should never type your password in plaintext on the command line using "-p=mypassword" or "--password=mypassword", because other processes on the computer can see what you typed in and steal your password. Use the .my.cnf file or use the "-p|--password" without an argument and type it in at the prompt where your password is hidden from view.)

Connecting to MariaDB in Python

Original webpage: https://mariadb.com/resources/blog/how-connect-python-programs-mariadb
MySQL-python client: https://pypi.org/project/MySQL-python/

If you have anaconda (see my post Jupyter Notebook Introduction and Basic Installation for installing anaconda):

# this creates an environment called "mariadb" that installs a library called "mysqlclient" using the channel "bioconda".
conda create -n mariadb -c bioconda mysqlclient

Otherwise, create a virtualenv by following this tutorial: https://packaging.python.org/guides/installing-using-pip-and-virtualenv/

# note that "~" is synonymous with your home folder, e.g. /Users/mica in mac or /home/mica in linux.
# on windows replace "~" with "YOUR_HOME_FOLDER", wherever your home folder is.

# create the environment.
virtualenv ~/python3

# activate the environment
source ~/python3/bin/activate

# confirm that the environment is active.
which pip
# should return something like ~/python3/bin/pip

# follow the pip command as mentioned by the link mentioned above:
pip install MySQL-python

Finally, some python code to confirm that the installation is working:

Additional Notes

This demonstrates proper use of query parameters. It properly formats the data types for the query (e.g. dates/integers/strings/etc.), and is also the proper use in production code for preventing injection attacks:

cursor.execute(
    'select * from companies where SNUM=%(snum)s',
    params={
        'snum': 'S1',
    },
)

Introduction to the Command Line

Posted on 2018-06-29 by violet — No Comments ↓

My hope for this post is to lower the barrier to entry for using the shell for people who are otherwise overwhelmed by not knowing where to begin. I will outline a series of commands that will allow you to become comfortable with the basics of the command line on unix/linux/Mac and learn enough to be able to explore and learn more about it on your own (think of this as your launchpad). Read more Introduction to the Command Line ›

Study Progress

Posted on 2018-06-12 by violet — No Comments ↓

Abbreviations:
ISLR: Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, ISBN 978-1-4614-7137-0
Hands-on: Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géron, ISBN 978-1-491-96229-9

2018-08-08
Created my first database replication server, a Postgres replication server from server to laptop following book "PostgreSQL Replication" by Hans-Jürgen Schönig and Zoltan Böszörmenyi.

2018-06-27
Wrote an API using python, falcon, and psycopg2 for storing data from garage sensors into a postgresql database.

2018-06-26T
Build LSTM using Keras on TensorFlow for predicting the number of parking spaces available in each parking garage.

2018-06-24U
On parking garage data, added circular statistics to hour of the day and day of the year, and increased accuracy on generalized linear model by 1% from 77% to 78% using h2o Flow. Started learning the basics of TensorFlow, computational graphs, constants/variables/placeholders, sessions.
Reading Hands-on, chapters 2 (data processing, scikit-learn pipelines), 9 (intro to TensorFlow), and 14 (RNNs and LSTMs).

2018-06-19T
SpotMe, viewing and cleaning parking garage data so that it can be used to train a machine learning model. Learned about a tool called h2o.ai.

2018-06-13W
Started working for SJSU student-led startup SpotMe Solutions.

2018-06-12T
ISLR Chapter 4 Logistic Regression (page 136).
Handson-ML Chapter 2 End-to-End Machine Learning Project (page 49).

Potential resources:
Things on my links page (https://violeteldridge.com/links/).
https://www.deeplearning.ai/ (Deep Learning Coursera Specialization by Andrew Ng).

San Jose Parking Garage Data

Posted on 2018-05-26 by violet — No Comments ↓

Uncategorized