Coral Death

In this project I wanted to task myself with sharpening up my machine learning expertise with publically available coral bleaching data. It wasn't hard to find a scientific site that had the CSVs available for downloading. I had never used Flask nor had I ever used machine learning for anything real before so it was quite the exercise.

ML Pipeline

I had decided to use the commonly used Anacondas and after doing basic ETL using Pandas I managed to make a very coherent and well tidied CSV file for ingestion in my ML model. Since this was such a rudimentary exercise I wasn't to experienced with making a very coherent observational trend of how bleached the coral ecosystem was going to be come 20xx. This was compounded by the fact that much of the data from the 1960s-1980s were quite incomplete as perhaps not many people thought it prudent to check the different coral reefs for bleaching. It was certainly not as severe back then.

Flask

After I had my data I thought of how to plot it. A few columns in the CSV were dedicated to the latitude and longitude of each coral reef checked for bleaching, so I had the idea of using Plotly to map it on a globe with a slider to show the future 'predicted' progression of bleaching.

I then had a dedicated page to show my exact process of how I came to the conclusion I did with the data I had found. It outlines the entire process from beginning to end, even including the Pandas command run. It was quite detailed.

The link below is the repository for the project as the site itself no longer exists. Perhaps if you cloned it and managed to get it to run you can see the page with a more detailed outline -- "Coral-Death/pages/process.py" specifically, here where there is a large chunk of markdown that describes my process while using the image assets.

J-Comm

ジェイ・コム

The J-Comm ecosystem didn't begin in earnest until the late 10's but its inception was in 2012. I decided it was high time to put myself on the world wide web outside of the rapidly toxifying social media environment -- my html Myspace in 2005 was not going to cut it. My first iteration of my website was blootron.com; a small static site made by me just writing HTML and CSS with bootstrap as a simple enchancer. I felt very proud of that fledgling site. Many iterations after I would experiment artistically with what it should look like, but always avoided Javascript as I thought it really had no place on any website I created. So I got really good at writing raw HTML and CSS, a skill that's 'haunted' me to this day.

I also decided to not leave my JS-less brethern behind and ensured that each site in my constellation of subdomains has an HTML-only equivalent. I switched over from blootron.com when I discovered they made new TLDs for different languages, my name was completely free; there was so many domains for the taking. But as for my email I stuck with blootron as emails from a [email protected] looks very much like spam does it not? All the 3D models, textures, and even most the GIFs were made by me in Blender and GIMP.

jason.コム

My first foray into the weeds of complexity began when I had seen the framework Three.js. I decided that I should have a website that demonstrates something beautiful and unique, and even though that would require JS I would do my best to make it as concomitant with my values as possible by keeping everything tidy, simple, bending whatever rules to save whatever bytes I can. In the end I had a very interesting website that I really fell in love with for its harkening back to the good ole' days of PS1 domination.

etch.jason.コム

A simple blog site made with a custom bash script modified from bashblog. I winded up pulling patches from people trying to submit merges to mainline in my own fork; they were good changes and it helped calm some of the headaches I was running into. I wanted something absolutely future-proof, and what's more future-proof than bash? I experimented previous to that with having a full-blown back-end, written in Golang, and a Django front-end. It worked marvellously but with inflation and increasing costs of everything under the sun I opted to end the little experiment and go for a straightforward compile-on-my-machine then use the resulting HTML type subdomain.

info.jason.コム

I made a hard decision and made a custom HTML compiler using bash alone to output tables of information I needed to have public or at the very least easily accessible. The process was very very fun, around this time I began to use my favourite text editor Acme in earnest and it made the process easier if you can believe it. Who needs syntax highlighting really? It turns out I work much better without it.
I started with a small script that had a single-page output for books I have read and want to read grabbed from a CVS file. It eventually morphed into a generic all-purpose table compiler where each table is on a different page and the main index is its own page with links directing the user.

hjkl.jason.コム

For this site you're reading this on right now it started as an inkling of an idea: what if I made a site that showcased my projects in a stylish way and gave interesting to look at information? I thought of doing the back-end fiasco that was the prototype version of etch but I learned my lesson about cost-saving the first time. I decided to take the plunge and do a full-blown React site with react-three-fiber. Is this the simplest site? No, but I think it describes my commitment to design and the marriage between art and technology very well.
I used Papaparse for reading the CSV files for the big table of things projects I added; the best part of the site is that whenever I want to add a new project it's pretty trivial, thankfully.
One of the hardest things I had to do for this site was manipulating the vertex and fragment shaders to get an intriguing result for the 'hologram' thing in the towers on the right side of the screen (if you're looking at this on desktop). I had a lot of fun trying to figure out the trigonometry needed to make the result look the way it does.

I also made a simple HTML compiler for this site to sum up all the texts in one page for the JS-less among us.
This site had a lot riffed from both Ace Combat and R4, it wasn't easy to try and emulate their styles but the Y2k aesthetic was just too enticing to not do for this site.

Errata

In the future I'll probably make redo the main site to use react-three-fiber as it made everything so much easier when using Three.js.
hjkl is in honour of Moolenaar and VIM my first command-line text editing love.

Pasadena Litter Club

This started as a project to clean one of my favourite parks in Pasadena (Cali) named Eaton Sunnyslope, and across from another park, both are flanked by convenient bus stops. Each tree in this park is named after a mayor of Pasadena or something like that, it's beyond beautiful in the evening but I digress.
The idea came to me each time I would stop by the park and saw there was a never-ending pile of trash, no matter how many times it was cleaned by the city each successive time I had gone. I thought it would be interesting to graph the trash in the area, the types of trash, where the trash was on a Plotly map perhaps. I searched for something that could easily be exported to a CSV and I found an app called Litterati; basically it works as a way to classify data for the company while cleaning up your surroundings.

Choosing Litterati

After a month and half or so of committing to this experiment I decided to finish at about 1000 pieces of litter or so. When I exported the CSV it came out so malformed, I was pretty surprised as it made no sense to have the data so disjointed for no reason. Anyways I cleaned it up with Pandas and a Juypter notebook (Anacondas I think?). It made me quite upset that this company had such a poor output when it had no reason to be, they should really refine their process. Then it was onto the visualisation.

Flask

I decided on Flask early-on as this project came after my Coral Death project I had just worked on to try some cool machine learning shenanigans. I simply copied over the project bones onto this one and rewrote the front-end to be more in line with what I wanted it to look like.

Then I took my Plotly API and did the unthinkable and used an external library request to download the Plotly Javascript onto the user's computer -- something I don't like to do often as I like to be far removed from many external dependencies -- and used some basic visualisations to sorta, kinda see where things were heading in the park (treemaps and the like).

The main offender litter of the project was Swisher Sweets which I had never heard of before this. It's some sort of sickly cigarillo or something. Very disgusting, and it seems whatever the type of person who smokes it has no concept of throwing away their own trash; where's the civic responsibility?

Government Action

After I made the Flask app to my liking I decided on Heroku to host it. I used their free dyno when they still provided that during the ZIRP period of our economic history...
It ran well and it was relatively fast once the dyno started up. I was getting a decent amount of hits a day.

After a few weeks had passed I went back to have In-n-Out at the park as usual, and it seems someone had noticed something because there were now at least three more permanent trash cans put into each corner of the park. And guess what? The amount of trash -- at least from my own observations -- seemed to decrease dramatically. Mission accomplished!

Russian Arms Trading

Российская Торговля Оружием

This project was in cooperation with the C4ADS team in Washington. The premise of this experiment was to see if machine learning -- before the useless AI hype cycle -- could be applied to data to extrapolate and flag a shipment coming out of Russia that was certainly an illicit arms trade. When we first started working our team -- five people -- were excited to be working on such big data. It was a couple of dozen GBs if I remember correctly; all hosted in S3 buckets for easy retrieval and manipulation. Their acquistion of said data came from a hush-hush source I probably shouldn't write here. It was certainly grey market and it looked it. You had columns messed up, entries doubled, just a mangled mess of cyrillic. We spent a good time just combing through the data, looking at average values and seeing how many duplicates there were, which columns should've been unique -- all in Russian too!
I have polylingual tendencies so I had a veritable ball doing magic on discovery -- we spent a few short weeks organising and sorting through it all. It's been a while since this project but I remember us deciding to split the data in a ratio of perhaps 20/80 training to real data in order to have our model work correctly. We also accidentally spent $700 in one shot in ten minutes training the model...whoops! They were completely cool about it though.

Cleaning Routine

After 'discovery' we moved on to cleaning the data, here was one of my main contributions: a unicode regex-based cleanup routine in Sagemaker with some unholy regex sequence to get rid of entries that were so mangled they had no hope of being useful in between putting the wrong entries into the right columns for 'shifted' rows (if you've ever worked with any significant data you know, the rows where someone fat-fingered an extra column in their Windows XP tabular entry program [how obvious is it that I've never been in a job like that?]).
For some people -- looking at what became an incredible waste of time; an interview for some company that's certainly fleecing customers to this day on a useless product -- it may not seem like much, but it took a while and since you need to know a different language and have a reference of its unicode values, Russian governmental acronyms(!), and delinations between places and abbreviations of places...it was tough.

Final Hour

After we had the stage set we finally started to figure out some of the Amazon tools we needed to use to fully run our model in a good amount of time, for the lowest cost as efficiently as possible. I helped with the python code for this but a lot of the final iteration of this work was done by another (smart and physics-minded) member of the team after I had to step away from the project on-and-off for personal reasons.
In the end we had to present the results of it all to the stakeholders of this project, queue a room full of three-letter spooks that looked like carbon-copies of one another. One had a question that tickled me which was why so much of the data was removed before the final process of training the model and then running it on the rest of the data. He seemed a bit annoyed but we tried explaining how filthy the data was, just a real mess -- when you get data from a 'grey-market' source what do you expect? Vladimir to input the manifests for cargo perfectly? C'mon now.
This project nets a 100 on the Scale counter, it was the largest project I've done and it required some heavy firepower to accomplish. For enjoyment it's a 75, fun was had but I had a tough time near its ending, and figuring out Amazon tools doesn't strike me as the most exciting thing in the world.

Below is a link to a fellow who worked on the project with me if you would like to read another account where he goes into talking about our adventures with parquet files, AWS Glue, and all the things in between.

Projects:

BZAVN

Model

Coral Death

ML Pipeline

Flask

J-Comm

ジェイ・コム

jason.コム

etch.jason.コム

info.jason.コム

hjkl.jason.コム

Errata

Pasadena Litter Club

Choosing Litterati

Flask

Government Action

Russian Arms Trading

Российская Торговля Оружием

Cleaning Routine

Final Hour