How to Make a Donut

Cory Reed and Chris Dieringer - DonutJS November 2017

Transcript

Greetings, everyone! My name is Chris Dieringer.

I'm Cory!

We're super excited to be here and talk to you. We only have 10 minutes, and this is very problematic, because we could talk all day about this. So we're gonna hustle. But in that process, we're probably gonna drop some things. So please, if we confuse you or forget to say something, just raise your hand, because we don't want to leave you confused. Next, Cory... Yeah. So I'm Chris. That's Cory. Both guys. And we went to college together like a decade ago. That was a lot of fun. And we also used to work together at a brain research company. We've been hacking together for a long time. We have good pair synergy. Pairing? Pair programming? Okay. Bloopers? Got 'em. The monster?

Growling.

Good. How about the hacks?

They're hackening.

I did the rest already.

All right. Okay. So Emily said we're here to talk to you about a doughnut generator. That's true, and we're going to, but, you know... That was really inspired by some prior work. The donut generator came out of a different project. Cory, tell us about it.

We worked at the Mind Research Network, which is a non-profit neuroscience company. And we worked on this project called CoinStack. Which attempted to solve some big problems in neuroscience data sharing. So the first problem is the constraints based on imaging data and sharing it. Researchers have to commit to maintaining participant privacy, and study facilitators oftentimes just don't allow file sharing. Because it's been shown that people can reverse engineer the images and derive personally identifiable information from them. So this creates data siloes, which is bad, because that neuroscience imaging could be used for many different projects.

Yeah, it's crazy. It's 2016. Obama says -- you have to share your data if it's publicly funded. But we've got a bunch of brain imaging data and you can't share it! Another problem is that brain imaging data is just huge. Oftentimes when you're doing brain scans, you don't take... One of the processes is you take just hundreds and hundreds of photos of different slices of the brain. And then you do it from different perspectives too. So after you just take a series of scans on one participant in a research study, you might be looking at like a gig to two gigs of data. And then you multiply that, if you've got a couple hundred, a couple thousand people in your research study... You're looking at terabytes of data sometimes. And you can't conveniently share it. You can put it over the network, but it's slow. You've got to pay for bandwidth. What seriously happens today is people literally ship hard drives to each other. It's pretty hilarious. I feel bad for the lab techs that have to plug these things in all the time. Because they need to aggregate the data to do research.

Yeah. So what did we do? We came up with this really cool system. Again, it's called CoinStack. Essentially it says -- you're not gonna share data. Rather than bringing the data to the analysis, we're gonna bring the analysis to you. So if you've got data, I'm gonna ship you an analysis and you're gonna run it there. I'm gonna ship you an analysis and you're gonna run it there, and when we've all done our analyses we'll ship the metadata about the analyses up to the cloud and do an analysis over the analyses. This doesn't generally work with a lot of numerical methods in data science today, but we have some expert machine learning wizards who are pretty bleeding edge. It's really cool. And to make sure that evil doers can't mythically magically reverse engineer the results of that data, we add a bunch of noise -- terrible GIF -- to make the data completely obscure, so no one can ever figure it out. So how did we do it, Cory?

How did we do it? We used JavaScript. We're JavaScript engineers. So we made an application powered by Electron, uses React and Redux on the frontend, and powered by Node, couchDB, and then we roll our neuroscience algorithms into Docker containers, because they're written in Python.

We had some screenshots. I don't know why they're not showing up.

It's fine.

So why are we here? We're here to show it to you. The real version is still in the works by our old pal back at our last job. And it's a little more complicated. We wanted to give you a simpler version of it. Well, scientists are off answering really important questions like: What factors lead to schizophrenia? We're answering a much, much more important question: What makes a donut super good? So we're gonna give you a sneak peek into how the product works with a fun donut demo and talk about a tiny little bit of machine learning on donuts.

Cool. So we made a web application. And you can see the URL there. The QR code that points towards it.

Phones out, everybody. We need you to participate.

Yeah, you can participate.

Please. It's gonna be great. It's gonna be great. You're gonna love it. If you're into donuts.

How do I get to it?

It's the second tab. Give them a second to look at the link. I think the QR is being truncated on the bottom. It's just a URL. You can just go right there. This is for your convenience. We're very user-centric. We're frontend JavaScript developers, for crying out loud. I don't even know how to... I don't have a QR app. Got it? Is everyone seeing it? We've got people doing it. Cory, you're driving, man. This is you.

How do you... Oh, there we go.

Apparently I already have some donuts saved. But this is you.

What am I talking about here? I'm talking about the webapp that we made. How do I get to... I don't know.

The cursor is terrible. Use your mouse, buddy!

Where is it? I don't see it.

I'll do it. I've got this. The screen... You think that the screen would be on the correct side. There you go.

All right. Cool. So we asked ourselves: How do you visualize a donut? What is a donut? It's composed of basic shapes. Sprinkles, these little pill shapes, and concentric circles. What in the browser is good at making shapes and manipulating them? SVG is. It's great. So we made our... Let me see if I can get to the sliders here. Reverse. We made the donut out of SVG, and this is a React app. Everything is just a component. And you can kind of control the aspects of the donut using the sliders. Don't worry about the faces right now.

Yeah, ignore these emojis. You should make some donuts that match your own heart's content and make sure to save them.

I can't see!

That's okay. Let's go back.

Cool.

So tinker around with that for a minute. I'm just gonna continue talking here. All right. But we're not just making donuts. We're learning about donuts. So a big part of our app here is a false objective to discover the worldly phenomenon that explains what really makes a donut delicious. It's a contrived problem. But data science! Right? So you guys are making all these donuts, and we're gonna have you submit them to us momentarily. And we have a series of systems in our backend that are going to ingest your donuts, and then predict what the ideal donut is. Our services have... We haven't cheated or anything. They have no idea. But when you're doing data science, there's a lot of... You generally have an objective. You take data as input and you want to come to some conclusion on the output. One of the strategies that you can do in data science is just kind of a classic regression. Regression says: Give me your data, and from that data, I will give you a function that explains the data.

And this is kind of a real simple case. You can imagine that all these troughs and peaks are actual donut data points. The more data that you give the regression, the better it's going to be able to actually create a mathematical formula that well fits the data. So we're gonna do some learning together to predict the world's most delicious donut. It's gonna be great. All right. Cory, launch the hacks. I use Colemak keyboard, and Cory can't type on my laptop. Don't do this on your machine.

Please don't do this. You have to enable hack mode.

Hack mode totally enabled.

We have an admin mode. This is Cory's stuff.

So I'm going to... Man, how did you...

I thought this thing was gonna be a little further up.

Cool. So we have this thing called the Donut Monster, who accepts all your donuts. So if you go to the donut view, you should see an upload button. Go ahead and try to go to the... Oh, gosh. Where is it? I can't... Agh!

I'll do it. I'm a great clicking guy. I'm really good at clicking. Pew! All right. So these are donuts that you are submitting.

This is all happening via the magic of socket.io. So your donuts are going to our NodeJS server, and that's passing it off to a Docker container.

So as your donuts are coming in, we're taking those, piping them into a series of Docker containers. That was teeny. Kind of like a little Pip's donuts. And we use a Python stack for machine learning. Everyone in machine learning uses Python, because apparently data scientists think Python is fast. Really, tangent... It's FORTRAN that makes Python fast in machine learning, because FORTRAN has all of the low level bindings for linear algebra. Which matrix math is what exploits all the characteristics on your CPU that make it fast. And those bindings exist in JavaScript too. Anyway... The first thing we do is we take your donuts and we make a model. We try to make an equation that explains what makes your donut delicious. Once we've got that mathematical model, we run another process that kind of explores... It kind of walks those peaks and troughs of that mathematical formula and it tries to find a maximum. When it finds a maximum, it says -- ah-ha! That's the best scoring donut. I know what characteristics make that donut really delicious, like coverage and thickness and radius. And I send it back to Cory and he paints it into his totally sweet animation. We're using two different machine learning techniques. We're using a ridge regression here and KNN.

I'm not gonna go into what those are, because... Again, 10 minutes. Those are literally your donuts. This actually learned... Cory, can you scroll down a little bit?

Can I?

Mine is scrolling backwards. This learned actually really quick. We ran a bunch of tests. This usually took a minute. But right now, the KNN is scoring pretty high. It can predict... A perfect donut is score 1. And it's guessing something that's 0.94. So this is pretty close to the ideal donut. You can see this method is kind of struggling. We actually built a... Show them the fire hose. Usually it takes a lot more data. But I guess you guys have created a bunch of more evenly distributed, diverse donuts than we did. We're kind of donut bigots, if you will. So Cory made this totally rad thing that generates a ton of random donuts and spits them on the screen. We limit how many donuts go on the screen. All the sprinkles are individual DOM nodes. And when you have that many in the browser, it shuts down.

Basically have to use Chrome. Yeah.

That's a lot. Yeah. We just talked about that next. What's our time? Someone have time?

You want to show the bloopers?

Yeah. So we have some bloopers. Geometry is fun. Geometry is like... Also hard. Oh, damn it. This is gonna work better. Thanks. So this will be rad. So we probably have, like, 20 of these. We're not gonna go through all of them. But where we just screwed up placing sprinkles on donuts or sizing the sprinkles. Some of them are pretty hilarious. That's straightforward. Kind of fun. This one was interesting. It kind of looks like a bunch of buzzing bees. Yeah. They should fire us now. This one is amazing. This was actually not a bug. This worked. We had to make sure that the sprinkles are evenly distributed, not spaced too close to the inner radius, not spaced too close to the outer radius. Whatever. Whoops. Fixed it. It's kind of beautiful. This one is really interesting, because what you're observing is... Not quite randomness. There's a lot of trigonometry, obviously. We're converting... The DOM works in rectilinear coordinates. X and Y. But when you're putting sprinkles on, you have to convert to polar coordinates, with sin and cosith. And you can see some pretty interesting patterns emerge. Donut bloopers. Pretty great. That was great! We've got to see that one. Pretty fun. These are all on our website. You guys can go and make donuts all you want in the future. It's all Open Source. We'll give you the link at the end. We're probably gonna turn off the donut machine learner thing. But whatever.

Plugs. Math. We're plugging math, I guess. We use Cleaver for the slideshow. It's a really great tool. No configuration. You pass it a markdown file and it spits out a slide show. What else?

Docker is cool. If anyone has used Docker, with the NodeJS bindings into Docker, good luck. It's great, but it's hard. Questions? Salsa?

It was in your bio.

No, salsa, the condiment. SenorSalsa.org. There's also a plug. Yeah. I love salsa. I lived in New Mexico for a year. So I got some intense salsa vibes. She's just putting her shirt on. That wasn't a raised hand. Dang it. Going once, going twice? Thank you.

(applause)

Live captioning by Mirabai Knight

← November 2017