Data Mining: How You're Revealing More Than You Think

Data mining recently made big news with the Cambridge Analytica scandal, but it is not just for ads and politics. It can help doctors spot fatal infections and it can even predict massacres in the Congo. Hosted by: Stefan Chin Head to https://scishowfinds.com/ for hand selected artifacts of the universe! ---------- Support SciShow by becoming a patron on Patreon: https://www.patreon.com/scishow ---------- Dooblydoo thanks go to the following Patreon supporters: Lazarus G, Sam Lutfi, Nicholas Smith, D.A. Noe, سلطان الخليفي, Piya Shedden, KatieMarie Magnone, Scott Satovsky Jr, Charles Southerland, Patrick D. Ashmore, Tim Curwick, charles george, Kevin Bealer, Chris Peters ---------- Looking for SciShow elsewhere on the internet? Facebook: http://www.facebook.com/scishow Twitter: http://www.twitter.com/scishow Tumblr: http://scishow.tumblr.com Instagram: http://instagram.com/thescishow ---------- Sources: https://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230 https://www.theregister.co.uk/2006/08/15/beer_diapers/ https://www.theatlantic.com/technology/archive/2012/04/everything-you-wanted-to-know-about-data-mining-but-were-afraid-to-ask/255388/ https://www.economist.com/node/15557465 https://blogs.scientificamerican.com/guest-blog/9-bizarre-and-surprising-insights-from-data-science/ https://qz.com/584287/data-scientists-keep-forgetting-the-one-rule-every-researcher-should-know-by-heart/ https://www.amazon.com/Predictive-Analytics-Power-Predict-Click/dp/1118356853 http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/DMSuccessStories.html http://content.time.com/time/magazine/article/0,9171,2058205,00.html https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=all&_r=0 https://www2.deloitte.com/content/dam/Deloitte/de/Documents/deloitte-analytics/Deloitte_Predictive-Maintenance_PositionPaper.pdf https://www.cs.helsinki.fi/u/htoivone/pubs/advances.pdf http://cecs.louisville.edu/datamining/PDF/0471228524.pdf https://bits.blogs.nytimes.com/2012/03/28/bizarre-insights-from-big-data https://scholar.harvard.edu/files/todd_rogers/files/political_campaigns_and_big_data_0.pdf https://insights.spotify.com/us/2015/09/30/50-strangest-genre-names/ https://www.theguardian.com/news/2005/jan/12/food.foodanddrink1 https://adexchanger.com/data-exchanges/real-world-data-science-how-ebay-and-placed-put-theory-into-practice/ https://www.theverge.com/2015/9/30/9416579/spotify-discover-weekly-online-music-curation-interview http://blog.galvanize.com/spotify-discover-weekly-data-science/ Audio Source: https://freesound.org/people/makosan/sounds/135191/ Image Source: https://commons.wikimedia.org/wiki/File:Swiss_average.png
Jake (1 month ago)
Also, what’s scary about data mining? If target knows you are pregnant, what’s the harm? That you get access to coupons that save money on the things you are gonna purchase for your baby anyway? I remember a time when I bought a sous vide, added modernest cuisine and an isi dispenser to my wishlist, watched a few cool science videos on liquid nitrogen and suddenly a video was suggested about cryosteak. It’s a really cool thing, I learned about something I didn’t know before. The coolest part was that something was suggested to me that was right in line with my interest at the time. This sounds like education at its finest. I know it’s education targeting my wallet, I went straight to amazon and added A liquid nitrogen dewar to my wishlist. None the less, I learned something extremely relevant to my interests because a huge set of data about people with similar interests was both available and examined by computers to suggest that video. So maybe I’m missing something. I like learning so please point it out if I am. But what’s so scary about collecting data. It’s not like that judgie suburban housewife down the street is poking through the data collected about me so she can judge more. And if she is, what do I care if she’s got nothing better to do than judge others? I don’t expect she can be a very enlightened, satisfied, happy person if that’s what she is choosing to spend her time doing. And I can comfort myself with the fact that I’m having a far nicer time than she. Please, someone, tell me why I should be scared and why I should prevent the tracking at all costs? What am I missing?
Jake (1 month ago)
With all this data mining and predictive ability, how come every store in Minnesota still sell out of vehicle snow brushes every year with the first snowfall?
B Blessed (2 months ago)
Wake up sheeple no info safe is safe period. Facecrook , google if ypu surf the web ypur being surfed sad but true
Kari Rakitan (4 months ago)
It's a gross breach of privacy for Target to reveal a daughter's pregnancy to her parents before she's ready to tell them. The man should not have apologized to Target but sued them.
BEPEC (4 months ago)
yeah, we can use data mining mostly everywhere
Turtle Von Nurtle (5 months ago)
Target sent me those pregnancy ads when I was 17, I am a man, and I did not even live in a country that had target. I also did not shop online.
kimberley beaumont (5 months ago)
You're Crazy if you think its pronounced Data not Data!
Alias Fakename (5 months ago)
New tech for same fuckery (to paraphrase Sagan) only want suggestions for things I already like so I can stifle potential growth in taste/knowledge. ...and inflation of crime stats in poor hoods from disproportionate policing used as justification for more cops & arrests there.
Rice Guy (5 months ago)
data mine Ya
Jeffrey Spinner (5 months ago)
Please never lump weather(wo)men and their fake models and real science trying to predict things in an overwhelmingly rigged world. I do appreciate your description of virtually all EDA and machine learning algos as applied statistics, cause as a man with an adv degree in Applied Math and Statistics that's ALL I've seen. Idk about the AIX (Explainable AI) stuff...I have to look at it soon.
Val Parish (5 months ago)
I've been trying to datamine this old game that I used to play on the first XBOX, It's called "Area 51" and it was made by Midway Studios. Can someone help? It was ported to PC, and that's what I'm trying to mine.
cooper512atx (5 months ago)
So target thought I was probably pregnant...
LUMiNO (5 months ago)
its data not dayda, gosh
Jascha R. (5 months ago)
One of the best videos on the topic, thanks for the overview!!
Benimation (5 months ago)
data data data data data data data data, batman!
Julia Gustepa (5 months ago)
shocking accuracy? I kept seeing baby stuff ads based solely on my gender and age I guess. I worked hard marking all diapers ads as sexually explicit. now I'm getting dresses and jewelry ads and webinars on how to find myself, find myself a man and be happy. it's nowhere near being accurate, it's pretty much generic. forget that I'm a qualified professional with tons of hobbies, who cares. they are leading me to buying diapers.
Existenceisillusion (5 months ago)
I'm just starting to study this in school
Finanov (5 months ago)
I took the Dual Enrollment Computer Science class at UT Austin and one of the modules explained data-mining and data collection. I learned so much from that class. Data-mining doesn't surprise me anymore as a result. And it makes sense: Google is practically the King of data-mining. Even if you're not using any of their services, if you visit a website with Google Ads, unless you turn off third-party cookies, they can still collect your data. Be careful of what you do on the Internet.
Wes Tolson (5 months ago)
"The messy human stuff." Is what I expected this video to explore.
Nice Harry Potter reference.
ubermench3000 (6 months ago)
Hierarchical clustering/Hierarchical linear modelling is very useful in research of schools.
zack james (6 months ago)
It's pronounced data not data 🤣
C2L Redstone (6 months ago)
I don't care how much data companies get about me as long as it's not my passwords or credit card info because I always do research before decidions. Common sense absolutely destroys targeted ads.
C2L Redstone (6 months ago)
The only problem is that I don't get coupons because what I want are fairly priced PC parts.
STMoody6 (6 months ago)
"Data Mining is more about spotting patterns than explain them." That's not very accurate. Data mining is very much about explaining patterns. That's why decision trees & Bayesian classifiers are data mining algorithms and something such as a neural network is not. Though all can produce similar results, the former are considered data mining algorithms because they do describe the patterns they find, while a neural network obfuscates anything it discovers.
Jan Kowalski (6 months ago)
That video is screaming not acceptable. Like, literally 406...
ssmmaacca (6 months ago)
hes talking about target and baby due dates when we all know that data mining is being used for a lot more than that its scary how much information these companies have on us and its definitly not right
crimfan (6 months ago)
Nice video. Regression and classification are indeed very, very similar. The math is almost identical. Anomaly detection is like model diagnosis, looking for points that don't fit the model. The one big thing people need to recall is Goodhart's Law: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." So these regularities will frequently die when they start being used for some purpose.
Katie Kane (6 months ago)
So these ads I'm getting for burial plots...
garudagal23 (6 months ago)
i wish you'd do a show on how videos on youtube that increase the speed of talking and remove the natural pauses stress or annoy people---often yours are too fast--and i quit watching
Joshua Boylan (6 months ago)
So they essentially count cards for data?
Kit Coffey (6 months ago)
This is market research on steroids. I think it included inventorying (association learning), no? It should be regulated, but you can't ban it, any more than you can ban advertising itself. It should not be used for ill. Vote with yiur dollars.
Michael DiJoseph (6 months ago)
0:33 more like "Zucked up"
Ryan King (6 months ago)
Limp Bizkit joke made this vid
acegeek (6 months ago)
No idea why, but Target's algorithm decided I was preggagenent several years ago. It was wrong. Have never been pregnant, yet they kept sending me coupons anyways.
Alexis Gamboa (6 months ago)
Google flue trends case shows how big data implementations might fail with the wrong assumptions.
Kris Gardner (6 months ago)
Remember kids, in the words of Zakk Wylde “Limp Bizcut Sucks Cock!”
MrMegaPussyPlayer (6 months ago)
4:44 So ... what do I need to buy on target (/what to do) so that they think I am pregnant?* Hey I only think how about I can rig these systems. *= I am guy
THE G. (6 months ago)
I love the shirt, cuz I drive Peugeot 406!
Anton-Constantin (6 months ago)
we're like robots
I G (6 months ago)
All of this just gets under my skin.
mohdsukri binramli (6 months ago)
I thought it was Bit coin mining
LilyMyLolita (6 months ago)
As a machine learning practitioner, I think Stefan said it pretty well 😉
Mel Tee (6 months ago)
I swear it's not just text data they mine, I swear someone (something) is listening to my cell phone microphone. I was talking with a co-worker who asked about Sirius (with my cell on the desk), and we were talking about the Octane station. I mentioned that I hadn't heard an Alter Bridge or Tremonti song on there yet. That very evening at work, I went on to youtube to play a music mix, and you want to know what band popped up in "my mix" that hadn't been in there before? Yep. Alter Bridge. This isn't the first time, which makes it not just a coincidence. I swear I'm going to make a tin foil hat and a bunker soon.
birbfanchannel tighto (6 months ago)
I heared sex workers had a similar issue. fb was locating them and matching clients with the non-sexwork profiles, endangeting the workers. But the thing is- the SW didn't live near the clients, didn't go to any similar places, and never took their private phones with them to encounters. They refused to admit they were locating then somehow even though there is no way it was a coincidence, since it happened to so many workers.
Jasmine Wood (6 months ago)
Wow... I've not been keeping close track but 5+ m subs is a lot! Has is been growing really fast lately or have I not paid enough attention? Gratz on 5m! Awesome and deserved, lots of hard work and time but its paid off!
fidelio (6 months ago)
"don't tell my mother i work in data mining, she thinks i play piano in a whorehouse"
Mike, from Texas (6 months ago)
"If you had an off switch, Doctor, would you not keep it secret?" -Data
confusedwhale (6 months ago)
Why are you mentioning Spotify so much? Is this a hidden promotion thing like... Spotify did with Drake?
Mike G (6 months ago)
I don't like this channel because their videos are too drawn out. I wish their videos were more condensed.
ResortDog (6 months ago)
Its no longer the beginning of the end, it IS the brave new world like it or lump it.
Gordon Lawrence (6 months ago)
Arthur C Clerk wrote about this in the foundation series way back in the 60's.
Michael (6 months ago)
damned clickbait title
Brendan Berg (6 months ago)
Best description of k-means I have heard so far (-;
Matt Kuhn (6 months ago)
Great episode! A minor quibble: I would have put a bit more emphasis on the fact that clustering doesn't require any training data, whereas both classification and regression usually do, though there are many unsupervised methods for all of these tasks.
Cethavi Joseph (6 months ago)
Data mining never works to girlfriend, can't be predicted
Edgar G. Kozlova (6 months ago)
Clearly explain. Good Job!
Larry Thielen (6 months ago)
Data mining can reveal a whole lot and can be pretty reliable, but then everyone hates it if people use profiling methods. Even though profiling, using a ton of data points on people (including race, gender, age, family status etc), can reveal the ones most likely to bring a bomb on a plane, or have mental issues, shoot up a school, or commit other crimes. This can speed up long security lines and lower costs. I for one am perfectly fine with profiling.
Nate (6 months ago)
Missed a real opportunity to sneak in a picture of Commander Data during the "Data, data, data, data!" bit at the start.
Lenard Segnitz (6 months ago)
Go automation go. Now instead of sneaking around for data we aught to give it more data than we think it can handle. Wearable medical monitors to feed an AI doctor with everything to do with our health. If enough of us feed an AI doctor it could learn "healthy" numbers from "sick" numbers, further learn "worrying" numbers from "good trend" numbers. Then instead of waiting for worrying trends to become a chronic condition we can head it off with simple preventative treatments.
Lenard Segnitz (6 months ago)
Since I do no thinking any amount of revelation is more than I think.
nAlle Alle (6 months ago)
Great one, thank you !
Gergely Hornich (6 months ago)
"I cannot make bricks without clay"
Don Honas (6 months ago)
That has to be the most positive insurance example I have heard... EVER come to think about it.
Laura E.T. (6 months ago)
Please tell me I'm not the only one that thought of Data from Star Trek Next Generation every time he said Data in the intro of this video. I kind of want to edit it so Data appears on screen every time he says Data. Not going to, but it would be funny :)
James Mitchell (6 months ago)
Please do a video on 432 Hertz and whether it’s better for tuning musical instruments. Does it truly have an effect on our brains? Is there merit to the conspiracy theories?
Game On! (6 months ago)
Nerdcore could rise up, it could get elevated.
slade meme (6 months ago)
*FbI oPeN UP*
lasarousi (6 months ago)
I get targeted Chinese women products and I'm a male adult from Guatemala, lmao. Probably because I stopped using most social networks, except a meme app (which only gave me ads for itself) and Twitter. This still has a long way to go, it works for simpletons and simple minded people that literally spreads their likes on easy to follow patterns even a human could see.
Frank Richards (6 months ago)
Once in awhile you guys have a really great video.
Danbo (6 months ago)
You wear that shirt so much, what's it mean?
Eric Lord (6 months ago)
OH! the [406] is ironic!!! I get it now. You know, cause he's the squarest looking square, ever. :-P
Music is Subjective (6 months ago)
Spotify doesn't data mine. All that they do is try to recommend Drake's new album to you and that's about it.
Robert Nester (6 months ago)
I'm big data engineer:( most companys don't know what is data, and how use, they hire Indian, and don't understand why analyses data don't help to predict market or optimization work of company.
Art Curious (6 months ago)
Data Mining> the practice of sifting through mountains of data in order to market stalk you and sell your personal information to China.
grennbalze (6 months ago)
🎶L I M P, Limp Biscuit right here🎶
noxabellus (6 months ago)
Wats a data lake
noxabellus (6 months ago)
"A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data."
David T (6 months ago)
a little creepy????
ZaBiMaRuSz (6 months ago)
the LimpBizkit Joke kill it. my god you got me out of my concentration state in work. i was Loling my ass off. Thanks SciShow
Celilhan Karaarslan (6 months ago)
logistic regression*
Albus Du (6 months ago)
Well google got me wrong, they put a thirteen year old in the “soon to retire”catagory they keep trying to sell me 401K accounts
Yo Soy (6 months ago)
Y'know, I straight-up filled out google's ad targeting and interest questionnaire because I hoped it would stop showing me (under 21, a vegetarian, no driver's license) ads that were irrelevant to me, but I still get cheeseburgers driving specific cars down to the local brewery. Kinda wish the algorithms were even better at creepily mining all my deepest darkest secrets.
thanrose (6 months ago)
Data mining at its most trivial is annoying. If I like a certain 12 string guitar, it may actually be the voice that goes with it, rather than the 12 string. If I source a plumbing bit, it may be for an art piece. If I want to see conductors' batons, I really am not looking for Harry Potter wands. The end result is homogeneity in tastes and interests. An interest in guerrilla gardening does not indicate a particular cant or esthetic or avocation. Maybe I'm just researching bentonite.
Ivan (6 months ago)
but what if i'm interested in wednesday's weather
Siva Kumar (6 months ago)
Wow.. how can one put Machine Learning in layman's terms simpler than this?..👌👌 good vedio guys
Steven Bale (6 months ago)
Solid episode. Script needed a bit of editing.
thel vadamee (6 months ago)
Considering if they are not data mining they are so nerdy they are most likely fapping to loli henti I say let them mine......it's more constructive😎
Volker Herbst (6 months ago)
Yeah about Spotify being soooo good in predicting what I want to hear: Why is it always trying to force that silly gangsta rap playlist on me then while I'm listening to Heavy Metal?
Volker Herbst (4 months ago)
Yeah, so? That's the whole point, you think Spotify is selling their subscription to a 30 year old mostly listening to Heavy Metal, Punk and Indie Rock by forcing him to listen to a genre that is completely irrelevant to him? You could argue that it's a negative reinforcement to buy a subscription to get rid of the ads. However positive reinforcement ALWAYS works better (e.g. "Oh look, Spotify really knows my taste and I didn't have to dig for hours for these really sweet tracks!") But even when I'm not subscribing, they can sell their commercial breaks for a fortune to external companies, IF they can prove it's aimed at the right demographic. And obviously they don't even try...
Craig Stone (4 months ago)
Don't forget it's trying to sell you stuff
AsHalt (6 months ago)
sounds like Westworld is leaking...
Ramon Quiroz (6 months ago)
Tbasko sauce (6 months ago)
"daida" ?? u mean D-A-T-A
Number Eight or Nine? (6 months ago)
Why can't we go back to the 50s/60s when everyone was the same?
Yellow Penetrator (6 months ago)
I always knew that Tuesday’s weather was important
Kevan808 (6 months ago)
As YouTube mines my liked videos...
Thomas Vinters (6 months ago)
Some algorithms can spot people with bipolar disorder, and also recognize when they're going to be manic - and display ads for expensive-but-exciting stuff when the user is likely to be not in control of their own actions.
Chad Olthoff (6 months ago)
Its Data not Data. You said it wrong. Lol
Just In (6 months ago)
Mine all they want. All those ads are doing is annoying the snot out of me.
mitkoogrozev (6 months ago)
Its data, not data. Data is an android character's name and the other is not.
Andrei Tache (6 months ago)
In theory I’m all for data mining if it makes the product better, but I’d really like if companies would just ask me for it and I would give them the data that I want instead of them just finding every little scrap of information (like age and sexual orientation... that’s just creepy)
Kit Coffey (6 months ago)
Consent and control over one's personal data is important
Kit Coffey (6 months ago)
Andrei Tache Yes, the vid should have discussed ethics and policy more, as you point out.
J PumpkinKing (6 months ago)
WHO CARES! If it makes life better, and if you have nothing to hide, why does ANYONE care about data mining?
Xeno Phon (6 months ago)
The ones being watched 24/7, scrutenized and restricted are the ones that can do the most harm to the nation, the government.
LazorGasm (6 months ago)
His forehead is so shiny that I think Stefan has officially become a rare Pokemon. By the way, what a cool episode, thanks SciShow ! :D

