Hacker News Clone

Ask Microsoft: Are you using our personal data to train AI?

by alabhyajindal on 9/1/2023, 6:31 AM with 151 comments

by nologic01 on 9/1/2023, 9:51 AM
The regulatory architecture of "big tech" is an unmitigated disaster.
You can't have two entities effectively control every touching point with the digital domain for a major fraction of the planet.
There is absolutely no reason to trust that they will not abuse this position in opaque and impossible to trace ways. These are trillion-dollar powered for-profit entities, with armies of lawyers and lobbyists that can intimidate medium sized countries. They will exploit every weakness of incompetent, confused and captured regulatory/political systems. Because that's what they are legally obliged to do for their shareholders. And these shareholders care zilch if this duopolistic - fingers in all honeypots - design undermines our entire digital future. They just want some tech "winners" in their portfolios.
The longer nothing serious is done the harder it becomes to do anything.
by dagaci on 9/1/2023, 10:05 AM
This is the most Kafkaesque thread i've ever seen on Hacker News.
Microsoft definitly uses analytics if that counts as personal data - https://www.microsoft.com/insidetrack/blog/microsoft-uses-an...
Is Microsoft reading your Gmail account, Word documents or Porn activities and feeding them to OpenAI? Not according to terms and conditions https://learn.microsoft.com/en-us/legal/cognitive-services/o...
Is Microsoft generally doing generally unknown uknowns. Yes.
by Silhouette on 9/1/2023, 8:12 AM
This has been my objection to Microsoft's maze-like privacy policies for a long time.
I once asked - on another forum and before the recent "AI" coding assistants were widely available - whether Microsoft's privacy policy allowed them to upload and do things with your own code if you used VS Code with telemetry enabled.
At the time I was downvoted to invisibility and told I was being silly. But not one person showed me anywhere in Microsoft's terms or privacy policy wording that limited the scope of the data processing clearly and transparently to exclude that kind of thing.
Today there's a bit of an obsession with training ML models using any large data set available and perhaps my caution from yesteryear wouldn't look so silly to the critics now.
by ldjb on 9/1/2023, 8:08 AM
For reference, here is the text of Microsoft's upcoming Services Agreement:
https://www.microsoft.com/en-us/servicesagreement/upcoming.a...
And here is their summary of what has changed:
https://www.microsoft.com/en-us/servicesagreement/upcoming-u...
by boredumb on 9/1/2023, 10:46 AM
Honest question I have. I've worked in advertising and the laws around PII are interesting. In California there are the CCPA laws that require them to allow a form to submit and have my PII removed after 90? days.
What happens if these people train a model on a cali residents PII, having a request come in, 3 months later someone asks it about you and it spits out the PII that was "removed"? I'm assuming it's a matter of this going to court to be decided but I'd be curious if any californian legal nerds have some reasons why no one has started trying to target these things for settlements if nothing else?
by Liquix on 9/1/2023, 7:49 AM
How does M$'s legal team accomplish such a feat? Are there layers of linguistic abstraction built up such that only a sufficiently large team (Microsoft, grand jury) has the bandwidth to extract any meaning? Red herrings with gotchas hidden in seemingly innocuous places? Do they just talk in circles and never give an exact answer?
by K0balt on 9/1/2023, 9:52 AM
If it does not specifically exclude that use, then you can be assured that eventually that data will be used in training if it’s useful. If not now, then eventually. That goes for every company warehousing data on the planet.
This should be everyone’s base assumption, and should be basically accurate unless laws are put in place, but even then jurisdictional bypass may make them irrelevant for historical data and will only weakly protect new data.
Welcome to the new oil.
by Delphiza on 9/1/2023, 1:32 PM
There is a difference between personal data and personally identifiable data. In many ways it is unavoidable to use personal data. Predictive text uses personal data (they 'learn' from everyone) - is the sentence and paragraph personal? Search engines record what you have typed in - do you classify your search query as 'personal'? I can copy paste the first comment from this thread - is it personal data. I may be flat out wrong, and shouldn't type anything into a web page ever.
Those four lawyers and three privacy experts didn't seem to come to a conclusion on what personal data is. Does big tech feed data created by people into 'AI' tools? Yes. Does little tech? Yes.
I'm okay to join Mozilla with my pitchfork if I know what it is about. I would like to have people that are clear about how they are looking after my interests, rather than just getting the mob riled up. Use of data, any data, is subject to an agreement. Go on and read the 'legal' that you have agreed to using Hacker News - they have a whole section on how they use personal information. Do we get our pitchforks out for HN, or are they cool?

by ahupp on 9/1/2023, 3:12 PM

Which of these things is "training AI"?

  * an abuse detection system is trained on user behavior and data to flag bad actors for human review
  * When I perform a web search the ranking function is trained all user behavior
  * The email composer will suggest completions based on my own typing history
  * a public chat interface to an LLM is trained on my private emails

"training AI" is an implementation detail that probably doesn't belong in a privacy policy most of the time. Would be nice if Mozilla was more explicit about the behavior they are concerned with.

by spacecadet on 9/1/2023, 10:35 AM
Yes, yes they are, and will. If we think some giant corpo has our best interest in mind and has corporate governance all buttoned up, HA capital H A.
by 1vuio0pswjnm7 on 9/1/2023, 8:59 AM
"130 products"
Does that include LinkedIn. What stops LinkedIn from sharing data with Microsoft.
by orwin on 9/1/2023, 9:23 AM
The easy solution would be that everything created by IA (code, books, letters, art) should be licensed creative common (or something more akin to AGPL). Idc if you use my code to write your own, but you have to share it too.
by Aachen on 9/1/2023, 8:32 AM
> We had four lawyers, three privacy experts, and two campaigners look at Microsoft's new Service Agreement, which will go into effect on 30 September, and none of our experts could tell if Microsoft [will use your data] to train its AI models.
* in the USA, I assume?
With GDPR, if it's not a defined goal then the answer is no. In the USA, I hear things of some states having a similar law now but as a blanket statement without defined region (not even country) I'm not surprised if you can't give a definitive "no".
by hulitu on 9/2/2023, 4:39 PM
> Ask Microsoft: Are you using our personal data to train AI?
According to Microsoft: "Your privacy is very important for us". So the answer is yes.
by edblock on 9/1/2023, 8:31 AM
Could this also affect the self hosted ChatGPT in Azure? Trying to convince everyone to host a model themselves so they can use that data...
by ceva on 9/1/2023, 7:51 AM
Of course, it is going to use it, else you won't be able to use any of their products ... the sad world we are living in
by microflash on 9/1/2023, 2:54 PM
The fact that they turn on photo scanning on the photos uploaded on OneDrive tells me everything.
by bzmrgonz on 9/1/2023, 8:23 PM
Of course they are, why do you think ms office practically begs you to save your documents on onedrive?? The save to onedrive is by default!!
by charcircuit on 9/1/2023, 8:46 AM
>and none of our experts could tell if Microsoft plans on using your personal data to train its AI models
This means nothing. You don't know if someone is going to do something unless they say they are going to do it. No one knows if Bethesda is going to take down every video of Starfield on Youtube tomorrow that is monetized with ads. Sure you can speculate what someone will do, but you will never know for sure.
by treprinum on 9/1/2023, 12:04 PM
I bet Adobe is doing the same as well.
by Am4TIfIsER0ppos on 9/1/2023, 10:24 AM
If a corporation has your data you can be damn sure they are mining it for all it is worth which includes feeding into AI learning these days. They work for the governments helping to surveil us. We "voluntarily" tell google, microsoft, apple, facebook everything that we would object telling to big brother. Boy do I have news for you!
by hsoovp on 9/1/2023, 7:39 AM
Ah yes, Mozilla, the most respectful company when it comes to personal data.
by amai on 9/1/2023, 4:14 PM
Has nobody put MS Service agreement into ChatGPT and asked for an answer?
by say_it_as_it_is on 9/1/2023, 9:44 AM
Would Microsoft really risk training data by government administration? It generates billions in revenue from government accounts and the vast industry supporting government that is compelled to use Microsoft products.
by executesorder66 on 9/1/2023, 11:11 AM
Where are all the assholes who say things like: "I'm loving the new Microsoft", "Microsoft is not the evil company it was back in the 90's", etc.
I'd love you hear how you justify this.
by acqbu on 9/1/2023, 10:34 AM
Official answer: no. Right answer: yes, of course!
by I_am_tiberius on 9/1/2023, 12:16 PM
Thank you Mozilla.
by zoobab on 9/1/2023, 8:48 AM
So they will add a clause in their contract.
by ChatGTP on 9/1/2023, 10:24 AM
Can’t we just ask ChatGPT-4 for a summary since it’s passed the Bar exam ?
by on 9/1/2023, 8:13 AM
undefined
by on 9/1/2023, 8:27 AM
undefined
by dm319 on 9/1/2023, 8:49 AM
So much of modern technology is a trojan horse these days. This is basically the enshittification of cloud services. Just a few years ago people would say you were being a little paranoid if you were worried about your data passing through company's servers unencrypted, but here we are now.
If companies do start training models on what people consider to be private documents, then the issues we already have with AI taking the jobs and purposes of humans is going to become significantly worse.
Scientists working on papers will essentially not be able to trust that their work won't get out before they have published it. A competing research team, asking prompts in the right way, could chance upon a reply that gives them a clue as to what the other team have found or are doing. The same goes for competing companies and engineering teams. Or authors writing the next book in a series. Other people using that trained data could produce a cheap rip-off of that next paper, patent, book using AI.
And that will completely demotivate humans to actually do stuff. Because what's the point? No one will pay you for it, and a poor quality second rate product is obtainable much cheaper.
At that point I think we'll discover what the real limitations of AI are, as we, as a society, have to get used to using it over humans. And I somehow doubt we will be better off.
by fxtentacle on 9/1/2023, 7:38 AM
If nine experts in privacy can't understand what Microsoft does with your data,
then in my opinion a court should step in and declare it void so that Microsoft isn't allowed to use any private data until they get their act together.
If it's so vague that it becomes meaningless that should default to granting no rights. Otherwise, why not publish your all-rights-granting privacy policy in Klingonian in a locked drawer in a toilet basement? ;)
by skilled on 9/1/2023, 8:03 AM
Samsung's ToS,
> The Sites may allow you to share things like comments, photos, messages, or documents with us or with other users. When you share content, you continue to own the intellectual property rights to your content and you are free to share the content with anyone else wherever you want. However, to use your content on our Sites, you need to grant us a license for any content that you create or upload using our Sites. When you upload, transmit, create, post, display or otherwise provide any information, materials, documents, media files or other content on or through our Sites (“User Content”) you grant us an irrevocable, unlimited, worldwide, royalty-free, and non-exclusive license to copy, reproduce, adapt, modify, edit, distribute, translate, publish, publicly perform and publicly display the User Content (“User Content License”), to the full extent allowed by Applicable Law.
It's my understanding that "Sites" is all Samsung products as it is vaguely referenced in the ToS itself.
https://www.samsung.com/us/common/legal/
by dudus on 9/1/2023, 7:46 AM
While we are talking about it... can we make ToS ilegal?
Why do we have to abide by rules while browsing the web? Why do businesses fear litigation so much, they hire lawyers to write and maintain a huge document nobody can ever read or understand?
This is failure from governments that can't set basic rules for human interaction.
All this
by dspillett on 9/1/2023, 8:34 AM
If they don't explicitly say in the ToS that they aren't going to use it in any particular way, then you can be sure that they are if there is potential for commercial gain.
Even if they do explicitly say that they aren't going to use it, I'm going to be sceptical. There will be a nice pile of caveats and exclusions within the legalese, and if not they might just use it anyway and hope they can afford to ride out any resulting legal action if people notice.
by diarrhea on 9/1/2023, 7:37 AM
If we need to ask, don't we know the answer?
by likenesstheft on 9/1/2023, 9:02 PM
Time to cancel my subscription to Microsoft’s online products. Never buying another Windows machine again. Apple or Linux only from here in out. Apple’s software and/or LibreOffice are more than enough.
by on 9/1/2023, 8:58 PM
undefined
by vorticalbox on 9/1/2023, 10:49 AM
All this privacy noise feels a lot like Whataboutism.
Firefox has telemetry and studies enabled by default. Alps while mozzilla call out x, y, z for the same thing.
by Takennickname on 9/1/2023, 7:36 AM
No one from the US is going to see this.
by chepy on 9/1/2023, 8:56 AM
How can we define "personal" data?
Is data from public LinkedIn accounts considered personal?
However, I believe that our Office 365 personal data should be prohibited from being used to train AI, as it is sensitive information.