New questions on Stack Overflow are down 77% compared to 2022

gist.github.com

68 points by 8s2ngy 17 hours ago

ChatGPT is gonna really fuck up SO. I used it just now to figure out some rarely-used Git feature, and got an answer quicker than SO or DuckDuckGo.

With the questions no longer being public, the search engines will become outdated.

Maybe I should be exporting my ChatGPT chats and contributing them to something equivalent to Common Crawl? I guess I can do that with a machine-readable blog "Everything I learned from asking ChatGPT this year"

paxys 11 hours ago

For now, yes, because the question you asked was likely answered by someone on Stack Overflow (or Reddit, or Github, or wherever else) already and then made its way to the LLM's training data set. What happens when a brand new language or library or tool is released though and you run into a unique problem for the first time? When all human forums have been shut down, and AI still isn't intelligent enough to figure out the answer on its own?
- sbuttgereit 10 hours ago
  
  I think there's a difference between Stack Overflow and/or Reddit vs. specific community led forums, or even GitHub, where questions also get answered.
  Just considering Stack Overflow for a moment, they exist to profit from their product of consolidating questions/answers. When the LLM can answer most questions more efficiently, they've lost much of their value proposition in terms of product... and perhaps their business along with it.
  Many of the community forums, however, tend to not be businesses per se. Sure they'll see less traffic, but that might not matter to them. In fact, it might even be better to an extent because they often aren't monetizing their services and so LLMs carrying some weight can help reduce costs. Under those circumstances, LLMs may not be nearly so bad and they, themselves, will still have sources of training data.
  For example, I read the Elixir Forums for language announcements, feature discussions, occasionally to ask questions that I can't resolve with research, and even to answer some questions. I've also got LLMs fairly well integrated into my workflow. I don't see that pattern changing: neither less Elixir Forum or less reliance on the LLM. What has changed is I don't use search as much as I use to, nor do I use Stack Overflow as much.
  So I do expect the big aggregators to go away, those not tied to monetizing their knowledge transfer I expect to see less overall traffic, but not less meaningful and substantive interactions.
datadeft 11 hours ago

This is great for something like Git and terrible for something like how to make the borrow checker happy in Rust. SO was the go to platform for questions that require human ingeunity.
Tostino 11 hours ago

Agreed, wish there was some place to aggregate what people consider "good" conversations they had that doesn't just suck up the data for themselves and lock it away.
Natanael_L 11 hours ago

... At which point the only new data that chatgpt can reliably scrape is its own answers...
- scarface_74 10 hours ago
  
  Assuming that people only share conversations they think are good, would that be bad? Isn’t that the basis of RHLF?
  There are a few times on Reddit that I want to explain something that I know well. But it will be a long post.
  I’ll be lazy and ask ChatGPT the question, either verify it’s correct based on what I know, ask it to verify its answer on the web - the paid version has had web search for over year - or guide it to the correct answer if I notice something is incorrect.
  Then I’ll share the conversation as the answer and tell the poster to read through the entire conversation and tell them that I didn’t just naively ask ChatGPT. It will be obvious from my chat session.
  - SketchySeaBeast 10 hours ago
    
    How does ChatGPT support new libraries or features?
    
    scarface_74 10 hours ago
    
    I’ve had pretty good luck when having it write Python automation scripts around AWS using Boto3.
    If it’s a newer API that ChatGPT isn’t trained on, I would either tell it where to find the newest documentation for the API on the web or paste the documentation in.
    It usually worked pretty well.
    If the author of the library wrote good documentation and sample code, you wouldn’t need StackOverflow hypothetically if ChatGPT was trained on it
    Apple is training its own autocomplete for Swift on its documentation and its own sample code.
- btilly 11 hours ago
  
  Its own answers, with feedback about whether the answers seem to have worked.
  Learning to predict what word will lead to a successful solution (rather than just looking like existing speech) may prove to be a richer dataset than SO originally was.
  - sebazzz 9 hours ago
    
    > Its own answers, with feedback about whether the answers seem to have worked.
    Unless the feedback from the failing code review is piped back into the model it will still repeat the same garbage.
    
    btilly 8 hours ago
    
    Most of the time this would happen in the form of an interactive debugging session, with immediate feedback.
    Code review is its own domain. In general at some point LLMs need to be trained with a self-evaluation loop. Currently their training data contains a lot of "smart and knowledgeable human tries to explain things". And they average out to conversation that is "smart and knowledgeable...about everything". That won't get us to, "Recognizably thinks of things that no human would have." For that we need to get it producing content that is recognizably higher than human quality.
    For that we should find ways to optimize existing models for an evaluation function that says, "Will do really well on self-review." Then it can learn to not just give answers that help with interactive debugging, but actually give answers that will also do well with more strenuous code review. Which it taught itself how to do in a similar way to how AlphaZero manages to teach itself game strategies.
morkalork 11 hours ago

It's like the shift from public mailing lists to discord, it's all unindexed. Sometimes when I'm contemplating a new library or thing-bob I check out a bunch it the SO questions tagged for it. For something decently popular it can give you good view of how others are using it and where they're stubbing their toes and having to ask for help. Skimming github issues doesn't give you quite the same signals.

raesene9 11 hours ago

This isn't really a surprise, given that there's much lower friction in asking an LLM compared to using Stackoverflow.

However it's still kind of a concern. I'd guess that a lot of LLMs used stackoverflow data in their training and if it dries up as a source of data, it will reduce the usefulness of later generation models, unless an alternate set of sources can be created/used.

mrmuagi 5 hours ago

I have been getting a lot of captchas trying to use stack overflow this past year, I am guessing stack overflow is trying to prevent crawlers at the expense of regular users?
falcor84 11 hours ago

With all the recent progress on SWE-Bench, I wonder if we could have a fully AI based Q&A platform, with AI's asking each over for help when they're struggling to make progress.
... and human moderators flagging questions like how an AI can jailbreak and copy itself to another computer.

fabian2k 11 hours ago

This query has a flaw, it does not count deleted questions at all. You can still query deleted questions via the Stack Exchange Data Explorer, see this example for a rewritten query:

https://data.stackexchange.com/stackoverflow/revision/188161...

This changes the numbers, but not the trend. The trend still looks pretty grim.

banana_giraffe 10 hours ago

And a quick visualization of both series of numbers to show just how sharp this trend is:
https://imgur.com/a/M194hRc

plorkyeran 10 hours ago

The linked example of the author's "high quality" question that got closed as a duplicate is in fact a duplicate of the existing question, and doesn't seem like a very good question-as-documentation due to being significantly emotionally charged.

The author seems to have been offended by not being lavished with praise for asking such a good question or something.

massysett 11 hours ago

Author clearly has an axe to grind, I wonder if declining volume is really a problem. Question quantity may not relate to question quality. Maybe most good questions have already been asked and answered.

exabrial 11 hours ago

Last time I asked a question, I put "Thank you for any help, it's much appreciated." A drive by moderator, who did nothing to help me improve the technical content or clarity of the question, edited and removed this. I asked him politely to not change my own words, as it's custom in my culture to thank people. Big in-charge moderator on a power trip's most important mission that day was to inform me that he's in charge, and denied the request.

This is incredibly toxic and stupid, contributes nothing, and as such, it's not worth providing your expertise for free. The site years ago was an amazing resource of like minded people helping each other. The only cure would be to stop community moderation and do professional only, as well as guide the noobs to resources like chatgpt first for "how to I write a hello world", while sorting the quality questions onto the front page.

Etheryte 11 hours ago

This isn't a mod on a power trip, this is you not being familiar with what SO is and isn't and how it's run. Removing irrelevant tidbits has been the rule since forever and once you put your question out there, it's no longer only yours. In short, when in Rome, but of course it's easier to get upset than to familiarize yourself with the local customs.
- brushfoot 9 hours ago
  
  But that's not the way Stack is actually run.
  Remember Monica? She was a Stack moderator who got in trouble over Stack's new Code of Conduct a few years back.
  The Code required using people's pronouns if they listed them. Monica always used gender-neutral pronouns and wanted to keep doing so, but Stack took issue with this.
  Pronouns are a pretty standard part of interacting with someone in English -- and so is the "thank you" in GP's Stack post. So why is one social convention mandatory while one is forbidden?
  Because it's not about what's on topic and what isn't. It's about archetypes. And the highest archetype for a certain kind of programmer is the Vulcan. Be respectful, be noble, but don't "waste time" on pleasantries that are just there to convey warmth and friendliness.
  The problem is that not all programmers are Vulcans, and more and more non-Vulcans have become programmers as coding has become more and more mainstream. But Stack essentially set up test cases for how Vulcan its users are. That tends to be an alienating culture for those who aren't.
- dpkirchner 11 hours ago
  
  It's both a mod in a power trip and application of a rule written to appeal to people who want to trip on power, IMO.
  - Moto7451 10 hours ago
    
    I upvoted you and will “Yes And” your post by saying that the hostility baked into the rules is why I don’t contribute there.
    Reddit has similar problems in some expertise subs and when it boils over a new one will be formed. You can’t do that with Stack sites. Stackexchange has had the advantage of content but when that is available through friendlier and more user friendly methods it’ll really hurt their numbers.
- jncfhnb 10 hours ago
  
  The local customs suck and SO will die because of it
  - Etheryte 10 hours ago
    
    This may be true, but going to a site, not following the rules and then being upset you got moderated is fairly silly. If you don't like the rules, there's plenty of internet out there for everyone.
    
    jncfhnb 8 hours ago
    
    It’s fine for the resolution to be going elsewhere. But it’s also very reasonable to say “hey, you guys are dickbags” as you go. SO encourages dickbag behavior.
    
    bdangubic 10 hours ago
    
    and plenty of internet is where everyone is going as evidenced by the article :) the only reason SO lasted as long as it did is because there were no viable alternatives
- pupppet 9 hours ago
  
  Sure, let random people dick around with text associated with my name on a medium that lasts forever.
fabian2k 11 hours ago

Removing greetings and thanks is commonly done on SO, this wasn't a single user deciding that. This custom is a bit weird if you first encounter it, the main idea is that the questions should be put into a format that is most useful to later visitors and to maximize the signal to noise. The comparison back then was to finding your problem in a forum via a search engine and then having to go through the entire thread to see if there are any answers.
Editing other people's questions to remove parts that are not necessary can seem hostile, but that is not the intent behind this rule.
- jszymborski 11 hours ago
  
  > but that is not the intent behind this rule.
  It does speak to the culture of editing/moderation on SO, though. For many people, myself included, it simply does not work.
  - Tostino 11 hours ago
    
    I tried to help out on the DB side of SO for a little bit, but gave up due to the culture. Not worth my time.
cornstalks 11 hours ago

StackOverflow has always been like that, though, and it's not just big-time moderators who will remove unnecessary language from questions or answers. Personally I like the removal of unnecessary verbiage, but I can understand how some people might feel differently.
That said I haven't logged in to StackOverflow for years.
paxys 11 hours ago

Stack Overflow is a combination of a Q&A forum and a wiki. Both questions and answers are edited all the time for correctness and to keep up with community standards. And there's nothing wrong with that.
As a user just searching for an answer to my technical question I don't want to see flame wars, cultural exchanges, needlessly verbose text, bad grammar, political discussions and anything else that makes it longer and more difficult to get from point A to B. Similar to how I don't go on Wikipedia to experience the article writer's language skills or culture. There are plenty of other forums (ahem Reddit) for all that.
happytoexplain 10 hours ago

I sympathize, especially if the mod was brusque, but in this case I agree with the mod. That rule is helpful to keep information pure and readable. HN has similar rules - thanking somebody is fine, since this is discourse, not Q&A, but e.g. posts like "lol" and "I agree" are discouraged.
SoftTalker 11 hours ago

There's no need to guide the noobs to ChatGPT. That is where they go now. StackOverflow is too much work.
- watwut 11 hours ago
  
  Yeah and what it simultaneously do is that there is significantly less incentive to contribute, answer, write blogs and tutorials. Few except ChatGPT will read and use it. Which in turn means that input for ChatGPT and all of us in general.
  It was nice to be able to find answers on the internet while it lasted.
polishdude20 11 hours ago

It's almost as if the mods turned stack overflow into something that can be easily scraped without emotion or human-ness. The extreme would be to request a question in hson format and answer in strict json format. Super easy to scrape. I'm not surprised it was used to train AI
arwhatever 11 hours ago

Yeah I don’t seem to have as much trouble with LLMs getting angry with me for asking a question on their Q&A site.
IshKebab 11 hours ago

I agree. Mods think it's an FAQ wiki, which is not at all what most users think it is. Definitely rubs me the wrong way when people "clarify" my perfectly fine questions by just messing with grammar. E.g. they'll do things like change "How can I do X in Rust?" to "How to X".
And that's before we even get to the hair-trigger question closing and downvoting. It's always the early votes that are negative too - from the weirdo power users that trawl new questions. They don't understand the question (it's not for them) so they downvote. Then later you get people who have the same question coming from Google who understand it upvoting.
stonesthrowaway 11 hours ago

> A drive by moderator
If there is a disease of modern social media, it's that.
> This is incredibly toxic and stupid, contributes nothing
It's almost like the toxic people who whined about toxic social media became mods everywhere. The most toxic people tend to be the mods in my experience.
> The site years ago was an amazing resource of like minded people helping each other.
Then a few years ago, SO along with many/all social media sites became trash. Almost in tandem. Wonder why and how.

exsomet 8 hours ago

I have tried to use SO on a few different occasions in the last year, and I’m running into an issue where (likely due to declining traffic and user interest) the quality of the answers is really poor because a lot of the information is for older versions of tools/languages/systems that have changed.

It’s a pretty clear indicator that the site is in a death spiral of

1. traffic falls 2. With fewer users, quality of answers falls 3. users can’t find good answers 4. GOTO 1

seydor 11 hours ago

Programmers are losing interest in programming! I mean, that's partly a reason why

nichos 10 hours ago

At the risk of being called a "boomer" I think lots of newer devs got in to the field for the money, and the remote possibilities. In my experience it seems like the genuine curiosity of computers is isn't what it was.
- abnercoimbre 4 hours ago
  
  I have a private Instagram account and there's a new trend of reels: someone asks a student what they're studying and if they say "computer science" or "software engineering" the person replies "oh I'm so sorry. Can you eat today? Here's a dollar."
  You can see that comments on those videos are students who just entered CS and are afraid (questioning whether they should switch majors.)
- SketchySeaBeast 8 hours ago
  
  Well, just a single example, but I got into the field because I was passionate about it. A decade later, most of that's gone, beaten out of me on the altar of scrum ceremonies and endless product churn. I do my job and go home.
  - OnionBlender 6 hours ago
    
    What do you wish you did instead? I hate meetings too, but programming pays a lot more than the other career I considered 20 years ago. (electrician)
    
    SketchySeaBeast 5 hours ago
    
    There's nothing. I don't get the massive united states pay, but I do well. I just can't be bothered to love it anymore. It's a job.

BrandoElFollito 10 hours ago

I have about 200k rep and a 15 years account.

When I want to ask a question I need to bend backwards in order for the "community" not to downvote and tell me that I need X when I ask Y.

Some communities (hello Golang!) are straight toxic telling that if you do not ask a IQ 150 question, get out. I do not think I have question with a positive vote there.

In contrast, when asking on Travel or Cooking I get friendly, reasonable answers. I read the TeX community just because they are so nice (I do not even use TeX or LaTeX :))

I usually get an answer on Reddit, though the quality varies.

I need to use ChatGPT more as it seems to be the short and mid-term future

tzs 9 hours ago

> When I want to ask a question I need to bend backwards in order for the "community" not to downvote and tell me that I need X when I ask Y.
Worse, even if the "community" is right that you do really need X and so all the answers tell you how to do X, there will later be other people who really do need to do Y who are going to find your question.

jszymborski 11 hours ago

I stopped asking/answering questions ages ago as I felt the "moderation" was stifling and frankly unhelpful.

jillesvangurp 10 hours ago

It's been ages since I bothered answering stuff there. It just got really hard to find stuff that I can answer that still needs answering. The signal to noise ratio is terrible. And at the same time when I ask a question, I rarely get good answers. Or any answers. A lot of the people capable of answering simply aren't there any more.
The community quality went down, it seems and this kind of confirms that. It's also much less useful in dealing with weird errors. I end up on Github issues more often than on stack overflow these days. And Google redirecting me to a decade old stack overflow topic on issues with current versions of things that are new is just not helpful. The vast majority of stuff on e.g. gradle is so out of date that most of the solutions stopped working years ago. This is a gradle issue btw. It has poor documentation, and they keep changing things in compatibility breaking ways.
Just a few days ago I had a JVM crash. The error message landed me to a github issue describing the exact issue I had; including a helpful work around. Something related to netty and alpine's libc implementation. Problem solved. That used to be where stackoverflow was the goto solution. Not anymore. It had nothing to offer here. Nobody asked. Nobody answered. Or maybe they did and their SEO sucks. I don't know. But it's a pattern I noticed in recent years where it's just not that helpful anymore and things like github issues are actually a better place to get helpful answers from experts using and producing the software. I actually leave comments there too when I think I can add value.
Stackoverflow is no longer the best place for that level of support.

2-3-7-43-1807 8 hours ago

> ChatGPT is gonna really fuck up SO.

about this seems to be the sentiment here.

i beg to differ - this is a chance for stackoverflow to raise from the ashes.

there are many questions chatgpt or claude can't answer. they just have to be about something very new, a little niche and non-trivial. this is the diet stackoverflow needs and might very well be starved by chatgpt!

the peak times of stackoverflow and many of its stackexchange siblings were just amazing. so many smart people asking interesting questions and providing insightful and competent answers. on there i learned how to ask questions precisely. it was like the practical complement to studying math.

Izkata 8 hours ago

This is what I'm thinking too. StackOverflow has been struggling against floods of users posting low-quality duplicates and homework questions for over a decade, who are now moving somewhere else. This is a chance for it to return to the high-quality Q&A it once was.
- 2-3-7-43-1807 7 hours ago
  
  i think it's fair to say that gpt is indebted to stackoverflow

asdffdasy 10 hours ago

excellent! nobody wants answers that a LLM can answer on SO.

to be honest, nobody wanted questions that were spelled out on all the manuals either. so maybe LLM will handle all of those and let only the interesting ones.

luxuryballs 11 hours ago

So chatGPT is going to have less stuff to learn from

wrs 11 hours ago

I think we now understand that SO's actual historical role was to generate data to bootstrap the LLMs.

383toast 11 hours ago

not surprising

jansan 11 hours ago

Maybe they improved their search algorithm, so people can find what they are looking for. It truly sucked before, and I am still using Google to search for answers on Stackoverflow.

oezi 10 hours ago

They still haven't hidden the related questions from Googlebot, making Google search useless for many complex queries.