Skip to main content
That was not meaningful.
Source Link
wizzwizz4
  • 34k
  • 8
  • 72
  • 121
  • You've clearly set out the original goals of the project.

    • Onboarding new users who have questions. This involves:
      • Distinguishing between novel questions, and those already answered on the network, so that they can be handled differently.

        • This has been the goal of quite a lot of design decisions in the past, especially as regards the Ask Question flow. A chat interface has the potential to do a much better job, especially if we have systems that can identify when a question is novel.
        • A system that can identify when questions are novel could be repurposed in other ways, such as duplicate detection. However, unless you're constructing an auxiliary database (à la Wikidata or Wikifunctions), the low-hanging fruit for duplicate detection can be accomplished better and more easily by a domain expert using a basic search engine.
        • Novice askers often require more support than experienced askers, and different genres of question require different templates. A chat interface combining fact-finding (à la the Ask Wizard) and FAQs, perhaps with some rules to catch common errors (like a real-time Staging Ground lite), could cover some bases that the current form-based Ask Wizard doesn't.
      • Presenting users with existing material, in a form where they understand that it solves their problems.

        • A system that provides users with incorrect information is worse than useless: it's actively harmful. Proper attribution allows us to remove that information from the database (see Bryan Krause's answer), and – more importantly – prevents the AI agent from confabulating new and exciting errors. (Skeptics Stack Exchange can handle the same inaccurate claim repeated widely, but would not be able to cope with many different inaccurate claims repeated a few times each.)
      • Imparting expertise to users, so they need less hand-holding in future.

        To use an example from programming: many newbies don't really get that variable names are functionally irrelevant, nor how completely the computer ignores comments and style choices, so if an example looks too different from their code, they can't interpret it. This skill can be learned, but some people need a bit of a push.

        • This is teaching, and therefore hard. I'd be tempted to declare this out of scope, although there are ways that a chat interface could help with this: see, for example, Rust error codes (which are conceptually a dialogue between teacher and student – see E0562 or E0565). Future versions of stackoverflow.ai could do this kind of thing.

        • Next-token prediction systems are particularly bad at teaching, because they do not possess the requisite ability to model human psychology. This is a skill that precious few humans possess – although many teachers who don't have this skill can still get good outcomes by using and adapting the work of those who do (which is a skill in itself).

        • Y'know what is good at teaching, in text form? Books! (And written explanations, more generally.) A good book can explain things as well as, or even better than, a teacher, especially when you start getting deep into a topic (where not much is fundamentals any more, and readers who don't immediately understand an explanation can usually work it out themselves). But finding good books is quite hard. And Stack Exchange is a sort of library…

        • Stack Exchange is not currently well-suited for beginner questions. When people ask a question that's already been answered, we usually close it as a duplicate (and rightly so!), so encouraging such users to post new questions is (as it stands) the wrong approach. However, beginners often require things to be explained in multiple ways, before it clicks. Even if one question has multiple answers from different perspectives, the UI isn't particularly suited for that.

          I suspect that Q&A pairs aren't the right way to represent beginner-help: instead, it should be more like a decision tree, where we try to identify what misunderstandings a user has, and address them. Handling this manually gets quite old, since most people have the same few misconceptions: a computer could handle this part. But, some people have rarer misconceptions: these could be directed to the community, and then worked into the decision tree once addressed.

          As far as getting the rarer misconceptions addressed, it might be possible to shoe-horn this into the existing Q&A system, by changing the duplicate system. (Duplicates that can remain open? Or perhaps a policy change would suffice, if we can reliably ensure that the different misconceptions are clear in a question's body.)

    • Imitating ChatGPTs' interfaces, for familiarity.
      • I'm not sure why "conversational search and discovery" has an additional list item, since this seems to me like the same thing. (Functional specification versus implementation?)
    • Competing with ChatGPTs, by being more useful.
      • I think focusing on differentiation, and playing to our strengths (not competing with theirs), is key here: I'm really glad you're moving in this direction. An OverflowAI that was just a ChatGPT was, I think, a huge mistake.
  • You've finally acknowledged that LLM output is neither attributed, nor really attributable. Although,

    • LLMs cannot return attribution reliably

      GPT models cannot return attribution at all. I'm still trying to wrap my head around what attribution would even mean for GPT output. Next-token generative language models compress the space of prose in a way that makes low-frequency provenance information rather difficult to preserve, even in principle – and while high-frequency / local provenance information could in principle be preserved, the GPT architecture doesn't even try to preserve it. (I expect quantisation-like schemes could reduce high-frequency provenance overhead to manageable levels in the final model, but I think you'd have to do something clever to train an attributing model without a factor-of-a-billion overhead.)

      Embedding all posts (or, all paragraphs?) on the network into a vector space with useful similarity properties would cut the provenance overhead from exponential (i.e., linear space) to linear (i.e., constant space). This scheme only allows you to train a language model to fake provenance quite well, which isn't attribution either: that's essentially just a search algorithm. (We're back where we started: I don't expect this to be better than more traditional search algorithms.)

    • analyzes for correctness and comprehensiveness in order to supplement it with knowledge from the LLM

      There is no "knowledge from the LLM". That knowledge is always from somewhere else. (The rare exceptions, novel valid connections between ideas that the language model has made, are drowned out by the novel invalid connections that the language model has made: philosophically, I'd argue that this is not knowledge.) Maybe you still don't quite get it, yet.

  • Your implementation is still deficient:

    A response is created using multiple steps via RAG + multiple rounds of LLM processing

    We created an AI Agent to act as an “answer auditor”: it reads the user’s search, the quotes from SO & SE content, and analyzes for correctness and comprehensiveness

    You're using the generative model as a "god of the gaps". Anything you don't (yet) know how to do properly, you're giving to the language model. And while the condemn this approach equally wellLLM introduces significant problems, I cannot find it in me to be upset about itthis approach: if it'ssomething's worth doingmaking, it's worth doingmaking badly. Where you aren't familiar with the existing techniques for producing chat-like interfaces (and there is copious literature on the subject), filling in the gaps with what you have to hand… kinda makes sense?

    But all the criticisms that the phrase "god of the gaps" was originally coined to describe apply to this approach just as well. There are better ways to fill in these gaps, and I hope you'll take them just as soon as you know what they are.

  • You've identified some ways people are using stackoverflow.ai. These include:

    • traditional technical searches

      • help with error messages,

      • how to build certain functions,

      • what code snippets do

    • comparing different approaches and libraries

    • asking for helping architecting and structuring apps

    • learning about different libraries and concepts.

    • The majority of queries are technical

    This is extremely valuable information: you can use it as phase 1 of a Wizard of Oz design. However, I don't think you benefit much from keeping the information secret, since only a few players are currently positioned to take advantage of it, and they're all better able to gather it than you are.

    Letting us at this dataset (redacted, of course) for mostly-manual perusal would let us construct expert systems, which could be chained together à la DuckDuckHack. Imagine a system like Alexa, but with the cohesiveness (and limited scope) of the Linux kernel. Making something like this work, and work well, requires identifying the low-hanging fruit: it's one great big application of the 80/20 rule.

  • The demographic also shows that it's a different set of users than stackoverflow.com, so there are good signs here for acquiring new community members over time.

    This doesn't follow. We've long known that most users never actively interact with the site (this is a good thing, for much the same reason that many readers are not authors). There's no reason to believe you can – or, more pertinently, should – be "acquiring" them as community members. (As users, maybe: that's your choice whether to push them to register accounts, so long as it doesn't hurt the community.)

  • You've clearly set out the original goals of the project.

    • Onboarding new users who have questions. This involves:
      • Distinguishing between novel questions, and those already answered on the network, so that they can be handled differently.

        • This has been the goal of quite a lot of design decisions in the past, especially as regards the Ask Question flow. A chat interface has the potential to do a much better job, especially if we have systems that can identify when a question is novel.
        • A system that can identify when questions are novel could be repurposed in other ways, such as duplicate detection. However, unless you're constructing an auxiliary database (à la Wikidata or Wikifunctions), the low-hanging fruit for duplicate detection can be accomplished better and more easily by a domain expert using a basic search engine.
        • Novice askers often require more support than experienced askers, and different genres of question require different templates. A chat interface combining fact-finding (à la the Ask Wizard) and FAQs, perhaps with some rules to catch common errors (like a real-time Staging Ground lite), could cover some bases that the current form-based Ask Wizard doesn't.
      • Presenting users with existing material, in a form where they understand that it solves their problems.

        • A system that provides users with incorrect information is worse than useless: it's actively harmful. Proper attribution allows us to remove that information from the database (see Bryan Krause's answer), and – more importantly – prevents the AI agent from confabulating new and exciting errors. (Skeptics Stack Exchange can handle the same inaccurate claim repeated widely, but would not be able to cope with many different inaccurate claims repeated a few times each.)
      • Imparting expertise to users, so they need less hand-holding in future.

        To use an example from programming: many newbies don't really get that variable names are functionally irrelevant, nor how completely the computer ignores comments and style choices, so if an example looks too different from their code, they can't interpret it. This skill can be learned, but some people need a bit of a push.

        • This is teaching, and therefore hard. I'd be tempted to declare this out of scope, although there are ways that a chat interface could help with this: see, for example, Rust error codes (which are conceptually a dialogue between teacher and student – see E0562 or E0565). Future versions of stackoverflow.ai could do this kind of thing.

        • Next-token prediction systems are particularly bad at teaching, because they do not possess the requisite ability to model human psychology. This is a skill that precious few humans possess – although many teachers who don't have this skill can still get good outcomes by using and adapting the work of those who do (which is a skill in itself).

        • Y'know what is good at teaching, in text form? Books! (And written explanations, more generally.) A good book can explain things as well as, or even better than, a teacher, especially when you start getting deep into a topic (where not much is fundamentals any more, and readers who don't immediately understand an explanation can usually work it out themselves). But finding good books is quite hard. And Stack Exchange is a sort of library…

        • Stack Exchange is not currently well-suited for beginner questions. When people ask a question that's already been answered, we usually close it as a duplicate (and rightly so!), so encouraging such users to post new questions is (as it stands) the wrong approach. However, beginners often require things to be explained in multiple ways, before it clicks. Even if one question has multiple answers from different perspectives, the UI isn't particularly suited for that.

          I suspect that Q&A pairs aren't the right way to represent beginner-help: instead, it should be more like a decision tree, where we try to identify what misunderstandings a user has, and address them. Handling this manually gets quite old, since most people have the same few misconceptions: a computer could handle this part. But, some people have rarer misconceptions: these could be directed to the community, and then worked into the decision tree once addressed.

          As far as getting the rarer misconceptions addressed, it might be possible to shoe-horn this into the existing Q&A system, by changing the duplicate system. (Duplicates that can remain open? Or perhaps a policy change would suffice, if we can reliably ensure that the different misconceptions are clear in a question's body.)

    • Imitating ChatGPTs' interfaces, for familiarity.
      • I'm not sure why "conversational search and discovery" has an additional list item, since this seems to me like the same thing. (Functional specification versus implementation?)
    • Competing with ChatGPTs, by being more useful.
      • I think focusing on differentiation, and playing to our strengths (not competing with theirs), is key here: I'm really glad you're moving in this direction. An OverflowAI that was just a ChatGPT was, I think, a huge mistake.
  • You've finally acknowledged that LLM output is neither attributed, nor really attributable. Although,

    • LLMs cannot return attribution reliably

      GPT models cannot return attribution at all. I'm still trying to wrap my head around what attribution would even mean for GPT output. Next-token generative language models compress the space of prose in a way that makes low-frequency provenance information rather difficult to preserve, even in principle – and while high-frequency / local provenance information could in principle be preserved, the GPT architecture doesn't even try to preserve it. (I expect quantisation-like schemes could reduce high-frequency provenance overhead to manageable levels in the final model, but I think you'd have to do something clever to train an attributing model without a factor-of-a-billion overhead.)

      Embedding all posts (or, all paragraphs?) on the network into a vector space with useful similarity properties would cut the provenance overhead from exponential (i.e., linear space) to linear (i.e., constant space). This scheme only allows you to train a language model to fake provenance quite well, which isn't attribution either: that's essentially just a search algorithm. (We're back where we started: I don't expect this to be better than more traditional search algorithms.)

    • analyzes for correctness and comprehensiveness in order to supplement it with knowledge from the LLM

      There is no "knowledge from the LLM". That knowledge is always from somewhere else. (The rare exceptions, novel valid connections between ideas that the language model has made, are drowned out by the novel invalid connections that the language model has made: philosophically, I'd argue that this is not knowledge.) Maybe you still don't quite get it, yet.

  • Your implementation is still deficient:

    A response is created using multiple steps via RAG + multiple rounds of LLM processing

    We created an AI Agent to act as an “answer auditor”: it reads the user’s search, the quotes from SO & SE content, and analyzes for correctness and comprehensiveness

    You're using the generative model as a "god of the gaps". Anything you don't (yet) know how to do properly, you're giving to the language model. And while the condemn this approach equally well, I cannot find it in me to be upset about it: if it's worth doing, it's worth doing badly. Where you aren't familiar with the existing techniques for producing chat-like interfaces (and there is copious literature on the subject), filling in the gaps with what you have to hand… kinda makes sense?

    But all the criticisms that the phrase "god of the gaps" was originally coined to describe apply to this approach just as well. There are better ways to fill in these gaps, and I hope you'll take them just as soon as you know what they are.

  • You've identified some ways people are using stackoverflow.ai. These include:

    • traditional technical searches

      • help with error messages,

      • how to build certain functions,

      • what code snippets do

    • comparing different approaches and libraries

    • asking for helping architecting and structuring apps

    • learning about different libraries and concepts.

    • The majority of queries are technical

    This is extremely valuable information: you can use it as phase 1 of a Wizard of Oz design. However, I don't think you benefit much from keeping the information secret, since only a few players are currently positioned to take advantage of it, and they're all better able to gather it than you are.

    Letting us at this dataset (redacted, of course) for mostly-manual perusal would let us construct expert systems, which could be chained together à la DuckDuckHack. Imagine a system like Alexa, but with the cohesiveness (and limited scope) of the Linux kernel. Making something like this work, and work well, requires identifying the low-hanging fruit: it's one great big application of the 80/20 rule.

  • The demographic also shows that it's a different set of users than stackoverflow.com, so there are good signs here for acquiring new community members over time.

    This doesn't follow. We've long known that most users never actively interact with the site (this is a good thing, for much the same reason that many readers are not authors). There's no reason to believe you can – or, more pertinently, should – be "acquiring" them as community members. (As users, maybe: that's your choice whether to push them to register accounts, so long as it doesn't hurt the community.)

  • You've clearly set out the original goals of the project.

    • Onboarding new users who have questions. This involves:
      • Distinguishing between novel questions, and those already answered on the network, so that they can be handled differently.

        • This has been the goal of quite a lot of design decisions in the past, especially as regards the Ask Question flow. A chat interface has the potential to do a much better job, especially if we have systems that can identify when a question is novel.
        • A system that can identify when questions are novel could be repurposed in other ways, such as duplicate detection. However, unless you're constructing an auxiliary database (à la Wikidata or Wikifunctions), the low-hanging fruit for duplicate detection can be accomplished better and more easily by a domain expert using a basic search engine.
        • Novice askers often require more support than experienced askers, and different genres of question require different templates. A chat interface combining fact-finding (à la the Ask Wizard) and FAQs, perhaps with some rules to catch common errors (like a real-time Staging Ground lite), could cover some bases that the current form-based Ask Wizard doesn't.
      • Presenting users with existing material, in a form where they understand that it solves their problems.

        • A system that provides users with incorrect information is worse than useless: it's actively harmful. Proper attribution allows us to remove that information from the database (see Bryan Krause's answer), and – more importantly – prevents the AI agent from confabulating new and exciting errors. (Skeptics Stack Exchange can handle the same inaccurate claim repeated widely, but would not be able to cope with many different inaccurate claims repeated a few times each.)
      • Imparting expertise to users, so they need less hand-holding in future.

        To use an example from programming: many newbies don't really get that variable names are functionally irrelevant, nor how completely the computer ignores comments and style choices, so if an example looks too different from their code, they can't interpret it. This skill can be learned, but some people need a bit of a push.

        • This is teaching, and therefore hard. I'd be tempted to declare this out of scope, although there are ways that a chat interface could help with this: see, for example, Rust error codes (which are conceptually a dialogue between teacher and student – see E0562 or E0565). Future versions of stackoverflow.ai could do this kind of thing.

        • Next-token prediction systems are particularly bad at teaching, because they do not possess the requisite ability to model human psychology. This is a skill that precious few humans possess – although many teachers who don't have this skill can still get good outcomes by using and adapting the work of those who do (which is a skill in itself).

        • Y'know what is good at teaching, in text form? Books! (And written explanations, more generally.) A good book can explain things as well as, or even better than, a teacher, especially when you start getting deep into a topic (where not much is fundamentals any more, and readers who don't immediately understand an explanation can usually work it out themselves). But finding good books is quite hard. And Stack Exchange is a sort of library…

        • Stack Exchange is not currently well-suited for beginner questions. When people ask a question that's already been answered, we usually close it as a duplicate (and rightly so!), so encouraging such users to post new questions is (as it stands) the wrong approach. However, beginners often require things to be explained in multiple ways, before it clicks. Even if one question has multiple answers from different perspectives, the UI isn't particularly suited for that.

          I suspect that Q&A pairs aren't the right way to represent beginner-help: instead, it should be more like a decision tree, where we try to identify what misunderstandings a user has, and address them. Handling this manually gets quite old, since most people have the same few misconceptions: a computer could handle this part. But, some people have rarer misconceptions: these could be directed to the community, and then worked into the decision tree once addressed.

          As far as getting the rarer misconceptions addressed, it might be possible to shoe-horn this into the existing Q&A system, by changing the duplicate system. (Duplicates that can remain open? Or perhaps a policy change would suffice, if we can reliably ensure that the different misconceptions are clear in a question's body.)

    • Imitating ChatGPTs' interfaces, for familiarity.
      • I'm not sure why "conversational search and discovery" has an additional list item, since this seems to me like the same thing. (Functional specification versus implementation?)
    • Competing with ChatGPTs, by being more useful.
      • I think focusing on differentiation, and playing to our strengths (not competing with theirs), is key here: I'm really glad you're moving in this direction. An OverflowAI that was just a ChatGPT was, I think, a huge mistake.
  • You've finally acknowledged that LLM output is neither attributed, nor really attributable. Although,

    • LLMs cannot return attribution reliably

      GPT models cannot return attribution at all. I'm still trying to wrap my head around what attribution would even mean for GPT output. Next-token generative language models compress the space of prose in a way that makes low-frequency provenance information rather difficult to preserve, even in principle – and while high-frequency / local provenance information could in principle be preserved, the GPT architecture doesn't even try to preserve it. (I expect quantisation-like schemes could reduce high-frequency provenance overhead to manageable levels in the final model, but I think you'd have to do something clever to train an attributing model without a factor-of-a-billion overhead.)

      Embedding all posts (or, all paragraphs?) on the network into a vector space with useful similarity properties would cut the provenance overhead from exponential (i.e., linear space) to linear (i.e., constant space). This scheme only allows you to train a language model to fake provenance quite well, which isn't attribution either: that's essentially just a search algorithm. (We're back where we started: I don't expect this to be better than more traditional search algorithms.)

    • analyzes for correctness and comprehensiveness in order to supplement it with knowledge from the LLM

      There is no "knowledge from the LLM". That knowledge is always from somewhere else. (The rare exceptions, novel valid connections between ideas that the language model has made, are drowned out by the novel invalid connections that the language model has made: philosophically, I'd argue that this is not knowledge.) Maybe you still don't quite get it, yet.

  • Your implementation is still deficient:

    A response is created using multiple steps via RAG + multiple rounds of LLM processing

    We created an AI Agent to act as an “answer auditor”: it reads the user’s search, the quotes from SO & SE content, and analyzes for correctness and comprehensiveness

    You're using the generative model as a "god of the gaps". Anything you don't (yet) know how to do properly, you're giving to the language model. And while the LLM introduces significant problems, I cannot find it in me to be upset about this approach: if something's worth making, it's worth making badly. Where you aren't familiar with the existing techniques for producing chat-like interfaces (and there is copious literature on the subject), filling in the gaps with what you have to hand… kinda makes sense?

    But all the criticisms that the phrase "god of the gaps" was originally coined to describe apply to this approach just as well. There are better ways to fill in these gaps, and I hope you'll take them just as soon as you know what they are.

  • You've identified some ways people are using stackoverflow.ai. These include:

    • traditional technical searches

      • help with error messages,

      • how to build certain functions,

      • what code snippets do

    • comparing different approaches and libraries

    • asking for helping architecting and structuring apps

    • learning about different libraries and concepts.

    • The majority of queries are technical

    This is extremely valuable information: you can use it as phase 1 of a Wizard of Oz design. However, I don't think you benefit much from keeping the information secret, since only a few players are currently positioned to take advantage of it, and they're all better able to gather it than you are.

    Letting us at this dataset (redacted, of course) for mostly-manual perusal would let us construct expert systems, which could be chained together à la DuckDuckHack. Imagine a system like Alexa, but with the cohesiveness (and limited scope) of the Linux kernel. Making something like this work, and work well, requires identifying the low-hanging fruit: it's one great big application of the 80/20 rule.

  • The demographic also shows that it's a different set of users than stackoverflow.com, so there are good signs here for acquiring new community members over time.

    This doesn't follow. We've long known that most users never actively interact with the site (this is a good thing, for much the same reason that many readers are not authors). There's no reason to believe you can – or, more pertinently, should – be "acquiring" them as community members. (As users, maybe: that's your choice whether to push them to register accounts, so long as it doesn't hurt the community.)

Source Link
wizzwizz4
  • 34k
  • 8
  • 72
  • 121

The current implementation looks quite bad, but there's a lot to praise in this announcement. It also seems I have many thoughts about it. (My kingdom for <details> / <summary>!)

  • You've clearly set out the original goals of the project.

    • Onboarding new users who have questions. This involves:
      • Distinguishing between novel questions, and those already answered on the network, so that they can be handled differently.

        • This has been the goal of quite a lot of design decisions in the past, especially as regards the Ask Question flow. A chat interface has the potential to do a much better job, especially if we have systems that can identify when a question is novel.
        • A system that can identify when questions are novel could be repurposed in other ways, such as duplicate detection. However, unless you're constructing an auxiliary database (à la Wikidata or Wikifunctions), the low-hanging fruit for duplicate detection can be accomplished better and more easily by a domain expert using a basic search engine.
        • Novice askers often require more support than experienced askers, and different genres of question require different templates. A chat interface combining fact-finding (à la the Ask Wizard) and FAQs, perhaps with some rules to catch common errors (like a real-time Staging Ground lite), could cover some bases that the current form-based Ask Wizard doesn't.
      • Presenting users with existing material, in a form where they understand that it solves their problems.

        • A system that provides users with incorrect information is worse than useless: it's actively harmful. Proper attribution allows us to remove that information from the database (see Bryan Krause's answer), and – more importantly – prevents the AI agent from confabulating new and exciting errors. (Skeptics Stack Exchange can handle the same inaccurate claim repeated widely, but would not be able to cope with many different inaccurate claims repeated a few times each.)
      • Imparting expertise to users, so they need less hand-holding in future.

        To use an example from programming: many newbies don't really get that variable names are functionally irrelevant, nor how completely the computer ignores comments and style choices, so if an example looks too different from their code, they can't interpret it. This skill can be learned, but some people need a bit of a push.

        • This is teaching, and therefore hard. I'd be tempted to declare this out of scope, although there are ways that a chat interface could help with this: see, for example, Rust error codes (which are conceptually a dialogue between teacher and student – see E0562 or E0565). Future versions of stackoverflow.ai could do this kind of thing.

        • Next-token prediction systems are particularly bad at teaching, because they do not possess the requisite ability to model human psychology. This is a skill that precious few humans possess – although many teachers who don't have this skill can still get good outcomes by using and adapting the work of those who do (which is a skill in itself).

        • Y'know what is good at teaching, in text form? Books! (And written explanations, more generally.) A good book can explain things as well as, or even better than, a teacher, especially when you start getting deep into a topic (where not much is fundamentals any more, and readers who don't immediately understand an explanation can usually work it out themselves). But finding good books is quite hard. And Stack Exchange is a sort of library…

        • Stack Exchange is not currently well-suited for beginner questions. When people ask a question that's already been answered, we usually close it as a duplicate (and rightly so!), so encouraging such users to post new questions is (as it stands) the wrong approach. However, beginners often require things to be explained in multiple ways, before it clicks. Even if one question has multiple answers from different perspectives, the UI isn't particularly suited for that.

          I suspect that Q&A pairs aren't the right way to represent beginner-help: instead, it should be more like a decision tree, where we try to identify what misunderstandings a user has, and address them. Handling this manually gets quite old, since most people have the same few misconceptions: a computer could handle this part. But, some people have rarer misconceptions: these could be directed to the community, and then worked into the decision tree once addressed.

          As far as getting the rarer misconceptions addressed, it might be possible to shoe-horn this into the existing Q&A system, by changing the duplicate system. (Duplicates that can remain open? Or perhaps a policy change would suffice, if we can reliably ensure that the different misconceptions are clear in a question's body.)

    • Imitating ChatGPTs' interfaces, for familiarity.
      • I'm not sure why "conversational search and discovery" has an additional list item, since this seems to me like the same thing. (Functional specification versus implementation?)
    • Competing with ChatGPTs, by being more useful.
      • I think focusing on differentiation, and playing to our strengths (not competing with theirs), is key here: I'm really glad you're moving in this direction. An OverflowAI that was just a ChatGPT was, I think, a huge mistake.
  • You've finally acknowledged that LLM output is neither attributed, nor really attributable. Although,

    • LLMs cannot return attribution reliably

      GPT models cannot return attribution at all. I'm still trying to wrap my head around what attribution would even mean for GPT output. Next-token generative language models compress the space of prose in a way that makes low-frequency provenance information rather difficult to preserve, even in principle – and while high-frequency / local provenance information could in principle be preserved, the GPT architecture doesn't even try to preserve it. (I expect quantisation-like schemes could reduce high-frequency provenance overhead to manageable levels in the final model, but I think you'd have to do something clever to train an attributing model without a factor-of-a-billion overhead.)

      Embedding all posts (or, all paragraphs?) on the network into a vector space with useful similarity properties would cut the provenance overhead from exponential (i.e., linear space) to linear (i.e., constant space). This scheme only allows you to train a language model to fake provenance quite well, which isn't attribution either: that's essentially just a search algorithm. (We're back where we started: I don't expect this to be better than more traditional search algorithms.)

    • analyzes for correctness and comprehensiveness in order to supplement it with knowledge from the LLM

      There is no "knowledge from the LLM". That knowledge is always from somewhere else. (The rare exceptions, novel valid connections between ideas that the language model has made, are drowned out by the novel invalid connections that the language model has made: philosophically, I'd argue that this is not knowledge.) Maybe you still don't quite get it, yet.

  • Your implementation is still deficient:

    A response is created using multiple steps via RAG + multiple rounds of LLM processing

    We created an AI Agent to act as an “answer auditor”: it reads the user’s search, the quotes from SO & SE content, and analyzes for correctness and comprehensiveness

    You're using the generative model as a "god of the gaps". Anything you don't (yet) know how to do properly, you're giving to the language model. And while the condemn this approach equally well, I cannot find it in me to be upset about it: if it's worth doing, it's worth doing badly. Where you aren't familiar with the existing techniques for producing chat-like interfaces (and there is copious literature on the subject), filling in the gaps with what you have to hand… kinda makes sense?

    But all the criticisms that the phrase "god of the gaps" was originally coined to describe apply to this approach just as well. There are better ways to fill in these gaps, and I hope you'll take them just as soon as you know what they are.

  • You've identified some ways people are using stackoverflow.ai. These include:

    • traditional technical searches

      • help with error messages,

      • how to build certain functions,

      • what code snippets do

    • comparing different approaches and libraries

    • asking for helping architecting and structuring apps

    • learning about different libraries and concepts.

    • The majority of queries are technical

    This is extremely valuable information: you can use it as phase 1 of a Wizard of Oz design. However, I don't think you benefit much from keeping the information secret, since only a few players are currently positioned to take advantage of it, and they're all better able to gather it than you are.

    Letting us at this dataset (redacted, of course) for mostly-manual perusal would let us construct expert systems, which could be chained together à la DuckDuckHack. Imagine a system like Alexa, but with the cohesiveness (and limited scope) of the Linux kernel. Making something like this work, and work well, requires identifying the low-hanging fruit: it's one great big application of the 80/20 rule.

  • The demographic also shows that it's a different set of users than stackoverflow.com, so there are good signs here for acquiring new community members over time.

    This doesn't follow. We've long known that most users never actively interact with the site (this is a good thing, for much the same reason that many readers are not authors). There's no reason to believe you can – or, more pertinently, should – be "acquiring" them as community members. (As users, maybe: that's your choice whether to push them to register accounts, so long as it doesn't hurt the community.)