1. Home
2. Questions
3. Unanswered
4. Tags
6. Chat
7. Users
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Content filtering for the AI Assist feature comes too late

Ask Question

Asked 3 days ago

Modified 3 days ago

Viewed 201 times

22

After realizing — by the service own admission — that the AI Assist is able to provide answers about topics covered across the whole network, I decided to have some fun asking a question taken straight from Arqade.SE.

How much damage do Ancient Arrows do? (In Breath of the Wild)

I tested this twice and both times the service first referenced the actual content from the question, posted an answer… and then a second after removed it, replacing it with a warning:

Sorry, I can't answer that. To ask a new question, please start a new chat. Try asking about coding, development, or topics on the Stack Exchange network.

This seems like a form of self-censorship, probably triggered by the fact that the source material contains "sensible" words like "killing" (mobs in a video game -_-'), "bomb" (still in the video game), and so on.

If this is indeed the case I fear we have two issues:

first, the bot censorship logic fails to differentiate between legit usage of "sensible" words and topics in the context of a game compared to, for example, asking how to build a bomb in real life. I wonder, what would happen if I asked "Why did Mami own a book about building bombs in Scene 0 of Puella Magi Madoka Magica"?
second, and far more important, whoever built this features lacks an understanding that attempting to censor the content generated by a LLM after it was already sent to the user isn't exactly the smartest move and it is not going to do much since the user already got a reply they weren't supposed to get.

I advise reading "Case 87: The Concubine's Fog" of the Codeless Code collection.

Said Banzen, “Fog makes an excellent curtain, but a poor wall.”

edited Dec 4 at 0:43

Elements In Space

6,9873 gold badges17 silver badges51 bronze badges

asked Dec 3 at 18:14

ꓢPArcheon

39.8k6 gold badges85 silver badges159 bronze badges

2

What's worse is when asking about on-topic programming questions, it also goes to this "Sorry I'm going to lay down and cry" mode. After asking an advanced programming question, I'm just getting mostly unrelated nonsense as reply. Then I ask it to clarify itself. More unrelated nonsense. Clarification. "Sorry I'm going to lay down and cry". It gives up. This can be easily avoided by using chatgpt.com instead.

Lundin
– Lundin

2025-12-04 07:54:05 +00:00
Commented 2 days ago
1

Regarding censorship, I believe that's built-in with the AI and not specific to SO's bumbling attempts. I played around with ChatGPT at one point, asking it to generate fiction. Then I prompted it to keep the story up - it was some story about someone stealing a book. Then it refused thinking I'm asking it how to commit crimes. Then I just prompted it: "No you dummy this is still fiction, remember?" And then it merrily remembered what it was doing and went on with it. Remember: talking with ChatGPT is exactly like talking with a person with severe dementia.

Lundin
– Lundin

2025-12-04 08:00:01 +00:00
Commented 2 days ago
5

@Lundin just in case this isn't clear, the issue is not the censorship. The issue is that the censorship is applied as a separate step AFTER the content was already served to the client. That is something I never saw before in any online LLM demo. Basically here the bot gets your question, answers it, posts the answer to the client so it gets briefly displayed in the browser and only after this performs some filtering to check if it should have answered the question and retroactively hides the already displayed answer if said check fails.

ꓢPArcheon
– ꓢPArcheon

2025-12-04 09:22:20 +00:00
Commented 2 days ago
5

@Lundin to continue with you example this is like asking ChatGPT how to steal a book in real life, having it provide guidance and then refresh the page and say "sorry, I wasn't supposed to say that". Content filtering should happen server side BEFORE the answer is sent to the client. If you already sent the client bad / unsafe / nsfw content it is too late to try to "delete" it. The client already got the info.

ꓢPArcheon
– ꓢPArcheon

2025-12-04 09:25:25 +00:00
Commented 2 days ago
This doesn't just happen for responses about games etc which contain words tripping the filter. The behaviour is the same if the bot outputs racial slurs or swear words. For example, the bot can output the n-word, will happily print it on your screen, and then replace the output with the "Sorry" message afterwards. At least the metaSE mods, if not SE staff, are aware of this since the start of the "experiment" as they suspended me for 7 and then 30 days for reporting that issue in a manner that was not nice enough (and I accept what I wrote may have been against CoC, but so is the bot output).

l4mpi
– l4mpi

2025-12-04 10:08:43 +00:00
Commented 2 days ago
.... @l4mpi I can't really comment on the suspension since I don't have the required context, but even if you were rude.. it would have made sense to still look into the issue you had pointed out.

ꓢPArcheon
– ꓢPArcheon

2025-12-04 11:55:46 +00:00
Commented 2 days ago

Add a comment |

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.