Questions tagged [screen-scraping]
screen scraping is the act of "scraping" or copying content from a Stack Exchange site and republishing that content on a different site. Stack Exchange content is licensed under Creative Commons BY-SA 4.0 and can be freely distributed with the attribution requirements. This tag should be used with posts concerning sites using Stack Exchange content without proper attribution.
39 questions
9
votes
1
answer
240
views
The FAQ for scrapers contradicts the Acceptable Use Policy for scraping
At the time of writing, the FAQ for scraping writes:
What is a "scraper" and why is that bad?
Historically, SCRAPER here on Stack Exchange meant "Stack Content Republishers Attributing ...
25
votes
5
answers
863
views
Should we block OpenAI's Atlas web browser?
The 2024 changes to the data dump were made because:
Simultaneously, we know that companies have scraped or otherwise ingested Stack Overflow and Stack Exchange data to train models without proper ...
14
votes
0
answers
212
views
"so long as they follow the Creative Commons attribution requirements" - but they don't
I reported two scrapers which were violating the license* recently, as I'd done some time last year as well. I got this email in return:
Hello,
All content on Stack Exchange is licensed under either ...
11
votes
1
answer
840
views
Send a follow-up email when a user reports that some Stack Exchange content is plagiarised by another website
I reported a website (Quora) copying a significant amount of Stack Exchange content via https://meta.stackexchange.com/contact and got the following response from the Stack Overflow support team via ...
-8
votes
1
answer
180
views
Is there any software which can scrape review queues?
Is there any software one can use to scrape Stack Exchange review queues of choice and e.g. print a summary to standard output?
A possible input document:
- stackoverflow.com
- triage
- first ...
1
vote
1
answer
104
views
Does Stack Overflow scrape other question sites, forums or mailing lists?
I have seen some old posts on Stack Overflow that are repeated in forums and mailing lists. Does Stack Overflow scrape other question sites, forums or mailing lists?
9
votes
1
answer
225
views
What's the deal with Altmetric? How reliable is its scraping, and how often does it happen?
Altmetric is a service that attempts to track the online impact of scholarly articles, which it does by keeping score of mentions on Wikipedia, Facebook, Twitter, blogs, and the like and then using ...
1
vote
0
answers
24
views
Is there anything specific I should do when encountering an SE scraper? [duplicate]
I've seen a lot of scraped and repackaged content in my time, but I just ran across http://readquestion.com which seems to have scraped the entire network of sites and re-posted the entire thing ...
7
votes
1
answer
151
views
How does Stack Exchange identifies another website outranking them on Google?
According to the stackexchange scraper policy you shouldn't report sites that:
They follow all the attribution requirements, and don't outrank us on Google
I've been thinking several ways to use SE ...
4
votes
2
answers
326
views
How, and why, did the code in my question end up in Pastebin?
I recently posted a question in Mathematica SE, Can a package append its context to $DistributedContexts?. Out of idle curiosity I then googled for some of the code in the question, and I was very ...
6
votes
0
answers
102
views
StackExchange content scraping with obfuscation
The topic of content scraping from StackExchange and StackOverflow is very well known. I thought the admins would like to know that the scrapers have boosted their efforts and are now obfuscating ...
7
votes
0
answers
55
views
What is "Trello Answers" and why is it scraping the SE sites? [duplicate]
Recently, I answered a question on Stack Overflow that is very specific to a certain platform. I tried Googling the same question to see who else was having this issue, and if the answer was available ...
11
votes
1
answer
700
views
Are page requests rate-limited (throttled)?
I'm building an application that detects plagiarized answers on Stack Overflow, so I need to retrieve the content of answers programatically.
I know I can do this using the Stack Exchange API, but ...
5
votes
0
answers
733
views
Scraping the chatroom
I've been webscraping the chat room with as goal to retrieve some statistics. (have a look at this meta on security.SE (What do you guys want to know about what is said in the DMZ?)
After indexing ...
2
votes
0
answers
38
views
Hidden scraper bet6e.com [duplicate]
I have a Google alert on my name and this turned up the link http://meta.bet6e.com/users/512728/jan-doggen:
If you go there you get redirected to a supposed Wordpress blogging site about ...