Skip to main content

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange

Loading…

current community
- Cross Validated
  
  help chat
- Cross Validated Meta
your communities

Sign up or log in to customize your list.

more stack exchange communities
company blog
Log in
Sign up

1. Home
2. Questions
3. Unanswered
4. AI Assist
5. Tags
7. Chat
8. Users
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Stack Internal

Knowledge at work

Bring the best of human thought and AI automation together at your work.

Explore Stack Internal

Questions tagged [text-mining]

Ask Question

Refers to a subset of data mining concerned with extracting information from data in the form of text by recognizing patterns. The goal of text mining is often to classify a given document into one of a number of categories in an automatic way, and to improve this performance dynamically, making it an example of machine learning. One example of this type of text mining are spam filters used for email.

Learn more…
Top users
Synonyms

636 questions

Newest Active Bountied Unanswered

Bountied 0
Unanswered
Frequent
Score
Trending
Week
Month
Unanswered (my tags)

Filter by

No answers

No upvoted or accepted answers

Has bounty

Days old

Sorted by

Newest

Recent activity

Highest score

Most frequent

Bounty ending soon

Trending

Most activity

Tagged with

My watched tags

The following tags:

1 vote

0 answers

59 views

Statistical test for author determination in corpus

How do I determine authorship in a corpus written by anonymous authors? Edit: Determining the number of distinct authors of a corpus of letters and whether it is statistically significant from random.

statistical-significance
text-mining
language-models

guest01

11

asked Jan 13 at 16:37

2 votes

0 answers

41 views

Unsupervised clustering of short texts with covariates

I posted this on the Data Science Stack Exchange and didn’t get any responses (that sight seems pretty dead). So I’m trying here! I'm working on a project where I have to categorise short texts. I don'...

clustering
natural-language
unsupervised-learning
text-mining
topic-models

James

45

asked Jul 30, 2024 at 8:23

0 votes

1 answer

688 views

Using Word Embeddings in Clustering and Topic Modelling

I am new to the field of NLP and would appreciate any guidance please. I am trying to understand how word embeddings can be used in clustering and topic modelling. If I create word embeddings for ...

clustering
natural-language
text-mining
word-embeddings
topic-models

osckt

31

asked Jul 6, 2023 at 14:19

0 votes

0 answers

96 views

Adjusted TF-IDF where many terms appear in every document

Struggling with something so hoped the brilliant minds of the internet could help me out. I have a large dataset of job postings from which I have extracted the skill demand (no. of times a skill is ...

r
clustering
natural-language
text-mining
tf-idf

Dandae

3

asked Mar 5, 2023 at 18:09

1 vote

0 answers

96 views

Similiarity between two corpus of text

I have two separated corpus of text, and i would like to understand wheter these are similiar or not using cosine similarity. I'm not sure on how to approach this problem, but i was thinking as a ...

r
data-mining
text-mining
cosine-similarity

user373562

33

asked Nov 22, 2022 at 10:59

1 vote

1 answer

542 views

How to avoid underflow of the probability of sentence in calculating the perplexity of corpus

I am looking at this post How to find the perplexity of a corpus. I understand the whole post, but the probability of a sentence appear in a corpus, in a unigram model, is given by p(s)=∏ni=1p(wi), ...

natural-language
text-mining
perplexity

Qqqq

13

asked Sep 19, 2022 at 1:24

1 vote

0 answers

34 views

processing natural language that descripe time frequency with R

I'm dealing with data that descripe onset frequency of a symptom. The text in each cell was not in the same format. For example: ...

r
natural-language
text-mining
frequency

Ian Wang

93

asked Sep 10, 2022 at 8:39

1 vote

0 answers

61 views

Does average or max pooling actually summarise the sentence?

I am working on an multi-label text classification problem at work and adapted model architecture from this notebook of Toxic Comment Classification challenge on Kaggle. I have trained the model, a ...

neural-networks
natural-language
text-mining
pooling

Naveen Reddy Marthala

298

asked Jun 13, 2022 at 10:01

2 votes

1 answer

108 views

Text similarity for badly written text

Consider the following scenario: Suppose two lists of words $L_{1}$ and $L_{2}$ are given. $L_{1}$ contains just bad-written phrases (like 'age' instead of '4ge' or 'blwe' instead of 'blue' etc.). On ...

natural-language
text-mining
word-embeddings
word2vec
embeddings

Ramiro Hum-Sah

367

asked May 19, 2022 at 0:47

1 vote

1 answer

87 views

How to extract numerical features that can separate well documents belonging to two different classes?

I have a group of texts belonging to two different classes. I would like to extract numerical features that can separate well the two classes. Right now I implemented a classic TF-IDF with a document ...

machine-learning
natural-language
text-mining
tf-idf

inginging

21

asked Feb 9, 2022 at 22:43

2 votes

0 answers

59 views

ML generated word choice to create distinct "speakers" [closed]

How hard a project would it be to use ML to assist a single author/script writer in writing dialog where each "speaker" sounds like a distinct person? Is that something that a professional ...

machine-learning
text-mining
multi-class
text-generation

BCS

131

asked Jan 22, 2022 at 5:07

1 vote

3 answers

545 views

How to improve language model ex: BERT on unseen text in training?

so I am using pre-trained language model for binary classification. I fine-tune the model by training on data my downstream task. The results are good almost 98% F-measure. However, when I remove a ...

classification
natural-language
text-mining
language-models

Injy Sarhan

11

asked Dec 22, 2021 at 12:56

1 vote

0 answers

87 views

How to statistically compare the frequencies of two different words in a single corpus

Suppose I have a large corpus of text data and I would like to compare the frequencies of words $w_1$ and $w_2$. How would I go about testing whether or not their respective frequencies, $f_1$ and $...

hypothesis-testing
statistical-significance
natural-language
proportion
text-mining

Joshua

11

asked Dec 10, 2021 at 2:11

0 votes

1 answer

132 views

Search, rank and recommend in large text datasets

Imagine you are Spotify and you have billions of songs. Assume that each of these songs are transcribed into text. How do you design your search and recommendation pipeline such that when somebody ...

machine-learning
text-mining
ranking
recommender-system
cosine-similarity

mhsnk

317

asked Dec 2, 2021 at 22:35

0 votes

1 answer

80 views

How to extract FSAs from postal codes when there is no match?

I would like to extract Canadian FSAs from unstrucured data. I want to pull only the first instance of each match. The problem: Some data don't include postal code and my function won't produce the ...

r
text-mining
information-extraction

sometimes_r

3

asked Oct 23, 2021 at 15:57

15 30 50 per page

1

2 3 4 5

…

Featured on Meta
AI Assist is now available on Stack Overflow
Native Ads coming soon to Stack Overflow and Stack Exchange

Hot Network Questions

Does tubeless sealant work in very wet conditions?
What is the interface for this old monochrome LCD screen?
How to sort mathematics symbols?
Would China-Japan flights cancellation affect flights through Hong Kong?
Is the Catholic Church the largest landowner in India after the Government?
Can a Hovering creature gain a defensive bonus from being Prone?
A short story by Asimov with some Fantasy elements
PSE Advent Calendar 2025 (Day 9): Christmas Kindling, Brightly Shining
60s Short story about a deadly amusement park for children
Does a Guardian laser do more damage on a direct hit?
How can I know if I should use radian mode or degree mode on a calculator?
How should we understand the last guidance to us by Jesus regarding commandments?
Usage of より in this sentence?
Cauchy completeness in ordered families of equivalences, constructively
Are there any obvious holes in my homebrew rule for identifying curses in magic items?
SPI bus layout with 8 slaves
Why would STM32 have a diode in series on the NRST line if it's supposed to get signals from STLINK?
What is the "rather tricky mathematics" Feynman uses to determine the average velocity of an atom?
Vintage sci-fi book with human descendants
Past event horizon of a Rindler observer and causal accessibility
What does a verbal recommendation mean for postdoc and faculty applications?
How does one declare gold bullion when leaving the USA at LAX airport?
Why would an Airbus A350-1000 park at a different angle?
Is there a distance at which the light of a star misses the observer

more hot questions

Newest text-mining questions feed

Subscribe to RSS

Newest text-mining questions feed

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Cross Validated

Tour
Help
Chat
Contact
Feedback

Company

Stack Overflow
Stack Internal
Stack Data Licensing
Stack Ads
About
Press
Legal
Privacy Policy
Terms of Service
Cookie Policy

Stack Exchange Network

Technology
Culture & recreation
Life & arts
Science
Professional
Business
API
Data

Blog
Facebook
Twitter
LinkedIn
Instagram

Site design / logo © 2025 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2025.12.10.37894