All Questions
57 questions
3
votes
3
answers
130
views
Increase time efficiency when writing arrays to CSV file
I have the following code to amend two rows of "test_base.csv" with the entries of the arrays "a_temp" and "b_temp," saving the result into "result.csv." .csv ...
8
votes
5
answers
828
views
adding data to a CSV file for it to be read
I'm making a program that lets you enter a name and house that adds it to the CSV file for it to be read and print out "Tre is in house Dragon", etc. The code works; I'm just wondering if ...
4
votes
1
answer
284
views
Reorder the Columns in a CSV File in Descending Order
I wrote a script to reorder the columns in a CSV file in descending order and then write to another CSV file. My script needs to be able to handle several tens of millions of records, and I would like ...
3
votes
1
answer
102
views
Calculating the total daily amount of confirmed cases of Coronavirus
I'm writing a small program to plot new COVID-19 infections. As of right now, I have it so the program reads the given data file, pulls out the daily cases and dates for each country, and adds ...
3
votes
3
answers
2k
views
Efficiently convert 60 GB JSON file to a csv file
Description
Simply take a JSON file as input and convert the data in it into a CSV file. I won't describe the functionality in too much detail since I have reasonable docstrings for that. As you can ...
9
votes
3
answers
609
views
Performance - Read large amount of XMLs and load into single csv
I am dealing with a large amount of XML files which I obtained from here https://clinicaltrials.gov/ct2/resources/download#DownloadAllData. The download yields around 300.000 XML files of similar ...
10
votes
2
answers
2k
views
Are there ways to speed up this string fuzzy matching in Golang?
I have a piece of python code doing fuzzy matching which works very well and is pretty fast. For reference, it uses the following files:
https://raw.githubusercontent.com/datasets/s-and-p-500-...
4
votes
1
answer
2k
views
Reading multiple csv files in a single dataframe
I have a lot of compressed csv files in a directory. I want to read all those files in a single dataframe. This is what I have done till now:
...
2
votes
1
answer
1k
views
Multithreading to process requests and save results in python
I was presented with a task to come up with a script that generates a CSV with POSTAL codes via bruteforce (I'm in Brazil, so that means CEP to us).
Points to note:
I'm using an external library, but ...
2
votes
1
answer
55
views
Batch retrieve formatted address along with geometry (lat/long) and output to csv
I have a csv file with 3 fields, two of which are of my interest, Merchant_Name and City.
My goal was to output multiple csv ...
2
votes
1
answer
91
views
Grouping sales transactions by person
I have a csv file with sales transactions. Each transaction includes person identifiers (which are sometimes/often missing) and transaction data. Person identifiers are fname, lname, phone, email and ...
3
votes
1
answer
115
views
Parse data into CSV prior to a bulk transaction into Neo4j
I'm trying to parse data into a CSV prior to a bulk transaction into Neo4j. I'm using vast amounts of data in the relationships and wondered if anyone could help in speeding up the transactions below....
1
vote
1
answer
83
views
Taking text from a file and formatting it
My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.
<...
4
votes
2
answers
468
views
Parsing contents of a large zip file into a html parser into a .csv file
I have some zip files somewhere in the order of 2GB+ containing only html files. Each zip contains about 170,000 html files each.
My code reads the file without extracting them,
Passes the ...
8
votes
2
answers
10k
views
String Similarity using fuzzywuzzy on big data
I have a file in which I was to check the string similarity within the names in a particular column. I use fuzzywuzzy token sort ratio algorithm as it is required for my use case. here is the code, is ...