25
votes
Accepted
Transcode UCS-4BE to UTF-8
Efficient file I/O
By default, files opened with fopen() are buffered, meaning that not every call to fread() or ...
24
votes
Transcode UCS-4BE to UTF-8
log is already declared in <math.h>. You don't need to declare it yourself. In fact, it could be harmful.
As stated in ...
13
votes
Transcode UCS-4BE to UTF-8
This program reads 4 byte codepoints (in BIG ENDIAN) from a file strictly called "input.data" and creates another file called "ENCODED.data" with the relative encoding in UTF8.
Needless to say, ...
12
votes
Accepted
Printing Command Line Unicode Chess Board
numpy
In this case there is no need to use. For an 8 by 8 board, filled with strings, there is no advantage to using it, apart from the possibility to index row and column at the same time
enums
You ...
11
votes
Accepted
Subclassed Python Counter for a more visually user-friendly __str__() method, for anagrams
1. Review
There is no docstring. What kind of object is a LetterCounter?
The Python style guide recommends restricting lines to 79 characters. If you did this, then ...
10
votes
Accepted
A String View Library in C
Only include the header files that are required:
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
It is useless to include both ...
9
votes
Transcode UCS-4BE to UTF-8
regarding:
ptr = fopen("input.data", "rb");
out = fopen("ENCODED.data", "wb");
always check (!=NULL) the returned value to assure the operation was successful....
9
votes
Transcode UCS-4BE to UTF-8
Code fails to detect invalid code points
There are 1,112,064 valid unicode code points, not 232.
The valid range is [0x0 - 0x10FFFF] except the sub-range of [0xD800 - 0xDFFF]. This later sub-range ...
8
votes
Pretty printing of the numpy ndarrays
If A.ndim is not in 1, 2, 3, your code tries to return a non-existing string s. It would be ...
8
votes
Transcode UCS-4BE to UTF-8
As others have said, don't use floating point math, but in some sense that's reviewing the wrong layer. The real issue behind that is that you don't need to be branching on a derived quantity, the ...
8
votes
UTF-8 to UTF-16 (char8_t string to char16_t string)
Good job on general organization: use of namespace and nested detail namespace, use of different sized char types, marking things with noexcept, etc.
Error: Why do you return ...
7
votes
Printing Command Line Unicode Chess Board
Piece dictionary
chrs is a very generic name. Since this is all about chess, you should be more specific by calling it something like ...
7
votes
Accepted
Universal string conversion
First, you should not be using Python 2 anymore, if at all possible. It will reach end of support at the end of the year.
In Python 3 your code would just be:
...
7
votes
Accepted
Convert UTF8 string to UTF32 string in C
There are two main things to talk about: checking the input, and buffer handling (your malloc question).
It's a very bad idea to do things like ...
7
votes
JavaScript string to Unicode (Hex)
This is regarding the edge-cases and test cases mentioned in the question:
[...characters] // or Array.from(characters)
handles splitting the characters of string ...
7
votes
Client server communications through unix signals in C
Why Unicode works
The reason why unicode works is that UTF-8 encoding uses one or more chars per unicode character. Thus, UTF-8 encoded strings fit into a C string, ...
6
votes
Convert UTF8 string to UTF32 string in C
Magic numbers
The implementation uses them a lot. While the bit-notation helps with indicating what is happening, it doesn't show the intention. What reads clearer:
...
6
votes
Accepted
Game of Life improvement
I'm unsure what you mean by saying "improve the user interface". AFAIK Game of life doesn't have much of a user interface as it's a simulation that only displays the grid and whatever happens on it.
...
6
votes
Printing Command Line Unicode Chess Board
Python-specific improvements:
creating bw_checkers (in get_checkers function). Instead of appending repeatedly to previously ...
6
votes
UTF-8 to UTF-16 (char8_t string to char16_t string)
Adding to JDługosz's comments:
Rename utf8_to_utf16() to convert()
Why? Because the function's argument types already restrict ...
6
votes
UTF-8 to UTF-16 (char8_t string to char16_t string)
Re-implementing this kind of low-level conversion seems a waste of effort when C++ goes to some length to provide these facilities for us:
...
5
votes
Accepted
Preliminary draft of a data structure for a goodies tracker
It's weird that your get_capacity() returns a string. I'm guessing that it's to facilitate string concatenation in the print() ...
5
votes
Accepted
UTF-8 to UTF-16 (char8_t string to char16_t string)
suggested architecture
① decode UTF-8 stream
Construct with a range of UTF-8 encoded bytes, present the object as an iterator that provides complete 32-bit characters. It has another member function ...
5
votes
Unicode transfer format conversions as range adapters
The input range mystery
I’ve been planning to write this review for a few days now. The thing that’s held me back is that there is a mystery I haven’t been able to solve: why you were unable to use <...
4
votes
Convert accented character to user name
PHP supports ICU via ext/intl and this includes transliteration.
...
4
votes
Convert UTF8 string to UTF32 string in C
Welcome back to C
Binary constants are not part of standard C - yet
// 0b10000000
0x80
Lack of error detection
i += 2;, <...
4
votes
4
votes
Accepted
Convert to UTF-8 all files in a directory
The code in the post uses the chardet library to determine the encoding of the file, but then the only use it makes of that information is to decide whether or not ...
4
votes
Filter out strings that are short or that are outside the alphabet of a language
It took me a while to decipher what you meant with 'less than n values'. What your function actually does: it filters out all words that are either shorter than the specified length (...
4
votes
Accepted
Normalize a GSM short message text
1. Review
Normally, a class is used to implement a collection of objects with similar behaviour, and an object is used to implement some kind of persistent thing or data structure. But what kind of ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
unicode × 78python × 19
strings × 19
c++ × 14
c × 11
performance × 9
utf-8 × 8
javascript × 7
regex × 7
i18n × 7
php × 6
parsing × 6
python-3.x × 5
java × 4
c# × 4
beginner × 3
c++11 × 3
console × 3
chess × 3
object-oriented × 2
ruby × 2
vba × 2
python-2.x × 2
unit-testing × 2
hash-map × 2