I'm trying to parse a large CSV file (this one, to be exact) into a Map
from numbers to objects. Because the file is big and may take a while to download, the code parses it while it's still downloading, in order to prevent doing all the work at once, after the download finishes. Here's the code:
const unicodeDataReader = (await fetch("data/ucd/UnicodeData.txt")).body.getReader();
const decoder = new TextDecoder();
let chunk, done;
let codePoint, column = 0, fieldBytes = [], codePointObj = {};
while ({ value: chunk, done } = await unicodeDataReader.read(), !done) {
for (const byte of chunk) {
if (byte === 0x3B) { // ;
const a = new Uint8Array(fieldBytes);
const field = decoder.decode(a);
switch (column) {
case 0:
codePoint = Number.parseInt(field, 16);
break;
case 1:
codePointObj.name = field;
break;
case 2:
codePointObj.generalCategory = field;
break;
}
fieldBytes.length = 0;
column++;
} else if (byte === 0x0A) { //
ucd.codePoints.set(codePoint, codePointObj);
fieldBytes.length = 0;
column = 0;
codePointObj = {};
} else {
fieldBytes.push(byte);
}
}
}
However, this code performs very poorly (even when downloading from localhost
) and I don't know why. Chrome DevTools says that the lines that take the most time to execute are:
const a = new Uint8Array(fieldBytes);
const field = decoder.decode(a);
The most weird thing about this is that this similar approach seems to work much better, but it may not work if a character is split between two chunks. (This doesn't happen with this file, because there are only ASCII characters, but I'm planning on adapting this code for other similar files.)
const unicodeDataReader = (await fetch("data/ucd/UnicodeData.txt")).body.getReader();
const decoder = new TextDecoder();
let chunk, done;
let codePoint, column = 0, field = "", codePointObj = {};
while ({ value: chunk, done } = await unicodeDataReader.read(), !done) {
for (const char of decoder.decode(chunk)) {
if (char === ";") { // ;
switch (column) {
case 0:
codePoint = Number.parseInt(field, 16);
break;
case 1:
codePointObj.name = field;
break;
case 2:
codePointObj.generalCategory = field;
break;
}
field = "";
column++;
} else if (char === "
") { //
ucd.codePoints.set(codePoint, codePointObj);
field = "";
column = 0;
codePointObj = {};
} else {
field += char;
}
}
}
I thought the problem was that decoder.decode()
was being called too much, or that maybe creating an Uint8Array
was slow, however that doesn't seem to be the issue, as this code runs very fast:
const td = new TextDecoder();
const decoded= [];
for (let i=0;i<10000;i++) {
// Generate a Uint8Array with 100 random bytes
const a = new Uint8Array(function*(){for(let i=0;i<100;i++)yield Math.floor(256*Math.random())}());
const b = [];
for (const byte of a)
b.push(byte);
const c = new Uint8Array(b);
decoded.push(td.decode(c));
}
How can I improve the performance of my code?
P.S.: I don't have network throttling enabled. The code is slow and the main thread freezes for seconds.