this post was submitted on 24 Feb 2025
412 points (98.6% liked)

Technology

63455 readers
7508 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 10 points 3 days ago* (last edited 3 days ago) (1 children)

How do devs make this mistake

it can happen many different ways if you're not explicitly watching out for these types of things

example let's say you have a csv file with a bunch of names

id, last_name
1, schaffer
2, thornton
3, NULL
4, smith
5, "NULL"

if you use the following to import into postgres

COPY user_data (id, last_name)
FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true);

number 5 will be imported as a string "NULL" but number 3 will be imported as a NULL value. of course, this is why you sanitize the data (GIGO) but I can imagine this happening countless times at companies all over the country

there are easy fixes if you're paying attention

COPY user_data (id, last_name)
FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true, NULL '');

sets the empty string to NULL value.


example with js

fetch('/api/user/1')
  .then(response => response.json())
  .then(data => {
    if (data.lastName == "null") {
      console.log("No last name found");
    } else {
      console.log("Last name is:", data.lastName);
    }
  });

if data is

data = {
  id: 5,
  lastName: "null"
};

then the if statement will trigger- as if there was no last name. that's why you gotta know the language you're using and the potential pitfalls

now you may ask -- why not just do

if (data.lastName === null)

instead? But what if the system you're working on uses JSON.parse(data) and that auto-converts everything to a string? it's a very natural move to check for the string "null"

obviously if you're paying attention and understand the pitfalls of certain languages (like javascript's type coercion and the particularities of JSON.parse()) it becomes easy but it's something that is honestly very easy to overlook

[–] [email protected] 3 points 3 days ago (1 children)

Like you said, GIGO, but I can't say I'm familiar with any csv looking like that. Maybe I'm living a lucky life, but true null would generally be an empty string, which of course would still be less than ideal. From a general csv perspective, NULL without quotes is still a string.

If "NULL" string, then lord help us, but I would be inclined to handle it as defined unless instructed otherwise. I guess it's up to the dev to point it out and not everyone cares enough to do so. My point is these things should be caught early.

I'll admit I'm much more versed in mysql than postgres.

[–] [email protected] 2 points 3 days ago

really it's a cautionary tale about the intersections of different technologies. for example, csv going into a sql database and then querying that database from another language (whether it's JS or C# or whatever)

when i was 16 and in driver's ed, I remember the day where the instructor told us that we were going to go drive on the highway. I told him I was worried because the highway sounds scary- everybody is going so fast. he told me something that for some weird reason stuck with me: the highway is one of the safest places to be because everybody is going straight in the same direction.

the most dangerous places to be, and the data backs this up, are actually intersections. the points where different roads converge. why? well, it's pretty intuitive. it's where you have a lot of cars in close proximity. the more cars in a specific square footage the higher probability of a car hitting another car.

that logic follows with software too. in a lot of ways devs are traffic engineers controlling the flow of data. that's why, like you said, it's up to the devs to catch these things early. intersections are the points where different technologies meet and all data flows through these technologies. it's important to be extra careful at these points. like in the example i gave above..

the difference between

WITH (FORMAT csv, HEADER true);

and

WITH (FORMAT csv, HEADER true, NULL '');

could be the difference between one guy living a normal life and another guy receiving thousands of speeding tickets https://www.wired.com/story/null-license-plate-landed-one-hacker-ticket-hell/