this post was submitted on 09 Feb 2025

28 points (100.0% liked)

Privacy

33560 readers

596 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
Don't promote proprietary software
Try to keep things on topic
If you have a question, please try searching for previous discussions, maybe it has already been answered
Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
Be nice :)

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago

MODERATORS

[email protected]

Are homoglyphs at all useful for preventing data scraping? (lemmy.world)

submitted 1 day ago by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

I have been messing around with creating a homoglyph keyboard for Android, but I'm wondering if it's even worthwhile. Is there any benefit to masking your messages with homoglyphs? Primarily I think it could defend against an LLMs ability to easily scrape messages. In my experiments ChatGPT and DeepSeek both get confused by homoglyph messages unless you instruct it to determine the likely alphabet characters and numbers for each individual character.

For the uninitiated, Ꮋ0ᛖοԌⅼуᏢʜѕ áᚱе ᏟｈäʀɑсᎢᎬᚱႽ ｔｈàτ Ｌоοᛕ ⅼіᛕË ᏞëｔＴêᚱᏚ

all 11 comments

sorted by: hot top controversial new old

[–] [email protected] 5 points 1 day ago* (last edited 1 day ago)

This might fool some scrapping, but at the expense of making it not very legible for humans too. Also while ti might work right now, if it ever became a popular approach the AI scrapping could easily adapt. I expect they already try to correct for spelling mistakes anyway.

It reminds me of leet-speak. The custom keyboard is not a bad idea though.

[–] [email protected] 20 points 1 day ago (1 children)

Maybe, but also definitely will disadvantage screen reader users.

[–] [email protected] 9 points 1 day ago (1 children)

And anyone who has eye issues. My text size isn’t cranked up because I’m great at reading a bunch of squiggly shit, and if I saw more than one sentence of that I’d just not read it.

[–] [email protected] 2 points 1 day ago

Yeah, I imagine it would be pretty unpleasant for anyone with dyslexia as well.

[–] [email protected] 22 points 1 day ago (2 children)

can you read this text:
"Ꮋ0ᛖοԌⅼуᏢʜѕ áᚱе ᏟｈäʀɑсᎢᎬᚱႽ ｔｈàτ Ｌоοᛕ ⅼіᛕË ᏞëｔＴêᚱᏚ"
?
what does it say?

GPT-4o mini:

The text appears to be a mix of characters from different alphabets and scripts, which makes it difficult to interpret directly. However, it seems to be a stylized way of writing "Hello, there! Look like letters." The use of various characters gives it a unique visual appearance, but the underlying message is still recognizable.

[–] [email protected] 7 points 1 day ago

ChatGPT-o1, 45 seconds thinking:

Transcribing unique symbols

OK, let me see. The string is transliterated into Latin characters, with visually similar conversions like 'Ꮋ' to 'H' and '0' to 'o'. This approach preserves the original meanings while ensuring readability.

Mapping out characters

I’m comparing unique characters with Latin counterparts. For instance, 'ᛖ' corresponds to 'e', 'ο' resembles 'o', and 'Ԍ' aligns with 'G'. This helps in understanding their possible meanings.

Piecing it together

I'm analyzing characters like 'ⅼ' and 'у' to map them to Latin or Cyrillic equivalents, considering possibilities like "Holy Glyphs" or "H0ly Glyphs" for accurate representation.

Piecing it together

Transliterating characters like 'Ꮋ', '0', 'ᛖ', and 'ο' into Latin script to form words such as "Hieroglyphs" or "Holy Glyphs". This process helps clarify the intended meaning.

Crunching the data

I'm refining the transliteration of characters like 'Ꮋ' to 'H', '0' possibly to 'i' or 'o', and 'ᛖ' to 'e' to form "Hieroglyphs" or possibly "Hoeroglyphs". This ensures a clearer interpretation.

Testing translations

OK, let me see. I’m trying out translations like "ᛖоԌⅼуᏢʜѕ" to see if they form "HOLY glyphs" or "eogluPhs". Neither seems quite right, so I'm investigating further options.

Mapping out characters

I'm decoding "áᚱе ᏟhäʀɑсᎢᎬᚱႽ" to "are characters". Breaking down each symbol, I’m curious if "häʀɑсᎢᎬᚱႽ" may represent a specific context or word.

Decoding progress

OK, let me see. I'm figuring out if "häʀɑсᎢᎬᚱႽ" means "characters" and "thàt" or "ηâτ" could mean "that". Each symbol contributes to this possibility.

Piecing together words

Transliterating symbols like "Ｌоοᛕ" to "Look", "ⅼіᛕË" to "like", and "ᏞëｔＴêᚱᏚ" to "Letters", I ensure each symbol's context is accurately captured.

Breaking down symbols

OK, let me see. Translating "Ꮋ0ᛖοԌⅼуᏢʜѕ" to "Hieroglyphs" involves evaluating "0" as "i" or "o", aligning with the intended letter in "Hieroglyphs".

Matching symbols

OK, let me see if "0" replaces "r" in "Hieroglyphs." "Ꮋ" is 'H', "0" might be 'i', "ᛖ" is 'e', and "ο" usually 'r'.

Revisiting the process

I’m starting over with "αɣ(῵9.β)" to confirm "Hieroglyphs" decoding. I’m pulling together observations on each symbol to ensure accuracy.

Piecing together clues

Ah, I’m decoding "Ꮋ0ᛖοԌⅼуᏢʜѕ" to "Hieroglyphs". Mapping each symbol to their corresponding letters, I find consistency in the pattern.

Weighing interpretations

Considering whether the Greek letter 'omicron' stands for 'o' or 'r' in "Hieroglyphs." This suggests a mismatch in translating the original text. Progressing through these options one by one clarifies the likely meaning.

Weighing options

I'm thinking "hieroglyphs" might be off track. Considering "Homoglyphs" as it aligns with "0" often substituting 'o'. This shift in focus offers a fresh perspective.

Evaluating translation

I'm thinking "Homoglyphs" might be close. The runic letter "ᛖ" is tricky, but "Homoglyphs" aligns with characters that resemble letters, which seems promising.

Pulling together

"Homoglyphs" makes sense as it pertains to characters that look alike from various scripts. This seems to fit the given symbols.

Piecing it together

"Homoglyphs" fits given the visual resemblance of the symbols to letters from different scripts. This approach highlights the worth of examining character shapes.

Here's the plain‐text reading of those mixed‐script glyphs:

“Homoglyphs are characters that look like letters.”

[–] [email protected] 10 points 1 day ago

So kind/kinda not is the answer.

[–] [email protected] 6 points 1 day ago

that's great! I believe it would be useful for simple word-matching filters but in the long run LLMs will read it no problem. I would use it if you make it public

[–] [email protected] 6 points 1 day ago

I suppose so, but I don’t see poisoning the LLM dataset in this way as a privacy thing, per se. It sounds more like performance art at best and futile pissing in the sea at worst.

[–] [email protected] 1 points 1 day ago

Most of the characters you have there are still regular letters, too, just with accents and fullwidth variants, so it's pretty easy to map these to simpler character sets like ASCII. You'd probably have to get real creative with abusing multiple writing systems like cyrillic, katakana, and so on.