this post was submitted on 27 Oct 2023
4 points (100.0% liked)

LibreOffice

460 readers
4 users here now

LibreOffice is a powerful free and open source office suite, used by millions around the world.

founded 2 years ago
MODERATORS
 

I have a csv file of orders from a business I run the web shop for, around 30 thousand entries total. I would like to FIND all entries which have a duplicate in the base: same customer, same total due amount and placed on the same day (some customers make repeating orders but over longer timeframe).

I found a help article regarding removing duplicate values, but it doesn't really apply to my situation, because I want to remove UNIQUE values.

How would I go about that?

top 5 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 year ago

Add a helper column concatenating cell contents (in the following example of column A and B, multiple cells or ranges are possible, see TEXTJOIN() online help) with a delimiter that does not occur in data, here the | vertical line or pipe symbol:

C1: =TEXTJOIN("|";0;A1:B1)

Copy-paste or pull down formula C1 to the desired range of rows.

For each row count the number of occurrences, assuming data in rows 1:99 here:

D1: =COUNTIF(C$1:C$99;C1)

Copy-paste or pull down formula D1 to the desired range of rows.

Filter on column D >1, or sort data in columns A:D on column D descending and at the bottom delete rows with value 1 in column D.

[–] [email protected] 1 points 1 year ago (1 children)

Maybe you could concatenate every relevant field/criteria with “ - “ between each element into a new column . then add a filter on the new column for duplicates?

I do this quite often

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

what if I sorted the orders by names and then for each one check if the one above and the one below it have the same name, date and amount due using 3 columns of IFs, and then filter out those which meet all three of these criteria by multiplying the outputs of IFs in another column? that should work I think? the only problem is last step filtering may fuck up the existing IF functions

[–] [email protected] 1 points 1 year ago (1 children)

That would work for a 1 dimensional table. If you have many columns, you either mess up the following columns or you get back to the beginning. With lists that works wonderfully

[–] [email protected] 2 points 1 year ago

It seems to have worked