Linked – Using Near-Duplication to Dedupe Document Collections Can be Dangerous

ByMike McBride November 9, 2015November 8, 2015 Reading Time: 2 minutes

“The three major distinctions are:-Per Family (email + attachment) vs. Per Document
Deduplication is performed on the family level, while near-duplication is performed on the document level.
–Textual Analysis vs. File Analysis
Near-duplicate detection uses only the text AND white space to compare documents, but deduplication uses a set of criteria based on the actual metadata of the files.
–Duplicates vs. Similarities
Deduplication removes identical document families, while near-duplicate detection groups documents together by similarity.”

Deduplication is not the same as identifying near-duplicates. On the other hand, there are a lot of reasons to do both, so long as you understand the differences, and the different things you are trying to accomplish with each.

I’m a big fan of using near duplication technologies to cluster together similar content. Our brains simply function better if we can focus on one subject at a time, so document review done in this manner is more efficient, period.

Using Near-Duplication to Dedupe Document Collections Can be Dangerous

Follow these topics: Links, LitigationSupport

LitigationSupport | Tech

Magic Databases
ByMike McBride January 13, 2011January 13, 2011 Reading Time: 3 minutes

I enjoyed reading this case review from Josh Gilliland about a ruling in regards to production of data from a database. As it turns out, the information being asked for from the database, wasn’t actually tracked in the database. As Josh says: However, there is a bigger issue: A database contains only what data it…

Like this:
Like Loading...

Read More Magic Databases
LitigationSupport | Personal

LexMonitor
ByMike McBride June 21, 2008July 20, 2014 Reading Time: 1 minute

Looks like the fine folks who run the LexBlog service are also getting into the same Blog Aggregation service that the ABA Journal’s Blawg Directory has been in for a little while now, with the “soft” launch of LexMonitor today. I’ve been seeing some things on Kevin’s blog about the work being done on it,…

Like this:
Like Loading...

Read More LexMonitor
Links | Security

Linked: Most organizations that paid a ransom were hit with a second ransomware attack
ByMike McBride June 9, 2022September 3, 2023 Reading Time: 1 minute

Whatever you choose to do, though, the next step needs to be doing everything possible to make sure it doesn’t happen again instead of breathing a sigh of relief that you got your data back and continuing business as usual. That would seem to be the common mistake here.

Don’t make that mistake.

Security pros, where do you fall on the debate on paying or not paying, and does this report change your thinking?

Like this:
Like Loading...

Read More Linked: Most organizations that paid a ransom were hit with a second ransomware attack
Career | Links

Linked: As the 9-to-5 work day disappears, our lives are growing more out of sync
ByMike McBride December 2, 2019December 2, 2019 Reading Time: 3 minutes

This is probably not something we think about often, but they’re finding it in Australia, and even thinking it could be a problem. As our working lives become increasingly 24-7, our new research suggests there’s now an additional task to do in our families and friendships. We need to work harder than before to get…

Like this:
Like Loading...

Read More Linked: As the 9-to-5 work day disappears, our lives are growing more out of sync
Career | Links

Linked: The Term ‘Bullying’ Doesn’t Easily Fit the Workplace
ByMike McBride October 30, 2021October 30, 2021 Reading Time: 2 minutes

It’s true, what we define as bullying among school children with no option to simply leave school doesn’t really fit when talking about the workplace, though it is the height of privilege to not recognize that many low-paid workers don’t necessarily have that same level of freedom to do so.

But, as the quote points out, it doesn’t matter what we call it, unprofessional behavior that hurts coworkers and employees has no place in the workplace:

Like this:
Like Loading...

Read More Linked: The Term ‘Bullying’ Doesn’t Easily Fit the Workplace
Links

What I’m Sharing (weekly)
ByMike McBride April 14, 2019 Reading Time: 1 minute

Interview: Matthew Geaghan of Nuix on using total data intelligence for compliance and HR purposes Sensei Launches a New Blog: The Digital Forensics Dispatch Technology Doesn’t Change Who You Are… It Magnifies Who You Are “Technology makes people who are good at their jobs better. Technology only makes people who are bad at their jobs…

Like this:
Like Loading...

Read More What I’m Sharing (weekly)

Linked – Using Near-Duplication to Dedupe Document Collections Can be Dangerous

Like this:

Magic Databases

Like this:

LexMonitor

Like this:

Linked: Most organizations that paid a ransom were hit with a second ransomware attack

Like this:

Linked: As the 9-to-5 work day disappears, our lives are growing more out of sync

Like this:

Linked: The Term ‘Bullying’ Doesn’t Easily Fit the Workplace

Like this:

What I’m Sharing (weekly)

Like this:

Leave a ReplyCancel reply

Follow Me!

Top Posts

Like this:

Similar Posts

Like this:

Like this:

Like this:

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Follow Me!

Top Posts