How to remove duplicate lines in vim

H
Well, is opening, sorting and managing a large file too much to ask of a lightweight text editor like Vim? Definitely not! If you have a file with a lot of lines that you want to sort and remove duplicate lines, Vim is the best editor to do the job.

In this guide, I will explore how to remove duplicate lines from a file with and without sorting using different methods in the Vim editor.

Use Vim to remove duplicate lines

There can be different approaches to performing the same task in Vim, for example, tasks like finding and removing duplicate lines from a file also have different approaches. Let us learn about these methods.

For this tutorial, I have created a CSV file containing the contents shown below:

Remove Duplicate Lines in Vim Using Sort Command

The fastest way to remove duplicate lines from a file in Vim is to use sort order with You simultaneous option.

The sort command sorts the file while You The option is for UNIQUE which keeps only the first row if there are multiple duplicates.

As can be seen, 2 duplicate lines have been removed, but the entries of the file have also been changed due to the default sort operation. Obviously, this is not the result you want unless sorting is acceptable.

delimitation

However this method has a limitation. Although it is removing duplicate rows from the file, it is also sorting entries that may be unnecessary or unwanted. There are alternative methods to avoid sorting and remove only duplicate rows.

Need more help regarding the sort command in Vim, use the help command:

Remove duplicate lines in Vim using uniq command

Duplicate lines can also be removed using Unique Command that filters repeated rows. Removing adjacent repeated lines can be a difficult task and I would advise you to avoid this method unless you need to remove adjacent repeated lines.

Uniq command will not delete anything from the following file:

Since repeated entries are not adjacent, look at rows 2 and 5 or 3 and 7. The UNIQUE command will delete the second and subsequent lines of adjacent identical entries.

Let's modify the file:

Now, go into visual line mode by pressing shift+v And selecting all rows.

Now, use the !uniq command and press Return:

As can be seen the repeated lines have been removed from the file, press :w The command to save the file.

Remove Duplicate Lines in Vim Using Awk Command

Use this if you want to remove duplicate rows without sorting the file entries Strange Order as follows:

Strange '!I[$0], <file name,

In the above order:

, denier
$0 Points to the current line. Similarly, $1 means the first argument separated by spaces, and $2 means the second.
I Variable name of an associative array, it can be any name, a, b, x, or y.
, increment operator

Duplicate lines have been removed, the file is in the same form. Now, let us understand how this one-liner is working to give us such an output.

In the above command awk is processing the file in the following way:

For Everyone Line of file

If Line Is not processed before

then print it

store Line He Is processed first

The correct pattern in the awk command will perform the default print operation. Our objective is to print unique rows and eliminate duplicates.

To understand the concept of this awk one-liner, you have to understand associative arrays. An associative array is an array that holds a set of key-value pairs, where the values ​​are accessed using unique string keys rather than integers.

Please note that the key value will always be unique, it cannot be duplicated.

Let us understand how the above awk command is working.

First, put aside the increment (++) part and learn the following expression:

here i[$0] Taking the current line as an associative array element or key.

I[$0] Returns false if the value is unique and the increment operator (++) adds the current value to 1.

, operator negates (false) the return value of the above expression and returns true. awk command will execute the default operation if operation is true printing the current key,

The bottom line is that the awk command will implicitly loop through all the lines of the file and create an associative array. It will return false if the row is not already in the array and true otherwise. The negator will change false to true and awk will perform its default operation and print the line.

This command will simply print the value to the terminal and the redirection operator will be used to save the output:

Strange '!I[$0], <file name, , <new_filename,

Delete Duplicate Lines in Vim Using Perl Commands

The same operations that I mentioned above can also be done using Perl commands that are developed for text manipulation.

Pearl ,it is 'print if !$i{$_}++' <file name,

This command also creates an associative array and implicitly loops through all the file values.

, There is a special Perl variable that contains the default input and is also known as it or subject. n Flags are used to tell Perl to loop using a given instruction I The flag indicates a small Perl program to be executed in the command line.

Use the -i flag to save modifications to the same file.

Pearl ,I ,it is 'print if !$i{$_}++' <file name,

Or use the redirection operator to save the modified version in a separate file.

Pearl ,it is 'print if !$i{$_}++' <file name, , <new_filename,

conclusion

Vim can be an incredibly useful editor for editing and managing files with lots of data due to its vast functionalities. It provides all operations to create and edit any type of file directly in the terminal window.

Effective data management is an important process to organize and manage data accurately. Removing duplicate lines is a common task that you would love to do while managing data. Duplicate lines can be removed using the sort u command where the u option is for unique. However, it sorts the list before removing duplicates. Other methods such as using awk, perl, or regex can remove duplicate lines without sorting a file. In my opinion, awk is the best way to remove duplicate lines from any type of file.

Add comment

By Ranjan