Thursday 21 March 2019

How to Remove CTRL-M characters From a File in UNIX and Linux? Example

If you're working with text files in a UNIX or Linux environment, you may encounter the issue of unwanted control-M characters, which can cause problems with formatting and readability. These characters are also known as carriage return characters, and they can be removed using a few simple commands in the terminal. In this article, we'll go over how to identify and remove control-M characters from your files using UNIX and Linux.

Identifying Control-M Characters

Before we can remove control-M characters from our files, we need to be able to identify them. One way to do this is by using the cat command with the -v option, which displays non-printing characters in a file.

cat -v file.txt


This command will display the contents of the file.txt, showing control-M characters as "^M" in the output. If you see these characters in your file, you'll need to remove them.

Method 1: Removing Control-M Characters

Once you've identified the control-M characters in your file, you can remove them using the tr command. The tr command is used to translate or delete characters in a file, and we can use it to delete the control-M characters from our file.

tr -d '\r' < file.txt > file_new.txt


This command removes the control-M characters from file.txt and saves the output to file_new.txt. The "-d" option specifies that we want to delete the characters, and '\r' is the character code for the control-M character. The "<" and ">" symbols are used to redirect the input and output of the command, respectively.

Alternatively, you can also use sed command to remove control-M characters from the file.

sed 's/\r//g' file.txt > file_new.txt


This command replaces the control-M characters with an empty string using a regular expression. The 's' command in sed is used for substitution and '/\r//' specifies that we want to replace all occurrences of control-M characters with an empty string. The 'g' at the end of the command specifies that we want to replace all occurrences of the control-M character, not just the first occurrence.

Method 2: Removing Control-M characters from multiple files

If you have multiple files with control-M characters that need to be removed, you can use a for loop in the terminal to iterate through each file and remove the characters using the tr command.

for file in *.txt do tr -d '\r' < "$file" > "${file}_new" done


This command loops through all .txt files in the current directory and removes the control-M characters from each file. The output is saved to a new file with "_new" appended to the original file name.


Method 3: Removing Control-M characters from a large file
If you have a large file with control-M characters, using the tr command to remove them can be slow and inefficient. In this case, you can use the sed command with the "in-place" option to remove the characters directly from the file.

sed -i 's/\r//g' large_file.txt


This command removes the control-M characters from large_file.txt directly, without creating a new file. The "-i" option specifies that we want to make the changes in-place, and 's/\r//g' is the same regular expression used in the previous example to remove the characters.

Method 4: Removing Control-M characters from a file on a remote server
If you need to remove control-M characters from a file on a remote server, you can use the ssh command to connect to the server and run the necessary commands.

ssh user@server "cat -v file.txt | tr -d '\r' > file_new.txt"


This command connects to the remote server as "user" and removes the control-M characters from file.txt using the cat and tr commands. The output is saved to file_new.txt on the remote server.

Method 5: Using the dos2unix command
The dos2unix command is a simple utility that converts text files between DOS/Windows and Unix/Linux formats. This command can also be used to remove control-M characters from a file in a single step.

dos2unix file.txt


This command removes the control-M characters from file.txt and converts the file to Unix/Linux format. The original file is overwritten with the modified file.

Method 6: Using awk to remove control-M characters
The awk command is a powerful tool for processing and manipulating text files. We can use awk to remove control-M characters by replacing them with a newline character.

awk '{ sub("\r$", ""); print }' file.txt > file_new.txt


This command uses awk to replace the control-M characters at the end of each line with a newline character. The modified file is saved to file_new.txt.

Method 7: Using Perl to remove control-M characters
Perl is a powerful scripting language that can be used for a wide range of tasks, including text processing. We can use Perl to remove control-M characters from a file using a regular expression.

perl -pi -e 's/\r//g' file.txt


This command uses Perl to replace all occurrences of control-M characters with an empty string using the regular expression '/\r//g'. The "-pi" option specifies that we want to make the changes in-place, directly in the file.

Method 8: Using Vim to remove control-M characters
Vim is a popular text editor that can be used to remove control-M characters from a file. The process involves opening the file in Vim, entering command mode, and running a substitution command.

vim file.txt


Once you have opened the file in Vim, enter the following command in command mode:

:%s/\r//g


This command uses Vim's substitution command to replace all occurrences of control-M characters with an empty string. The "%s" specifies that the substitution should be applied to the entire file, and the "/g" flag indicates that it should be applied globally.

Save and exit the file by entering the following commands in command mode:

:wq


This saves the modified file and exits Vim.

Method 9: Using Python to remove control-M characters
Python is a versatile programming language that can be used for a wide range of tasks, including text processing. We can use Python to remove control-M characters from a file using a regular expression.

import re

with open('file.txt', 'r') as f:
    data = f.read()

data = re.sub('\r', '', data)

with open('file_new.txt', 'w') as f:
    f.write(data)

This code reads the contents of file.txt into a string, uses the re module to replace all occurrences of control-M characters with an empty string, and writes the modified string to file_new.txt.

Method 10: Using the col command to remove control-M characters
The col command is a specialized utility that can be used to filter control characters from a file. This command can be used to remove control-M characters from a file in a single step.

col -b < file.txt > file_new.txt


This command uses the col command to remove control-M characters from file.txt and save the modified file to file_new.txt.

There are many different ways to remove control-M characters from a file in UNIX and Linux. By using different tools and techniques, you can find the one that works best for your particular situation and make sure that your files are clean and consistent. Whether you prefer to use a command-line tool like dos2unix or awk, a text editor like Vim, or a programming language like Python, the important thing is to be able to identify and remove these characters quickly and efficiently.


Labels: , ,

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home