Word Count in Linux for understanding it's usage in practical scenarios.

Word Count in Linux for understanding it's usage in practical scenarios.

Last Blog Review

In the last, blog we understood how to use “Find” command in linux, which makes the life easy as there are lot of files and directories in linux because being a inverted tree like structure, it’s very difficult to find a particular find due to which using this command we can get the desired files and directories quickly within no time. As Find allows you to use many filter’s to search the file and directories as we saw above.

What is exactly Word Count ???

Word Count is a command or process which helps to count the number of lines, word, characters in the file which is specified as arguments. This feature seems to be a not so attractive one, but it acts as a life savior when you deal with lot’s of files and you need to look for some specific words, also when you are creating a script where you are checking the word limit is exceeded or not, even more.

Let’s understand it practically →

A. To count the number of lines

wc -l <file.name>

Ex. wc -l text.txt

B. To count the number of words

wc -w <file.name>

Ex. wc -w text.txt

C. To count the number of bytes

wc -c <file.name>

Ex. wc -c text.txt

D. To print the length of the longest character in a file

wc -L <file.name>

Ex. wc -L text.txt

E. It can be used to count the number of files and folders in a directory

ls <directory> | wc -l

Ex. ls log | wc -l

Practical use cases →

  1. Suppose I have a log.txt file in which there are some error words, now i want calculate how many error words have occurred ?

     root@ubuntu-host ~ ➜  cat log.txt
     2023-01-24 10:01:02 - INFO - Application started successfully
     2023-01-24 10:01:05 - WARN - Database connection attempt took longer than expected
     2023-01-24 10:01:10 - ERROR - Unable to fetch user data from database: [Errno 110] Connection timed out
     2023-01-24 10:01:12 - INFO - User 'john.doe' successfully logged in
     2023-01-24 10:01:15 - ERROR - Invalid input provided for parameter 'amount' in function 'calculate_total'
     2023-01-24 10:01:18 - INFO - File 'report.txt' generated successfully
     2023-01-24 10:01:20 - WARN - Network latency detected on request to external API
     2023-01-24 10:01:23 - INFO - Processing user request successfully
     2023-01-24 10:01:25 - ERROR -  Error processing image: Image format not supported
     2023-01-24 10:01:28 - INFO - System health check passed
     2023-01-24 10:01:30 - WARN - Cache miss for key 'recent_products'
    
     root@ubuntu-host ~ ➜  grep -o -i "error" log.txt | wc -w
     4
    
     root@ubuntu-host ~ ➜
    
  1. While web scraping to find the number of words on that web html page

     root@ubuntu-host ~ ➜  curl -s https://google.com 
     <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
     <TITLE>301 Moved</TITLE></HEAD><BODY>
     <H1>301 Moved</H1>
     The document has moved
     <A HREF="https://www.google.com/">here</A>.
     </BODY></HTML>
    
     root@ubuntu-host ~ ➜  curl -s https://google.com | wc -w
     14
    
  2. If we want to take the backup of the file having a good amount of words, and to analyze that we can make use of word command

     root@ubuntu-host ~ ➜  cat log.txt
     2023-01-24 10:01:02 - INFO - Application started successfully
     2023-01-24 10:01:05 - WARN - Database connection attempt took longer than expected
     2023-01-24 10:01:10 - ERROR - Unable to fetch user data from database: [Errno 110] Connection timed out
     2023-01-24 10:01:12 - INFO - User 'john.doe' successfully logged in
     2023-01-24 10:01:15 - ERROR - Invalid input provided for parameter 'amount' in function 'calculate_total'
     2023-01-24 10:01:18 - INFO - File 'report.txt' generated successfully
     2023-01-24 10:01:20 - WARN - Network latency detected on request to external API
     2023-01-24 10:01:23 - INFO - Processing user request successfully
     2023-01-24 10:01:25 - ERROR -  Error processing image: Image format not supported
     2023-01-24 10:01:28 - INFO - System health check passed
     2023-01-24 10:01:30 - WARN - Cache miss for key 'recent_products'
    
     root@ubuntu-host ~ ➜  cat wordcountshell.sh
     #!/bin/bash
    
     if [ $(wc -w < log.txt) -lt 500 ]; then
       echo "File is having too less words to backup"
     else
       echo "File is good, you can take backup"
     fi
    
     root@ubuntu-host ~ ➜  sh wordcountshell.sh
     File is having too less words to backup
    
     root@ubuntu-host ~ ➜
    

Conclusion →

So, here we understood how word count command looks very simple to use. But it can help in hell lot of ways to make the life easy for analyzing any file/folder. It will assist in counting lines, words, characters, bytes and can be used in shell scripts to automate the task as well.

💡
That’s a wrap for today’s post! I hope this has given you some valuable insights. Be sure to explore more articles on our blog for further tips and advice. See you in the next post!