Nixpert: July 2019

Top 10 CPU/Mem consuming processes in Linux (command line)

Top 10 CPU consuming processes in Linux:

# ps -eo pmem,pcpu,pid,args | head -1;ps -eo pmem,pcpu,pid,args | tail -n +2 | sort -rnk 2 | head -10

# ps aux | head -1;ps aux | sort -rnk 3 | head -10

Top 10 Memory consuming processes in Linux:

# ps -eo pmem,pcpu,pid,args | head -1;ps -eo pmem,pcpu,pid,args | tail -n +2 | sort -rnk 1 | head -10

# ps aux | head -1;ps aux | sort -rnk 4 | head -10

Understanding “Top” Command - Linux Process Monitoring

Top

Linux Top command is a performance monitoring program which is used frequently by many system administrators to monitor Linux performance and it is available under many Linux/Unix like operating systems. The top command used to display all the running and active real-time processes in ordered list and updates it regularly. It display CPU usage, Memory usage, Swap Memory, Cache Size, Buffer Size, Process PID, User, Commands and much more. It also shows high memory and cpu utilization of a running processess. The top command is much userful for system administrator to monitor and take correct action when required. Let’s see top command in action.

There are a number of variants of top, but in the rest of this article, we will talk about the most common variant — the one that comes with the “procps” package. You can verify this by running:

[root@server ~]# top -v

top: procps version 3.2.8

Understanding top’s interface: the summary area

As we have previously seen, top’s output is divided into two different sections. In this part of the article, we’re going to focus on the elements in half of the output. This region is also called the “summary area”.

System time, uptime and user sessions:

At the very top left of the screen (as marked in the screenshot above), top displays the current time. This is followed by the system uptime, which tells us the time for which the system has been running. For instance, in our example, the current time is "12:48:02" and the system has been running for 1 hour and 21 mins. Next comes the number of active user sessions. In this example, there is one active user sessions. These sessions may be either made on a TTY (physically on the system, either through the command line or a desktop environment) or a PTY (such as a terminal emulator window or over SSH).

If you want to get more details about the active user sessions, use the "who" command.

Tasks:

The “Tasks” section shows statistics regarding the processes running on your system. The “total” value is simply the total number of processes. For example, in the above screenshot, there are 27 processes running. To understand the rest of the values, we need a little bit of background on how the Linux kernel handles processes.

Processes perform a mix of I/O-bound work (such as reading disks) and CPU-bound work (such as performing arithmetic operations). The CPU is idle when a process performs I/O, so OSes switch to executing other processes during this time. In addition, the OS allows a given process to execute for a very small amount of time, and then it switches over to another process. This is how OSes appear as if they were “multitasking”. Doing all this requires us to keep track of the “state” of a process. In Linux, a process may be in of these states:

Runnable (R): A process in this state is either executing on the CPU, or it is present on the run queue, ready to be executed.
Interruptible sleep (S): Processes in this state are waiting for an event to complete.
Uninterruptible sleep (D): In this case, a process is waiting for an I/O operation to complete.
Stopped (T): These processes have been stopped by a job control signal (such as by pressing Ctrl+Z) or because they are being traced.
Zombie (Z): The kernel maintains various data structures in memory known as "process table" to keep track of processes. A process may create a number of child processes, and they may exit while the parent is still around. However, these data structures must be kept around until the parent obtains the status of the child processes. Such terminated processes whose records are still around in process tables are called zombies.

Processes in the D and S states are in “sleeping”, and those in the T state are in “stopped”. The number of zombies are shown as the “zombie” value.

CPU usage:

The CPU usage section shows the percentage of CPU time spent on various tasks. The us value is the time the CPU spends executing processes in userspace. Similarly, the sy value is the time spent on running kernelspace processes.

Linux uses a “nice” value to determine the priority of a process. A process with a high “nice” value is “nicer” to other processes, and gets a low priority. Similarly, processes with a lower “nice” gets higher priority. As we shall see later, the default “nice” value can be changed. The time spent on executing processes with a manually set “nice” appear as the ni value.

This is followed by id, which is the time the CPU remains idle. Most operating systems put the CPU on a power saving mode when it is idle. Next comes the wa value, which is the time the CPU spends waiting for I/O to complete.

Interrupts are signals to the processor about an event that requires immediate attention. Hardware interrupts are typically used by peripherals to tell the system about events, such as a keypress on a keyboard. On the other hand, software interrupts are generated due to specific instructions executed on the processor. In either case, the OS handles them, and the time spent on handling hardware and software interrupts are given by hi and si respectively.

In a virtualized environment, a part of the CPU resources are given to each virtual machine (VM). The OS detects when it has work to do, but it cannot perform them because the CPU is busy on some other VM. The amount of time lost in this way is the “steal” time, shown as st.

Load average:

The load average section represents the average “load” over one, five and fifteen minutes.

System Load/CPU load is a measurement of CPU over or under-utilization in a Linux system; the number of processes which are being executed by the CPU or in waiting state.

Memory usage:

The “memory” section shows information regarding the memory usage of the system. The lines marked “Mem” and “Swap” show information about RAM and swap space respectively. Simply put, a swap space is a part of the hard disk that is used like RAM. When the RAM usage gets nearly full, infrequently used regions of the RAM are written into the swap space, ready to be retrieved later when needed. However, because accessing disks are slow, relying too much on swapping can harm system performance.

As you would naturally expect, the “total”, “free” and “used” values have their usual meanings. The “avail mem” value is the amount of memory that can be allocated to processes without causing more swapping.

The Linux kernel also tries to reduce disk access times in various ways. It maintains a “disk cache” in RAM, where frequently used regions of the disk are stored. In addition, disk writes are stored to a “disk buffer”, and the kernel eventually writes them out to the disk. The total memory consumed by them is the “buff/cache” value. It might sound like a bad thing, but it really isn’t — memory used by the cache will be allocated to processes if needed.

Understanding top’s interface: the task area

The summary area is comparatively simpler, and it contains a list of processes. In this section, we will learn about the different columns shown in top’s default output.

This is the process ID, a unique positive integer that identifies a process.

USER

This is the “effective” username (which maps to a user ID) of the user who started the process. Linux assigns a real user ID and an effective user ID to processes; the latter allows a process to act on behalf of another user. (For example, a non-root user can elevate to root in order to install a package.)

PR and NI

The “NI” field shows the “nice” value of a process. The “PR” field shows the scheduling priority of the process from the perspective of the kernel. The nice value affects the priority of a process.

VIRT, RES, SHR and %MEM

These three fields are related with to memory consumption of the processes. “VIRT” is the total amount of memory consumed by a process. This includes the program’s code, the data stored by the process in memory, as well as any regions of memory that have been swapped to the disk. “RES” is the memory consumed by the process in RAM, and “%MEM” expresses this value as a percentage of the total RAM available. Finally, “SHR” is the amount of memory shared with other processes.

As we have seen before, a process may be in various states. This field shows the process state in the single-letter form.

TIME+

This is the total CPU time used by the process since it started, precise to the hundredths of a second.

COMMAND

The COMMAND column shows the name of the processes.

Top command usage examples

So far, we have discussed about top’s interface. However, it can also manage processes, and you can control various aspects of top’s output. In this section, we’re going to take at a few examples.

In most of the examples below, you have to press a key while top is running. Keep in mind that these key-presses are case sensitive — so if you press “k” while Caps Lock is on, you have actually pressed a “K”, and the command won’t work, or do something else entirely.

Killing processes

If you want to kill a process, simply press ‘k’ when top is running. This will bring up a prompt, which will ask for the process ID of the process and press enter.

Next, enter the signal using which the process should be killed. If you leave this blank, top uses a SIGTERM, which allows processes to terminate gracefully. If you want to kill a process forcefully, you can type in SIGKILL here. You can also type in the signal number here. For example, the number for SIGTERM is 15 and SIGKILL is 9.

If you leave the process ID blank and hit enter directly, it will terminate the topmost process in the list. As we’ve mentioned previously, you can scroll using the arrow keys, and change the process you want to kill in this way.

Sorting the process list

One of the most frequent reasons to use a tool like top is to find out which process is consuming the most resources. You can press the following keys to sort the list:

‘M’ to sort by memory usage
‘P’ to sort by CPU usage
‘N’ to sort by process ID
‘T’ to sort by the running time

By default, top displays all results in descending order. However, you can switch to ascending order by pressing ‘R’.

You can also sort the list with the -o switch. For example, if you want to sort processes by CPU usage, you can do so with:

top -o %CPU

You can sort the list by any of the attributes in the summary area in the same way.

Showing a list of threads instead of processes

We have previously touched upon how Linux switches between processes. Unfortunately, processes do not share memory or other resources, making such switches rather slow. Linux, like many other operating systems, supports a “lightweight” alternative, called a “thread”. They are part of a process and share certain regions of memory and other resources, but they can be run concurrently like processes.

By default, top shows a list of processes in its output. If you want to list the threads instead, press ‘H’ when top is running. Notice that the “Tasks” line says “Threads” instead, and shows the number of threads instead of processes.

You may have noticed how none of the attributes in the process list changed. How is that possible, given that processes differ from threads? Inside the Linux kernel, threads and processes are handled using the same data structures. Thus, every thread has its own ID, state and so on.

If you want to switch back to the process view, press ‘H’ again. In addition, you can use the -H switch to display threads by default.

top -H

Showing full paths

By default, top does not show the full path to the program, or make a distinction between kernelspace processes and userspace processes. If you need this information, press ‘c’ while top is running. Press ‘c’ again to go back to the default.

Kernelspace processes are marked with square brackets around them. As an example, in the above screenshot there are two kernel processes, kthreadd and khelper. On most Linux installations, there will usually be a few more of them.

Alternatively, you can also start top with the -c argument:

top -c

Listing processes from a user

To list processes from a certain user, press ‘u’ when top is running. Then, type in the username, or leave it blank to display processes for all users.

Alternatively, you can run the top command with the -u switch. In this example, we have listed all processes from the root user.

top -u root

Linux Network Commands Used In Network Troubleshooting

Check Network Connectivity Using ping Command

The ping command is one of the most used Linux network commands in network troubleshooting. It is used to check whether or not a specific IP address can be reached.

The ping command works by sending an ICMP echo request to check the network connectivity.

$ ping google.com

[root@server ~]# ping google.com

PING google.com (172.217.166.174) 56(84) bytes of data.

64 bytes from bom07s20-in-f14.1e100.net (172.217.166.174): icmp_seq=1 ttl=56 tim e=7.55 ms

64 bytes from bom07s20-in-f14.1e100.net (172.217.166.174): icmp_seq=2 ttl=56 tim e=9.89 ms

--- google.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4738msrtt min/avg/max/mdev = 7.555/8.715/9.892/0.863 ms

These results are showing a successful ping, and it can be described as the trip of an echo request issued by our system to google.com.

This command measures the average response. If there is no response, then maybe there is one of the following:

There is a physical problem on the network itself.
The location might be incorrect or non-functional.
The ping request is blocked by the target.
There is a problem with the routing table.

If you want to limit the number of echo requests made to 3, you can do it like this:

$ ping -c 3 google.com

ping -c

Here ping command stops sending echo requests after 3 cycles.

There are some issues that you should consider about ping command. These issues may not necessarily mean that there is a problem like:

Distance to the target: so if you live in the U.S. and you ping a server on Asia, you should expect that this ping will take much time than pinging a server in the U.S.

The connection speed: if your connection is slow, ping will take a longer time than if you have a fast connection.

The hop count: this refers to routers and servers that the echo travels across till reaching its destination.

The important rule about ping is that the low ping is always desirable.

Get DNS Records Using dig and host Commands

You can use the dig command to verify DNS mappings, host addresses, MX records, and all other DNS records for a better understanding of DNS topography.

The dig command was developed to replace nslookup command.

$ dig google.com

The dig command by default searches for A records, you can obtain information for specific record types like MX records or NS records.

$ dig google.com MX

[root@server ~]# dig google.com MX

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6 <<>> google.com MX

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23427

;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 4, ADDITIONAL: 8

;; QUESTION SECTION:

;google.com. IN MX

;; ANSWER SECTION:

google.com. 600 IN MX 20 alt1.aspmx.l.google.com.

google.com. 600 IN MX 40 alt3.aspmx.l.google.com.

google.com. 600 IN MX 50 alt4.aspmx.l.google.com.

google.com. 600 IN MX 10 aspmx.l.google.com.

google.com. 600 IN MX 30 alt2.aspmx.l.google.com.

;; AUTHORITY SECTION:

google.com. 172608 IN NS ns3.google.com.

google.com. 172608 IN NS ns2.google.com.

google.com. 172608 IN NS ns1.google.com.

google.com. 172608 IN NS ns4.google.com.

;; ADDITIONAL SECTION:

ns2.google.com. 172608 IN A 216.239.34.10

ns2.google.com. 172608 IN AAAA 2001:4860:4802:34::a

ns1.google.com. 172608 IN A 216.239.32.10

ns1.google.com. 172608 IN AAAA 2001:4860:4802:32::a

ns3.google.com. 172608 IN A 216.239.36.10

ns3.google.com. 172608 IN AAAA 2001:4860:4802:36::a

ns4.google.com. 172608 IN A 216.239.38.10

ns4.google.com. 172608 IN AAAA 2001:4860:4802:38::a

;; Query time: 67 msec

;; SERVER: 192.168.0.10#53(192.168.0.10)

;; WHEN: Tue Jul 23 12:55:33 2019

;; MSG SIZE rcvd: 384

You can get all types of records by using ANY query.

$ dig google.com ANY

The dig command makes a reverse lookup to get DNS information like this:

[root@server ~]# dig -x 8.8.8.8

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6 <<>> -x 8.8.8.8

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22755

;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:

;8.8.8.8.in-addr.arpa. IN PTR

;; ANSWER SECTION:

8.8.8.8.in-addr.arpa. 21560 IN PTR dns.google.

;; Query time: 10 msec

;; SERVER: 8.8.8.8#53(8.8.8.8)

;; WHEN: Tue Jul 23 12:56:35 2019

;; MSG SIZE rcvd: 62

dig command does its query using the servers listed on /etc/resolv.conf.

The host command is similar to dig command.

#host -a google.com

Also, you can perform reverse lookups using host command.

#[root@server ~] host 8.8.8.8

8.8.8.8.in-addr.arpa domain name pointer dns.google.

So both commands work in a similar way but dig command provides more advanced options.

Diagnose Network Latency Using traceroute Command

The traceroute command is one of the most useful Linux network commands. It is used to show the pathway to your target and where the delay comes from. This command helps basically in:

Providing the names and the identity of every device on the path.
Reporting network latency and identify at which device the latency comes from.

$ traceroute google.com

The output will provide the specified host, the size of the packet that will be used, the IP address, and the maximum number of hops required. You can see the hostname, IP address, the hop number, and packet travel times.

To avoid reverse DNS lookup, use the -n option.

traceroute -n google.com

By using traceroute command, you can identify network bottlenecks. The asterisks shown here means there is a potential problem in routing to that host, as the asterisks indicate packet loss or dropped packets.

The traceroute command sends a UDP packet, traceroute can send UDP, TCP, and ICMP.

If you need to send ICMP packet, you can send it like this:

sudo traceroute -I google.com

To use a TCP variation, it can be used like this:

$ sudo traceroute -T google.com

This is because some servers block UDP requests, so you can use this method.

In this case, you can send UDP, ICMP, or TCP to bypass these issues.

mtr Command (Realtime Tracing)

This command is an alternative to traceroute command.

[root@server ~]# mtr google.com

The best thing about mtr command is that it displays real-time data unlike traceroute.

Furthermore, you can use the mtr command with –report option, this command sends 10 packets to each hop found on its way like this:

[root@server ~]# mtr --report google.com

HOST: server.example.com Loss% Snt Last Avg Best Wrst StDev

1. 192.168.0.1 0.0% 10 1.8 4.3 1.1 18.5 5.2

2. 100.69.0.1 0.0% 10 4.4 8.5 2.9 21.8 7.4

3. Pune-Core01.youbroadband.in 0.0% 10 6.5 9.4 5.5 31.4 7.9

4. 58-215-187-203.static.youbro 0.0% 10 7.7 7.7 6.9 8.6 0.5

5. 209.85.175.108 0.0% 10 8.2 8.1 6.8 11.0 1.3

6. 108.170.248.193 0.0% 10 7.7 8.6 7.1 11.4 1.3

7. 74.125.253.107 0.0% 10 16.2 11.7 8.2 25.6 5.4

8. bom07s20-in-f14.1e100.net 0.0% 10 7.2 9.3 6.9 16.2 2.8

This command gives a huge amount of details better than traceroute.

If this command doesn’t run using a normal user account, you should use root, since some distros adjust the permission of this binary for root users only.

Checking Connection Performance Using ss Command

The socket statistics command ss is a replacement for netstat, it’s faster than netstat and gives more information.

The ss command gets its information directly from the kernel instead of relying on /proc directory like netstat command.

[root@server ~]# ss | less

This command outputs all TCP, UDP, and UNIX socket connections and pipes the result to the less commandfor better display.

You can combine this command with either the -t to show TCP sockets or -u to show UDP or -x to show UNIX sockets. And you should use -a option combined with any of these options to show the connected and listening sockets.

[root@server ~]# ss -ta

State Recv-Q Send-Q Local Address:Port Peer Address:Port

LISTEN 0 128 *:43433 *:*

LISTEN 0 128 :::sunrpc :::*

LISTEN 0 128 *:sunrpc *:*

LISTEN 0 3 192.168.0.10:domain *:*

LISTEN 0 3 127.0.0.1:domain *:*

LISTEN 0 128 :::ssh :::*

LISTEN 0 128 *:ssh *:*

LISTEN 0 128 127.0.0.1:ipp *:*

LISTEN 0 128 ::1:ipp :::*

LISTEN 0 128 :::60600 :::*

LISTEN 0 128 *:4505 *:*

LISTEN 0 100 ::1:smtp :::*

LISTEN 0 100 127.0.0.1:smtp *:*

LISTEN 0 128 ::1:rndc :::*

LISTEN 0 128 127.0.0.1:rndc *:*

LISTEN 0 128 *:4506 *:*

ESTAB 0 0 192.168.0.10:ssh 192.168.0.102:52204

To list all established TCP sockets for IPV4, use the following command:

$ ss -t4 state established

To list all closed TCP states:

$ ss -t4 state closed

You can use the ss command to show all connected ports from a specific IP:

$ ss dst XXX.XXX.XXX.XXX

And you can filter by a specific port like this:

$ ss dst XXX.XXX.XXX.XXX:22

Install and Use iftop Command For Traffic Monitoring

iftop utility or iftop command is used to monitor the traffic and display real-time results.

You can download the tool like this:

$ wget http://www.ex-parrot.com/pdw/iftop/download/iftop-0.17.tar.gz

Then extract it:

$ tar zxvf iftop-0.17.tar.gz

Then compile it:

$ cd iftop-0.17

$  ./configure

$ make

$ make install

If you got any errors about libpcap, you can install it like this:

$ yum install libpcap-dev

And you can run the tool as a root user like this:

$ sudo iftop -I <interface>

And you will see this table with real-time data about your traffic.

add –P option with iftop to show ports.

$ sudo iftop -P

You can use the -B option to display the output in bytes instead of bits which is shown by default.

$ iftop -B

There a lot of options, you can check them man iftop.

arp Command

Systems keep a table of IP addresses and their corresponding MAC addresses, this table is called ARP lookup table. If you try to connect to an IP address, your router will check for your MAC address. If it’s cached, ARP table is not used.

To view the arp table, use the arp command:

$ arp

[root@server ~]# arp

Address HWtype HWaddress Flags Mask Iface

192.168.0.102 ether f4:8c:50:bb:a0:fd C eth1

192.168.0.1 ether 70:4f:57:21:f5:2e C eth1

[root@server ~]#

By default, arp command shows the hostnames, you can show IP addresses instead like this:

$ arp -n

You can delete entries from the arp table like this:

$ arp -d HWADDR

Packet Analysis with tcpdump

One of the most important Linux network commands is The tcpdump command. tcpdump command is used to capture the traffic that is passing through your network interface.

This kind of access to the packets which is the deepest level of the network can be vital when troubleshooting the network.

$ tcpdump -i <network_device>

You can also specify a protocol (TCP, UDP, ICMP and others) like this:

$ tcpdump -i <network_device> tcp

Also, you can specify the port:

$ tcpdump -i <network_device> port 80

tcpdump will keep running until the request is canceled; it is better to use the -c option in order to capture a pre-determined number of events like this:

$ tcpdump -c 20 -i <network_device>

You can also specify the IP to capture from using src option or going to using dst option.

$ tcpdump -c 20 -i <network_device> src XXX.XXX.XXX.XXX

You can obtain the device names like this:

$ ifconfig

You can save the traffic captured from tcpdump to a file and read it later with -w option.

$ tcpdump -w /path/ -i <network_device>

And to read that file:

$ tcpdump -r /path

I hope that Linux network commands we’ve discussed in this post could help you troubleshoot some of your network problems and take the right decision.

10 wraps of 300MB network data capture in a file and bidirectional TCPDump.

nohup tcpdump -nli eth0 -W 10 -C 300 -w /apps/tcpdump/file.pcap host <host1> and <host2> &

Thank You.

Nixpert