Tuesday
May012012

Kill processes older than 5 minutes

In this below script, I'm listing smbd processes and kill any older than 300 seconds. These can be adjusted as needed for other needs.
#!/usr/bin/perl

@pids = map {(split)[1]} grep {$_ !~ /grep/} grep {/smbd/} `ps -ef`;

foreach (@pids)
{
        # Get process start time
        $ctime{$_} = (stat "/proc/$_")[9];

        # Calculate difference between now and process start time
        $diff = time() - $ctime{$_};

        # If more than 300 seconds, kill the process
        if ($diff > 300)
        {
                kill 9, $_;
        }
}
Tuesday
May102011

LFTP Mirroring

I routinely use the LFTP utility in scripts for file transfers to/from remote sites. It's a full-feature CLI file transfer utility which supports FTP, FTP/S,and SFTP. I also recently discovered its mirroring feature which works quite nicely in situations where rsync isn't an option.

The following will upload file.txt to the sftp.domain.com SFTP server:

[root@saturn ~]# lftp -p 22 -u user,pass -e "cd dropoff; put file.txt; ls; quit" sftp://sftp.domain.com

The following will mirror /data on ftp.domain.com to /data on local machine while keeping the file ownership consistent:

[root@saturn ~]# lftp -u user,pass -e "mirror -e -v --allow-chown /data /data;quit" ftp.domain.com
Friday
Mar182011

Finding Linux server installation date

Here's a few ways to determine the install date on a Linux system (RHEL/CentOS/Fedora).

Check the timestamp of files generated by the Anaconda installer:

[root@saturn ~]# ll /root
...
-rw------- 1 root root  1262 Feb 17 11:06 anaconda-ks.cfg
-rw-r--r-- 1 root root 38088 Feb 17 11:06 install.log
-rw-r--r-- 1 root root  4507 Feb 17 11:03 install.log.syslog

Check the root filesystem creation time:

[root@saturn ~]# tune2fs -l /dev/mapper/VolGroup00-root | grep created
Filesystem created:       Thu Feb 17 10:52:59 2011
Sunday
Feb132011

Analyzing Linux Kernel Dumps

In a default RHEL install, the kdump utility is configured to reserve an area of memory to capture kernel dumps in the event of a kernel panic.    The GRUB boot entry is named something like "crashkernel=128M@16M" which tells it to reserve a chunk of memory for this special kernel.

There's a utility called "crash" which can analyze the contents of these dumps.    The dumps themselves are located in /var/crash/YYYY-MM-DD-HH:MM/vmcore.

First check to see if the utility is installed (if not you can install it using yum):

[root@saturn ~]# which crash
/usr/bin/crash

The crash utility requires specific kernel debug packages based on the kernel that was running when the crash occurred.   They aren't in the default yum repositories for some reason.  They can be found here for RHEL 5: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/

[root@saturn ~]# uname -a
Linux saturn.coldcache.com 2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@saturn ~]# rpm -ivh ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kernel-debuginfo-common-2.6.18-194.26.1.el5.x86_64.rpm
[root@saturn ~]# rpm -ivh ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kernel-debuginfo-2.6.18-194.26.1.el5.x86_64.rpm
Once those are installed, then we're ready to run crash and point it to the vmcore file.
[root@saturn ~]# crash /usr/lib/debug/lib/modules/2.6.18-194.26.1.el5/vmlinux /var/crash/2011-02-12-07\:04/vmcore

crash 4.1.2-8.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
      
      KERNEL: /usr/lib/debug/lib/modules/2.6.18-194.26.1.el5/vmlinux
    DUMPFILE: /var/crash/2011-02-12-07:04/vmcore
        CPUS: 8
        DATE: Sat Feb 12 07:03:02 2011
      UPTIME: 00:01:06
LOAD AVERAGE: 0.77, 0.25, 0.09
       TASKS: 211
    NODENAME: saturn.coldcache.com
     RELEASE: 2.6.18-194.26.1.el5
     VERSION: #1 SMP Fri Oct 29 14:21:16 EDT 2010
     MACHINE: x86_64  (2813 Mhz)
      MEMORY: 39.4 GB
       PANIC: "Oops: 0002 [1] SMP " (check log for details)
         PID: 5534
     COMMAND: "insmod"
        TASK: ffff81087f591080  [THREAD_INFO: ffff81087cc5e000]
         CPU: 4
       STATE: TASK_RUNNING (PANIC)

crash> 
The above output shows that there was a kernel panic caused by an "insmod" command with PID 5534 at 7:03am. You're then dropped into a crash command prompt which lets you run other commands to get more information. View the contents of dmesg at that time by typing "log":
crash > log

(output truncated for brevity)

kobject_add failed for ipmi_bmc.17 with -EEXIST, don't try to register things with the 
same name in the same directory.

Call Trace:
 [] kobject_add+0x170/0x19b
 [] device_add+0x85/0x372
 [] platform_device_add+0xd8/0x129
 [] :ipmi_msghandler:ipmi_register_smi+0x5cc/0xab7
 [] autoremove_wake_function+0x0/0x2e
 [] :ipmi_si:try_smi_init+0x494/0x685
 [] :ipmi_si:ipmi_pci_probe+0xa0/0x17f
 [] pci_device_probe+0x104/0x184
 [] driver_probe_device+0x52/0xaa
 [] __driver_attach+0x65/0xb6
 [] __driver_attach+0x0/0xb6
 [] bus_for_each_dev+0x43/0x6e
 [] bus_add_driver+0x76/0x110
 [] __pci_register_driver+0x51/0xa6
 [] :ipmi_si:init_ipmi_si+0x5f6/0x746
 [] sys_init_module+0xaf/0x1f2
 [] tracesys+0xd5/0xe0

ipmi_msghandler: Unable to register bmc device: -17
ipmi_si: Unable to register device: error -17
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [] _spin_lock+0x0/0xa
PGD 36b8c4067 PUD 36b9b4067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /devices/pci0000:40/0000:40:0b.0/0000:4f:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:1/timeout
CPU 4
Modules linked in: ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
lockd sunrpc dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi 
wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc 
lp parport ide_cd k8temp serio_raw shpchp cdrom bnx2 hwmon k8_edac edac_mc sg hpilo pcspkr 
dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod 
qla2xxx scsi_transport_fc cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5534, comm: insmod Tainted: G      2.6.18-194.26.1.el5 #1
RIP: 0010:[]  [] _spin_lock+0x0/0xa
RSP: 0018:ffff81087cc5fc90  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff81037994cc38 RCX: ffff81037fe1d800
RDX: ffff81037f96b000 RSI: ffffffff801510b5 RDI: 0000000000000000
RBP: ffff81037994cc10 R08: ffff81087cc5e000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
R13: ffffffff8033bce0 R14: 0000000000000000 R15: 0000000000000000
FS:  00002b05aa7f9210(0000) GS:ffff81068710d440(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000373272000 CR4: 00000000000006e0
Process insmod (pid: 5534, threadinfo ffff81087cc5e000, task ffff81087f591080)
Stack:  ffffffff8028397c ffffffffffffffff ffff81037994cc00 ffff81036e2ab000
 ffffffff801c5f3f ffff81036e2ab000 ffff81037994cc00 ffff81037f96b000
 ffff81036e2ab000 ffff81037f96b000 ffffffff801c96ce ffff81037f96b034
Call Trace:
 [] klist_del+0x15/0x2a
 [] device_del+0x22/0x1a9
 [] platform_device_unregister+0x9/0x12
 [] :ipmi_msghandler:cleanup_bmc_device+0xde/0xe9
 [] :ipmi_msghandler:cleanup_bmc_device+0x0/0xe9
 [] kref_put+0x6f/0x7a
 [] :ipmi_msghandler:ipmi_bmc_unregister+0x6a/0x79
 [] :ipmi_msghandler:ipmi_unregister_smi+0xc/0xf4
 [] :ipmi_si:try_smi_init+0x59e/0x685
 [] :ipmi_si:ipmi_pci_probe+0xa0/0x17f
Show the process tree for PID 5534:
crash> ps -p 5534
PID: 0      TASK: ffffffff80308b60  CPU: 0   COMMAND: "swapper"
 PID: 1      TASK: ffff81010c4a97a0  CPU: 4   COMMAND: "init"
  PID: 3141   TASK: ffff81037fd260c0  CPU: 6   COMMAND: "rc"
   PID: 4358   TASK: ffff81087f1450c0  CPU: 2   COMMAND: "S91hpasm"
    PID: 4373   TASK: ffff81087f521100  CPU: 6   COMMAND: "sh"
     PID: 5306   TASK: ffff81087f71a100  CPU: 6   COMMAND: "sh"
      PID: 5434   TASK: ffff81087f11f7e0  CPU: 6   COMMAND: "hp-OpenIPMI"
       PID: 5534   TASK: ffff81087f591080  CPU: 4   COMMAND: "insmod"
The above shows that that the culprit is an IPMI kernel driver and it was loaded by S91hpasm startup script which invokes another script named hp-OpenIPMI. Disabling or removing that driver resolved the kernel panic issue for this server.
Monday
Dec272010

Using MTR to troubleshoot network issues

MTR is a network diagnostic tool combining the features of the ping and traceroute commands.  It uses ICMP traffic to deduce the route to a particular host.

I use this tool often at work to troubleshoot connectivity issues between two endpoints of a VPN tunnel.   Since the return path is often different, it is useful to collect MTR statistics from both ends.

[root@saturn ~]# mtr --report coldcache.com
HOST: saturn                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 192.168.10.1                  0.0%    10    0.8   1.0   0.8   2.7   0.6
  2. 96.179.244.1                  0.0%    10    7.6   7.8   7.2   8.7   0.6
  3. ge-6-8-ur02.pittsburgh.pa.pi  0.0%    10    7.6   8.5   7.2  10.3   1.0
  4. te-8-3-ur01.pennhills.pa.pit  0.0%    10    9.6   9.5   8.3  11.3   0.9
  5. 68.85.75.41                   0.0%    10    9.6  13.5   8.3  47.9  12.1
  6. 68.85.75.193                  0.0%    10    9.2  10.4   8.3  18.1   2.9
  7. te-3-0-0-1-cr01.ashburn.va.i  0.0%    10   17.2  17.2  15.8  19.3   0.9
  8. xe-9-0-0.edge1.Washington1.L  0.0%    10   16.6  18.1  15.6  31.3   4.7
  9. vlan99.csw4.Washington1.Leve  0.0%    10   20.8  23.4  17.1  28.8   4.2
 10. ae-92-92.ebr2.Washington1.Le  0.0%    10   28.6  19.6  15.5  30.8   5.4
 11. ae-5-5.ebr2.Washington12.Lev  0.0%    10   22.5  17.9  16.4  22.5   1.7
 12. 4.69.148.49                   0.0%    10   22.1  22.1  21.3  23.2   0.7
 13. ae-91-91.csw4.NewYork1.Level  0.0%    10   37.5  26.8  21.7  37.5   5.1
 14. ae-4-99.edge1.NewYork1.Level  0.0%    10   24.1  32.1  21.6 115.1  29.2
 15. PEER-1-NETW.edge1.NewYork1.L  0.0%    10   23.0  22.7  21.8  23.3   0.5
 16. squarespace.com               0.0%    10   23.6  24.4  23.2  26.3   0.9

One important thing to remember is that routers who de-prioritize ICMP will show as packet loss in tools such as MTR which rely on ICMP packets. So a hop showing packet loss may only be dropping ICMP traffic and not necessarily legitimate TCP/UDP traffic.

The tool is available as a package on most Linux distributions and the official page contains links to the source code. A Windows port called WinMTR is also available.

Wednesday
Nov102010

Red Hat Enterprise Linux 6 

After a long wait, Red Hat officially released RHEL 6 today (press release).    It's built on the 2.6.32 kernel.

Along with the new version, they've made some changes to the certification process.   The RHCT is being replaced by the RHCSA (Red Hat Certified Systems Administrator).    The RHCSA is a separate exam and a prerequisite to the RHCE.

I anticipate that the CentOS team will release CentOS 6 in the next month or two.

Sunday
Nov072010

Determining filesystem type

Query the device file:

[root@saturn ~]# file -s /dev/sdb1
/dev/sdb1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (large files) (huge files)

Look at the mount options:

[root@saturn ~]# df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/vg_rhce-lv_root
              ext4    6.5G  1.7G  4.4G  28% /
tmpfs        tmpfs    246M     0  246M   0% /dev/shm
/dev/sda1     ext4    485M   57M  403M  13% /boot
/dev/md0      ext3    4.6G  138M  4.3G   4% /raiddisk
/dev/mapper/vg1-lv1
              ext3    5.0G  139M  4.6G   3% /newlvm
/dev/sdb1     ext4    9.2G  149M  8.6G   2% /newdisk

[root@saturn ~]# mount
/dev/mapper/vg_rhce-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext4 (rw)
/dev/md0 on /raiddisk type ext3 (rw)
/dev/mapper/vg1-lv1 on /newlvm type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/sdb1 on /newdisk type ext4 (rw,acl)
Friday
Oct292010

Linux Find Examples

Show all files older than 30 days:
find . -type f -mtime +30 -exec ls -lh {} \;

Show all files under /app1 over 500MB:
find /app1 -type f -size +500M -exec ls -lh {} \;

Show all tar.gz files
find . -type f -name "*.tar.gz" -exec ls -lh {} \;

Wednesday
Oct202010

HP Array Configuration Utility CLI

HP's Array Configuration Utility includes a CLI that allows you to query the controller for information on the status of drives and array:
[root@saturn ~]# /usr/sbin/hpacucli ctrl all show status
Smart Array P400 in Slot 9
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK
[root@saturn ~]# /usr/sbin/hpacucli ctrl all show config

Smart Array P400 in Slot 9                (sn: XXXXXXXXXXXXXX)

   array A (SAS, Unused Space: 0 MB)

      logicaldrive 1 (68.3 GB, RAID 1, OK)

      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 72 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 72 GB, OK)

   array B (SAS, Unused Space: 0 MB)

      logicaldrive 2 (273.4 GB, RAID 5, OK)

      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 146 GB, OK)
      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 146 GB, OK)
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 146 GB, OK)
Sunday
Oct172010

CPOSC 2010 - Central PA Open Source Conference

I attended CPOSC this past Saturday with my friend Joe.  It's a relatively new and smaller conference, but there were some quality sessions.   I even won an O'Reilly book as one of the door prizes.  Slides from the presentations should be posted here in the coming days.

One of the speakers mentioned Shodan as a useful site for identifying misconfigured servers exposed to the Interwebs.  It's basically a search engine that shows port scan results for HTTP, SSH, Telnet, and SNMP.   Some of the search results may be old since it's not doing real-time scans.  But after doing a couple searches, it's scary how many hits you get from servers that no one ever bothered to change the default password.