 |
 |
webpointmorpheus Linux Info
Performance and Problems
|
|
Link3
Link4
Link5
Notes
Documents in this Series
©2005 - material compiled by Bob Carnaghi, www.webpointmorpheus.com
|
- Redhat Linux Documentation
- talk to me bob
- System Troubleshooting
- This section lists items that are common problem areas of the Linux Operating System. The problem areas are listed, and further defined with common corrective measures. Note the schematic breakdown of the topics of this section, as listed in the list below. For additional assistance, check the man pages, online documentation, FAQ's, etc. that pertain to the specific problem.
-
- System Monitoring
- Proactive Maintenance
- Reactive Maintenance
-
- Hardware Problems
- Software Problems
- Application Problems
- Operating System Problems
- Documentation
- System Monitoring
- This area is a series of vigilant steps that are performed by a system administrator which include examining or reviewing log files and running performance utilities. These practices help establish the presence of potential problems before they occur and create reduced system productivity or failure.
- Definitions
-
- baseline - a series or collection of statistice taken from a system during a no- or light-load situation under normal circumstances that will serve as a reference of comparision for the performance of the system during various production loads and situations.
- bus mastering - the practice of including a mini-processor on peripheral hardware that relieves the CPU of a certain amount of processing load.
- jabbering - a syndrome where an aged and malfunctioning piece of hardware sends large amounts of unnecessary information to the CPU thereby slowing system performance.
- page file - a hard drive or partitioned section of a hard drive that serves as a temporary storage space for data when the system RAM becomes filled or overtaxed.
- paging - the process where the CPU writes data to the page file instead of to the system RAM.
- Best General Practices
- Listed below are several general practices that should be considered when initializing hardware of an overall Linux system. These considerations are not exhaustive, and should be considered in tandem with overall good system configuration practices.
-
- Abundant system RAM - decreases the need for swap file use, improves general system speed and response.
- Replace slower hard drives, use disk striping RAID for faster hard drive access time.
- SCSI is generally faster than IDE./li>
- CD-ROM is slower than hard drive - don't place them both on the same IDE controller
- Keep and maintain the kernel trim and lean - eliminate unnecessary modules or recompile a lighter version.
- Establish baseline system statistics for reference during troubleshooting.
- Watch and become familiar with the contents of the
/proc directory.
- Commands and Utilities
- System performance utilities are listed below. Note that the flags provided to the commands can alter the return, sometimes dramatically.
-
sysstat - System Statistics package that will monitor the system vrom the /proc directory and system devices.
mpstat - Multiple Processor Statistics utility. See below for return codes. No flags will give average values since last boot. Use flags (such as mpstat 1 10) to establish or alter the return values for interval sampling (as given, 1 second interval, 10 times.)
iostat - Input/Output Statistics. Shows information for data written to disk.
vmstat - Reports virtual memory statistics.
sar - System Activity Reporter. An enhanced utility that combines many of the features of the *stat package, and writes them to log files. Check the /var/log/sa, and the /var/log/sa/* directories. sar has a full array of useful flags and options, see the man pages for details.
free - Displays memory and its utilization in KB for both RAM and swap.
top - Displays a full array of process statistics dynamically.
ps - Gives a shapshot of the current processes on the system.
-
*stat headers |
| Header |
Definition |
%user |
Refers to user initiated programs and daemons.* |
%nice |
Refers to processes and daemons initiated with nondefault values.* |
%system |
Refers to the amount of time the system spent maintaining itself such that it could execute user programs and daemons.* |
%iowait |
Refers to the amount of time the CPU stood idle while there existed outstanding I/O requests.** |
%irq |
Refers to the percentage of time the CPU uses to respond to normal interrupts.** |
%soft |
Refers to the percentage of time the CPU uses to respond to interrupts that span multiple CPUs.** |
%idle |
Refers to the amount of time the CPU was not processing requests.*** |
intr/s |
The number of interrupts per second received by the CPU from peripheral devices.**** |
Notes:
*Watch for a high %system value compared to the %user and %nice values. This condition would indicate too many resource-intensive programs.
**Watch the values for %iowait, %irq, and %soft. A rapid increase over time indicates a situation such that the CPU is not keeping pace with the number of requests sent from software.
***%idle should be on average greater than 25%. Lesser values in short bursts are acceptable, but sustained values less than 25% indicate a need for faster or additional CPUs.
****The production value is best compared to a baseline value. When this comparision shows excessive returns, a jabbering hardware condition may exist.
|
- Proactive Maintenance
- Proactive system maintenance is defined as those measures taken to identify, reduce, or eliminate problems before they reduce system performance. System documentation is best kept separate from the files which constitute the system itself in case of filesystem or hard disk failure.
-
- System backups
- Identify and isolate potential problems
- Reactive Maintenance
- examine log files, examine /proc file system, run information utilities (
ps or mount)
- NOTE: the
tail -f 'name-of-log-file' will give an updated version of the log file as items are written. Use this method for immediate troubleshooting of an application or service in progress.
- Tips to identify and correct problems
-
- Prioritize problems
- Find and solve the root cause of the problem - know the system hardware and configuration, what is the empirical cause of a problem?
- Resolving System Problems
- Problems will typically fall into one of two broad categories. The categories will further subdivide, but identifying the first general category will channel corrective measures in the right direction.
-
- Hardware Problems
- Software Problems
-
- Application Software
- Operating System Software
- Hardware Type Problems
- paragraph
-
- damaged, defective, or decayed hardware
- improper hardware or software configuration
- SCSI drives properly terminated
- Video card and monitor settings configured properly
- Hardware Compatibility List (HCL)
- IRQ/IO address conflict
-
- Both devices will not work. Check boot errors with
dmesg or /var/log/boot.log or /var/log/messages
- Look for non PnP devices to claim resources that are assigned to another device. The device may be configured manually, or there may be a chip on the device itself.
- Check the
/proc directory for hardware conflicts
- Device drivers
-
- Hardware and drivers are typically checked at boot time by the
kudzu program
- Drivers are configured manually by loading the appropriate kernel module or recompiling the kernel
- Hard drive failure
-
- Is it hardware or software RAID?
- RAID level 1 or 5 will permit data to be regenerated. Check the configuration of the RAID utility.
- Restoring non critical directories after hard drive failure
-
- Turn the power off to the computer. Remove the AC power cable from the back of the unit. Open the cover. Remove and replace the failed hard drive unit.
- Reboot the system.
- Use
fdisk to create partitions as necessary on the new hard drive.
- Use the command
mkfs to create filesystems on the new partitions.
- Using a back-up utility, restore the original data.
- Enter the appropriate data into the
/etc/fstab system configuration file.
- Restoring critical system directories after hard drive failure
-
- Turn the power off to the computer. Remove the AC power cable from the back of the unit. Open the cover. Remove and replace the failed hard drive unit.
- Reinstall the operating system. Mount file systems from additional hard drives as necessary.
- Using a back-up utility, restore the original data.
- Software Type Problems
- Once a problem has been determined to be in the software category there are two general types that are possible. The question then becomes whether the software problem pertains to application software, or to operating system software.
-
- Application Software Problems
- Often caused by missing program library files, by process limitations, and/or by conflicting applications.
- Software dependency problems - package managers test dependency during install and uninstall procedures. Source code typically checks dependency during pre-compile, and refuses the
makefile command if the dependency requirement is not met. Run the command rpm -V to check and verify package dependencies.
- Use the
ldd command to check a specific program for dependent library files.
- Typical locations for library files are
/lib or /usr/lib. Also refer to /etc/ld.so.conf and /etc/ld.so.cache are updated at install time with the ldconfig utility.
- Typically utilities and software will install with a
install.sh script. If this script is not available, shared libraries will need to be manually copied to the above locations, and the configuration files manually updated.
- Linux processes can be a source of problematic behavior. Child or zombie processes can consume excessive resources to the point of rendering the system sluggish or locked. Look for a process PID to identify that process. Kill the parent process to kill all subsequent child processes.
- Linux processes that need, open, or use many or excessive filehandles can cause problems with system resources and operation. The system limit of 1024 filehandles as well as the number of possible processes can be configured with the
ulimit command. see the man pages for details.
- Does the specific application have a dedicated
log file? Check /var/log to explore this option. Also check for the existance of /var/log/*application* for additional log files.
- Problems can arise as a process attempts to access resources, or from different processes attempting to access the same resources at the same time. This can often be resolved by restarting the process using a
SIGHUP. Attempt to start the process in Single User Mode to observe the process in a simpler form.
- Keep software up to date with patches, updates, upgrades, etc. for best performance. Always verify software compatibility with other software, especially when the processes run at the same time.
- Operating System Software Problems
- Often caused by missing or corrupt boot loaders, file systems, serial devices, etc.
- Boot loaders can be problematic when loading the operting system. LILO bootloader can be simplified in the
linear mode instead of compact. Look in /etc/lilo.conf for configuration.
- Create a boot disk with the
mkbootdisk command to ensure that the system can be booted for repair in an emergency.
- Filesystem corruption is the cause of many problems. If the
/ root filesystem becomes corrupt, the system will become unstable. Use the command fsck to check and repair a filesystem. Do not restore data to a corrupt filesystem. To recreate a filesystem that is damaged or corrupted beyond repair, perform the steps as listed below.
- Repairing a filesystem on a non-critical drive:
- Unmount the filesystem.
- Run the
fsck -f command on the filesystem.
- If the
fsck command fails to repair the system, recreate the filsystem with the mkfs command.
- Restore the data to the filesystem from a back-up utility.
- To repair a filesystem on a system-critical drive:
- Place the installation CD into the CD drive and boot the system. Ensure that the system boots to the CD.
- At the welcome screen, or when prompted, type
linux rescue and press Enter. Unless necessary, do not enable network support.
- If prompted, and if desired, the rescue process will look for and mount existing installations. Choose this step if necessary. If restoring a filesystem is the only intended outcome, skip this process.
- Use the
mkfs command on the CD-ROM Linux system to create the new / root filesystem.
- Use a back-up utility on the CD-ROM (
tar, dump, or cpio) to restore the original data to the recreated / root filesystem.
- Type
exit to end the shell and reboot the system.
- A complete bootable Linux operating system is available on CD by using the
Knoppix or BBC Linux utilities. These are available by download, and provide a complete system with repair utilities for working through system problems that exist on the hard drive. See the distribution section for more info.
- Serial devices are common sources of problems. Serial port configuration for most devices is done automatically, however some devices do not respone to PnP. When conflicts occur, the
setserial command can be used to adjust device parameters. Some options are listed below, or see the man pages for further info.
port -n - Set theI/O address to n for a serial device.
irq -n - Sets the IRQ to 'n' for the serial device.
auto_irq - Attempts to automatically detect the IRQ setting for a serial device.
spd_normal - Assign 38.4KB/s speed to the serial port.
spd_hi - Assign 56Kb/s speed to the serial port.
spd_vhi - Assign 115KB/s speed to the serial port.
- Documentation
- A general rule of over-documentation will serve better in a time of emergency than one of under-documentation.
-
- what are good documentation guidelines?
- what should be documented?
- title
- paragraph
-
- list
|
Other Documents in this Series
Top of Page
- Introduction and History
- Installation, Advanced Installation, and Usage
- The Linux Kernel and the Boot Process
- Filesystems - Management & Administration
- The BASH and Other Shells
- System Initialization and the X Environment
- Linux Processes
- Linux Administration, Peripherals, and Hardware
- Software Installation and Management
- Backups and Log Files
- Performance and Problems
- Network Configuration
- Security
- Key Linux Commands
- Essential Linux Definitions
|
| webpointmorpheus Home
Technical Pages
|
Site Map This page was last modified: Wednesday January 03, 2007 10:53 AM |
|
 |