What version of Solaris am I running?

	me@mysunserver:/$ uname -a
	SunOS mysunserver 5.10 Generic_125100-06 sun4v sparc SUNW,Sun-Fire-T1000 Solaris
	me@mysunserver:/$

	Solaris 10 uses the SunOS 5.10 kernel, Solaris 9 uses the SunOS 5.9 kernel...

	me@mysunserver:/$ cat /etc/release 
	                       Solaris 10 11/06 s10s_u3wos_10 SPARC
	           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
	                        Use is subject to license terms.
	                           Assembled 14 November 2006
	me@mysunserver:/$ 

What Solaris build am I using?

	me@mysunserver:/$ showrev
	Hostname: mysunserver
	Hostid: 848233a4
	Release: 5.10
	Kernel architecture: sun4v
	Application architecture: sparc
	Hardware provider: Sun_Microsystems
	Domain: 
	Kernel version: SunOS 5.10 Generic_125100-06
	me@mysunserver:/$ 

	Kernel architectures:
		sun4c (MicroSPARC)
		sun4d
		sun4m (SuperSparc)
		sun4u (UltraSPARC) (~1995)
		sun4v (USIIIi) (~2002)
	(Solaris 10 only support sun4u and later)

How can I see which Solaris patches I have installed?

	me@mysunserver:/$ showrev -p
	Patch: 118367-04 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
	Patch: 118371-07 Obsoletes: 119265-02 Requires:  Incompatibles:  Packages:
	SUNWcsu, SUNWcsl, SUNWtoo
	Patch: 118373-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
	Patch: 118872-04 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
	.
	.
	.

How do I disable a CPU?


	It is safe to take CPUs (or "cores") out of the multitasking context
	rotation at any time with "psradm", and can sometimes be a useful
	trick to check for SMP-based race conditions or for performance
	analysis and tuning.

	Let's first check on CPU #6 with "psrinfo":

	root@mysunserver:~$ psrinfo -v 6
	Status of virtual processor 6 as of: 07/26/2007 11:45:30
	  on-line since 07/26/2007 11:39:55.
	  The sparcv9 processor operates at 1000 MHz,
	        and has a sparcv9 floating point processor.
	root@mysunserver:~$ 

	Okay, let's go ahead and take it offline (it's safe, really!)

	root@mysunserver:~$ psradm -f 6
	root@mysunserver:~$ 

	Now, if we run "psrinfo" again, it should show as off-line...

	root@mysunserver:~$ psrinfo -v 6
	Status of virtual processor 6 as of: 07/26/2007 11:45:38
	  off-line since 07/26/2007 11:45:34.
	  The sparcv9 processor operates at 1000 MHz,
	        and has a sparcv9 floating point processor.
	root@mysunserver:~$ 

	So let's bring it back up:

	root@mysunserver:~$ psradm -n 6
	root@mysunserver:~$ 

	And now it'll show as on-line!

	root@mysunserver:~$ psrinfo -v 6
	Status of virtual processor 6 as of: 07/26/2007 11:46:03
	  on-line since 07/26/2007 11:45:57.
	  The sparcv9 processor operates at 1000 MHz,
	        and has a sparcv9 floating point processor.
	root@mysunserver:~$ 

How much RAM does my Sun have?

	root@mysunserver:~$ prtdiag | grep Memory
	Memory size: 2040 Megabytes
	root@mysunserver:~$ 

	(This machine has 2 gigs.)

What hardware is in my Sun?

	root@mysunserver:~$ prtdiag
		.
		.
		.

What does load average mean?

The "load average" of the UNIX system is the average number of processes assigned to the specified processor set that are in the system run queue, averaged over various periods of time.

For example, your UNIX system may have 100 processes in the process table, but at a given moment, 98 of them may be "sleeping", such as a web server which just "sleeps" while waiting for a web browser to connect to it.

These processes use no CPU time while they "sleep", so a single-CPU system will spend all of its time flipping ("context switching") between the two "running" processes, giving each a little slice of processing time before switching back to the other. The subset of a UNIX system's processes which are actually "running" make up the "run queue".

If three processes are running, and are all demanding the full CPU's time, each will have to share the CPU's time with the other processes, and thus the average number of processes in the run queue will be about 3.0.

If there is only one process "running", and it spends half of it's time processing ("thinking") and half of its time sleeping, then the system load average would be about 0.5.

The UNIX kernel keeps a running count of the system process "load" and can report it as an average expressed over the last 1 minute, 5 minutes, and 15 minutes, using the "uptime" command":

	root@mysunserver:~$ uptime
	  3:29pm  up 7 days 18:13,  1 user,  load average: 0.01, 0.01, 0.01
	root@mysunserver:~$ 
                                                             ^     ^    ^
                                                             1     5    15
	

Why is my Sun so slow?

Whether you're unsatisfied with your Solaris system performance or just want to get the most out of what the machine is capable of, there is a cyclical process to improve it, which consists of determining where the processing slow-down is occuring (the bottleneck), fixing it, then repeating the process, until the most significant bottlenecks are reduced.

First, we should learn about some tools to help us monitor system performance. SAR (the system activity reporter) is the time-honored (and very cryptic) standard UNIX performance monitoring tool. How do we use SAR to see what the Sun server has been doing?

SAR collects data on the state of the system using a program called "sadc" (the system activity data collector), which is usually launched from two standardized shell scripts (called sa1 and sa2) that can be launched from "cron" as frequently as you want to collect performance data by adding a few lines to your crontab with:

	mysunserver# crontab -e
Now add these lines..
	# Starting at 8am collect system activity records
	# every 20 minutes for 12 hours
	# 20 minutes = 1200 seconds
	# 12 hours with 3 samples each hour = 36 loops
	0 8 * * 1-5 /usr/lib/sa/sa1 1200 36


	# After the 12 hour period,
	# collect a system activity report
	30 20 * * 1-5 /usr/lib/sa/sa2 -A
After a full day of systems monitoring, we can use "sar" to extract the collected data:
	root@mysunserver:~$ sar

	SunOS mysunserver 5.10 Generic_125100-06 sun4v    07/25/2007

	23:00:01    %usr    %sys    %wio   %idle
	23:20:00       0       0       0     100
	23:40:00       0       0       0     100
	00:00:00       0       0       0     100
	00:20:00       0       0       0     100
	00:40:02       0       1       0      99
	01:00:01       0       2       0      98
	01:20:05       0       2       0      98
	   .
	   .
	   .

In the default SAR display, we see, at every sample interval, how the computer was spending it's time.

Each of the four columns shows how the CPU's time was being divided:

NOTE: These statistics are not quite the same as the system "load average", which, precisely defined, is the AVERAGE number of PROCESSES that are in the process table which are WAITING for the CPU to be able to service them. In theoretical terms, the "load average" resembles a percentage of processor utilization that can exceed 100% when the system is overutilized (load avg. > 1.00) or fall below 100% when the CPU is not fully utilized (load avg. < 1.00). Theoretically, a system that sustained a load average utilization of 0.50 would operate with the same apparent performance if the CPU ran at half the clock-speed.

If may be best to think of load average as CPU demand, and to think of the SAR statistics as actual CPU utilization.

If SAR reports the CPU is more than 10% idle, then your system is probably not being held back by a slow processor or slow storage devices. The culprit may just be inefficiently-written software, which fails to optimize for the particular platform which it runs on.

Time spent by the system waiting for I/O (%wio) is one of the most important to attack. This "wasted" time is sometimes called a "wait state". As those I/O processes are necessary, it is only "wasteful" because the CPU just sits around doing nothing until the disk has had a chance to fetch the requested data, or upload the data, or do whatever other I/O activity which was requested by the UNIX process.


What's my Sun server doing at this moment?

Solaris comes with a process monitoring program called "prstat" (which works a lot like "top" on Linux):
	# prstat
"prstat" will list all the UNIX processes (that it can fit on the screen) in order of which are using the most CPU time first.

We can change the sort key "prstat" uses to sort the processes. For example, which processes are using the most memory:

	# prstat -s size

Summary of Solaris tools:

	psrinfo - Describe CPUs
	psradm - activate/de-activate CPUs
	prtdiag - shows amount of RAM on machine, hardware inventory

Solaris monitoring
	mpstat - individual CPU usage
	vmstat
	iostat
	nfsstat - NFS function statistics
	sar

Other:
	dtrace


Solaris debugging:
	apptrace
	truss
	snoop
	prstat - like 'top'
	kstat - status variables for all kernel modules

		kstat -m bge

Free Replacement drivers to use on Solaris:
	ppp

Find an error or omission? Sorry about that! Please e-mail Eric at eric@ericshalov.com and let him know!

All of Eric's Tech Notes are provided on an as-is basis, and may contain errors or omissions. No statement is made as to thier suitability for any particular purpose, and no warranty is given. Use at your own risk! All trademarks are the property of their respective owners.
No duplication of the above information is permitted without prior written permission of the author(s).
©Copyright 2007 Eric Shalov. All Rights Reserved.