Quantcast
Channel: Novell User Communities - SUSE Linux Enterprise
Viewing all 30 articles
Browse latest View live

Defending Against Spam Hosts

$
0
0
license: 
GPL

This script should be placed into a cron job and run once a day. The only modification that the script may need is the path to the iptables utility.

The script will create an iptable called: "Spamhaus" which will house all malicious IP addresses.

ZałącznikWielkość
security.txt957 bytes

Recordings for the Free Novell Training Webinar Series Recording Vols. 1-11

$
0
0

In an effort to increase Novell product training opportunities, Novell, GroupLink, GWAVA, BrainStorm, Omni, Messaging Architects and other Novell Partners are proud to bring you an exciting new series of webinars. These monthly webinars highlight successes using Novell and Novell partner products. Six webinars have been held so far in 2009. The response has been great!

Each webinar has four sessions highlighting different Novell Technologies. This year’s 26 sessions of webinars have included the following topics (click the link to view the recordings):

*************
Sessions which SHOWCASE Novell customer successes (taught by the Novell customer):

“K-12 Novell Technology Success including; collaboration, security, mobility and service desk” Greg Long, Frankfort Community Schools http://www.grouplink.net/redir.asp?id=2009021302

"Migrating from Netware to OES2" Mike Faris, First Data
http://www.grouplink.net/redir.asp?id=2009030604

"Migrating from Netware to OES2, part 2" Mike Faris, First Data
http://www.grouplink.net/redir.asp?id=2009090304

"Tips and best practices for using OES2" Mike Faris, First Data
http://www.grouplink.net/redir.asp?id=2009121104

"Using Teaming + Conferencing in the education process for k12" Tim Leerhoff from TIES (a K12 conglomerate in Minnesota)
http://www.grouplink.net/redir.asp?id=2009041702

"Novell and things that make you go hmm" Tim Leerhoff from TIES (a K12 conglomerate in Minnesota)
http://www.grouplink.net/redir.asp?id=2009011504

"CRM Success Story with Adventist Risk Management, GroupWise integrated CRM, ContactWise" Charles Mendoza, Adventist Risk
http://www.grouplink.net/redir.asp?id=2009041703

"GroupWise 8 - Migration Tips and Tricks” Tim Leerhoff, TIES (Minnesota education technology collaborative)
http://www.grouplink.net/redir.asp?id=2009051503

*************
Sessions TAUGHT by Novell personnel which SHOWCASE Novell products:
"Teaming 2 and Conferencing" Travis Grandpre, Product Marketing Manager, Novell
http://www.grouplink.net/redir.asp?id=2009011505

"Novell Pulse" Karen Victory, Novell
http://www.grouplink.net/redir.asp?id=2009121103

"Latest Release Features - ZCM 10.2" - Pete Green, Novell Technical Specialist - ZENworks
http://www.grouplink.net/redir.asp?id=2009090302

"Advantages of Upgrading/Migrating to GroupWise 8“ - Travis Grandpre/Dean Lythgoe, GroupWise Team http://www.grouplink.net/redir.asp?id=2009072305

"eDirectory Overview" - Kamal Nayaran, Product Specialist
http://www.grouplink.net/redir.asp?id=2009061901

“Using Teaming + Conferencing in K12” – Phil Karren, Collaboration Product Manager http://www.grouplink.net/redir.asp?id=2009051502

“Managing your Assets with Zen 10” Dave Carter Novell Technical Specialist http://www.grouplink.net/redir.asp?id=2009011605

“Managing your Assets with Zen 10 Part 2” Dave Carter Novell Technical Specialist http://www.grouplink.net/redir.asp?id=2009030605

“Teaming + Conferencing” Tracy Smith, Product Manager, Novell Teaming http://www.grouplink.net/redir.asp?id=2009021305

"Overview of ZENworks 10 and the everything HelpDesk ZENWorks 10 integration" Dave Carter, Novell Technical Specialist and Casey Trujillo GroupLink
http://www.grouplink.net/redir.asp?id=2009041701

“OES Overview for K12” Jason Williams, Product Manager OES
http://www.grouplink.net/redir.asp?id=2009051505

"SLED 11 Latest Release Features" Gary Ekker, Desktop OEM Senior Product Manager
http://www.grouplink.net/redir.asp?id=2009100804

*************
Sessions taught by Novell INTEGRATORS (channel partners) which SHOWCASE Novell products:
"Novell Secure Login" Thom Kerby, EOS Systems
http://www.grouplink.net/redir.asp?id=2009100802

“Tips for ZENworks 7 to ZCM Migration” Thom Kerby, EOS Systems
http://www.grouplink.net/redir.asp?id=2009090301

“Migrating from ZENworks 7 to ZENworks 10” – Norm O’Neal, Integrity Solutions http://www.grouplink.net/redir.asp?id=2009072302

“XenServer, Data Center Virtualization and how easy it is to spin up a HelpDesk” Norm O'Neal with Integrity a Novell Channel Partner http://www.grouplink.net/redir.asp?id=2009011606

"Using Novell technologies to streamline your business process needs" Eliot Lanes, CNE, Viable Solutions, Novell Channel Partner
http://www.grouplink.net/redir.asp?id=2009041705

*************
Sessions taught by Novell Training or Technology Partners, which showcase LEVERAGING Novell products:
"The ZENworks 10 integrated HelpDesk" Gus Hytonen, GroupLink http://www.grouplink.net/redir.asp?id=2009011503

"The Novell Integrated HelpDesk, featuring ZENworks 10 and GroupWise 8 Integration”
– Mike Nielson, Product Manager, everything HelpDesk
http://www.grouplink.net/redir.asp?id=2009072304

"GWAVA 4.5" Taylor Cochrane, GWAVA
http://www.grouplink.net/redir.asp?id=2009090303

"GroupWise Mobility Training" Paul DePond, President, Notify
http://www.grouplink.net/redir.asp?id=2009061903

“Password Synchronization and Universal Password” – Mike Weaver, Concensus Consulting http://www.grouplink.net/redir.asp?id=2009051504

“What is New in GroupWise 8” presented by Derek Adams from BrainStorm, Inc. your Novell training partner http://www.grouplink.net/redir.asp?id=2009011603

“What is New in GroupWise 8 part 2” presented by Derek Adams from BrainStorm, Inc. http://www.grouplink.net/redir.asp?id=2009030603

“What is New in GroupWise 8 part 3” presented by Derek Adams from BrainStorm, Inc.
http://www.grouplink.net/redir.asp?id=2009061902

"A Simple Guide to Single Message Recovery" GWAVA Retain, Willem Bachgus
http://www.grouplink.net/redir.asp?id=2009061904

“Making sure your GroupWise Server Never Goes Down” GWAVA Reload http://www.grouplink.net/redir.asp?id=2009011604

""Email Archiving - Reduce the size and cost of your email archiving" GWAVA Retain, Taylor Cochrane, GWAVA http://www.grouplink.net/redir.asp?id=2009030602

“GroupWise Storage Optimization and Compliance” Ranjit Sarai, Product Manager M+Archive, Messaging Architects http://www.grouplink.net/redir.asp?id=2009100805

"Document Management with DocXchanger" Doug Ouztz, Condrey Corporation
http://www.grouplink.net/redir.asp?id=2009100803

These webinars are brought to you by everything HelpDesk, the Novell Integrated HelpDesk solution. To learn more about this product and to receive a free 30 day download trial visit: http://www.grouplink.net/forms/ehdtrial/

Clustering XEN with Heartbeat and Advanced HASI

$
0
0

Contents:

About this guide

This document is focusing on a design where XEN virtual machines (domU) are centrally managed according to a set policy and highly available. What is written here is the real thing not a demo and has been working in production for over 18 months. We use this technology for hosting many UNIX services including DNS, Web proxy, SMTP, NFS, VPN, CUPS, etc. and traditional Netware services with OES2 although in this document I only present solution for hosting home directories for UNIX users by NFS (HASI).

It's inspired by and based on the original Novell demo introduced by Jo De Baer to whom I am thankful for support and effort for this piece of technology.
http://wiki.novell.com/images/3/37/Exploring_HASF.pdf
http://wiki.novell.com/images/c/c8/Tut323_bs2007.pdf

HASI (High Availability Storage Infrastructure) is fully supported by Novell on SLES10:
http://www.novell.com/products/server/ha_storage.html

Please read the guides above to have better understanding about this technology. This guide assumes that you already have hands on SLES10 experience, Heartbeat, EVMS and other components and intended for experienced system administrators.

Overview

We have a 2 node XEN cluster (HP DL360G5) running on a small local SAS storage but also has connectivity to our fiber channel SAN where all our resources (XEN virtual machines) reside. We run SLES10 SP2 on both cluster member nodes and configure Heartbeat (high availability) services upon our member nodes (XEN hosts). We set a policy which treats each virtual machine as an individual primitive resource, monitors each virtual machine alongside of their network connectivity and act according to an event.

Point

This is a 2 node cluster what isn't a big deal to manage but HA (heartbeat) supports up to 16 nodes in one cluster and sharing a storage between these nodes is very dangerous. Due to HA is configured on all cluster member nodes, it tracks each resource (XEN domU) and through monitoring operation, it knows where a certain resource is running, what it's doing, etc.

It protects you from your own mistakes for instance starting up a new instance of an already running virtual machine corrupting its storage instantly.

HA is your control center, you must do everything there including starting, stopping, migrating, etc. your virtual machines otherwise HA gets confused. It's not just the safe way of managing virtual machines it's very good for DR and business continuity.

Should you lose one of your virtual machines, should one of them crash, freeze your host, should your LAN switch go faulty? HA will restart (stop/start), migrate (even live if possible) your XEN domU resource to another healthy host with all its dependent services.

Remember that you have to eliminate the single point of failure and do as much as possible for redundancy for everything else in this scenario. Resources can only be highly available if you have redundancy in your storage, servers, switches, power supply (UPS and emergency generator) HA communication paths, etc. as well.

Storage

Both hosts see both LUNs even if the diagram is a bit confusing

I have LUN2 (currently 100GB) for virtual machines. I split this up with EVMS and through CSM container we can share the same block based storage layer between both XEN hosts (HP DL360G5).

It's simpler somewhat comparing to file image based setup where you'd need an extra layer to mount the images as well as this setup is well tested and works even for live migration seamlessly not to mention the unbeatable I/O performance.

LUN1 (currently 2T) is for user data (home directories) what I do NOT manage on the XEN hosts. I actually forced EVMS to manage (able to see) only the 100GB LUN2 because I want to manage the 2T array within my NFS virtual machine. We just map this LUN1 to our NFS virtual machine.

What makes this advanced compared to the original Novell demo is as follows:

  • Fiber channel SAN for storage (HP EVA 6000)
  • SLES10 SP2 (refreshed HA, EVMS, XEN, OCFS2, etc. components)
  • Block device based XEN virtual machines (unbeatable I/O performance)
  • Live migration ready XEN virtual machines
  • HA includes network connection monitoring
  • HA includes adjusted timing for XEN virtual machines (hypervisor friendly)
  • HA includes HP iLO STONITH resource for increased protection (fencing)

Configuration

I installed a fairly cut down copy of SLES10 SP2 on my XEN hosts, LUNs are presented to both nodes, etc. I configured my first eth0 NIC with a class C private IP address which will be the main connection back to our private LAN. The other NIC is configured with a class A private IP address what I use solely for HA communication. I simply connected these to multiple switches (redundancy) but if you have your hosts close to each other you could use crossover cable as well. I did have several virtual machines already hence it's not covered in this guide by the way there are plenty of notes about that already. I run mainly SLES10 SP2 XEN virtual machines whereas possible although have some Debian Linux 4.0 virtual machines as well.

  1. NTP Setup

    The time on the two physical machines needs to be synchronized. Several components in the HASF stack require this. I have configured both nodes to use our internal ntp servers (3 of them) in addition to the other node which would give us fairly decent redundancy.

    host1:~ # vi /etc/sysconfig/ntp
    NTPD_INITIAL_NTPDATE="ntp2.domain.co.nz ntp3.domain.co.nz ntp1.domain.co.nz"
    NTPD_ADJUST_CMOS_CLOCK="no"
    NTPD_OPTIONS="-u ntp -L -I eth0 -4"
    NTPD_RUN_CHROOTED="yes"
    NTPD_CHROOT_FILES=""
    NTP_PARSE_LINK=""
    NTP_PARSE_DEVICE=""
    NTPD_START="yes"

    Remember that making changes to the /etc/sysconfig directory from the command line you need to run SuSEconfig to apply changes:

    host1:~ # SuSEconfig
    host1:~ # vi /etc/ntp.conf
    server 127.127.1.0
    fudge 127.127.1.0 flag1 0 flag2 0 flag3 0 flag4 0 stratum 5
    server ntp2.domain.co.nz iburst
    server ntp3.domain.co.nz iburst
    server ntp1.domain.co.nz iburst
    server host2.domain.co.nz iburst
    driftfile /var/lib/ntp/drift/ntp.drift
    logfile /var/log/ntp
    
    

    It was set by the YaSTGUI module, includes mainly the defaults. I added the servers with iburst option to reduce the initial sync time and made the local NTP server to be stratum 5.

    Both nodes need to reach each other without DNS, I added the iLO IP addresses here too:

    host1:~ # vi /etc/hosts
    192.168.1.1 host1.domain.co.nz host1
    192.168.1.2 host2.domain.co.nz host2
    10.0.1.1 host1.domain.co.nz host1
    10.0.1.2 host2.domain.co.nz host2
    172.16.1.1 host1-ilo.domain.co.nz host1-ilo
    172.16.1.2 host2-ilo.domain.co.nz host2-ilo
    
    

    Certainly these need to be done on the other node as well the same way respectively.

  2. Multipathing

    I have redundant SAN switches and controllers hence for proper redundancy we need to configure this service and it could confuse EVMS as well. There's a guide from HP but it requires the HP drivers to be installed. I prefer using the SuSE stock kernel drivers because it is maintained by Novell and works pretty much out of the box. Using the HP one may require you to reinstall or update the HP drivers at a time when you receive new kernel update. Tools we need:

    host1:~ # rpm -qa | grep -E 'mapper|multi'
    device-mapper1.02.13-6.14 
    multipath-tools-0.4.7-34.38 
    

    Find out what parameters the stock kernel driver supports.

    host1:~ # modinfo qla2xxx| grep parm

    We need as quick response as possible in an emergency hence we instruct the stock driver to disable the HBA built in failover and propagate this up to the dm I/O layer. The stock driver does support this as shown by the list above... Activate it:

    host1:~ # echo "options qla2xxx qlport_down_retry=1">> /etc/modprobe.conf.local

    Update the ramdisk image then reboot the server:

    host1:~ # mkinitrd && reboot
    
    

    After reboot, ensure that modules for multipathing are loaded:

    host1:~ # lsmod | grep 'dm'
    dm_round_robin 	7424 1
    dm_multipath 	30344 2 dm_round_robin
    dm_mod 		67504 39 dm_multipath
    
    

    Your SAN devices should be visible by now (2 for each LUN), in my case /dev/sda, sdb, sdc, sdd. Note: this will dynamically change when you add additional LUNs to the machine or get any other disk managed by dm.

    Find out your WWID numbers, it's needed for multipath configuration:

    host1:~ # for disk in sda sdb sdc sdd; do scsi_id -g -s /block/$disk; done
    3600508b4001046490000700000360000
    3600508b40010464900007000000c0000
    3600508b4001046490000700000360000
    3600508b40010464900007000000c0000
    
    

    Multipathing is somewhat complex and not easy to understand at the first time but we need to pay attention to it otherwise it will give us hard time. It's well documented in the SLES storage guide:

    http://www.novell.com/documentation/sles10/stor_evms/data/bookinfo.html

    The problem I found that dm tries to take over and manage pretty much every block device (after SP2 update) therefor we will need to blacklist everything including CD/DVD drives, local cciss (HP SmartArray), etc. as well except the SAN LUNs.

    Configure multipath service according to your WWID numbers. Remember we have duplicates because we have double paths for each LUN:

    #
    ## Global settings: should be sufficient, change it on your own risk
    #
    defaults { 
    	multipath_tool 		"/sbin/multipath -v0" 
    	udev_dir 		/dev 
    	polling_interval 	5 
    	default_selector 	"round-robin 0" 
    	default_path_grouping_policy 	multibus 
    	default_getuid_callout 	"/sbin/scsi_id -g -u -s /block/%n" 
    	default_prio_callout 	/bin/true 
    	default_features 	"0" 
    	rr_min_io 		100 
    	failback 		immediate
    }
    #
    ## Local devices MUST NOT be managed by multipathd
    #
    blacklist { 
    	devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" 
    	devnode "^hd[a-z][0-9]*" 
    	devnode "^cciss!c[0-9]d[0-9]*" 
    	devnode "^sd[e-z][0-9]*" 
    	devnode "^xvdp*"
    } 
    
    multipaths { 
    	# 
    	## Mpath1: Storage area for user's home within NFS domU 
    	# 
    	multipath { 
    		wwid 			3600508b40010464900007000000c0000 
    		alias 			mpath1 
    		path_grouping_policy 	multibus 
    		path_checker 		readsector0 
    		path_selector 		"round-robin 0" 
    	} 
    	# 
    	## Mpath2: Storage area for clustered domUs via EVMS 
    	# 
    	multipath { 
    		wwid 			3600508b4001046490000700000360000 
    		alias 			mpath2 
    		path_grouping_policy 	multibus 
    		path_checker 		readsector0 
    		path_selector 		"round-robin 0" 
    	}
    }
    #
    ## Device settings: should be sufficient, change it on your own risk
    #
    devices { 
    	device { 
    		vendor 			"HP" 
    		product 		"HSV200" 
    		path_grouping_policy 	group_by_prio 
    		getuid_callout 		"/sbin/scsi_id -g -u -s /block/%n" 
    		path_checker 		tur 
    		path_selector 		"round-robin 0" 
    		prior_callout 		"/sbin/mpath_prio_alua %n" 
    		failback 		immediate 
    		rr_weight 		uniform 
    		rr_min_io 		100 
    		no_path_retry 		60 
    	}
    }
    

    Enable services upon reboot:

    host1:~ # insserv boot.device-mapper boot.multipath multipathd

    Remember that every time you change the multipath.conf file you must rebuild your initrd and reboot the server:

    host1:~ # mkinitrd -f mpath && reboot

    After reboot check your multipaths:

    host1:~ # multipath -ll
    mpath2 (3600508b4001046490000700000360000) dm-1 HP,HSV200
    [size=100G][features=1 queue_if_no_path][hwhandler=0]
    \_ round-robin 0 [prio=2][active] 
    \_ 0:0:1:1 sdc 8:32 [active][ready] 
    \_ 0:0:0:1 sda 8:0 [active][ready]
    mpath1 (3600508b40010464900007000000c0000) dm-0 HP,HSV200
    [size=2.0T][features=1 queue_if_no_path][hwhandler=0]
    \_ round-robin 0 [prio=2][active] 
    \_ 0:0:1:2 sdd 8:48 [active][ready] 
    \_ 0:0:0:2 sdb 8:16 [active][ready]
    
    

    Check blacklists with verbose output:

    host1:~ # multipath -ll -v3

    For further information on HP technology please refer to the original HP guide:
    http://h20000.www2.hp.com/bc/docs/support/SupportM...
    Do exactly the same on the other node as well. I copied the multipath.conf over to the other host followed by setting up the services as shown above.

    Multipath for root device:

    If you want to boot your XEN host off the SAN you would have an extra LUN for the host's OS. The method doing your host this way is that you install the system onto one of your disks (of your OS LUN). Once you have a running system you can build multipath on top of that, the only difference is that you will not create alias for your host's OS disk LUN!

    Further information:

    http://www.novell.com/documentation/sles10/stor_ev...
    http://www.novell.com/support/php/search.do?cmd=di...

  3. Heartbeat

    Note:HA has been going through a transformation lately in order to support both the OpenAIS and Heartbeat cluster stacks equally. The resource manager (crm) got extracted out of the HA package and became and individual project named Pacemaker.

    What you see in SLES10 at this time of writing (version 2.1.4) is a special Novell port for SLES10 customers only, bundled with new features, bug fixes since the change in the project. Ultimately it will change in SLES11, HA will be replaced with OpenAIS and follow the same packaging and naming convention according to the recent changes in the project.

    More information:

    http://www.novell.com/linux/volumemanagement/strategy.html
    http://www.clusterlabs.org

    Heartbeat (referred as HA) is a very powerful, versatile, open source clustering solution for Linux. SuSE and IBM are big contributors of this project (code) which I am personally thankful for.

    HA will be our central database, control center, it will manage our resources (domU) and their dependencies according to a set policy. Some services can be configured via LSB scripts for certain runlevels but basically HA will take over this for most services necessary for domU management by the way EVMS what we will configure in a minute doesn't maintain cluster memberships. We need HA to actually maintain memberships and activate EVMS volumes upon startup on our nodes.

    Install heartbeat package first (there's plenty of ways doing this):

    host1:~ # yast2 sw_single &

    Change the filter to pattern then select the entire High Availability group since we will need other components of that later on. Ignore the disk section, it's not the picture of the actual server...

    HA at present still supports v1 configuration (haresources file) but we use the new v2 style (crm with XML files) known to be better and more powerful.

    In this section I only show the initial setup of HA, resources and the complete cluster setup will be discussed later on. Since evmsd is started by HA, I have to present this part before I can discuss EVMS volumes and disk configurations.

    Configuration:

    host1:~ # vi /etc/ha.d/ha.cf
    udpport 694 	                # Default
    use_logd on 		        # Need powerful logging
    keepalive 2 	                # Interval of the members checking each other
    initdead 50 		                # Patience for services starting up after boot
    deadtime 50 		        # Time to clarify a member host dead
    deadping 50 		        # Time to clarify ping host dead
    autojoin none 		        # We add members manually only
    crm true 		                # We want v2 configuration
    coredumps true 		        # Useful
    auto_failback off                # Prefer resources staying wherever they are
    ucast eth0 192.168.1.2	# Other member host to talk to (NIC1)
    ucast eth1 10.0.1.2	        # Other member host to talk to (NIC2)
    node host1 host2	        # Cluster member host names (hostnames - not FQDN)
    ping 192.168.1.254 	        # Ping node to talk to
    apiauth evms uid=hacluster,root
    respawn root /sbin/evmsd

    The bold lines tell heartbeat to bring up evmsd at startup. evmsd is a remote extension of the EVMS engine without the brain. I configured unicast simply because I prefer it over broadcast and multicast. At last but not least I configured one ping node because I only care about one connection back to my private LAN, the other NIC is reserved for HA communication only.

    Configure authentication for the member nodes' communication:

    host1:~ # sha1sum
    yoursecretpassword
    7769bf61f294d7bb91dd3583198d2e16acd8cd76 -
    host1:~ # vi /etc/ha.d/authkeys
    auth 1
    1 sha1 7769bf61f294d7bb91dd3583198d2e16acd8cd76
    
    

    Configure logging:

    host1:~ # vi /etc/ha.d/ha_logd.cf
    logfacilitylocal7
    debugfile /var/log/ha-debug 
    logfile/var/log/ha-log 
    
    host1:~ # ln -s /etc/ha.d/ha_logd.cf /etc/logd.cf
    
    

    Configure log rotation for your HA logs:

    host1:~ # vi /etc/logrotate.d/heartbeat 
    /var/log/ha-debug { 
    	weekly 
    	missingok 
    	compress 
    	rotate 4 
    	copytruncate 
    } 
    /var/log/ha-log { 
    	weekly 
    	missingok 
    	compress 
    	rotate 4 
    	copytruncate 
    } 
    
    

    The reason why we need internal logging from HA is that for busy clusters there's a good chance to lose logs.

    Logs are vital parts of a well functioning system so for that reason I have a central logserver where I am going to send all cluster member node logs alongside of the local files.

    You don't need this part if you are not planning to use central logging.
    host1:~ # vi /etc/syslog-ng/syslog-ng.conf.in
    options { long_hostnames(off); sync(0); perm(0640); stats(3600); 
    check_hostname(no); dns_cache(yes); dns_cache_size(100); 
    log_fifo_size(4096); keep_hostname(yes); chain_hostnames(no); };
    -snip-
    ## ------------------------------------- ##
    ## HA logs sent to remote server as well ##
    ## ------------------------------------- ##
    filter f_ha { facility(local7) and not level(debug); };
    destination ha_tcp { tcp("yourserver" port(514)); };
    log { source(src); filter(f_ha); destination(ha_tcp); };
    
    

    I modified the global options as shown above for performance reasons and added the second part to the bottom of the configuration file. I used my server's DNS name, excluded debug info for my own taste, you may consider those too. At completion rebuild syslog-ng's configuration file:

    host1:~ # SuSEconfig --module syslog-ng

    Do not forget to prepare your remote syslog server for these log entries! It's a bit outdated and doesn't include performance settings but related reading:

    http://wiki.linux-ha.org/SyslogNgConfiguration

    Start and enable HA at boot:

    host1:~ # rcheartbeat start && insserv heartbeat

    Don't forget the other node (everything is the same except the IP addresses):

    host2:~ # vi /etc/ha.d/ha.cf
    -snip-
    ucast eth0 192.168.1.1# Other member host to talk to (NIC1)
    ucast eth1 10.0.1.1# Other member host to talk to (NIC2)
    -snip-
    
    

    Configure logging, authentication, everything just like for the first host then:

    host2:~ # rcheartbeat start && insserv heartbeat

    Ensure they see each other before proceeding, they may need few minutes to get in sync:

    host1:~ # crmadmin -N
    normal node: host1 (91275fec-a9f4-442d-9875-27e9b7233f33)
    normal node: host2 (d94a39a4-dcb5-4305-99f0-a52c8236380a)
    
    host2:~ # crmadmin -N
    normal node: host1 (91275fec-a9f4-442d-9875-27e9b7233f33)
    normal node: host2 (d94a39a4-dcb5-4305-99f0-a52c8236380a)
    
    
  4. EVMS

    EVMS is a great, open source, enterprise class volume manager with yet again significant support from IBM and SuSE. It includes a feature called CSM (Cluster Segment Manager) what we use to manage shared LUNs, distribute the block devices (partitions) and the complete storage arrangement between the dom0 nodes identically. On top of CSM we use LVM2 volume management to create, resize, extend logical volumes.

    Since you could have different LVM2 or EVMS arrangement within your domU I include only the device in the evms.conf file what I want to manage on the XEN host. This is the 100GB LUN for the virtual machines only. I want to hide the rest from the host system, I don't want EVMS to discover or interfere with other disks I am not planning to use or manage from the XEN host.

    The multipath -ll command (earlier) tells you the device-mapper (referred as dm) managed number you need for this.

    Note: this numbering changes as you add or remove disks, LUNs, etc. When you do so it's advised to reboot the host to see what the new dm layer is like then update the EVMS configuration accordingly.
    host1:~ # vi /etc/evms.conf
    engine { 
    	mode 	= readwrite 
    	debug_level 	= default 
    	log_file 	= /var/log/evms-engine.log 
    	metadata_backup_dir 	= /var/evms/metadata_backups 
      auto_metadata_backup	= yes 
      remote_request_timeout 	= 12 
    } 
    -snip-
    sysfs_devices { 
    	include = [ dm-1 ] 
    	exclude = [ iseries!vcd* ]
    }
    -snip- 
    
    

    It's a very good idea to set up automatic metadata backup, just remove the comment from the corresponding line as shown above.

    Remember that when you save the configuration you update the metadata, utilities read the metadata from the disk. The evms.conf is really just for the global behavior of the engine.

    LVM2 versus EVMS:
    Remember that I have another 2T LUN what I want to manage by LVM2 solely within the NFS domU hence I also disabled LVM2 on the XEN host to avoid interfering with EVMS managed disks and with the domU's LVM2 configuration later on.

    The reason being is that when you map a block device, file image, CDROM, etc. to a XEN virtual machine, it appears on the XEN host just like it does within the domU. You can see pretty much the same thing from both (you can mount it, partition, etc.) hence I always make sure that I only manage disks on the XEN host what I need to manage on the host. In a complex environment this could create confusion...

    Change the filter to exclude everything from LVM2 discovery:

    host1:~ # vi /etc/lvm/lvm.conf
    devices {
    	dir = "/dev"
    	scan = [ "/dev" ]
    	filter = [ "r|.*|" ]
    	cache = "/etc/lvm/.cache"
    	write_cache_state = 1
    	sysfs_scan = 1
    	md_component_detection = 1
    }
    -snip-
    
    

    File images versus block devices:

    So why this big fuss is, what is wrong with the original HASI design, why we don't use file image based virtual machines?

    Well, it's a long story, started more than 2 years ago when I first started playing with this technology. At that time the loop mounted file images were slow, we just simply couldn't afford it. As of today we have blktap driver shipped with SLES10 SP2 providing nearly native disk I/O performance on file images out of the box.

    http://wiki.xensource.com/xenwiki/blktap

    You could say then, that this EVMS complexity is unnecessary and makes the domU importable what I would disagree with. I think having file images is one extra layer of complexity in our storage and involves OCFS2. I had issues with it often especially when new versions came along hence I only use it for my configstore.

    The other reason is that block devices are just like file images, just virtual or special files. I can create a copy of them any time with dd, redirect the output to a file creating an identical copy of my block device, (partition) it's just a special type of file and at last but not least it has been working for nearly 2 years in production without one glitch.

    Now, we can create the volumes. Note: I am going to present my configuration here just for reference. If you need a step by step guide please read this document:

    http://wiki.novell.com/images/0/01/CHASF_preview_Nov172006.pdf

    I strongly recommend to visit the project's home as well:

    http://evms.sourceforge.net
    http://evms.sourceforge.net/clustering

    Disks:

    Segments:

    CSM container and LVM2 on top:

    Regions:

    Volumes:

    After you have created (with evmsn or evmsgui utilities) your EVMS volumes, save the configuration. To activate changes (create the devices on the file system) on all XEN hosts immediately, we need to run evms_activate on every other node simply because the default behavior of EVMS is to apply changes upon the local node only.

    I have 2 nodes at this stage and I want to activate only the other node:

    host1:~ # evms_activate -n host2

    This is where evmsd process we started with HA becomes important. It's our engine handler on the remote node, without we wouldn't be able to create the devices on remote hosts' file system.

    What if I had 16 nodes? It would be a bit overwhelming so a quick solution to do this on all nodes: (there are many other ways of doing this)

    host1:~ # for n in `grep node /etc/ha.d/ha.cf | cut -d '' -f2-`; do evms_activate -n $n; done
    
    

    If you are unfamiliar with the power of UNIX shell, yes it's a funny looking one line command above.

    Reference: http://wiki.xensource.com/xenwiki/EVMS-HAwSAN-SLES10

    The future of EVMS in SLES:

    It will disappear in SLES11. Novell will support it until the lifetime of SLES10, probably same for OES2 after all NSS volumes can only be done at this stage with EVMS. I did ask Novell about the transition from EVMS to cLVM, (when you upgrade from SLES10 to SLES11) and as usual there will be tools, procedures and support for this.

    It's your call to decide whether you want to use a discontinued or unsupported technology (in future releases) today or not, but as mentioned on the page below customers shouldn't be put off by this decision and still encouraged to use it.

    More information:
    http://www.novell.com/linux/volumemanagement/strategy.html

  5. OCFS2 cluster file system for XEN domU configurations

    We need a fairly small volume for XEN virtual machine configurations. The best for this is OCFS2, Oracle's cluster file system. We will mount this under the default /etc/xen/vm directory on all member nodes. Both XEN nodes will see the same files, both will be able to make changes or create new ones on the same file system.

    I will not provide step-by-step solution for this, it's been discussed many times, there's a lot about it already.

    http://wiki.novell.com/images/3/37/Exploring_HASF.pdf (page 72.)
    http://www.novell.com/coolsolutions/feature/18287.html (section 6.)
    http://wiki.novell.com/index.php/SUSE_Linux_Enterprise_Server#High_Availability_Storage_Infrastructure

    Outline:

    • create a small EVMS volume (32MB is plenty)
      (actual device on my cluster is /dev/evms/san2/cfgpool)
    • create cluster configuration (vi /etc/ocfs2/cluster.conf or ocfs2console GUI)
    • ensure the configuration is identical on all other XEN hosts at the same location
      (GUI Propagate Configuration option copies the created config to all nodes via ssh)
    • enable user space OCFS2 cluster membership settings on all nodes
    • create OCFS2 file system

    That's all you need to do for now. The last step will be to integrate OCFS2 in the Heartbeat v2 cluster (mount the device under /etc/xen/vm) what I will discuss in the next chapter.

    Configuration for reference:

    host1:~ # vi /etc/ocfs2/cluster.conf
    node: 
    	ip_port = 7777 
    	ip_address = 192.168.1.1 
    	number = 0 
    	name = host1 
    	cluster = ocfs2
    node: 
    	ip_port = 7777 
    	ip_address = 192.168.1.2 
    	number = 1 
    	name = host2 
    	cluster = ocfs2
    cluster: 
    	node_count = 2 
    	name = ocfs2

    The configuration is identical with the other XEN host.

    In SLES10 SP1, there was a timing issue with Heartbeat managed OCFS2 on XEN host. In SLES10 SP2, the defaults changed to fix this, but it may be worthwhile to mention the solution in this guide. (Probably you don't need to do this)

    Sometimes the networking on a XEN host takes more time to come up (remember xend modifies the network configuration, creates virtual bridges, etc.), maybe your switches are busy or STP is enabled, perhaps something else is causing slight delay. Nevertheless OCFS2 is very sensitive for this. We need to ensure that OCFS2 has enough time for the members to handshake if there's a delay on the network:

    host1:~ # vi /etc/sysconfig/o2cb
    -snip-
    O2CB_HEARTBEAT_THRESHOLD=50
    O2CB_HEARTBEAT_MODE="user"
    O2CB_IDLE_TIMEOUT_MS=30000
    O2CB_RECONNECT_DELAY_MS=2000
    O2CB_KEEPALIVE_DELAY_MS=5000

    Ensure that OCFS2 is restarted if settings changed and services are turned on:

    host1:~ # SuSEconfig
    host1:~ # rco2cb restart
    host1:~ # insserv o2cb
    host1:~ # insserv ocfs2
    host1:~ # chkconfig -l o2cb
    o2cb 		0:off 1:off 2:on 3:on 4:off 5:on 6:off
    host1:~ # chkconfig -l ocfs2
    ocfs2 		0:off 1:off 2:on 3:on 4:off 5:on 6:off

    This needs to be applied upon all XEN hosts, related Novell TID:
    http://www.novell.com/support/php/search.do?cmd=displayKC&docType=kc&externalId=7001469&sliceId=2&docTypeID=DT_TID_1_1&dialogID=13902114&stateId=0%200%2013900539

  6. Heartbeat Cluster Configuration

    HA is blank, empty at this stage. We configured it to bring evmsd process up when it starts, which other cluster members it will need to talk to, they should already be in sync. This chapter explains the cluster's operation, the policy, what resources we want to manage by HA and how they should react to certain events. I shall try to explain everything as clear as possible, but there will be details not being covered in this guide.

    One of the biggest issues with HA is the documentation. There's some but it's usually outdated and hard to find. I understand that the project is trying hard to make it better but it's still far away from being good, the product is changing rapidly and complex. The best documentation I found was the Novell one, I encourage everybody to read it, gaining decent understanding about the product:
    http://www.novell.com/it-it/documentation/sles10/heartbeat/data/b3ih73g.html

    The best possible source of information is still the mailing list though, you will want to join or at least browse the archives if you are serious about HA: http://wiki.linux-ha.org/ContactUs

    As mentioned previously, we will use the new v2 (crm) type configuration with XML files. It's not as nice when it comes to reading as a text file, but easy to get used to and any decent text editor nowadays can recognize XML, help you in the syntax with highlighting, etc.

    Outline:

    • create and save XML entry for each resource or policy
    • load them into the cluster one by one
    • monitor the cluster for reaction
    • backup the final (complete) configuration

    Note:HA does have a GUI (hb_gui) interface but as of today, it's really just for basic operations, it's still not useful for complex configurations. I only use it for monitoring perhaps start/stop a resource or put a node in to standby. Therefore the configuration method presented in this guide will be mainly CLI (command line) based.

    The cluster configuration is replicated amongst all member nodes therefore you don't need to do this from other nodes, it has to be done once and from any node although my preference is always the DC (designated controller) node. You can find this information from monitoring commands (hb_gui, crm_mon) or alternatively:

    host2:~ # crmadmin -D
    Designated Controller is: host1 
    
    

    Global settings:

    HA has a very good default configuration/behavior therefor we have a very little to change here:

    host1:~ # vi cibbootstrap.xml
    <cluster_property_set id="cibbootstrap"> 
      <attributes> 
        <nvpair id="cibbootstrap-01" name="cluster-delay" value="60"/> 
        <nvpair id="cibbootstrap-02" name="default-resource-stickiness" value="INFINITY"/> 
        <nvpair id="cibbootstrap-03" name="default-resource-failure-stickiness" value="-500"/> 
        <nvpair id="cibbootstrap-04" name="stonith-enabled" value="true"/> 
        <nvpair id="cibbootstrap-05" name="stonith-action" value="reboot"/> 
        <nvpair id="cibbootstrap-06" name="symmetric-cluster" value="true"/> 
        <nvpair id="cibbootstrap-07" name="no-quorum-policy" value="stop"/> 
        <nvpair id="cibbootstrap-08" name="stop-orphan-resources" value="true"/> 
        <nvpair id="cibbootstrap-09" name="stop-orphan-actions" value="true"/> 
        <nvpair id="cibbootstrap-10" name="is-managed-default" value="true"/> 
      </attributes>
    </cluster_property_set>
    
    

    Usually id="something" is a given name by you, it can be anything although you should keep it meaningful and easy to read (keep the indents) without symbols, etc.

    What is worth mentioning here is that we enable STONITH with a default action of reboot. It's our power switch, which will ensure that in case of node failure such as reboot, freeze, network issue or any occasion when heartbeat stops receiving signals from the other node the misbehaving node is rebooted. It is extremely important then that you have multiple communication paths (multiple NICs for example) between your nodes to avoid serious problems.

    For more information: http://wiki.linux-ha.org/SplitBrain

    The resource-stickiness setting will ensure that if a failing node comes back up online after reboot ,the resource (virtual machine) which was moved will NOT move back to its original location. (where it was started) It's a safety feature and save you having bouncing resources between failing nodes.

    Generally a resource stays where it is started unless its originating node was rebooted, either was put into standby or the resource was forced to move by the administrator. HA will always try to keep the balance and the harmony in your cluster. It means that for example you have 10 domUs to load into the cluster or just start up in a brand new one, HA will balance this amongst all nodes (2 node cluster = 5 each) unless you configure the policy with preferred locations for certain domUs. This sort of thing is out of the scope of this guide because it's not very useful for XEN clustering what this guide is supposed to be about or at least it wasn't for me.

    For more information about preferred locations, please visit this page: http://wiki.linux-ha.org/ciblint/crm_config

    Load the created XML file into the cluster:

    host1:~ # cibadmin -C -o crm_config -x cibbootstrap.xml 

    STONITH resources:

    So far we just enabled this globally in the cluster, we don't have power switches yet. These are like daemons running on each member node and execute the reboot command but it depends on the STONITH resource type and the set global action. Remember that we talk about rebooting XEN cluster member nodes not resources (virtual machines).

    HA ships with a test STONITH agent which executes reboot via ssh. It's not for production use but I configured it because I rather have more than one. It could do the job as long as the failing node is responding (not frozen).

    Failing as a term can be anything in HA. You may have perfectly working XEN cluster member node but for example if HA cannot start a new domU (resource) up on a node, (because you made typo in the XEN configuration file) it would be severe from HA's point of view because its job is to keep all your resources up and running in default. As a result it would migrate (or stop and start) all existing resources from the node where the startup failed to another node and would reboot the misbehaving node. Of course it would try starting that resource on the other node and it would fail too. Once the rebooted node came back online again HA would migrate all resources away from the other node and would try again the new resource with the typo within, of course it would fail again too so from this point it will wait for admin interaction, keep running what is healthy and mark the misbehaving resource failed. It can take significant amount of time if you had more nodes as it would try the same thing on all nodes, one by one. It may not be an issue in a test environment but can be severe in production.

    Outline:

    • always triple check all configuration changes, entires, etc. before activation
    • always check dependent configuration files (XEN domU), etc. you refer to
    • perhaps set the resource to not managed so HA will not bother if it fails to start
    • put the resource back to managed mode if everything is working as expected

    ssh test STONITH agent:

    We need passwordless login (by ssh keys) between all nodes and for the root user. Doing this is out of the scope of this document, (original HASI guide discusses this) ensure that before you configure the agent you can log into all nodes as root without password from all nodes.

    host1:~ # vi stonithcloneset.xml
    <clone id="STONITH" globally_unique="false"> 
    	<instance_attributes id="STONITH-ia"> 
    		<attributes> 
    			<nvpair id="STONITH-ia-01" name="clone_max" value="2"/> 
    			<nvpair id="STONITH-ia-02" name="clone_node_max" value="1"/> 
    		</attributes> 
    	</instance_attributes> 
    	<primitive id="STONITH-child" class="stonith" type="external/ssh" provider="heartbeat"> 
    		<operations> 
    			<op id="STONITH-child-op-01" name="monitor" interval="5s" timeout="20s" prereq="nothing"/> 
    			<op id="STONITH-child-op-02" name="start" timeout="20s" prereq="nothing"/> 
    		</operations> 
    		<instance_attributes id="STONITH-child-ia"> 
    			<attributes> 
    				<nvpair id="STONITH-child-ia-01" name="hostlist" value="host1,host2"/> 
    			</attributes> 
    		</instance_attributes> 
    	</primitive>
    </clone> 
    
    

    It's a clone resource meaning that we will have a running copy on each member node, their characteristics will be identical. As shown, I configured the max number (number of nodes I have) and the max copies. (usually one on each node) We set monitoring as well, should something happen to my ssh daemon and this resource cannot log into one of my nodes? I shall be notified about it.

    Load the resource into the cluster:

    host1:~ # cibadmin -C -o resources -x stonithcloneset.xml

    More information: http://wiki.linux-ha.org/v2/Concepts/Clones

    The last thing is to enable at daemon. The way the ssh STONITH agent works is that in case of a node failure, an intact node logs into the failing one (assuming it's possible) and schedules a reboot via at daemon:

    host1:~ # insserv atd && rcatd start

    riloe STONITH agent

    The next thing is to configure a lot more production ready STONITH agent for my servers. The best is to use something, which independent from the operating system like iLO for HP. At this time of writing HA comes with agents for most hardware vendors like HP, IBM, etc.

    So which STONITH agent will actually act and execute the reboot when the disaster strike if I have more than one STONITH agent? The answer is any, HA will pick one randomly.

    In design the common sense is to configure STONITH agents as clone resources. For clusters with many nodes it's actually causing minor issues because:

    iLO resource is configured on all nodes. (even the one where the iLO is installed physically) It makes sense to log into the failing node from another node and execute the reboot right? (suicide is not actually allowed by default) So when the monitoring operation is due (every 30sec or whatever you set) these agents from all nodes will try logging into the iLO device.

    Occasionally a race condition evolve when 2 nodes trying to log into the same iLO device what they can't causing weird behavior, error, etc.

    According to a recent discussion on linux-ha mailing list, it should be fine by now and fixed regardless what method you use but I just couldn't see any point having a copy on a node where the iLO device is installed physically even if suicide is not allowed (safe).

    I think it's nonsense to run iLO STONITH agent on all nodes regardless how many nodes you have because we need only one on a healthy node, the DC (always the stonithd on the DC receives the fencing request) will instruct that node (where the agent is running) to execute reboot on the corresponding iLO device. (installed on the failing node) hence I took a different approach due to the nature of the iLO device and:

    • created one primitive iLO STONITH resource
    • configured the cluster to run this anywhere but on the node where it is installed

    On a 2 node cluster obviously it will be the other node but on a many node cluster it could run anywhere depending on node availability, cluster load, etc. This solution is working seamlessly for me for quite sometime.

    Create the policy first for iLO on host1:

    host1:~ # vi stonithhost1_constraint.xml
    <rsc_location id="STONITH-iLO-host1:anywhere" rsc="STONITH-iLO-host1"> 
    	<rule id="STONITH-iLO-host1:anywhere-r1" score="-INFINITY"> 
    		<expression id="STONITH-iLO-host1:anywhere-r1-e1" attribute="#uname" operation="eq" value="host1"/> 
    	</rule>
    </rsc_location>
    
    

    As usual we create unique id for a rule then tell the cluster that STONITH-iLO-host1 resource (doesn't exist yet) has a score -INFINITY. In HA a preference is expressed by always scores and it's the smallest one meaning that it's the least preferred, it cannot something... For a rule then we create an expression (with unique id also) after the rule line telling the cluster where the rule applies to, on the node where the iLO interface is installed.

    Create the policy now for iLO on host2:

    host1:~ # vi stonithhost2_constraint.xml
    <rsc_location id="STONITH-iLO-host2:anywhere" rsc="STONITH-iLO-host2"> 
    	<rule id="STONITH-iLO-host2:anywhere-r1" score="-INFINITY"> 
    		<expression id="STONITH-iLO-host2:anywhere-r1-e1" attribute="#uname" operation="eq" value="host2"/> 
    	</rule>
    </rsc_location>
    
    

    Load both policies into the cluster:

    host1:~ # cibadmin -C -o constraints -x stonithhost1_constraint.xml
    host1:~ # cibadmin -C -o constraints -x stonithhost2_constraint.xml

    At this stage we don't have the resources in the cluster to whom the policies referring to, warning messages will appear in the logs, ignore them, that's just normal.

    Create the primitive resource for iLO on host1:

    host1:~ # vi stonithhost1.xml
    <primitive id="STONITH-iLO-host1" class="stonith" type="external/riloe" provider="heartbeat"> 
    	<operations> 
    		<op id="STONITH-iLO-host1-op-01" name="monitor" interval="30s" timeout="20s" prereq="nothing"/> 
    		<op id="STONITH-iLO-host1-op-02" name="start" timeout="60s" prereq="nothing"/> 
    	</operations> 
    	<instance_attributes id="STONITH-iLO-host1-ia"> 
    		<attributes> 
    			<nvpair id="STONITH-iLO-host1-ia-01" name="hostlist" value="host1"/> 
    			<nvpair id="STONITH-iLO-host1-ia-02" name="ilo_hostname" value="host1-ilo"/> 
    			<nvpair id="STONITH-iLO-host1-ia-03" name="ilo_user" value="Administrator"/> 
    			<nvpair id="STONITH-iLO-host1-ia-04" name="ilo_password" value="CLEARTEXTPASSWORD"/> 
    			<nvpair id="STONITH-iLO-host1-ia-05" name="ilo_can_reset" value="true"/> 
    			<nvpair id="STONITH-iLO-host1-ia-06" name="ilo_protocol" value="2.0"/> 
    			<nvpair id="STONITH-iLO-host1-ia-07" name="ilo_powerdown_method" value="power"/> 
    		</attributes> 
    	</instance_attributes>
    </primitive> 
    
    

    Along the normal operations we configure instance_attributes as well which describe the details of our iLO device. It's for iLOv2, if you happened to need this for older iLOv1, the difference would be:

    -snip-
    <nvpair id="STONITH-iLO-host1-ia-05" name="ilo_can_reset" value="false"/> 
    <nvpair id="STONITH-iLO-host1-ia-06" name="ilo_protocol" value="1.2/>
    -snip-
    
    

    iLOv1 cannot cold reset a node, since we used ilo_powerdown_method "value="power"
    iLOv1 would stop then start the server (in order) which is pretty much the same thing.

    Create the primitive resource for iLO on host2:

    host1:~ # vi stonithhost2.xml
    <primitive id="STONITH-iLO-host2" class="stonith" type="external/riloe" provider="heartbeat"> 
    	<operations> 
    		<op id="STONITH-iLO-host2-op-01" name="monitor" interval="30s" timeout="20s" prereq="nothing"/> 
    		<op id="STONITH-iLO-host2-op-02" name="start" timeout="60s" prereq="nothing"/> 
    	</operations> 
    	<instance_attributes id="STONITH-iLO-host2-ia"> 
    		<attributes> 
    			<nvpair id="STONITH-iLO-host2-ia-01" name="hostlist" value="host2"/> 
    			<nvpair id="STONITH-iLO-host2-ia-02" name="ilo_hostname" value="host2-ilo"/> 
    			<nvpair id="STONITH-iLO-host2-ia-03" name="ilo_user" value="Administrator"/> 
    			<nvpair id="STONITH-iLO-host2-ia-04" name="ilo_password" value="CLEARTEXTPASSWORD"/> 
    			<nvpair id="STONITH-iLO-host2-ia-05" name="ilo_can_reset" value="true"/> 
    			<nvpair id="STONITH-iLO-host2-ia-06" name="ilo_protocol" value="2.0"/> 
    			<nvpair id="STONITH-iLO-host2-ia-07" name="ilo_powerdown_method" value="power"/> 
    		</attributes> 
    	</instance_attributes>
    </primitive>
    
    

    Load both resources into the cluster:

    host1:~ # cibadmin -C -o resources -x stonithhost1.xml
    host1:~ # cibadmin -C -o resources -x stonithhost2.xml

    This can take a few moments to come up green, be patient for a while, monitor the cluster.

    Hint:

    If it didn't come up for some reason or stopped itself after a while (rarely happens) select “stop”, wait few seconds then “clean the resource on all nodes” with the GUI. Few seconds later select “default” option with the GUI which should start it up fine after all. I have slow hubs connecting my iLO network, perhaps that's causing this minor issue occasionally.

    Reference:http://wiki.linux-ha.org/CIB/Idioms/RiloeStonith

    Ping daemon resource:

    At this point we are just configuring our cluster for general behavior, setting up rescue and safety tools, resources. The last one on the list is the ping daemon. We already specified the gateway IP address of my private LAN (in ha.cf) which uses eth0 interface. The eth1 interface is purely for my HA communication in my setup so if I can't ping my gateway IP via eth0 then it means something is wrong. Since all my domU resources will share the eth0 interface, it's crucial to monitor the eth0 interface and make sure that it's working otherwise all my resources (virtual machines) could become unreachable.

    Outline:

    • create a clone resource for pingd
    • configure the cluster to score ping (network) connectivity
    • configure the cluster to run resources only where ping connectivity is defined

    Each resource (domU) in the cluster will have to be individually configured for pingd connectivity hence I will leave this for later, right now I just configure the clone resource:

    host1:~ # vi pingdcloneset.xml
    <clone id="pingd" globally_unique="false"> 
    	<instance_attributes id="pingd-ia"> 
    		<attributes> 
    			<nvpair id="pingd-ia-01" name="clone_max" value="2"/> 
    			<nvpair id="pingd-ia-02" name="clone_node_max" value="1"/> 
    		</attributes> 
    	</instance_attributes> 
    	<primitive id="pingd-child" provider="heartbeat" class="ocf" type="pingd"> 
    		<operations> 
    			<op id="pingd-child-op-01" name="monitor" interval="20s" timeout="40s" prereq="nothing"/> 
    			<op id="pingd-child-op-02" name="start" prereq="nothing"/> 
    		</operations> 
    		<instance_attributes id="pingd-child-ia"> 
    			<attributes> 
    				<nvpair id="pingd-child-ia-01" name="dampen" value="5s"/> 
    				<nvpair id="pingd-child-ia-02" name="multiplier" value="100"/> 
    				<nvpair id="pingd-child-ia-03" name="user" value="root"/> 
    				<nvpair id="pingd-child-ia-04" name="pidfile" value="/var/run/pingd.pid"/> 
    			</attributes> 
    		</instance_attributes> 
    	</primitive>
    </clone>
    
    

    Load it in:

    host1:~ # cibadmin -C -o resources -x pingdcloneset.xml

    More information: http://wiki.linux-ha.org/v2/faq/pingd

    Now we should see something like this:

    host1:~ # crm_mon -1
    ============
    Last updated: Thu Jan 8 14:59:21 2009
    Current DC: host1 (91275fec-a9f4-442d-9875-27e9b7233f33)
    2 Nodes configured.
    4 Resources configured.
    ============
    Node: host1 (91275fec-a9f4-442d-9875-27e9b7233f33): online
    Node: host2 (d94a39a4-dcb5-4305-99f0-a52c8236380a): online
    Clone Set: STONITH 
    		STONITH-child:0 (stonith:external/ssh): Started host1 
    		STONITH-child:1 (stonith:external/ssh): Started host2
    STONITH-iLO-host1 (stonith:external/riloe): Started host2
    STONITH-iLO-host2 (stonith:external/riloe): Started host1
    Clone Set: pingd 
    		pingd-child:0 (ocf::heartbeat:pingd): Started host1 
    		pingd-child:1 (ocf::heartbeat:pingd): Started host2
    
    

    crm_mon is a great utility, it displays information about cluster in various ways even provide output for nagios monitoring system. See the man page for further information.

    EVMS resource:

    On SLES10 the /dev directory is actually tmpfs meaning that it's an interim file system created by udev every time the system starts. It also means that next time we boot the servers our evms devices will not be available under /dev/evms/...

    Remember that when we saved the evms disk configuration we had to run evms_activate on the other node to make the devices available. (created) This is exactly what we need to do and luckily HA ships with an OCF resource agent to do exactly this for us. The other good thing doing evms this way is that we can make all the other resources dependent on this.

    The benefit of this is that HA will ensure that evms_activate was run, devices are in place before starting dependent resources (domU).

    Create EVMS clone resource:

    host1:~ # vi evmscloneset.xml
    <clone id="evms" notify="true" globally_unique="false"> 
    		<instance_attributes id="evms-ia"> 
    			<attributes> 
    				<nvpair id="evms-ia-01" name="clone_max" value="2"/> 
    				<nvpair id="evms-ia-02" name="clone_node_max" value="1"/> 
    			</attributes> 
    		</instance_attributes> 
    	<primitive id="evms-child" class="ocf" type="EvmsSCC" provider="heartbeat"> 
    	</primitive>
    </clone>
    
    

    All it does is runs evms_activate on all nodes when the resource starts up.

    Load the resource into the cluster:

    host1:~ # cibadmin -C -o resources -x evmscloneset.xml
    Note: SLES10 ships with LSB scripts (found in /etc/init.d) doing the same thing, creating EVMS devices during boot process but it will not work with CSM containers. They would run fine without errors but your devices won't be created. Perhaps because evmsd would not be running but whatever causes this according to the project's page, the designed way at this time of writing is to manage cluster memberships with HA. Even if you didn't plan deploying clustered XEN domUs, just want to share storage between 2 bare metal servers with EVMS and CSM, you would need the same setup until this point except ping daemon.

    OCFS2 cluster file system resource:

    By now we should have our OCFS2 cluster up and running, file system created, ready to mount, services set for boot, etc. The last step is to mount the actual device what we will do with HA and its file system resource agent. The reason being is that it will need to be done on more than one node and doing it with HA makes it:

    • cluster aware (every node will know when a node leaves, joins the cluster)
    • simple (single configuration for multiple mounts)

    This will need to be again a clone resource that mounts
    /dev/evms/san2/cfgpool volume to /etc/xen/vm on each node in the cluster:

    host1:~ # vi configpoolcloneset.xml
    <clone id="configpool" notify="true" globally_unique="false"> 
    	<instance_attributes id="configpool-ia"> 
    		<attributes> 
    			<nvpair id="configpool-ia-01" name="clone_max" value="2"/> 
    			<nvpair id="configpool-ia-02" name="clone_node_max" value="1"/> 
    		</attributes> 
    	</instance_attributes> 
    	<primitive id="configpool-child" class="ocf" type="Filesystem" provider="heartbeat"> 
    		<operations> 
    			<op id="configpool-child-op-01" name="monitor" interval="20s" timeout="60s" prereq="nothing"/> 
    			<op id="configpool-child-op-02" name="stop" timeout="60s" prereq="nothing"/> 
    		</operations> 
    		<instance_attributes id="configpool-child-ia"> 
    			<attributes> 
    				<nvpair id="configpool-child-ia-01" name="device" value="/dev/evms/san2/cfgpool"/> 
    				<nvpair id="configpool-child-ia-02" name="directory" value="/etc/xen/vm"/> 
    				<nvpair id="configpool-child-ia-03" name="fstype" value="ocfs2"/> 
    			</attributes> 
    		</instance_attributes> 
    	</primitive>
    </clone>
    
    

    Again we are configuring an anonymous cloneset globally_unique parameter is (again) set to false. Since this time we are configuring a cloneset that contains an OCFS2 file system resource agent, we want to enable notify for it, so that the clones (each agent on each node) will receive notifications from the cluster and hence get informed on the cluster membership status. To enable notifications, set notify to true for the cloneset. We also configure the monitor operation, so that the cluster checks every 20 seconds if the mount is still there.

    The most important part of the XML blob is the attributes section of the configpool primitive part. Set the device parameter to the OCFS2 file system that needs to be mounted, directory to the directory on which this file system must get mounted, and fstype to ocfs2 for obvious reasons. In a clone file system RA (resource agent), any other value is forbidden because OCFS2 is the only supported cluster aware file system at this stage.

    Load the resource into the cluster:

    host1:~ # cibadmin -C -o resources -x configpoolcloneset.xml

    Without EVMS volumes, this resource wouldn't be able to start hence we have to make sure that EVMS starts first, making OCFS2 resource dependent of EVMS resource:

    host1:~ # vi configpool_to_evms_order.xml

    <rsc_order id="configpool_depends_evms" from="configpool" to="evms" score="0"/>

    Something to mention here about the scoring at the end... It's very important since version 2.1.3 and above. Without it, after successful live migration, domUs can randomly restart. Reading the documentation, I realized that it's harmless and it should be in the CIB anyway.

    Reading:

    http://www.clusterlabs.org/wiki/images/a/ae/Ordering_Explained_-_White.pdf
    http://www.gossamer-threads.com/lists/linuxha/users/52913

    Load the policy into the cluster:

    host1:~ # cibadmin C o constraints x configpool_to_evms_order.xml

    More OCFS issues:

    This is the stage where I had issues with fileimage based domUs. I had basically the 100GB LUN created with one big OCFS2 file system, mounted with HA in similar fashion. It was a while ago, I must admit that perhaps it is fixed by now but when domUs started migrating to the other node things went wrong.

    According to my observations and logs, it was like a locking issue with live migration, OCFS2 couldn't handle over the lock to the other node or release the image file when the handover finished by XEN and the writing operation to the image file was going to continue on the other node.

    Of course it was a severe error and the nodes started receiving STONITH actions until things came right. Without live migration, HA would have stopped the domUs first then started them up on the new location which would have worked perfectly, the original HASI was based on this idea. I just couldn't afford this for storage which holds user data and mounted (multiple times) all the time and at last but not least it's not what I want for a production environment.

    FYI:

    Another interesting issue we found recently is that if you have findutils-locate package installed, it would not work very well on OCFS2. There's a cron job running every day which builds the database for all files found on the system but when it reaches OCFS2 volumes, it hangs. We made a support call about this, no updates yet.

    It's great file system, I like its features and the support what Novell builds into its products for it but I am not convinced yet, that it's suitable for XEN clustering and live migration in production environment.

    NFS virtual machine (domU) resource:

    Finally this is the time to play with virtual machines and HA. As mentioned earlier in this guide, I will only present solution here for NFS domU sharing a large disk with UNIX users but on the same idea, I run around 15 other domUs on 2 clusters hosting various UNIX services.

    Creating a virtual machine is out of the scope of this guide, there's plenty on the net about it. In fact I don't install domU anymore, I maintain, run a plain copy on one of my clusters and clone that when I need a new one.

    Note: the XEN domU configuration is still text based, not xenstore. It's the only way of doing XEN clustering according to my knowledge because it's tricky to sync the xenstore database amongst all nodes at this time of writing.

    Related reading:

    http://wiki.xensource.com/xenwiki/XenStore

    For a reference here it is my NFS XEN domU configuration:

    host1:~ # vi /etc/xen/vm/nfs.xm
    ostype="sles10"
    name="nfs"
    memory=512
    vcpus=1
    uuid="86436fde-1613-4e12-8a94-093d1c3f962e"
    on_crash="destroy"
    on_poweroff="destroy"
    on_reboot="restart"
    localtime=0
    builder="linux"
    bootloader="/usr/lib/xen/boot/domUloader.py"
    bootargs="--entry=xvda1:/boot/vmlinuz-xenpae,/boot/initrd-xenpae"
    extra="TERM=xterm"
    disk = [ 'phy:/dev/evms/san2/vm2,xvda,w', 'phy:/dev/mapper/mpath1,xvdc,w' ]
    vif = [ 'mac=00:16:3e:32:8b:12', ]
    vfb = [ "type=vnc,vncunused=1" ]
    
    

    I assigned a fairly small (5GB) EVMS volume to this domU and added the 2T LUN as well, as is, without any modification as it comes off the dm layer with its alias name mpath1. The 5GB might look a bit tight but my domUs are crafted for their purpose, run purely in runlevel 3 (no GUI) and really just a cut-down SLES copy. At a time when I built this cluster, there was no JEOS release available, today it may be good choice for domU:
    http://www.novell.com/it-it/linux/appliance

    Create XEN domU resource:

    host1:~ # vi xenvmnfs.xm
    <primitive id="nfs" class="ocf" type="Xen" provider="heartbeat"> 
    	<operations> 
    		<op id="xen-nfs-op-01" name="start" timeout="60s"/> 
    		<op id="xen-nfs-op-02" name="stop" timeout="90s"/> 
    		<op id="xen-nfs-op-03" name="monitor" timeout="60s" interval="10s"/> 
    		<op id="xen-nfs-op-04" name="migrate_to" timeout="90s"/> 
    	</operations> 
    	<instance_attributes id="nfs-ia"> 
    		<attributes> 
    			<nvpair id="xen-nfs-ia-01" name="xmfile" value="/etc/xen/vm/nfs.xm"/> 
    		</attributes> 
    	</instance_attributes> 
    	<meta_attributes id="nfs-ma"> 
    		<attributes> 
    			<nvpair id="xen-nfs-ma-01" name="allow_migrate" value="true"/> 
    		</attributes> 
    	</meta_attributes>
    </primitive>
    
    

    We set the standard start, stop, monitor, migration operations with timings, then the location of the XEN domU configuration and enabled live migration (needs to be meta attribute).

    Instance attribute: settings about the cluster resource for the local (on the node) resource (ie. what IP address to use)

    Meta attribute: settings about the resource for the cluster (ie. should the resource be started or not)

    One of the features of this HASI is the hypervisor friendly resource timing what I can best explain with a second domU configuration running within the same cluster:

    <primitive id="sles" class="ocf" type="Xen" provider="heartbeat"> 
    	<operations> 
    		<op id="xen-sles-op-01" name="start" timeout="60s" start_delay="10s"/> 
    		<op id="xen-sles-op-02" name="stop" timeout="60s" start_delay="10s"/> 
    		<op id="xen-sles-op-03" name="monitor" timeout="60s" interval="10s" start_delay="10s"/> 
    		<op id="xen-sles-op-04" name="migrate_to" timeout="90s" start_delay="10s"/> 
    	</operations> 
    	<instance_attributes id="sles-ia"> 
    		<attributes> 
    			<nvpair id="xen-sles-ia-01" name="xmfile" value="/etc/xen/vm/sles.xm"/> 
    		</attributes> 
    	</instance_attributes> 
    	<meta_attributes id="sles-ma"> 
    		<attributes> 
    			<nvpair id="xen-sles-ma-01" name="allow_migrate" value="true"/> 
    		</attributes> 
    	</meta_attributes>
    </primitive>
    
    

    The second domU above has 10 seconds start_delay set for all its operations. When HA starts up it starts all resources in the cluster according to ordering and only wait for dependencies to complete. If I had 20 domUs in my cluster it would hammer the system and its hypervisor and could lead some domUs to crash. I understand that SLES and xend has protection against this sort of problem however I had issues when some of my heavily loaded domUs started migrating all at the same time, some crashed occasionally.

    http://wiki.linux-ha.org/ClusterInformationBase/Actions

    This little time will delay the second domU operations and mostly 10 seconds is enough for start, stop, migration to complete unless it's heavily loaded or has large RAM allocated. Mind you 10 seconds for each resource could cause significant time delay for a cluster loaded with 20 domUs, the last resource would have more than 3 minutes delay, therefore it's your call to make how you configure or adjust these delays. You have to craft it for your environment, I cannot give you “one fits all” solution.

    You could though:

    • reduce the time and only delay each domU for 5 seconds if you have many
    • delay a pair of domUs (or more) with either similar purpose or characteristics
    • delay just the ones you know being resource intensive or busy
    • don't use timing if you had well tested your cluster and had no issues

    You have to make sure that the domU runs, the configuration is typo free and the disk descriptions are correct before loading it into the cluster.

    You could simply test the domU outside of the cluster (start it up with the traditional XEN utilities on one of the member nodes) or better yet to test it in a test environment if you can afford one. If you load it into your HA cluster and it doesn't start or crash after a while, HA will issue STONITH for that node (reboot) which could affect other, already running, working services. It may not be something you want on a production system.

    If you are ready, load it into the cluster:

    host1:~ # cibadmin -C -o resources -x xenvmnfs.xm 

    Now we create a policy to make this domU resource dependent of EVMS and configpool. Since they are already dependent of each other, it makes sense to make domU dependent of configpool resource:

    host1:~ # vi nfs_to_configpool_order.xml
    <rsc_order id="nfs_depends_configpool" from="nfs" to="configpool" score="0"/>
    
    

    It has no affect on an already running domU resource, safe to load into the cluster while operating, the scoring was already discussed at the bottom of page 27.

    Create another policy to make the NFS domU dependent of working networking on eth0:

    host1:~ # vi nfs_to_pingd_constraint.xml
    <rsc_location id="nfs:connected" rsc="nfs"> 
    	<rule id="nfs:connected-rule-01" score="-INFINITY" boolean_op="or"> 
    		<expression id="nfs:connected-rule-01-expr-01" 
    			attribute="pingd" operation="not_defined"/> 
    		<expression id="nfs:connected-rule-01-expr-02" 
    			attribute="pingd" operation="lte" value="0"/> 
    	</rule>
    </rsc_location>
    
    

    pingd is already working and scoring, this rule instructs the cluster not to run (stop) this domU resource anywhere in the cluster, where there's no networking. (pingd connectivity)

    Related reading:

    http://wiki.linux-ha.org/CIB/Idioms/PingdStopOnConnectivityLoss

    Certainly in our scenario, we have live migration enabled hence the cluster will not just stop the resources as mentioned on the page above but it would either start on another node (assuming there's network connectivity) or just live migrate if there's some other Ethernet connectivity between the cluster member nodes. (should be!)

    It is not the best way of handling network connectivity scoring as written on the page above, I'm well aware of that but the other method (preference scoring for better connectivity) could cause domUs moving between nodes perhaps often depending on load. I don't want that, I wanted my resources to stay where they are as long as there's networking available.

    Should you lose networking on all your nodes? HA will shutdown all domUs and keep scoring continuously in the background. Once it came back, it would start them up again although I haven't tested this behavior.

    The last thing we need to do is load these policies into the cluster:

    host1:~ # cibadmin -C -o constraints -x nfs_to_configpool_order.xml
    host1:~ # cibadmin -C -o constraints -x nfs_to_pingd_constraint.xml

    Remember that these policies will need to be individually configured for each domU resource you plan to run within an HA cluster.

Operating Hints

Caveats:

HA is our central database, it tracks and monitors each resource in the cluster, informs all member nodes about changes and synchronizes the CIB (cluster information base) amongst all nodes.

You must stop using any traditional XEN domU management utility including:

  • virsh (libvirt)
  • virtmanager (libvirt GUI)
  • xm

Anything you do with the domUs must be done with informing HA. If you stop one of your domUs with any traditional utility, HA would not know what happened to it and would start the resource up but what if you do the same just at the same time? Yes, corrupted storage. XEN will happily start multiple instances of the same domU, will not warn you or complain, it's what it does but all your files written at the same time without cluster file system will be corrupted.

You can only use those above for monitoring, gathering information, perhaps testing domUs out of the cluster, nothing else. The cluster's job is to keep them running, should you want to change that status, tell the cluster by its built in, HA aware utilities.

To make it easy from the CLI, I wrote a basic script which translates basic xm management commands to HAaware commands:
http://www.novell.com/communities/node/2573/xen-ocf-resource-management-script-ha-stack

The other common mistake is typo in the configuration files, particularly when you don't install the domU just clone it. Some will be harmless but some can be quite destructive hence:

You have to make sure that the XEN domU configuration has the correct disk descriptions and they point to the right device. It's a must regardless you use EVMS setup or fileimages on OCFS2.

Failing to do so you could forget to change the disk line of the configuration after cloning and when you start it up, there's a very good chance to corrupt an already running production domU disk...

Crafting as a word appearing in this guide quite often. This is what makes the difference, create harmony in your cluster and you don't need to do much against defaults for it.

Basic principals: turn off unused services, no firewall, no AppArmor, patch regularly, install only the softwares packages you need, keep an eye on RAM and CPU usage, no unnecessary accounts, use runlevel 3, etc.

Heartbeat GUI:

You should see lots of nice little green lamps in your cluster now so let's talk a bit about the GUI interface. It's very basic, can do certain things, gets better by every release but I use it mainly to take an overview at the services or for basic operations and I strongly recommend you doing the same.

To authenticate, you either have to reset the the password for hacluster system user (nonsense) or have to make your account to be the member of the haclient group (better):

host1:~ # groupmod -A <yourusername> haclient
host2:~ # groupmod -A <yourusername> haclient

Note: this is something what you will need to do on all cluster member nodes unless you are using centralized user management. (LDAP) The GUI can be started from any member node within the cluster and will display, work, behave the same way.
host1:~ # hb_gui &
or
host2:~ # hb_gui & 

Now you should be able to authenticate with your credentials, learn and get used to the interface:

Start, stop resources with HA:

You can configure the status of a resource in the XML file before loading it in for example:

<nvpair id="xen-sles-op-05" name="target_role" value="stopped"/>

or

<nvpair id="xen-sles-op-05" name="target_role" value="started"/>

In the CIB it becomes the default for that particular resource. Frankly, I don't see any point loading something into the CIB with stopped status, doing it with started is nonsense because that's the HA default action anyway.

But when we stop, start a resource either with the GUI or with my script we in fact insert an interim attribute into the CIB with a generated id without making it default.

It's important because if you want to start the stopped resource again you could select start:

But then you just replace the stopped interim attribute with the started one, you still leave interim bits in your CIB. To do it properly select default which actually removes the interim attributes and apply the HAdefault status to the resource which was started:

Deleting the target_role attribute inserted by GUI from the right panel has the same affect as default. I don't personally like interim entries in my CIB, I like to keep it nice and clean. Note: at this stage my script doesn't have option for default!

Safe way of testing new resource configurations:

Essential for production environment even if you are certain that things would work. The best way is to tell the cluster not to manage the resource in the initial XML file:

<nvpair id="xen-sles-op-05" name="is_managed" value="false"/>

Should the new resource not to start up, crash after a while? The cluster will ignore the failure and will not issue fencing for the node where the failure occurred. Once it became stable you can remove this attribute with the GUI or edit the CIB with cibadmin (later on).

Disable monitoring operation:

You may need this at some stage, I haven't used it so far. It's just an extra attribute:

<op id="xen-sles-op-03" name="monitor" timeout="60s" interval="10s" enabled="false"/>

Notice that I removed the monitor delay from the previous example just to avoid breaking the line and maintain readability otherwise it would be there. You can remove this as the previous one.

Editing the CIB:

You can only use the GUI or the cibadmin utility. You must not edit the CIB by hand even if you know where it is located on the file system.

The GUI is simple, you just edit or delete the particular part you need but it's different from the command line. Assume you saved all the XML files you loaded into the CIB then just make the change you like and load it back into the cluster but use replace -R option instead of create -C for example:

host1:~ # cibadmin -R -o resources -x xenvmnfs.xm

Backup, Restore the CIB:

You can backup the entire CIB:

host1:~ # cibadmin -Q > cib.bak.xml

It includes all the LRM parts what you wouldn't need normally. LRM stands for local resource manager, basically the part which handles all things to the corresponding node locally. CRM is replicated across all nodes whereas LRM is the component which does the local actions on each node. Hence I prefer to backup my CIB by object type instead:

host1:~ # cibadmin -Q -o resources > resources.bak.xml
host1:~ # cibadmin -Q -o constraints > constraints.bak.xml

Should you need to restore them:

host1:~ # cibadmin -R -o resources -x resources.bak.xml
host1:~ # cibadmin -R -o constraints -x constraints.bak.xml

Migration with HA:

It's a little bit different therefore I would like to talk about it. In HA when you tell your cluster to do something (stop, start) you either set target_role or set a certain preference. (migration)Preference is managed by scoring what you may have realized already and when you migrate a resource, you actually instruct the cluster NOT to prefer (score -INFINITY) or prefer it the MOST (score +INFINITY) a certain resource on a particular node.

Using my script to migrate or “right click” on any resource on the GUI and select “migrate resource” option are the same, they apply interim scoring and insert a rule into the cluster. This nature of HA applies to all resource types not just to virtual machines!

Ultimately you are messing up the CIB what you will need to clean up at some point. As you can see it's already built into the interface (option down below) and my script also includes a subcommand for this but it's still the not preferred way of doing this at least by me.

Standby is the way to go:

In last 2 years, since I am running these clusters I have only had to migrate resources when I wanted to update (patching) or do maintenance on the servers. The best is to put the node into standby. It's designed for this and basically what it does is migrates all resources which are migration capable (and need to be running), stops and starts the others which are not then stops the remaining resources which don't need to be running.

As usual my script includes a built in option to do this or you could use the GUI:

It takes some time depending on your setup, set time delays, number of resources and their load so just be patient. Once HA reports that the node is in running-standby, resources stopped, domUs running on another node, you can basically do whatever you feel like. You can patch the XEN host, upgrade, shut it down for maintenance, upgrade firmware and so forth.

When you finished just make it active node again, GUI or the script it's your choice:

Backups:

How do I actually backup these complex systems? What would the restore be like? We use CommVault7 to back up data partitions (within domU) but I use tar for everything else. The dom0 is simple, it logs to a remote server as well, barely changes hence I back it up (full) once a week to a remote backup server via NFS where it gets stored onto tape.

The restore would be simple too. Should the system fail to boot? I would boot from the original SLES install DVD, select rescue mode, configure networking then restore from the remote NFS server. We actually keep lots of system backups online on the remote backup server's disk cache.

For the domU I run tar based daily differential backups and full once a week. The restore should be more challenging due to the nature of the setup but it's actually fairly easy. You can fix, restore, backup, modify or whatever you need doing on any domU disk from the XEN dom0 host regardless you use EVMS, partition or fileimage based setup. I already published a guide how to access domU disks from the host which is the key in this solution:

http://www.novell.com/communities/node/2697/mountaccess-files-residing-xen-virtual-machines

I tested this solution many times when I accidentally deleted domU disks or corrupted them during development. It actually takes less than 10 minutes to fully recover a domU this way which is pretty good.

If you can afford domU being offline, you can create, clone any domU disk with dd utility, create an image backup, daily snapshot or whatever you want.

Update regime:

Due to the nature of the system it makes sense not to be version freak and update all the time when a patch is released. Simply follow Murphy, don't try to fix the non existing problem although it's good to do patching occasionally to save yourself from software bugs.

I recommend doing the following, keep the common sense amongst them:

  • avoid upgrading during working hours, the load needs to be as small as possible
  • sign up for patch notification emails
  • always read them thoroughly, understand what is fixed, see if you could be affected
  • update regularly but not too often (I update usually between 8-12 weeks)
  • unless I am affected or the patch is really important for healthy environment

I'm not too concerned about security fixes, my systems protected by various levels of tools, firewalling, etc. but you may need them urgently if a package is affected by a critical bug and it could remotely affect your systems providing public services over the INTERNET.

The updating as a process is more challenging in this environment especially when software components receive feature updates, newer versions, etc.

For an HA managed XEN cluster, the standard procedure would be as follows:

  • put host1 node into standby
  • apply patches and reboot (assume within 8-12 weeks you receive kernel update)
  • put host1 node back into active mode (no domU should be running there)
  • observe the system for a while, monitor logs, ensure full functionality
  • update the least important domU first then stop it (running on host2)
  • start the least important domU up immediately (it should start up on host1)
  • monitor behavior for a while, may be a day or so depending on your requirements
  • if things look good put host2 into standby BUT don't patch just yet
  • all domUs will move to the newly patched host1 (should be no issues)
  • host2 remains in standby until all domUs are fully patched and restarted
  • proceed with the patching of the rest of the domUs running on host1
  • once all domUs are fully patched, restarted on host1 and fully functional for a while proceed with host2 patching
  • reboot host2 then put it back to active mode if behaves well for a while

This is the safest method I had figured out over the years and it always worked except one occasion: when SUSE updated HA from 2.0.7 to 2.1.3 and my configuration at that time didn't have certain scoring settings. It's already discussed briefly at the bottom of page 27.

I had odd OCFS2 issues as well when a new version got released (1.4.x). For clustering solutions it's quite common that nodes cannot establish connection with other nodes with different software versions, it's just the way it is.

You have to pick the right time for upgrade and maintenance. The load needs to be the smallest possible to avoid issues. For example: once I updated one of my cluster nodes during working hours and when I put the node back to active mode, the EVMS resource failed due to timeout. I must have had big load either on the SAN or on the volumes presented to the nodes. As a result STONITH actions were fired, reboots, etc.

RAM usage generally:

This idea assumes that you count the amount of RAM you consume, you limit the usage of your dom0 and all your domUs for a certain amount and you never run more resources than one node can handle. It's one of the reasons why a 2 node cluster is inefficient, one is pretty much waste because you can only run as much as one node can handle.

Of course if you had more nodes, it's a lot easier to dump some domUs onto many, alternatively you could set up some policies to shut down certain resources to accommodate the extra load but it's out of the scope of this guide.

Persistent device names:

It's a bit off topic, take it as an optional reading by the way it should not affect users who run SP1 or SP2 SLES10 installations. I upgraded mine since GA release and at that time this wasn't the default during installation. I made some notes how to do this by hand.

http://www.novell.com/communities/node/6691/create-convert-disks-persistent-device-names

Limiting dom0's RAM:

I think the maximum limit is still a good idea and I'm still doing it:

http://www.novell.com/documentation/sles10/xen_admin/data/sec_xen_config_bootloader.html

The maximum limit is not enough itself, I do recommend setting the minimum limit as well. This way your dom0 (your controller) always gets what it needs and according to my tests 1G is enough for even the busiest environments. Smaller systems could try 512M for a start:

host1:~ # vi /etc/xen/xend-config.sxp
-snip-
(dom0-min-mem 1024)
-snip-
host1:~ # rcxend restart

Be careful restarting xend on production systems, I haven't tested this during normal operation, it might cause trouble.

Drawing dependency graph:

It's very useful for visualizing the HA cluster, its dependencies and it should be part of your cluster documentation. I published this separately:

http://www.novell.com/communities/node/5880/visualizing-heartbeat-ha-managed-resource-dependencies

Unlimited shell history:

I found it useful especially when I forgot how I did certain things in the past. You could track your root account's activity during the system's lifetime:

http://www.novell.com/communities/node/6658/unlimited-bash-history

Managing GUI applications remotely:

It's an article I published recently and it could be very useful for virtual environments. The GUI component takes a lot of resources and actually the server doesn't need to run these so we can turn them off yet use and take advantage of the GUI tools developed for SuSE.

http://www.novell.com/communities/node/6669/rdp-linux-managing-gui-displays-remotely

Monitoring HP hardware:

Protect your XEN host from hardware failure and monitor it by HP provided tools. It may not be necessary for initial setups but for production environments.

http://www.novell.com/communities/node/6690/hpproliantsupportpacksles10

Relocation host settings:

You will need this for live migration. If you are unfamiliar with this please visit this page:

http://www.novell.com/documentation/sles10/xen_admin/index.html?page=/documentation/sles10/xen_admin/data/bookinfo.html

XEN networking:

It's a very important and sensitive topic, pay attention to it when you design your cluster. I personally prefer bridges since it's a layer 2 operation on the OSI model, easy to set up and doesn't cause too much overhead on the host even if it's software based but it may not suit your environment therefor you will need routing.

If you decided to use bridging and want to use multiple ones for more than one NIC, you can create a small wrapper script to manage them when xend starts up:

host1:~ # vi /etc/xen/scripts/multi-network-bridge
#!/bin/sh
# PRELOADER (Wrapper) SCRIPT FOR 'network-bridge'
# Modified script from wiki.xensource.org (for more
# than 1 bridge) IAV, Sep 2006
#
# Modified to suit SLES10-SP2. IAV, Aug 2008
dir=$(dirname "$0")
"$dir/network-bridge""$@" netdev=eth0
"$dir/network-bridge""$@" netdev=eth1

Modify your xend configuration to reflect this change:

host1:~ # vi /etc/xen/xend-config.sxp
-snip-
(network-script multi-network-bridge)
-snip-
host1:~ # rcxend restart

It will basically run the standard bridge-script multiple times for every NIC specified. You will need this to be set same way on all dom0 hosts. We use this to separate private LAN and DMZ traffic for infrastructure domUs requiring access to both.

Common problems with bridges:
http://www.novell.com/support/php/search.do?cmd=displayKC&docType=kc&externalId=7001989&sliceId=1&docTypeID=DT_TID_1_1&dialogID=18715317&stateId=0%200%2018707874

Issues with multiple NICs:
http://www.novell.com/support/php/search.do?cmd=displayKC&docType=kc&externalId=7000058&sliceId=1&docTypeID=DT_TID_1_1&dialogID=18715317&stateId=0%200%2018707874

Upgrading from SLES10 SP1 to SP2:
http://www.novell.com/support/php/search.do?cmd=displayKC&docType=kc&externalId=7000608&sliceId=1&docTypeID=DT_TID_1_1&dialogID=18715317&stateId=0%200%2018707874

XEN knowledge base master reference:
http://www.novell.com/support/php/search.do?cmd=displayKC&docType=ex&bbid=TSEBB_1221753215744&url=&stateId=0%200%2018707874&dialogID=18715317&docTypeID=DT_TID_1_1&externalId=7001362&sliceId=2&rfId=

Cluster status via web:

hb_gui is great but what if you don't have it in hand? It's possible through HTTP protocol with web browser, crm_mon command line utility is capable of creating output in HTML format. Web server could be running on your nodes but it's pointless, my preference is to run the web server somewhere else equipped with a small cgi script to retrieve the output from the nodes remotely. You could ask any nodes, they return the same output but your cgi script must ask another node if one of them is down for maintenance. This solution is designed for 2 node cluster and might not be the best but works, based on that you already created with your favorite tool a user for example "monitor" on both hosts.

Generate keys then copy them to the other node, same location:

monitor@host1:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/monitor/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/monitor/.ssh/id_rsa.
Your public key has been saved in /home/monitor/.ssh/id_rsa.pub.
The key fingerprint is:
05:d7:71:82:a5:14:d6:a1:c6:1c:2e:3a:ba:5b:9c:cc monitor@host1
monitor@host1:~> cd .ssh
monitor@host1:~/.ssh> ln -s id_rsa.pub authorized_keys
monitor@host1:~/.ssh> cd ..
monitor@host1:~> scp -r .ssh host2:~
Password: 
authorized_keys		100% 	926 	0.9KB/s	00:00
id_rsa			100% 	1675 	1.6KB/s	00:00
id_rsa.pub		100% 	395 	0.4KB/s	00:00
known_hosts		100% 	395 	0.4KB/s	00:00
monitor@host1:~>

Both nodes should be ready now for remote passwordless login. (monitor user only)

My webserver is not a SLES box at this stage but it should be very similar, the difference is just the running user and maybe the locations. On the webserver we have to unlock the account for the running user by giving it a valid shell. At completion we copy the private key across and put it to the default $HOME/.ssh location:

webserver:~ # getent passwd apache
apache:x:48:48:Apache:/var/www:/sbin/nologin
webserver:~ # usermod -s /bin/bash apache
webserver:~ # getent passwd apache
apache:x:48:48:Apache:/var/www:/bin/bash
webserver:~ # su - apache
apache@webserver:~> pwd
/var/www
apache@webserver:~> mkdir .ssh
apache@webserver:~> scp monitor@host1:~/.ssh/id_rsa .ssh/
Password:
id_rsa			100% 	1675 	1.5KB/s	00:00 

Execute some remote test commands from the webserver as the running user on both nodes. It's necessary to test the authentication and to accept the initial authenticity for both systems.

apache@webserver:~> ssh -l monitor host1.domain.co.nz 'uptime' 
2:41pm up 5 days 18:08, 4 users, load average: 0.09, 0.08, 0.02
apache@webserver:~> ssh -l monitor host2.domain.co.nz 'uptime' 
2:42pm up 5 days 23:26, 6 users, load average: 0.38, 0.36, 0.29

When you finished you can log out then lock the running user's account back, it doesn't need valid shell anymore:

apache@webserver:~> exit
webserver:~ # usermod -s /sbin/nologin apache
webserver:~ # getent passwd apache
apache:x:48:48:Apache:/var/www:/sbin/nologin
webserver:~ # su - apache
This account is currently not available.

Create a shell script in your cgi-bin directory depending on your webserver's configuration with the following content:

#!/bin/sh
NODE1="host1"
NODE2="host2"
KEY=/var/www/.ssh/id_rsa
USER="monitor"
DOMAIN="domain.co.nz"
CMD="/usr/sbin/crm_mon -1 -w"
# This function is testing the node availability
ping_node() { 
/bin/ping -c3 -w5 -q $1 > /dev/null 
/bin/echo $?
}
# This function is used to randomize NODE variable
randomize() { 
i=1 
while [ $i -ne 0 ]; do 
SECONDS=`/bin/date +%S` 
TEST=$(( $SECONDS % 2 )) 
if [ $TEST -ne 0 ]; then 
NODE=$NODE1 
else 
NODE=$NODE2 
fi 
i=`ping_node $NODE` 
done
}
randomize && /usr/bin/ssh -l $USER -i $KEY $NODE.$DOMAIN $CMD

The webserver configuration is out of the scope of this document and remember it may take few seconds to load the page. This delay is caused by the host checking (ping) method built into the cgi script.

Output should look like this:

Testing the cluster

It's just as important as any other safety feature we built into the cluster. I hope you had read the links provided by this guide by now and they were all working. Jo's original HASI discussed the testing in some ways, I'm not planning to duplicate but you should:

Test multipath:

It depends on your hardware hence I cannot give you solution. In my case I did have some unplanned outages, controller failures which allowed me to actually test the service in real life. At some point one of my servers lost both paths to the SAN, the other for some reason didn't. Interestingly HA must have noticed something and the following day I found all my domUs running on the healthy node. There's nothing in my cluster monitoring the SAN but something obviously did something which actually saved all my resources from becoming unavailable what I didn't noticed at that time since it was after hours...

Test STONITH:

This is the most important part, ensure it's operating 100%. You can kill the HA process and see if resources move to the other node and the one you killed the process on actually reboots (after a while depending on your check interval, deadtime, etc.):

host2:~ # pkill heartbeat

It could take some time if ssh agent was chosen so be patient, monitor logs in the meantime:

host2:~ # tail -f /var/log/ha-debug

You can test the iLO agent like this from the CLI:

host2:~ # stonith -t external/riloe -p hostlist ilo_hostname ilo_user 
ilo_password ilo_can_reset ilo_protocol ilo_powerdown_method -T reset

It's a one line command, you can use the parameters hardcoded into your XML file. It should cold reboot the node within few seconds.

Note: the STONITH action is considered emergency hence the iLO agent will just pull the cord, your journaling filesystem should take care of the unclean shutdown. The ssh agent is different, it's executing a reboot command within the OS which would do clean shutdown therefore requires significant amount of time to complete depending on your setup.
Remember: if you have implemented timing delays as explained in this guide, with many domUs, starting heartbeat process at boot and stopping at shutdown will take some time. Don't force it, it will complete, it's just the nature of the cluster.

I do recommend stopping the ssh STONITH agent and do the proper test (killing HA process as above) to ensure that iLO STONITH work as well.

Resources:

Stop some domUs randomly with xm or virt-manager, see if HA brings them back up after while, monitor logs.

Network failover:

Test the cluster against network issues. The best and easiest is to pull the cord out of one of your member nodes. (eth0 only, we monitor only the NIC going back to our private LAN) Resources should start restarting or migrating onto the other node after few moments. If you don't have access to the hardware or don't want to pull the cord, you can block the returning ping packets to your node which should have the same affect:

host2:~ # iptables -I INPUT -p icmp -s 192.168.1.254 -d 192.168.1.2 -j DROP

The -s parameter is my gateway, the returning packet's source, the -d is the server itself and the destination for the packets. It filters -p only icmp protocol and just inserts -I this filter to the standard INPUT chain. You could add more options to create the closest match to these packets but I think it should do the job safely without risking you blocking something you didn't mean to. The server should not reboot, your domU resources should move to another node. To get rid of this interim firewall rule, you could restart the XEN host or issue the following command:

host2:~ # iptables -D INPUT -p icmp -s 192.168.1.254 -d 192.168.1.2 -j DROP

Standby mode:

Test the cluster against standby mode, see resources moving. Reboot the nodes, do some maintenance, etc. Once finished, put it back to active mode, resources should stay where they are, logs should contain no errors.

After few reboots on both nodes, ensure that the DC (designated controller) role does change over in the HA cluster, EVMS volumes get discovered and activated well on all nodes.

Check the configuration time to time:

It's good idea to do ad-hoc configuration checks particularly when it changed:

host1:~ # crm_verify -LV
host1:~ #

Empty prompt is good sign, everything else is displayed...

Proof of concept

The NFS domU is hosting user home directories exported by NFS service to store user data, for testing it's set with 512MB of RAM. The domU is running on host1, everything is as presented earlier in this document.

Writing 208MB file to the NFS export from my desktop:

geeko@workstation:~> ls -lh /private/ISO/i386cd-3.1.iso
-rw-r--r-- 1 geeko geeko 208M Nov 3 2006 /private/ISO/i386cd-3.1.iso
geeko@workstation:~> md5sum /private/ISO/i386cd-3.1.iso
b4d4bb353693e6008f2fc48cd25958ed /private/ISO/i386cd-3.1.iso 
geeko@workstation:~> mount -t nfs -o rsize=8196,wsize=8196 nfs:/home/geeko /mnt
geeko@workstation:~> time cp /private/ISO/i386cd-3.1.iso /mnt
real 0m20.918s
user 0m0.015s
sys 0m0.737s 

It wasn't very fast due to my uplink was limited to 100Mbit/s at that time but it's not what we are concerned about right now. Redo the test but migrate the domain (put host1 into standby) while writing the same file to the NFS export:

geeko@workstation:~> time cp /private/ISO/i386cd-3.1.iso /mnt
real 0m41.221s
user 0m0.020s
sys 0m0.772s

Meanwhile on host1 (snippet):

host1:~ # xentop
xentop - 12:02:23 Xen 3.0.4_13138-0.47
2 domains: 1 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 1 shutdown
Mem: 14677976k total, 1167488k used, 13510488k free CPUs: 4 @ 3000MHz 
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) 
Domain-0 -----r 2754 47.6 524288 3.6 no limit n/a 4 4 1282795 4132024 
migrating-nfs -s---- 8 0.0 524288 3.6 532480 3.6 1 1 17813

It was twice as long but:

nfs:~ # md5sum /home/geeko/i386cd-3.1.iso
b4d4bb353693e6008f2fc48cd25958ed /home/geeko/i386cd-3.1.iso

The md5 hash matches up and that is what I wanted to see from the NFS domU. I tested the file system just in case: (NFS domU itself used LVM2 on top of xvdc (mpath1) with XFS)

nfs:~ # umount /home
nfs:~ # xfs_check /dev/mapper/san1-nfshome
nfs:~ #

No corruption found.

For the record:

As of today the NFS domU uses 1G RAM and sharing the home space amongst many users. The disk is still using XFS filesystem due to the CommVault7 backup software we use for the data partition by the way it's the best for this purpose. Apart from minor adjustments (added more RAM, increased the number of nfsd processes as we added more users to it) the system uses nearly default settings.

We needed to tune the mount options for the users' NFS clients too. These were necessary to ensure data safety and instant reconnection if the NFS server rebooted for some reason such as kernel update. The mount command (one line) would look like this:

geeko@sled:~> mount -t nfs -o rw,hard,intr,proto=tcp,rsize=32768,wsize=32768,nfsvers=3,timeo=14 nfs:/home/geeko /home

All UNIX systems around nowadays configured with autofs which retrieves a special record from our LDAP about the user's home location then mounts it instantly. It has been working absolutely hassle free, the domU was migrated many times amongst the nodes while heavy access without any issues. The filesystem is clean, the data is safe, the shared storage is considerably efficient although NFS is not my favorite protocol.

I/O Statistics:

Native SLES10 SP2 on HP DL585G2 hardware with EVA6000 SAN
     Write: ~156M/s
     Read: ~140M/s

     CommVault7 best backup throughput was ~130G/hour
     CommVault7 best restore throughput was ~75G/hour

XEN SLES10 SP2 domU on HP DL360G5 hardware with EVA6000 SAN
     Write: ~132M/s
     Read: ~121M/s

     CommVault7 best backup throughput was ~116G/hour
     CommVault7 best restore throughput was ~61G/hour

NFS performance from SLED10 SP2 desktop (not as good but still fast as a local disk)
     Write: ~49M/s
     Read: ~27M/s

These measurements were done mostly after hours without much load, showing an average at that time. The SAN has a fairly large cache hence the throughput may be bogus.

XEN host stats with average load across 8 domUs:

Load average: 0.4-0.8
I/O: ~3000 blocks/sec
Interrupts: ~2200/sec
Context switch: ~2500/sec
Idle: 91% 

Conclusion

There may be an argument for many between XEN versus VMware, no doubt both has its strengths and place on the market. XEN is the child of Linux, no question about its strengths or efficiency and cost effectiveness but unfortunately it's lacking of all the fancy features and tools, utilities what VMware offers.

XEN tools are getting better, evolve fast but it will take significant amount of time for developers to catch upon others who have been doing this for a long time ago. Novell's efforts to improve on this is quite clear, the support we get is very good and I am very happy about what I managed to work out over the last 2 years.

For us, the deciding factor was the efficiency, stability not particularly the cost. At that time we had verbal agreement to purchase ESX for some infrastructure developments, I believe we still own one copy but it's not being used as far as I know.

I don't have VirtualCenter, P2V, dynamic provisioning and many other great features but the reality is that I don't need it. The main components are built into this solution:

  • live migration without service interruption
  • clustering and high availability
  • auto failover
  • efficient, centralized resource management, etc.

We needed static virtualization, domUs with dedicated purpose and that's what most small, mid sized businesses would want at least for a start. XEN can serve these well in a very good, very efficient, cost effective way.

I have to admit that I didn't plan using SLES for this project at the early days and it's not marketing here for any flavor, just an individual opinion.

I tested Debian Linux, Fedora Core 5 and NetBSD3 for both dom0 and domU but SLES turned out to be the best for the dom0. It's what matters after all, the dom0 has to be rock solid and perfect. The components used in this guide also developed by SuSE people as individual projects hence no doubt that it's really the best you can use for this sort of thing.

Unlike for the dom0, I still prefer INTERNET facing core systems on Debian Linux and I wouldn't use anything else. You wonder why? Cleanness of code, community support, vast amount of packages, discipline and at last but not least their standard policy of no feature upgrades within release.

For example our SMTP gateway domU consumes 387M storage space, it's the installed system without logs, keeps away daily an average 20K SPAM (most rejected on the spot meaning that these don't waste my CPU time), exchanges zillions, checks viruses and at last but not least our false positive ratio is very low. Half of the filters would not be available for SLES and maintaining these could become unnecessary hassle or overhead after a while.

Thanks to

In no particular order:

Jo De Baer
Andrew Beekhof
Lars Marowsky Bree
Dejan Muhamedagic
Alan Robertson
Novell support
Users from linux-ha mailing list

ZałącznikWielkość
hasi-secure.pdf2.25 MB

RPM Package Verification and Repair

$
0
0

As everybody who uses a computer in a multi-user environment eventually finds out permissions on files and directories sometimes get in your way. This is typically a good thing since they also get in the way of things like viruses and malware and other users on the system trying to get your stuff or break your computer. Windows users, well, you haven't perhaps dealt with this much in life since windows is a bit of a red light district (and I mean that in a bad way) as you probably also already know.

So when permissions become torn up on your precious application files because of a bad set of instructions, a typo, or some other glitch in software (it happens, even with Linux), what is a user to do? The first thing that comes to mind may be to have a backup that you can restore, but that implies several things which I do not want to imply, but will list them for grins.

First, it implies you know that you should backup, so essentially you are an IT person. Second, it implies you actually took a backup, which implies you don't really exist. Third, it implies you took a backup recently, and now we're pushing the limits of quantum theory. Fourth, it implies your backup knows about permissions (not that hard to believe) and fifth that it wasn't lost in the same operation that messed up your permissions.

This article is primarily made for when you mess up application files permissions, like those that actually run the software (OpenOffice, Pidgin, Thunderbird, Firefox, etc.). These permissions are set to typically be world-usable but not world-writable. Most of the time (with SUSE) they are installed via RPMs and are owned by 'root' and while 'root' can tinker with the actual contents of the files most regular users can only access the files (execute, read) which keeps them safe from your own bad decisions. In those RARE cases where you become 'root' and are following directions on how to configure some random package and you mistype with a 'chmod' or 'chown' command you can quickly make your system unusable just like you can with any bad command in any OS. A few options are available to help us in these cases. The RPM software has been around for quite a while and provides options out of the box... yes, without third-party software, Linux Just Does That.

So let's dive in. For those of you not inclined to use the command line, welcome to computers. Not just microsoft computers, but real computers. If you hadn't been tinkering as 'root' in the first place this might not be a problem now but the only way to efficiently troubleshoot any system (including windows or Mac, in my opinion) is from the command line. I'll explain why in another article someday, but for now I'll just be very clear in what needs to be done.

The 'rpm' command is the interface to the RPM database which holds information about files/directories and tons of metadata about your application and other files. The command has a lot of options to use, but a few (when learned) provide the bulk of everything necessary (use 'man' pages for the rest when needed of course).

A few of the options you should know include -q (query), -V (verify), -i (install), -U (upgrade), -F (freshen), and two new ones that I just learned, --setperms --setugids (set permissions and set ownership, respectively). So how do you use these?

First, the -q option: this is the basic starting point for RPM if you already have your system installed. It basically lets you query the RPM database (store of information about installed stuff) or a package on your machine for information. This option is combined with other flags to get more data and in this it can be confusing. For example there is a -i parameter that lets you install packages, but if you use -qi it gets information about whatever you are querying. I'll provide some examples for clarification.

rpm -qa   #Query All, as in everything in your RPM database.  By default this dumps package names to the screen
ab@mybox:~/Desktop> rpm -qa | head -3
terminfo-base-5.6-90.55
libwnck-lang-2.24.1-2.22
libgnome-lang-2.24.1-2.28

Notice that I selected three results and they happened to be just three packages from my system retrieved in who-knows-what order. The last one looks to be related to Gnome, which makes sense since I use it primarily. If I select that one specifically and use the -qi (query information) options I get the following:

ab@mybox:~/Desktop> rpm -qi libgnome-lang
Name        : libgnome-lang                Relocations: (not relocatable)
Version     : 2.24.1                            Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany
Release     : 2.28                          Build Date: Wed 25 Feb 2009 03:41:23 AM MST
Install Date: Fri 06 Mar 2009 11:37:04 PM MST      Build Host: baur
Group       : System/Localization           Source RPM: libgnome-2.24.1-2.28.src.rpm
Size        : 2565938                          License: GPL v2 or later; LGPL v2.1 or later
Signature   : RSA/8, Wed 25 Feb 2009 03:41:54 AM MST, Key ID e3a5c360307e3d54
Packager    : http://bugs.opensuse.org
URL         : http://www.gnome.org/
Summary     : Languages for package libgnome
Description :
Provides translations to the package libgnome
Distribution: SUSE Linux Enterprise 11

Okay, now this is getting interesting. I just found that my box is SLE(D) 11 and this is a ligbnome-lang package version 2.24.1, built on some box named 'baur' from my vendor (SUSE) in Germany (where SUSE is from). I see its size, a signature (so I know it's really what it says it is), a license, and all kinds of good information. Let's go a step further and I can query all of the files that the RPM database knows about that came from this package. The '-ql' option lets me List all files from this package:

rpm -ql libgnome-lang |head -3

/usr/share/locale/am/LC_MESSAGES/libgnome-2.0.mo
/usr/share/locale/ar/LC_MESSAGES/libgnome-2.0.mo
/usr/share/locale/as/LC_MESSAGES/libgnome-2.0.mo

Okay, now I can see three (of however-man) files that were laid down by this package. Where are we going with all of this? Well the RPM database can be (and often is) told to keep track of metadata about these files, such as ownerships, permissions, checksums, etc. Let's try the -V (verify) option with the command:

rpm -V libgnome-lang

After a short delay I'm right back to my prompt with no output. This means that nothing was found to be different. You could add a lower-case 'v' to that in order to get more-verbose output and see a bunch of lines like this if desired, or you could just `echo $?` and see the return code from the command was zero (0) which typically means everything is good regardless of the command used:

........    /usr/share/locale/am/LC_MESSAGES/libgnome-2.0.mo                                                   
........    /usr/share/locale/ar/LC_MESSAGES/libgnome-2.0.mo                                                   
........    /usr/share/locale/as/LC_MESSAGES/libgnome-2.0.mo                                                   
........    /usr/share/locale/az/LC_MESSAGES/libgnome-2.0.mo

Notice the leading dots? They would be replaced by characters indicating a problem if there was a discrepancy detected. For example configuration files are often changed by users or the system or something and they therefore often show up as having a different size or md5 checksum from the original, but this is expected.... they're configuration files. As an example I know something has modified my /etc/profile file incorrectly (no, you shouldn't be modifying this directly... create your own user-level scripts in ~/.profile or else create your own system-wide scripts in the /etc/profile.d directory) so let's see which package owns it:

rpm -qf /etc/profile
aaa_base-11-6.3

The result is that aaa_base owns this file. If I were to get a new patch to aaa_base tomorrow my custom changes would be lost when the files were updated. Let's run the Verify option against aaa_base:

rpm -V aaa_base
.......T  c /etc/csh.login
S.5....T  c /etc/inittab
S.5....T  c /etc/mailcap
S.5....T  c /etc/mime.types
S.5....T  c /etc/profile

So here we can see that the Size, the md5sum, and the Timestamp are changed on several files, and the Timestamp is changed alone on /etc/csh.login. How were they changed? Who knows, but I know where they originated and now I know that something in them may be wrong. If I were troubleshooting an issue and found that some important files, perhaps libraries or executables (binaries) were changed then I would be very suspicious. Configuration files change all the time, but binary files should not unless the package changes (typically).

Since permissions were how I led into all of this let's pretend that those are wrong. I'll demo that by hacking my permissions badly on purpose:

chmod 777 /etc/profile

rpm -V aaa_base
.......T  c /etc/csh.login
S.5....T  c /etc/inittab
S.5....T  c /etc/mailcap
S.5....T  c /etc/mime.types
SM5....T  c /etc/profile

Notice that now my Mode (permissions) are wrong. Currently they are set to any average user can come in here and add anything they want to my default profile, which could include a bad command like 'rm' with recursive and force flags (nuking my hard drive quickly the next time I login as 'root' for some bad reason). I need to fix this but I just don't know what the permissions used to be (I really don't... I should have backed this up.... if my demo fails then this article is going to be a really costly one to me). The options to set these back to what the RPM database knows are --setperms and --setugids. In my example I have the wrong permissions on this one file so I'll have RPM set all of the permissions for the files in that package back for me:

rpm --setperms aaa_base
chmod: cannot access `/var/adm/fillup-templates/gshadow.aaa_base': No such file or directory

Now this is a good example for two reasons. First, my default bash prompt tells me the return code for my previous command and it is '0' so all is well, but the warning above is interesting. Verify your own with `echo $?` before you do anything else and you should get a '0' back if everything went properly. What does that message above about gshadow.aaa_base mean? Well apparently that file (/var/adm/fillup-templates/gshadow.aaa_base) exists per this package, but it couldn't be found. That's not really related to my initial problem, though, so let's run the Verify again:

root@mybox:~# rpm -V aaa_base
.......T  c /etc/csh.login
S.5....T  c /etc/inittab
S.5....T  c /etc/mailcap
S.5....T  c /etc/mime.types
S.5....T  c /etc/profile

Look at that... the 'M' (mode) is now gone because it matches what the RPM database has recorded for it. So doing this for any packages that I know I have broken lets me put permissions back to normal. We saw that there is a --setugids option that fixes ownership so let's break that next:

chown ab:wheel /etc/profile

root@mybox:~# rpm -V aaa_base
.......T  c /etc/csh.login
S.5....T  c /etc/inittab
S.5....T  c /etc/mailcap
S.5....T  c /etc/mime.types
S.5..UGT  c /etc/profile

So now we see that the User and Group ownership options are wrong for my /etc/profile file. It's owned by my user ('ab') and the 'wheel' group. According to my assumption I should be able to run one command to fix this nicely for me:

rpm --setugids aaa_base
chown: cannot access `/var/adm/fillup-templates/gshadow.aaa_base': No such file or directory
chgrp: cannot access `/var/adm/fillup-templates/gshadow.aaa_base': No such file or directory

root@mybox:~# rpm -V aaa_base
.......T  c /etc/csh.login
S.5....T  c /etc/inittab
S.5....T  c /etc/mailcap
S.5....T  c /etc/mime.types
S.5....T  c /etc/profile

root@mybox:~# ll /etc/profile
-rw-r--r-- 1 root root 9663 Mar 25 21:28 /etc/profile

Voila, the ownerships are back to 'root' (user) and 'root' (group) and the Verify comes back clean(er) again.

So what does this all really mean? It's possible, because Linux Just Does That, to fix permissions, ownerships, and other things recorded by the RPM database (and stored there from the package that was used for the installation) out of the box. The system keeps track of these for you (among who knows how many other things) so it can verify the integrity of applications that came in this way. Now there are a few caveats to note. I believe (though have not confirmed) that the builder of the RPM package must specify which of the options (Size, Mode, md5, User-owner, Group-owner, Timestamp, etc.) are saved by the package when the package is actually built. This is a developer thing that you as an end user don't need to worry about since I've seen very few packages NOT store all of these options on all files that were within the package but keep it in mind. If I were to create a package and didn't save these data as I built the package then the Verify option of RPM wouldn't be as useful to you. Also, as we mentioned, configuration files change size/contents all the time since that is their purpose.

Also it is possible (and regularly happens) that files are not owned by a specific RPM even though that RPM may have created them. This usually applies to configuration files and not more-useful (in this case) files like libraries and other binaries or application files. As an end user I create documents all the time as well that obviously don't come from any given RPM. /etc/passwd is a critical file that fits this profile on my system; it is not owned by any package but it had better be there and had better have the right permissions. As a configuration file it always changes from time to time so keep this in mind.

Finally even applications can be shipped without an RPM package. For example I use Thunderbird regularly and I just extract it to an 'apps' directory under my user's home directory so it is only accessible by my user. Since it didn't come with an RPM there is no way to do the tests above (any of them) for this package. That's my own prerogative and it's something to be aware of since many great applications may not come in RPM format. Anything you get via the SUSE repositories (or similar for other RPM-based distributions) should be fine since they are based around the concepts of packages.

That leads to another topic: what about other distributions and operating systems? RPM has had this functionality for years but there are others... Debian and Ubuntu use .deb packages. Gentoo uses a tool called 'emerge' to build its software. Solaris uses PKG commands (pkginfo, for example) and its own type of packages. There are a lot of package options out there and this is all fairly common stuff for package management but I do not know the details for other systems. Fedora (another RPM-based distribution) obviously has it, and a site I found doing research on this led me to some the pkgchk command on Solaris: http://www.cyberciti.biz/tips/reset-rhel-centos-fedora-package-file-permission.html. A bit more use of Google revealed 'debsums' which appears to be available for Debian-based distributions (Debian, Ubuntu, Backtrack 4, etc.).

So what can we learn from all of this? Well, a few things come to mind. First, this is troubleshooting and fixing issues that you know are broken. I wouldn't typically recommend running something like the following to magically go and reset permissions on all files on your system that come from packages. Will it hurt things? Maybe not, but it's not a good idea (just like anything as 'root' without being very careful and understanding the consequences is not a good idea):

rpm -qa | xargs rpm --setugids --setperms   #No seriously, don't run this unless you're ready to rebuild your box.

With that said we can use these skills to fix things. When you try to SSH to a machine you just mangled unknowingly and find you cannot login because the SSH server is implementing some decent security you can fix files using the RPM command. You can also reinstall software of course but sometimes that is not the best option; using a chisel is always best even if there is a jackhammer or sledgehammer nearby assuming the chisel does the job (feel free to ask any archaeologist).

While the information here is perhaps a bit complex it's meant to illustrate a point more than to help just anybody start messing with permissions. Troubleshooting is a skill. Tracing a symptom to a cause is a skill. Hopefully with a bit of the information here and some time and effort a system that you otherwise think is lost can be recovered and restored to full health. Reinstalling software shouldn't be the norm... it should be the exception.

Another point I find myself making over and over is to check other references. I mentioned the 'man' pages above; in case you are not aware there is a command in Linux/Unix called 'man' that accesses a built-in manual of data (there is also 'info' that gives a different manual). If I wanted to know more about the 'rpm' command I would type the following on any machine with the 'rpm' command installed and could likely get more than my heart's content about the command, its options, its development history, and whatever else the developers threw in there:

man rpm

'man' itself is a command on the most (if not all) machines. Another option to find out more about a given command is to append '--help' as its first parameter:

rpm --help

The following section results from that command being executed and this parameter (or a variant of it) works for every command I have used:

ab@mybox:~/Desktop> rpm --help                                
Usage: rpm [OPTION...]                                                       
  --quiet                                                                    

Query options (with -q or --query):
  -c, --configfiles                list all configuration files
  -d, --docfiles                   list all documentation files
  --dump                           dump basic file information 
  -l, --list                       list files in package       
  -P, --patches                    list patches or patched files 
  --queryformat=QUERYFORMAT        use the following query format
  -s, --state                      display the states of the listed files
  -a, --all                        query/verify all packages             
  -f, --file                       query/verify package(s) owning file   
  -g, --group                      query/verify package(s) in group      
  -p, --package                    query/verify a package file           
  -W, --ftswalk                    query/verify package(s) from TOP file tree
                                   walk                                      
  --pkgid                          query/verify package(s) with package      
                                   identifier                                
  --hdrid                          query/verify package(s) with header       
                                   identifier                                
  --fileid                         query/verify package(s) with file identifier
  --specfile                       query a spec file                           
  --triggeredby                    query the package(s) triggered by the       
                                   package                                     
  --whatrequires                   query/verify the package(s) which require a 
                                   dependency                                  
  --whatprovides                   query/verify the package(s) which provide a 
                                   dependency                                  
  --nomanifest                     do not process non-package files as         
                                   manifests                                   

Verify options (with -V or --verify):
  --nomd5                          don't verify MD5 digest of files
  --nofiles                        don't verify files in package   
  --nodeps                         don't verify package dependencies
  --noscript                       don't execute verify script(s)   
  -a, --all                        query/verify all packages        
  -f, --file                       query/verify package(s) owning file
  -g, --group                      query/verify package(s) in group   
  -p, --package                    query/verify a package file        
  -W, --ftswalk                    query/verify package(s) from TOP file tree
                                   walk                                      
  --pkgid                          query/verify package(s) with package      
                                   identifier                                
  --hdrid                          query/verify package(s) with header       
                                   identifier                                
  --fileid                         query/verify package(s) with file identifier
  --specfile                       query a spec file                           
  --triggeredby                    query the package(s) triggered by the       
                                   package                                     
  --whatrequires                   query/verify the package(s) which require a 
                                   dependency                                  
  --whatprovides                   query/verify the package(s) which provide a 
                                   dependency                                  
  --nomanifest                     do not process non-package files as         
                                   manifests                                   

File tree walk options (with --ftswalk):
  --comfollow                      FTS_COMFOLLOW: follow command line symlinks
  --logical                        FTS_LOGICAL: logical walk                  
  --nochdir                        FTS_NOCHDIR: don't change directories      
  --nostat                         FTS_NOSTAT: don't get stat info            
  --physical                       FTS_PHYSICAL: physical walk                
  --seedot                         FTS_SEEDOT: return dot and dot-dot         
  --xdev                           FTS_XDEV: don't cross devices              
  --whiteout                       FTS_WHITEOUT: return whiteout information  

Signature options:
  --addsign                        sign package(s) (identical to --resign)
  -K, --checksig                   verify package signature(s)            
  --delsign                        delete package signatures              
  --import                         import an armored public key           
  --resign                         sign package(s) (identical to --addsign)
  --nodigest                       don't verify package digest(s)          
  --nosignature                    don't verify package signature(s)       

Database options:
  --initdb                         initialize database
  --rebuilddb                      rebuild database inverted lists from
                                   installed package headers           

Install/Upgrade/Erase options:
  --aid                            add suggested packages to transaction
  --allfiles                       install all files, even configurations
                                   which might otherwise be skipped      
  --allmatches                     remove all packages which match <package>
                                   (normally an error is generated if       
                                   <package> specified multiple packages)   
  --badreloc                       relocate files in non-relocatable package
  -e, --erase=<package>+           erase (uninstall) package                
  --excludedocs                    do not install documentation             
  --excludepath=<path>             skip files with leading component <path> 
  --fileconflicts                  detect file conflicts between packages   
  --force                          short hand for --replacepkgs --replacefiles
  -F, --freshen=<packagefile>+     upgrade package(s) if already installed    
  -h, --hash                       print hash marks as package installs (good 
                                   with -v)                                   
  --ignorearch                     don't verify package architecture          
  --ignoreos                       don't verify package operating system      
  --ignoresize                     don't check disk space before installing   
  -i, --install                    install package(s)                         
  --justdb                         update the database, but do not modify the 
                                   filesystem                                 
  --nodeps                         do not verify package dependencies         
  --nomd5                          don't verify MD5 digest of files           
  --nocontexts                     don't install file security contexts       
  --noorder                        do not reorder package installation to     
                                   satisfy dependencies                       
  --nosuggest                      do not suggest missing dependency          
                                   resolution(s)                              
  --noscripts                      do not execute package scriptlet(s)        
  --notriggers                     do not execute any scriptlet(s) triggered  
                                   by this package                            
  --oldpackage                     upgrade to an old version of the package   
                                   (--force on upgrades does this             
                                   automatically)                             
  --percent                        print percentages as package installs      
  --prefix=<dir>                   relocate the package to <dir>, if          
                                   relocatable                                
  --relocate=<old>=<new>           relocate files from path <old> to <new>    
  --repackage                      save erased package files by repackaging   
  --replacefiles                   ignore file conflicts between packages     
  --replacepkgs                    reinstall if the package is already present
  --test                           don't install, but tell if it would work or
                                   not                                        
  -U, --upgrade=<packagefile>+     upgrade package(s)                         

Common options for all rpm modes and executables:
  -D, --define='MACRO EXPR'        define MACRO with value EXPR
  -E, --eval='EXPR'                print macro expansion of EXPR
  --macros=<FILE:...>              read <FILE:...> instead of default file(s)
  --nodigest                       don't verify package digest(s)            
  --nosignature                    don't verify package signature(s)         
  --rcfile=<FILE:...>              read <FILE:...> instead of default file(s)
  -r, --root=ROOT                  use ROOT as top level directory (default: 
                                   "/")                                      
  --querytags                      display known query tags                  
  --showrc                         display final rpmrc and macro configuration
  --quiet                          provide less detailed output
  -v, --verbose                    provide more detailed output
  --version                        print the version of rpm being used

Options implemented via popt alias/exec:
  --scripts                        list install/erase scriptlets from
                                   package(s)
  --setperms                       set permissions of files in a package
  --setugids                       set user/group ownership of files in a
                                   package
  --conflicts                      list capabilities this package conflicts
                                   with
  --obsoletes                      list other packages removed by installing
                                   this package
  --provides                       list capabilities that this package provides
  --requires                       list capabilities required by package(s)
  --suggests                       list capabilities this package suggests
  --recommends                     list capabilities this package recommends
  --enhances                       list capabilities this package enhances
  --supplements                    list capabilities this package supplements
  --basedon                        list packages this patch-rpm is based on
  --info                           list descriptive information from package(s)
  --changelog                      list change logs for this package
  --xml                            list metadata in xml
  --triggers                       list trigger scriptlets from package(s)
  --last                           list package(s) by install time, most
                                   recent first
  --dupes                          list duplicated packages
  --filesbypkg                     list all files from each package
  --fileclass                      list file names with classes
  --filecolor                      list file names with colors
  --filecontext                    list file names with security context from
                                   header
  --fscontext                      list file names with security context from
                                   file system
  --recontext                      list file names with security context from
                                   policy RE
  --fileprovide                    list file names with provides
  --filerequire                    list file names with requires
  --buildpolicy=<policy>           set buildroot <policy> (e.g. compress man
                                   pages)
  --with=<option>                  enable configure <option> for build
  --without=<option>               disable configure <option> for build

Help options:
  -?, --help                       Show this help message
  --usage                          Display brief usage message
  

The resources for most commands are plentiful in the computing world. Forums, IRC, product-specific e-mail lists, etc. all focus on helping with learning, and most of it is free.

Good luck.

Move IT: Migrating to Novell Open Enterprise 2 (Linux)

$
0
0

The following information will help you develop a comprehensive understanding of Open Enterprise Server 2 and equip you to serve the needs of our customers. The materials that are listed here are also available at the upgrade website beginning April 28th, and subscribe to this page for for notifications of updates regarding new materials that we will be posting throughout the year.

Please use the comments option to provide feedback and let us know what you may be looking for that you can't find.

Webcasts

Novell Open Enterprise Server 2 Upgrade Assurance Program (April 2009)
In this recorded webcast, Product Marketing Manager Sophia Germanides, shares the details of this Upgrade program with you to help you engage with our customers in their efforts to upgrade their NetWare services platforms to Novell Open Enterprise Server 2.
View the Webcast

CEO Message to NetWare Customers

Whiteboard Academies:

coming soon

Documents:

OES2 DocumentationWebsite
Competitive Brief
Upgrade vs. Migrate White Paper
Gartner Upgrade vs. Migrate Research Note
Move IT Sales Deck
NetWare Lifecycle FAQ
Move IT Teleweb "Cheat Sheet"

Online Training:

May
May 19th 2009 Register Now (EMEA)
May 20th 2009 Register Now (EMEA)
May 21st 2009 Register Now (Americas)
May 22nd 2009 Register Now (Americas)

June
June 9th 2009 Register Now (EMEA)
June 10th 2009 Register Now (EMEA)
June 11th 2009 Register Now (Americas)
June 12th 2009 Register Now (Americas)

Other Items of Interest

Related Articles
Same Server Data Migration - NW to OES2SP1
Materials to Help You Upgrade NetWare Customers to OES2 on Linux
Boost Sales with New Upgrade Promotions!

A SUSE Dragon Wallpaper

Introduction to Novell Support Advisor 1.0.0

$
0
0

Lately if you have been in the Novell Communities pages or reading the Novell Press Release information it is likely you have come across a blurb or two about a new application or tool known as Novell Support Advisor (NSA).

While the press and articles released from the Marketing folks is likely full of pretty screen shots and terms that let you know it will increase your productivity by millions of percent (give or take) I imagine a few in the audience really want to know what it is and how it works, and some of the details I've come across on my own are not yet obvious to the casual observer.

I also have an agenda; I want this to become something that those interested can dive into and start using actively, offering suggestions for improvements, getting it to work just right for them, and basically do whatever is best for them. The tool is cool, that much is certain, so let me show you why it's cool, how it works, and what you can do to make it yours.

So what is NSA? From the NSA website:

What is Novell Support Advisor - Novell Support Advisor is an automated self-help tool used to support and diagnose SUSE® Linux Enterprise Server, Open Enterprise Server (Linux) and associated products. This tool provides customers with a streamlined way to perform both pro-active and reactive system diagnostic tasks typically provided by Novell Technical Services, but in a local, secure and automated fashion. Novell Support Advisor will be incrementally and automatically updated every two to three months. Customer feedback is essential and welcomed as we consider the next set of features for implementation. You can view the planned set of features and projected timelines below.

Let's start with the high level stuff and get a bit more serious towards the end. NSA is a graphical tool that checks your environment(s) for known issues. This helps to speed up resolution times of issues.

With its ability to auto-update and get current data via patterns it is a way to be informed of relevant, applicable configuration or security issues quickly and specifically for your systems. It's a way to help cut your costs and get you focusing on things that cannot be done by a brainless computer.

NSA, if implemented properly and things work the way everybody wants them to, will find known issues in your environment based on data that can be collected via supportconfig but that otherwise you (as an administrator) do not fully understand or know how to find.

The total sum of everybody who writes a pattern is transferred to the person running the pattern on the data gathered from the server. For example if condition A and condition B can cause badThing C and those conditions can be detected via a supportconfig then a pattern can be written to check for the conditions and throw a warning before badThing C even takes place. Not only are solutions able to be found via NSA if used reactively, but they can be prevented if NSA is used proactively.

To go into a bit of the background for the NSA tool we need to know a bit more about supportconfig. For those who are NetWare experts and know about the 'config.nlm' module supportconfig is designed to be the same sort of thing, but for Linux. When it runs it goes to various parts of the system by running commands, capturing file contents, and basically poking all around, to get data that will be useful for troubleshooting. These data are put into text files that can be analyzed by administrators, support engineers, developers, or any others interested in the data.

For more information or the ability to download the tool separately check out the supportconfig webpage: http://www.novell.com/communities/node/2332/supportconfig-linux

So getting into some of the nuts and bolts of the tool we find that somehow supportconfig must be placed on the server (or workstation) from which the data are being collected. This is currently done, if used from within the NSA tool, via SSH. The NSA client is supported on Linux (or even windows) client machines and from there can reach out to the (currently only) SLES servers and upload/download data or run commands.

When NSA is launched the user (a system administrator most-likely) is able to enter an IP address or DNS information for a remote system. Armed with these data (the connection information and credentials for authentication) the NSA client then reaches out over the network and connects to the server. The client installs the RPM which contains supportconfig (the RPM is currently named supportutils) and eventually runs supportconfig with some predetermined parameters.

The net result of running supportconfig via the NSA client is a gzipped TAR file which is then copied back to the NSA client where the rest of the analysis is completed (by default supportconfig creates bzipped TAR files so this is a slight different with the 1.0.0 release of the NSA tool).

The NSA tool keeps an archive of all of the "configs" it retrieves forever and breaks them down by date so you can run patterns against a config yesterday and another today and see what may be different (perhaps after you applied patches, or implemented fixes). After running the tool against a server and having the "config" locally in the client the connection to the server is closed and the rest of the magic is executed via patterns.

Going into this process a little, some administrators of *nix for a while may be wondering how the RPM is installed, supportconfig is executed, and various other components happen. Which credentials are used for this? What if I don't want to use 'root' specifically because that account is disabled for direct SSH access (as it should be for all systems)?

Well those are all fair questions. The default credentials entered are for 'root' and the default port for access is 22, though both are configurable. If a user besides the 'root' user is to be used then that user must have a fair bit of power granted via 'sudo'. From my own system the 'nsauser' is granted rights via the following line in /etc/sudoers:

Syntax: <username> <machineHostname> = NOPASSWD: /bin/uname, /bin/mkdir, /bin/rpm, /usr/bin/env, /sbin/supportconfig, /bin/mv, /bin/ls, /bin/chmod, /bin/rm
Example: nsauser systemNameHere = NOPASSWD: /bin/uname, /bin/mkdir, /bin/rpm, /usr/bin/env, /sbin/supportconfig, /bin/mv, /bin/ls, /bin/chmod, /bin/rm

The documentation for NSA actually instructs the user to set things up a bit more generically, though that is simply because it's the minimal way to get things going. The settings above were found by checking the /var/log/messages file after running with the line below to see which 'sudo' commands were executed by the user.

nsauser systemNameHere=(ALL) NOPASSWD: ALL

To see the interaction via /var/log/messages the following may be useful; keep in mind this is for the 1.0.0 version of the NSA client:

Apr  2 13:01:30 systemNameHere sshd[11949]: Accepted keyboard-interactive/pam for nsauser from 123.45.67.89 port 55898 ssh2
Apr  2 13:01:30 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/uname -srpon
Apr  2 13:01:30 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/mkdir /opt/nsa/
Apr  2 13:01:30 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/mkdir /var/log/nsa/
Apr  2 13:01:31 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/rpm -e ntsutils
Apr  2 13:01:31 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/rpm -Uvh /home/nsauser//supportutils-1.01-40.1.noarch.rpm
Apr  2 13:01:31 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/mkdir /opt/nsa/
Apr  2 13:01:32 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/mv /home/nsauser/nsa.conf /opt/nsa/
Apr  2 13:01:32 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/usr/bin/env SC_CONF=/opt/nsa/nsa.conf /sbin/supportconfig -bg
Apr  2 13:03:09 systemNameHere zmd: NetworkManagerModule (WARN): Failed to connect to NetworkManager
Apr  2 13:03:20 systemNameHere sshd[11946]: fatal: Timeout before authentication for 164.99.194.2
Apr  2 13:03:31 systemNameHere zmd: Daemon (WARN): Not starting remote web server
Apr  2 13:04:14 systemNameHere kernel: device-mapper: table: 253:0: mirror: Device lookup failure
Apr  2 13:04:14 systemNameHere kernel: device-mapper: ioctl: error adding target to table
Apr  2 13:04:40 systemNameHere kernel: pnp: Device 00:09 disabled.
Apr  2 13:04:40 systemNameHere kernel: parport 0x378 (WARNING): CTR: wrote 0x0c, read 0xff
Apr  2 13:04:40 systemNameHere kernel: parport 0x378 (WARNING): DATA: wrote 0xaa, read 0xff
Apr  2 13:04:40 systemNameHere kernel: parport 0x378: You gave this address, but there is probably no parallel port there!
Apr  2 13:04:40 systemNameHere kernel: parport0: PC-style at 0x378 [PCSPP,TRISTATE]
Apr  2 13:04:40 systemNameHere kernel: pnp: Device 00:09 activated.
Apr  2 13:04:40 systemNameHere kernel: parport: PnPBIOS parport detected.
Apr  2 13:04:40 systemNameHere kernel: parport0: PC-style at 0x378 (0x778), irq 7, using FIFO [PCSPP,TRISTATE,COMPAT,EPP,ECP]
Apr  2 13:04:40 systemNameHere kernel: lp0: using parport0 (interrupt-driven).
Apr  2 13:04:40 systemNameHere kernel: ppa: Version 2.07 (for Linux 2.4.x)
Apr  2 13:04:52 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/ls /var/log/nsa/
Apr  2 13:04:52 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/chmod ugo+r /var/log/nsa/nts_systemNameHere_090402_1301.tgz
Apr  2 13:04:53 systemNameHere sudo:  nsauser : TTY=unknown ; PWD=/home/nsauser ; USER=root ; COMMAND=/bin/rm -r /var/log/nsa/
Apr  2 13:04:53 systemNameHere sshd[11952]: Received disconnect from 123.45.67.89: 11: Closed due to user request.

As you can see the client connects and then runs 'sudo' to do a variety of operations from making a directory (specifiable in the NSA's settings), removing the previous supportconfig's package, upgrading the current supportconfig package, moving things around, and finally running supportconfig. At the end you can see it cleans up the data on the system. The data in between the sudo sections are probably caused by running supportconfig (perhaps via the hwinfo command).

While the log shown above illustrates a connection to the server via username and password it is also possible to use the NSA client to bind via SSH using public/private keys which SSH supports natively. This is setup on the server side like any server-side SSH connection via the ~/.ssh/authorized_keys file for the identity being logged in via SSH. In the NSA client, before connecting, simply check the 'Use SSH Keys' checkbox and optionally (depending on whether or not the private key is protected by a passphrase) enter in a passphrase for the private key file you will specify. Browse to the key file and then you should be able to connect without using a password at all.

This addresses concerns regarding running with the 'root' user directly, but it may also be a concern that these privileges are ever given to a user who can access the system in this way. Some of the commands required are indeed very powerful and without further restriction could let a malicious user damage the system. It may be that you would like to give one person with NSA the ability to run patterns but keep them from having any way of accessing the system directly in any fashion.

For these cases, recall that the NSA client stores the configs in an archive from which the client can read configs to run patterns over and over again. This archive is organized out of the box first by hostname and then by time, but the client basically displays the directory structure of the archive however it is created so it can be created by you and then pointed-to for access later via the client.

In this way it is possible to have one user with rights, or a cron job with the rights, run supportconfig manually or automatically and then store the supportconfig results in a directory accessible by other users. An administrator can setup supportconfig to automatically upload its results to a file server from which a helpdesk user could pull the read-only archives and run patterns against them as part of a daily routine.

Security of the supportconfig output is important because of the configuration information contained therein but not as important as having full 'root' privileges secured on a server. supportconfig already supports the ability to upload its results to a repository of some kind securely. See its documentation for more information.

So going back to the NSA client a bit more I mentioned it is supported on Linux and even on windows. Written in Flex and AIR (Adobe) as well as in Java (Sun, er, Oracle) cross-platform support comes naturally. On the server side supportconfig must be able to install and run which requires the use of RPMs. I believe supportconfig works on both SLES and RedHat though NSA is only currently supported with SLES 9, 10, and 11. It can also run on SLED though currently it is targeted for servers more than desktops.

The patterns themselves are written in Perl and can be viewed at any time in the NSA client's 'patterns' directory. Writing new patterns of your own for your environment can be accomplished with a little Perl and supportconfig knowledge; stay tuned for an article on the same. Once a custom pattern is developed place it in the ~/.supportAdvisor/custompatterns directory.

This option is not in the 1.0.0 release but is in subsequent releases. Custom patterns in 1.0.0 requires a little more tinkering that I'll cover later if there is enough interest but in the meantime using a newer client will probably be much simpler. Analyzing the shipping Novell -created patterns is possible by simply viewing the script files in ~/.supportAdvisor/patterns with your favorite text editor. Patterns are divided up into directories based on what they do; for example there is a directory for Clustering, Connectivity, eDirectory, OES, Print, SLE, Update, Security, and Virtualization currently.

When running either an old or a new config is loaded the list of patterns NSA knows about is shown. These patterns that come from Novell are sent from a Novell update server if automatic updates are enabled or updates are requested from the NSA client. The update feature will update not only the patterns but can also update the full client or the Novell Support Advisor piece (or all of the above).

If an upgrade to the full client is needed the NSA client prompts for privileged user credentials for the client machine and then starts the installer automatically. The patterns that come from Novell are developed primarily by Novell employees (exclusively by them, so far) and are tested against the configs that have been retrieved previously to ensure valid results or at least a lack of potential damage to the system on which they run.

Pattern ideas as well as full patterns can be proposed/submitted via Novell's BugZilla site (http://bugzilla.novell.com/). Patterns must be able to determine what is needed from just the output of supportconfig so if that is not possible supportconfig can also be enhanced to retrieve data (if possible) to better meet this goal.

The supportconfig tool via NSA supports plugins which can be used to gather data outside what supportconfig is designed to retrieve out of the box so that may help significantly with implementations specific to your environment.

As mentioned earlier the client and its patterns are made to run on a client machine and not on servers directly (though that is possible as well). While this limits the options of the NSA tool for data retrieval it also means that the client cannot hurt the server easily and that the patterns in the client never have access to the remote system.

Patterns written could potentially have adverse affects on the client system if they were written poorly, though there is no reason to run the NSA client with a privileged user so that should be a minimal risk as well for the most part. Part of the reason for testing patterns against all of the previously-collected configs is to ensure the output is appropriate for the data sent into the pattern. Patterns typically should not error out (have a syntax/runtime error) and should account for possible data in any form (including no data).

The NSA tool on the client does require a privileged user to do the client install due to dependencies on Adobe Air and Flex. The installation is located in the /opt section of the filesystem. Once running as an unprivileged user the NSA client keeps user files in the user's home directory including patterns, archive, and runtime data. The following is a directory listing from the 1.0.2 beta client on my SLED 11 laptop:

aburgemeister@ablaptop0:~> ll ./.supportAdvisor/
total 16
-rw-r--r--  1 ab users  585 2009-04-27 18:33 advisorConfig.xml
drwxr-xr-x 30 ab users 4096 2009-04-24 12:44 archive
drwxr-xr-x  2 ab users    6 2009-04-02 10:05 custompatterns
drwxr-xr-x 12 ab users 4096 2009-04-15 07:54 patterns
drwxr-xr-x  2 ab users   27 2009-03-17 16:50 servers
drwxr-xr-x  3 ab users   34 2009-04-27 18:33 work

The advisorConfig.xml file stores configuration information and looks like the following on my system:

<?xml version="1.0" encoding="UTF-8"?><AdvisorConfig>
  <buildid>17332441844015282</buildid>
  <remote_config_path>/opt/nsa/</remote_config_path>
  <remote_rpm_path>/root/</remote_rpm_path>
  <archive>/home/ab/.supportAdvisor/archive/</archive>
  <pattern_url>https://nu.novell.com/NSA/</pattern_url>
  <perl_path>/usr/bin/perl</perl_path>
  <update_enabled>yes</update_enabled>
  <custom_pattern_path>/home/aburgemeister/.supportAdvisor/custompatterns/</custom_pattern_path>
  <custom_patterns>false</custom_patterns>
  <survey_enabled>yes</survey_enabled>
</AdvisorConfig>

The 'work' directory is for the current run of the NSA client against a config and can be opened after the NSA tool is used to view the supportconfig output in its extracted form for manual review of the files themselves. The 'servers' directory contains a serverList.xml file which contains the list of servers which have been added to the NSA client for config retrieval and analysis. Passwords passphrases, and keys are currently never stored in the NSA client.

My 'work' directory currently looks like the following (recursive listing):

ab@mybox0:~/.supportAdvisor> ls -alR ./work/
work/:                                                  
total 4                                                 
drwxr-xr-x 3 ab@mybox users   34 2009-04-27 18:33 .
drwxr-xr-x 8 ab@mybox users  124 2009-04-27 18:33 ..
drwxr-xr-x 2 ab@mybox users 4096 2009-04-27 18:33 nts_lab35_090317_1940

work/nts_lab35_090317_1940:
total 12848                
drwxr-xr-x 2 ab@mybox users    4096 2009-04-27 18:33 .
drwxr-xr-x 3 ab@mybox users      34 2009-04-27 18:33 ..
-rw-r--r-- 1 ab@mybox users    2076 2009-04-27 18:33 basic-environment.txt
-rw-r--r-- 1 ab@mybox users   15429 2009-04-27 18:33 basic-health-check.txt
-rw-r--r-- 1 ab@mybox users    1335 2009-04-27 18:33 basic-health-report.txt
-rw-r--r-- 1 ab@mybox users  152434 2009-04-27 18:33 boot.txt               
-rw-r--r-- 1 ab@mybox users    7239 2009-04-27 18:33 chkconfig.txt          
-rw-r--r-- 1 ab@mybox users     172 2009-04-27 18:33 cimom.txt              
-rw-r--r-- 1 ab@mybox users     175 2009-04-27 18:33 crash.txt              
-rw-r--r-- 1 ab@mybox users    9125 2009-04-27 18:33 cron.txt               
-rw-r--r-- 1 ab@mybox users      81 2009-04-27 18:33 dns.txt                
-rw-r--r-- 1 ab@mybox users   40870 2009-04-27 18:33 env.txt                
-rw-r--r-- 1 ab@mybox users  148766 2009-04-27 18:33 etc.txt                
-rw-r--r-- 1 ab@mybox users   50254 2009-04-27 18:33 evms.txt
-rw-r--r-- 1 ab@mybox users    1078 2009-04-27 18:33 fs-autofs.txt
-rw-r--r-- 1 ab@mybox users    8078 2009-04-27 18:33 fs-diskio.txt
-rw-r--r-- 1 ab@mybox users     425 2009-04-27 18:33 fs-iscsi.txt
-rw-r--r-- 1 ab@mybox users     568 2009-04-27 18:33 fs-softraid.txt
-rw-r--r-- 1 ab@mybox users  257014 2009-04-27 18:33 hardware.txt
-rw-r--r-- 1 ab@mybox users      86 2009-04-27 18:33 ha.txt
-rw-r--r-- 1 ab@mybox users      81 2009-04-27 18:33 ib.txt
-rw-r--r-- 1 ab@mybox users    1542 2009-04-27 18:33 ldap.txt
-rw-r--r-- 1 ab@mybox users   54028 2009-04-27 18:33 lvm.txt
-rw-r--r-- 1 ab@mybox users   17301 2009-04-27 18:33 memory.txt
-rw-r--r-- 1 ab@mybox users  381206 2009-04-27 18:33 messages.txt
-rw-r--r-- 1 ab@mybox users  415323 2009-04-27 18:33 modules.txt
-rw-r--r-- 1 ab@mybox users   11607 2009-04-27 18:33 mpio.txt
-rw-r--r-- 1 ab@mybox users   33580 2009-04-27 18:33 network.txt
-rw-r--r-- 1 ab@mybox users    1173 2009-04-27 18:33 nfs.txt
-rw-r--r-- 1 ab@mybox users     175 2009-04-27 18:33 novell-edir.txt
-rw-r--r-- 1 ab@mybox users      87 2009-04-27 18:33 novell-lum.txt
-rw-r--r-- 1 ab@mybox users      91 2009-04-27 18:33 novell-ncp.txt
-rw-r--r-- 1 ab@mybox users     100 2009-04-27 18:33 novell-ncs.txt
-rw-r--r-- 1 ab@mybox users      87 2009-04-27 18:33 novell-nss.txt
-rw-r--r-- 1 ab@mybox users    1113 2009-04-27 18:33 ntp.txt
-rw-r--r-- 1 ab@mybox users      88 2009-04-27 18:33 ocfs2.txt
-rw-r--r-- 1 ab@mybox users  398558 2009-04-27 18:33 open-files.txt
-rw-r--r-- 1 ab@mybox users   12736 2009-04-27 18:33 pam.txt
-rw-r--r-- 1 ab@mybox users   11482 2009-04-27 18:33 print.txt
-rw-r--r-- 1 ab@mybox users   32200 2009-04-27 18:33 proc.txt
-rw-r--r-- 1 ab@mybox users  118521 2009-04-27 18:33 rpm.txt
-rw-r--r-- 1 ab@mybox users   12452 2009-04-27 18:33 samba.txt
-rw-r--r-- 1 ab@mybox users      84 2009-04-27 18:33 sar.txt
-rw-r--r-- 1 ab@mybox users   10678 2009-04-27 18:33 security-apparmor.txt
-rw-r--r-- 1 ab@mybox users      86 2009-04-27 18:33 slert.txt
-rw-r--r-- 1 ab@mybox users    9178 2009-04-27 18:33 slp.txt
-rw-r--r-- 1 ab@mybox users      80 2009-04-27 18:33 smt.txt
-rw-r--r-- 1 ab@mybox users    1485 2009-04-27 18:33 ssh.txt
-rw-r--r-- 1 ab@mybox users    6573 2009-04-27 18:33 supportconfig.txt
-rw-r--r-- 1 ab@mybox users  484310 2009-04-27 18:33 sysconfig.txt
-rw-r--r-- 1 ab@mybox users   20682 2009-04-27 18:33 udev.txt
-rw-r--r-- 1 ab@mybox users  132665 2009-04-27 18:33 updates-daemon.txt
-rw-r--r-- 1 ab@mybox users  181666 2009-04-27 18:33 updates.txt
-rw-r--r-- 1 ab@mybox users       0 2009-04-27 18:33 updates.txt.rug.1237340515.SEMAPHORE
-rw-r--r-- 1 ab@mybox users      16 2009-04-27 18:33 updates.txt.rug.1237340515.SEMAPHORE.out
-rw-r--r-- 1 ab@mybox users     970 2009-04-27 18:33 updates.txt.rug.1237340515.SEMAPHORE.sh
-rw-r--r-- 1 ab@mybox users      84 2009-04-27 18:33 web.txt
-rw-r--r-- 1 ab@mybox users      86 2009-04-27 18:33 xen.txt
-rw-r--r-- 1 ab@mybox users   47255 2009-04-27 18:33 x.txt
-rw-r--r-- 1 ab@mybox users 9894176 2009-04-27 18:33 y2log.txt
2009-04-27 22:52:26 Jobs:0 Err:0
ab@mybox0:~/.supportAdvisor>

You can see that the supportconfig output is extracted here under the name of the host and the other timestamp information used to name the supportconfig output. These files are purely text-based so reading the contents they contain is trivial. The 'patterns' directory listing follows:

ab@mybox@mybox:~/.supportAdvisor> ll patterns
total 124
drwxr-xr-x 2 ab@mybox users    44 2009-04-15 07:54 Clustering
drwxr-xr-x 2 ab@mybox users  4096 2009-04-15 07:54 Connectivity
drwxr-xr-x 2 ab@mybox users  4096 2009-04-15 07:54 eDirectory
-rw-r--r-- 1 ab@mybox users  1263 2009-04-15 07:54 nsa.conf
drwxr-xr-x 2 ab@mybox users    26 2009-04-15 07:54 OES
drwxr-xr-x 2 ab@mybox users    26 2009-04-15 07:54 Print
drwxr-xr-x 2 ab@mybox users    48 2009-04-15 07:54 SDP
drwxr-xr-x 2 ab@mybox users  4096 2009-04-15 07:54 Security
drwxr-xr-x 2 ab@mybox users  4096 2009-04-15 07:54 SLE
-rw-r--r-- 1 ab@mybox users 34649 2009-04-15 07:54 SPMan.xml
-rw-r--r-- 1 ab@mybox users 63252 2009-04-15 07:54 supportutils-1.01-40.1.noarch.rpm
drwxr-xr-x 2 ab@mybox users  4096 2009-04-15 07:54 Update
drwxr-xr-x 2 ab@mybox users    69 2009-04-15 07:54 Virtualization
2009-04-27 22:55:40 Jobs:0 Err:0
ab@mybox@mybox:~/.supportAdvisor>

The 'nsa.conf' file may be one of interest in this case. It specifies all of the options that may be used by NSA when running supportconfig by default. A snippet follows:

####################################
# Default Options
####################################
OPTION_APPARMOR=1
OPTION_AUTOFS=1
OPTION_BOOT=1
OPTION_CHKCONFIG=1
OPTION_CRASH=1
OPTION_CRON=1
OPTION_DISK=1
OPTION_DNS=1
OPTION_EDIR=1
OPTION_ENV=1
OPTION_ETC=1
OPTION_EVMS=1
OPTION_HA=1
OPTION_ISCSI=1
OPTION_LDAP=1
OPTION_LUM=1
OPTION_LVM=1
*snip*
ADD_OPTION_RPM_VFULL=0
ADD_OPTION_SLP_FULL=0

VAR_OPTION_BIN_TIMEOUT_SEC=300
VAR_OPTION_CONTACT_COMPANY=""
VAR_OPTION_CONTACT_EMAIL=""
VAR_OPTION_CONTACT_NAME=""
VAR_OPTION_CONTACT_PHONE=""
VAR_OPTION_LINE_COUNT=500
VAR_OPTION_LOG_DIRS="/var/log/nsa /tmp"
VAR_OPTION_MSG_MAXSIZE=500000
VAR_OPTION_PENGINE_FILES_LIMIT=250
VAR_OPTION_SAR_FILES_LIMIT=30
VAR_OPTION_SILENT=0
VAR_OPTION_UPLOAD_TARGET="ftp.novell.com:/incoming"

As you can see this specifies which modules in supportconfig run and optionally where a resulting supportconfig can be sent. The current supportconfig version is also stored in the patterns directory and updated should something change and a new version is pushed out to customers.

Some current scripts do things such as determine if time is drifting far from the NTP sources that are configured. Another reports on high memory utilization in the eDirectory process. Another checks for existing core files from eDirectory so you can be informed of previous crashes, especially crashes within the past week. Others check for known security issues based on versions of the product, installed components, and the system kernel. The options for patterns are vast and can be extended to almost any part of the system.

After a script is executed there are a few possible results. The three main results are Good, Warning and Critical. Good is surprisingly indicated by a green color, and so it is naturally passed over. There are data in the entry that is green providing some information but typically this is not the point of focus for the NSA tool. Warning shows up as a Yellow entry in the table of results and is something to check into that is more important than a Green entry but not yet something deemed Critical bythe writer of the pattern. A Critical pattern result means something is definitely wrong according to the pattern logic. This is a situation needing to be resolved or perhaps something to be purposefully skipped in the future due to environment irrelevance.

There are other error conditions that basically indicate a pattern is needed to run or it errored out in a way it never should have which we'll skip for now. The table entries, regardless of their color, currently provide a button labeled 'View Solution' that will take you to a TID or other online document with the solution for the issue.

For example if you are having an eDirectory problem where your ndsd service is chewing up too much RAM the TID that is linked is one to help you tune your memory settings for the ndsd process. By providing a link to the solution after the analysis for a particular issue experienced Linux administrators can work on fixing things with their background and Novell's documented expertise and new or migrating users can at least jump in, see how troubleshooting is done, and possibly get an issue resolved without having to call Novell directly.

Another bit of integration with the rest of Novell is done via the Novell Customer Center (NCC). This portal is a central place for a customer to see their products, entitlements, others in their company tied with the support organization, Service Requests (SR), and multiple other aspects of their Novell relationship.

The shipping 1.0.0 version of NSA has its own tab dedicated to have some integration with the NCC so that opening an SR or looking at your systems is possible and done in an easy fashion. The client even tells you via a little padlock icon at the top of the screen whether or not you are authenticated to Novell's site. Note that the only reason you may need Internet access while using the NSA tool is for this feature along with receiving updates or otherwise interacting with third parties over the Internet. The NSA tool only requires, to run, access to the supportconfig output.

In the future some major enhancements are planned for the NSA tool. For example currently all patterns chosen are executed against the config even if they do not make sense. A future plan is to check for required packages on the system before running the patterns so errors about Samba or NFS not being configured are ignored on machines where there is no need for Samba or NFS.

Another project in the works is to be able to analyze the configs that come directly from NetWare and provide patterns for that platform for the same purpose as for Linux. Support for other platforms including Windows is also being investigated, though without a robust shell and command-line environment in all windows versions (like the various shells available in Linux and Unix) the data collection needs to be done via other means.

Other enhancements may include support for the ability to automatically run patterns and return data, or integration with other Novell products for a more complete overall solution. With the supportconfig output the presence of applied patches can be confirmed as installed or other system errors could be sent to a centralized helpdesk system for automatic ticket creation and resolution.

Work is also in progress for the creation of a tool that will allow those without programming experience to start with pattern development. The BugZilla system is open to the public and ideas are encouraged to be submitted to the development team at any time.

Feedback is also encouraged at runtime of NSA. There is a button within NSA that provides a way to respond directly about the most-recent use of NSA when closing the client down as you can see in the XML configuration file shown above. After selecting a pattern's result and seeing the resulting Solution button you are also prompted for data regarding the accuracy of the patterns to increase accuracy and usability.

The NSA also has its own public forum which is encouraged for any questions you have and is located at http://forums.novell.com/novell-product-support-forums/novell-support-advisor/ for review or contributions.

Hopefully this provides a decent introduction to the various components of NSA. Feel free to ask questions via the comments section of this article if there are bits I have missed.

DNSBLchecker 4.3

$
0
0
license: 
free

home page url: http://www.kvy.com.ua
download url: http://www.kvy.com.ua/products/

DNSBLchecker is a program for checking hosts in Internet : Enter a domain name or IP of your host, and I tell you who are you! The program also includes additional tests:

  • Checking websites hosted on one IP address.
  • Ping
  • Trace route

1. Question:  What is DNSBLchecker?
    Answer: The DNSBLchecker program is a tool for testing hosts in DNS blacklists ("DNSBL") and Whois databases . For that operation the program uses DNS and Whois queries to the servers storing those databases.

2. Question: What distinguishes DNSBLchecker from existing similar DNSBL programs?
    Answer:

  • For search of info the program uses Internet Whois and DNSBL databases.
  • The program works in the Microsoft Windows or SUSE Linux environments.
  • To this program you can attach any number of DNSBL servers for
    testing. 
  • For testing you can enter an IP address or domain name of the wanted
    host. If a domain name of the host has several IP addresses, then the
    program will check those entire addresses and write a result to the
    HTML report.
  • Since v.3.0 the program was enhanced with the additional tests (for details see the program Help )

3. Question: What requirements are for the program?
    Answer:  The program needs Microsoft .Net Framework 2.0  (and later) or Mono 2.0 (and later), which you can free download at http://msdn.microsoft.com or http://www.go-mono.com.

4. Question:  Who can work with the program?
    Answer:  The program is recommended for GroupWise and Internet administrators.

5. Question: How can I get the program?
    Answer:  The program is free for downloading on our site: http://www.kvy.com.ua/products. The program has Help.

Main window of the program:

Click to view.


LJDT: iSCSI for Shared Storage

$
0
0

So first let's cover the basics. Shared storage, as its names implies, is storage shared among multiple machines. You can probably lump Network-Attached Storage (NAS) into this but the typical high-performance acronym is SAN which stands for Storage Area Network. A SAN typically works at the block level (lower) while a NAS environment works at the file level (higher) which also contributes to the performance benefits for a SAN. Tied to a SAN are mystical terms like Fibre Channel which is essentially one way to attach machines to high-speed storage.

A SAN is usually a hardware-based solution where machineA and machineB have a dedicated network to attach to sanC and share the storage on there in the form of LUNs (Logical Unit Numbers, and a fairly common SCSI term). SANs, because of their performance and ability to do things quickly and well, are typically expensive. Surprisingly enough I don't have one next to my desk here at work, or even one at home (for some reason). I have decent-sized non-SCSI non-RAID hard drives in my machines but that doesn't exactly help me with shared storage, or at least that was my mistaken impression until several days ago. iSCSI is actually one implementation of SAN technology with Fibre Channel being the other popular implementation I know about though one that requires proprietary hardware including cabling.

iSCSI stands for Internet SCSI (Small Computer Systems Interface) and basically means a way of doing shared storage without proprietary hardware. Why would you want to do this? Well, cost is the big reason. Many will likely tell you that iSCSI is nothing more than fecal remains from cheap non-SCSI implementers, and while it is inexpensive I do not believe that means it does not have its place. The performance of Fibre Channel is made to be top of the line and so is the price. iSCSI may not be able to perform with the best of them but so far my experience with it has been just fine for my purposes and many companies and organizations implement in their largest production environments. I'll leave the performance debate for those who have more time and expertise in the area and for now let's just leave it as, it works well enough for almost anything and the rest is outside my budget to even consider anyway.

So what is Shared Storage used for? High Availability (HA) systems need this in many cases so that when one server explodes another server can pick up with the exact same dataset and continue processing. If you need five-nines (99.999%) uptime the leeway for failure is too small to wait for somebody to actually go and turn the computer back on, much less replace it if hardware is bad or identify the problem if it is not known. Shared storage means you can have all of your user data in one location (RAID array on a dedicated system, for example) and then have access to it from anywhere else simultaneously. Reading the shared data simultaneously is usually not a big problem though shared storage systems must address the potential for multiple writers. I'll probably save most of that for another day but suffice it to say that iSCSI allows multiple writers of data simultaneously and leaves it up to the filesystem and other components to implement locking properly.

One example of shared storage that is in my area product-wise at Novell is for eDirectory and Identity Manager. Historically (for about seventeen years now) eDirectory has natively supported fail-over and other aspects of high availability through replication of data to multiple locations. "Partitioning" the tree and placing replicas at various locations is trivial and even done automatically by eDirectory so if one server explodes it is still possible for clients to hit another server and just keep on running. eDirectory is, in LDAP terms, a multi-master setup which means each replica can be written to simultaneously, but this is not done via shared storage. Adding Novell Identity Manager (IDM) to the mix means quite a bit more complexity.

IDM is typically replica-specific in its operations so it doesn't care about what other replicas hold for its short-term use (though that's simplifying things for this article; see the documentation for details) and a given driver config will typically run on one system so when you have a failure of the system running IDM the various applications provisioning identities to and from the Identity Vault suddenly stop provisioning. This can be painful for environments where users are actively making changes; for example if I change my password in one environment and IDM is supposed to change my password in various other systems automatically (saving me changing passwords in a dozen systems all on my own) then I cannot login to those other systems with my new password until IDM does its job. If it is down then I call the helpdesk which is not what the helpdesk wants. Enter shared storage with eDirectory.

With shared storage and a properly-configured eDirectory/IDM environment it is possible to have two physical machines (node0 and node1) both running the same eDirectory Data InfoBase (DIB) in an active/passive configuration. node0 runs and runs until it suddenly gets kicked by the janitor and then node1, detecting its partner's demise, fires up the same eDirectory "instance" in its own circuitry within a negligible timeframe. Both servers are essentially acting as the same eDirectory server and to do that shared storage is utilized. In this example a Fibre Channel or iSCSI SAN environment could be used to keep the data stored in a safe environment (clear of cleaning crews' cleats) so when node0 dies and node1 picks up it basically looks like a quick eDirectory restart instead of a complete rebuild of a server that was broken. This also means that the data are stored on, presumably, better storage devices that are faster and dedicated to the task of data storage and delivery. Since IDM is particular about its eDirectory replicas and is only working within the one eDirectory server it can keep on functioning after one of the nodes running it has been drowned in the ocean, as long as another node can replace it.

So with all of this prelude how does it actually work? iSCSI (or any other shared storage really) needs to make the Operating System (OS) think it has a local disk when, in fact, it does not. This lets all of the other layers above the hardware just work as if nothing was different; there is no need for funky network protocols, strange file transfer rules, nighttime synchronization schedules, etc. In the example above eDirectory literally believes that /dev/sdb1 is local and an independent disk dedicated to holding eDirectory information.

The "client" side of the iSCSI interaction is referred to as the iSCSI initiator. It is the one that initiates the connection to the server side of the connection referred to as the iSCSI target. The target is the one targeted by the client and which actually has the data accessible locally somehow. A target can be accessed by multiple initiators at the same time for the storage to be shared.

The actual client/server (hereafter initiator/target) connection is done over a typical TCP/IP network. For best performance the network used should be one dedicated to the task using the best hardware possible (gigabit, or ten-gigabit ethernet). On the security side of it adding encryption to a SAN usually means a lot of overhead that you do not want, and so iSCSI does not typically implement it. This means on the network side it is important to have the data secured between the initiator and the target as it is possible to see the traffic on the wire with a network capturing utility, or even possible to inject new data into the transfer corrupting data on the target without the initiator knowing immediately. Authentication is possible between the initiator and target to keep somebody from connecting to a target that is not their own.

So we have the basics... a client, a server, optional authentication, and data moving between the two over the (preferably) dedicated network. The case that led me to wanting to do this in the first place revolved around setting up Oracle Real Application Clusters (RAC) which requires shared storage. In this case Oracle has their own filesystem (Oracle Cluster File System 2 (OCFS2)) which controls the multiple-writers issue along with the Oracle Database (DB) software so I'll show how I managed to set that up. To get the Oracle side going I used the following link which is a bit dated but gets the point across, though with a distinct lack of brevity:

http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html

To accomplish this on SLES 10 (SP2) I had to add a few packages that let OCFS2 and iSCSI work nicely, but they all comes with SLES so that was fairly trivial. The resulting packages follow:

ab@mybox0:~> rpm -qa | grep -i ocfs
ocfs2console-1.4.0-0.3
ocfs2-tools-1.4.0-0.3
ocfs2-tools-devel-1.4.0-0.3

ab@mybox0:~> rpm -qa | grep -i iscsi
yast2-iscsi-server-2.13.26-0.3
iscsitarget-0.4.15-9.4
yast2-iscsi-client-2.14.42-0.3
open-iscsi-2.0.707-0.44

As you can see I have packages for OCFS2 to work properly as well as some iSCSI packages which give me the tools I needed there. As a note the iscsitarget package is specific to the target machine. In my setup I wanted to limit machines as much as possible while still having a true "cluster" so I am using one physical machine to host two Virtual Machines (VM). mybox0 is the iSCSI target and also runs an iSCSI initiator on the same box so it can be a client to itself. mybox1 is simply an iSCSI initiator. Both machines in my setup are running Oracle using the OCFS2 filesystem on the iSCSI-accessible partitions though that will not matter much to most. The virtual machines, in my case, are Xen VMs on a SLES 11 x86_64 host though using another virtualization technology like kVM/Qemu or VMware would also be acceptable and does not really affect the outcome of this exercise except perhaps in terms of performance.

The next question that came up was how in the world to get a disk that I wanted to share. Unfortunately I built these VMs for generic use and did not have any extra space on the disk for things like shared storage. I could have added a second "hard drive" to the VM without too much effort but an easier solution came and can be implemented instantly. Using the 'dd' command I created a sparse file of a size I wanted, placed it in /var/iscsidata for access later (with a size less-than the free space on the VM's hard drive, of course) and then pointed to that with my iSCSI target configuration. For those not aware 'dd' is a great tool, and one you should learn about immediately. For brevity I will just tell you that the command below creates a forty-gigabyte sparse file meaning it is a file that natively takes up very little space but can grow to forty gigabytes:

ab@mybox:~> sudo dd if=/dev/zero of=/var/iscsidata/data0 bs=1 count=1 seek=40G

The command takes almost no time at all to run (less than one second) and you end up with one file located where you specified in the 'of' (outfile) parameter. To anything that wants to use this we have just created forty gigabytes of nothing. The filesystem where the file is located on the host (/var/iscsidata) will handle giving it more blocks as it needs to grow and in the meantime the internals do not matter to the host, which is to our benefit. Now we will create the rest of the iSCSI target setup.

ab@mybox:~> sudo /sbin/yast iscsi-server

Going into YaST directly into the iSCSI "server" (target) configuration we have options to control the server, other global settings, and finally the various targets.

Under the Service section it makes sense to have this service start on system startup so it is always available. Also it makes sense to have the system open the firewall port for us for iSCSI traffic so we do not need to do that later after hours of hair-tugging troubleshooting.

Under the Global section we can setup authentication if desired, though for my test I abstained from this (it is an isolated VM network so I'm not concerned about intruders at this point).

Finally under Targets is where the real meat resides. The default target itself is named with a big long value of 'iqn.2009-06.novell.lab:d7998b3b-6622-4c90-9128-0f8d65d1' by default in my system though within that target I can define the LUNs which will be mounted on the initiator when a connection is made to the target. Selecting the 'Edit' option I specify a new LUN (number 0) and point it to the file I created with the 'dd' command above by either entering, or browsing to, the path. It is also possible to create additional targets with their own LUNs for a lot of configuration options out of the box.

When the LUNs are configured I 'Finish' to exit out of YaST and should have a service running and listening as shown below:

ab@mybox0:~> ps aux | grep ietd
root 9184 0.0 0.1 6200 800 ? Ss 23:44 0:00 /usr/sbin/ietd

ab@mybox0:~> netstat -anp | grep ietd
tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 9184/ietd

Because I asked YaST to also open a port in my firewall I also have port 3260 open in the firewall as well as listening and bound to the 'ietd' process which, not coincidentally, is the one for my iscsitarget package:

ab@mybox0:~> sudo /usr/sbin/iptables-save | grep 3260
-A input_ext -p tcp -m limit --limit 3/min -m tcp --dport 3260 --tcp-flags FIN,SYN,RST,ACK SYN -j LOG --log-prefix "SFW2-INext-ACC-TCP " --log-tcp-options --log-ip-options
-A input_ext -p tcp -m tcp --dport 3260 -j ACCEPT

ab@mybox0:~> rpm -qf /usr/sbin/ietd
iscsitarget-0.4.15-9.4

Going to my client machine I want to ensure everything is working correctly and I can reach my server. As I mentioned earlier there are two machines that will be initiators or clients and one of them is also the target machine. This is a bit non-standard but it works well for my get-it-going-quickly test. The same test is applicable for both machines though testing locally doesn't get affected by the default firewall if it did happen to still be in the way:

ab@mybox0:~> netcat -zv mybox0 3260
mybox0 [123.45.67.89] 3260 (iscsi-target) open

So as we can see the TCP port 3260 is open and reachable from the local system. Running the same command from all other initiators also works so I know the firewall is not in the way of traffic on this port currently. Configuring the initiator is as easy as configuring the target. Let's start by going into YaST:

ab@mybox0:~> sudo /sbin/yast iscsi-client

Once inside YaST I am again given options for Service and this time I also have options for Connected Targets as well as Discovered Targets.

Under Service I go ahead and set it to start 'When Booting' as that makes sense to me.

Under Connected Targets I currently have nothing though that will change soon as I proceed to Discovered Targets. Selecting the option for 'Discovery' in this section I enter the IP address of my target system along with the port which is already defaulted to 3260. Setting up authentication as configured on the Target would be trivial here as well but for this example I did not do it so I select 'Next' which takes me to the list of what it discovered. I can now click 'Log In' and login to the target to get access to its LUNs. At this point if I go to Connected Targets I see that I am logged into my target properly. The only thing left is to set this to be 'Automatic' instead of 'Manual' so on a reboot the service not only starts but this connection is re-established. Once this is done and I use 'Finish' to exit YaST I should have everything setup and running on the iSCSI side.

There are a few issues left to resolve, like to ensure it's working. Looking for established TCP connections to the system is fairly easy to start with:

ab@mybox0:~> netstat -anp | grep 'ESTAB' | grep 3260
tcp 0 0 151.155.131.8:1740 151.155.131.8:3260 ESTABLISHED 2142/iscsid
tcp 0 0 151.155.131.8:3260 151.155.131.8:1740 ESTABLISHED 9184/ietd

Also I can use the 'fdisk' command to list all the drives and partitions available to my system:

ab@mybox0:~> sudo /sbin/fdisk -l
Disk /dev/xvda: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 5221 cylinders
Units = cylinders of 16065 * 512 = 16450560 bytes

Device Boot Start End Blocks Id System
/dev/xvda1 1 131 1052226 82 Linux swap / Solaris
/dev/xvda2 132 5221 81770850 83 Linux

Disk /dev/sda: 42.8 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 4096 * 512 = 2097152 bytes

Disk /dev/sda doesn't contain a valid partition table

From the text above we can see that we have two disks essentially. /dev/xvda is the device for my virtual disk for my Xen VM's storage (it, the VM, is paravirtualized, in case that matters). The other disk now shows up as /dev/sda or as the first SCSI disk. fdisk properly reports that it is forty gigabytes and does not have a valid partition table. That's easy to take care of, though:

ab@mybox0:~> sudo /sbin/fdisk /dev/sda
root's password:
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 40960.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-40960, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-40960, default 40960):
Using default value 40960

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

So after loading fdisk I selected 'n' (new), specified a primary partition with 'p', made it number '1', and then took all of the space (the defaults). Once completed I used 'w' to write the partition table and update the system. Now I should have a bit more useful data from fdisk:

ab@mybox0:~> sudo /sbin/fdisk /dev/sda
Disk /dev/sda: 42.8 GB, 42949672960 bytes
64 heads, 32 sectors/track, 20480 cylinders
Units = cylinders of 4096 * 512 = 2097152 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 40960 41943008 83 Linux

The text above is truncated for brevity but it gives you the idea. Now I have /dev/sda1 which is ready for partitioning. In this case I am going to setup an OCFS2 partition in that space for use by Oracle. Doing this, along with the partitioning, only needs to be done by one of the two nodes so I'd might as well do it here now before configuring the other node's iSCSI initiator:

ab@mybox0:~> sudo /sbin/mkfs.ocfs2 /dev/sda1
mkfs.ocfs2 1.4.0
Filesystem label=
Block size=4096 (bits=12)
Cluster size=4096 (bits=12)
Volume size=42949672960 (1052226 clusters) (1052226 blocks)
163 cluster groups (tail covers 17404 clusters, rest cover 32256 clusters)
Journal size=134213632
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 3 block(s)
Formatting Journals: done
Writing lost+found: done
mkfs.ocfs2 successful

After about five seconds I have my volume setup and all that is left to do is to create a mount point and then mount it:

ab@mybox0:~> sudo mkdir /mnt/oraracdata
ab@mybox0:~> sudo mount -t ocfs2 /dev/sda1 /mnt/oraracdata

As this completes let's check to see if everything shows up as it should:

ab@mybox0:~> ls -a /mnt/oraracdata/
total 8
drwxr-xr-x 3 root root 4096 2009-06-10 00:26 .
drwxr-xr-x 3 root root 23 2009-06-05 00:49 ..
drwxr-xr-x 2 root root 4096 2009-06-10 00:26 lost+found

ab@mybox0:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 39G 1.9G 38G 5% /
udev 257M 112K 256M 1% /dev
/dev/sda1 40G 519M 40G 3% /mnt/oraracdata

So at this point we have everything mounted on the box running both the initiator and the target. With the target completely setup, partitioned, and with the filesystem of my choice on there I am going to do the same initiator setup ONLY on the second client machine (no fdisk, no mkfs, etc.) followed by the same mkdir and mount commands to see if I get the same output. Once I do I can create a file on one side and make sure it shows up on the other as follows (notice the different hosts in the two tests):

ab@mybox0:~> sudo touch /mnt/oraracdata/test0

ab@mybox1:~> ls -l /mnt/oraracdata
total 4
drwxr-xr-x 2 root root 4096 2009-06-10 00:26 lost+found
-rw-r--r-- 1 root root 0 2009-06-10 00:37 test0

In these tests each host is mounting the shared storage to /mnt/oraracdata in its respective filesystem so the exact same 'ls -l' command shows the same output on each system. Creating files on each node works and we are up and running. Restarting the system brings the iSCSI initiators up again but one additional step is required to make the mountpoint automatic which is trivial. Adding the following to the /etc/fstab on each system takes care of it since each is mounting the iSCSI location as /dev/sda1:

/dev/sda1 /mnt/oraracdata ocfs2 _netdev,datavolume,nointr 0 0

The mount options specified are unique to OCFS2 so with other filesystems they would not necessarily be the same but in this case they are there. Finally we have shared storage working. The joy of this system is that it is all within one physical machine (a SLES 11 x86_64 Xen host), two VMs (SLES 10 SP2 x86_64 Xen Paravirtualized VMs) and without any shared disks due to the "disk" created in the filesystem with the 'dd' command. For a test system this is a quick and dirty way to get things going. For production adding some real hardware into the mix and a dedicated network can give you great performance overall with minimal work and cost. The demonstration above was done with SUSE-based distributions but iSCSI is one of those things that every Linux distribution can, and probably does, do out of the box. In SLES's case the setup is really easy (taking out all the fluff, verification, etc. we're talking about a dozen steps for all of it including both machines) and easy to manage via YaST. In my case the next steps are to setup Oracle and have RAC work which, unfortunately, is much more involved than just getting the disks working via iSCSI.

LJDT: Taking Advantage of Screen

$
0
0

Last week I was asked if there was a way to start an application at the command line and later come back to it from somewhere else. Normally in Linux/Unix (*nix) it is possible to 'background' a process and then return to it later on but that's only if you are still in the same session where the process was sent to the background. This is useful to have something run while you do other things but reconnecting to this session cannot be done with just the shell. Thankfully 'screen' is on Linux systems by default (all of them I've used anyway) and as a result, Linux Just Does That.

So let's go into the basics of the 'screen' command. I'll be using my SLED 11 laptop and SLES 10 servers for demonstration though this applies equally to OpenSUSE and probably every other current Linux distribution you can find. As mentioned there is sometimes a need to run a command and come back later to check on it, and not always after you have been able to stay in front of the box or consistently connected for hours or days on end. There may be a multi-gigabyte download you are trying to pull down, or a file you are tailing to see what is happening over time, or just some process you run over and over and need to see how that continues without running it manually again. Another use case that is slightly different is one where information needs to be shared between or among individuals who are not necessarily near eachother or wanting to get near eachother to see something happen on a machine. The screen application lets you have as many people as you would like viewing the same terminal, all able to provide their own individual contributions at the same time. Another use case is when you don't want to remotely access a machine multiple times but instead you just want to have multiple shells and you do not have a GUI in which you can spawn a dozen xterm windows. A final scenario may be the simple desire to not have programs stop when a network connection random dies. I'm sure there are others but these are the ones I use most-often. I'll elaborate a bit more on each of them here shortly.

Going back to the original introduction there is the concept of the 'background' and 'foreground' for processes. It is possible, for example, to start a download with wget or curl (two commands used for such things) and then background it with Ctrl+Z followed by 'bg' and the number of the job. This looks like the following:

ab@mybox0:/tmp> wget ftp://ftp.novell.com/outgoing/imanager273idm361.zip
--23:12:34--  ftp://ftp.novell.com/outgoing/imanager273idm361.zip
           => `imanager273idm361.zip'
Resolving ftp.novell.com... 130.57.1.88
Connecting to ftp.novell.com|130.57.1.88|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /outgoing ... done.
==> PASV ... done.    ==> RETR imanager273idm361.zip ... done.
Length: 149,585,409 (143M) (unauthoritative)

 2% [===>                                                                                                                                                            ] 4,304,904     10.26M/s
[1]+  Stopped                 wget ftp://ftp.novell.com/outgoing/imanager273idm361.zip
2009-06-18 23:12:35 Jobs:1 Err:148
ab@mybox0:/tmp> bg 1
[1]+ wget ftp://ftp.novell.com/outgoing/imanager273idm361.zip&

As you can see (by the results I pressed Ctrl+Z after about two percent of the download was completed. My prompt (customized) shows me the number of jobs I have running and it shows I have one. When I paused the job it also gave me a job number in brackets '[1]' so I knew which job to background with the 'bg' command right after that. The last line shows that I had backgrounded the job. This is fine and I can then foreground it again with 'fg 1' whenever I want to do so but only if I am still in the same session that did the backgrounding. This is covered online as to why that is (even when I'm the same user, and even when I'm 'root') and it about as far as I would like to get into it. Suffice it to say it is not currently possible (from everything I've found) to restore a process to a shell unless it came from that shell.

This is where one of the use cases comes in. Using screen I can start the process and then detach from the screen and go along my merry way. The benefit here is that I can reconnect to the screen (it's an application made to do that after all) and see how my progress is after minutes, hours, days, or however long it takes for me to be satisfied enough with the results to do something else. This looks slightly different.

ab@mybox0:/tmp> screen -q      #The -q skips a screen of information that isn't that useful for demonstration purposes

2009-06-18 23:18:02 Jobs:0 Err:0
ab@mybox0:/tmp> wget ftp://ftp.novell.com/outgoing/imanager273idm361.zip
--23:18:05--  ftp://ftp.novell.com/outgoing/imanager273idm361.zip
           => `imanager273idm361.zip'
Resolving ftp.novell.com... 130.57.1.88 
Connecting to ftp.novell.com|130.57.1.88|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /outgoing ... done.
==> PASV ... done.    ==> RETR imanager273idm361.zip ... done.
Length: 149,585,409 (143M) (unauthoritative)                                     

 6% [=========>

[detached]
2009-06-18 23:18:06 Jobs:0 Err:0
ab@mybox0:/tmp>
2009-06-18 23:30:06 Jobs:0 Err:0
ab@mybox0:/tmp> screen -x


2009-06-18 23:18:02 Jobs:0 Err:0
ab@mybox0:/tmp> wget ftp://ftp.novell.com/outgoing/imanager273idm361.zip
--23:18:05--  ftp://ftp.novell.com/outgoing/imanager273idm361.zip
           => `imanager273idm361.zip'
Resolving ftp.novell.com... 130.57.1.88
Connecting to ftp.novell.com|130.57.1.88|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /outgoing ... done.
==> PASV ... done.    ==> RETR imanager273idm361.zip ... done.
Length: 149,585,409 (143M) (unauthoritative)

100%[=========================================================================================>] 149,585,409   11.22M/s    ETA 00:00

23:18:19 (10.74 MB/s) - `imanager273idm361.zip' saved [149585409]

2009-06-18 23:20:19 Jobs:0 Err:0
ab@mybox0:/tmp> exit

[screen is terminating]
2009-06-18 23:31:22 Jobs:0 Err:0
ab@mybox0:/tmp> 

As you can see at first I ran 'screen -q' and then I was in the application at an otherwise fairly normal shell. Next I ran my wget command as before but then something weird happened and the terminal/console displayed '[detatched]'. This was not just a coincidence. Right before that happened I pressed Ctrl+a followed by 'd' which, according to the 'screen' man page, has the following explanation: 'C-a d (detach) Detach screen from this terminal.' By using this key sequence I have just been bounced back to my original shell where I was before running screen (which created a new shell within the 'screen' application).

From here I can do just about anything, including log out of the system, without affecting my download. The beauty of the application is that I can not only disconnect and not affect things (essentially the same thing that backgrounding a process does) but I can reconnect and check on its progress or continue with more commands. This is shown where I typed 'screen -x'. The screen command uses '-x' to reconnect to an existing screen in multi-user mode (vs. -r to just reconnect) (prompting you for which screen to join if there are multiples) and then shows that entire session in your shell. As shown I see everything including the start of my download and its completion (which is why the timestamps from my shell are out of order between joining/exiting the screen). This is the basic start of what screen does as we're basically enhancing what we can do with one command on our own which is useful but far from the end of the story. Keep in mind that it's possible to use the original shell after detaching from the screen for whatever purpose so while you may be SSH'd into a system once you can start a screen, do something, detatch, and do something else without interfering with the first task. This can be done repeatedly with multiple screens or with multiple 'windows' in multiple screens as I'll show later.

Going to the next example let's pretend I need to help a coworker, or I want to access a system that a customer is already in without having them lose their environment variables from the middle of their test. If I can access the system as the same user they are using then I can join a screen that they previously started and are now in, all while they are still in it.

The result is something like using Elluminate, Bomgar, webex, or a similar application, but with just a shell. For those familiar with the Novell Remote Manager, RConJ, rconsole, rconip, or FreeCon (or similar) utilities for NetWare which would essentially let multiple users on the same system see the same output simultaneously and remotely 'screen' lets you do the same thing so you can see each other's work (Linux natively, and by default, lets you do the same thing without interfering with one another). This really becomes nice when there are not alternative remote-access utilities around. It's very nice when a network connection on one side or another is slow or has high latency where full graphical sharing environments do not perform well. With a text-based interface (SSH for example) for the various participants each change to the display sends just the amount of data needed to show that one change. There are no moving windows to draw, popups to distract, etc. As the other party types on the screen all are shown the updates. By default everybody in the session can also manipulate the text in the session as if they were on their own. In my own life this is useful when I have broken my system and need help from developers to undo the mayhem I have inflicted upon myself. As they SSH to my box and share their session via screen I can learn by watching, and even help them get around my system at any time if desired. There is not a great way to show this except with the same commands as were shown above. The windows that the two users would see would be identical so I'll let you try that one on your own. One thing to note is that if EITHER user types 'exit' the screen will close entirely and both users will be dumped back to the shell from which they connected to (or launched) screen. To disconnect leaving the screen intact be sure to use Ctrl+a followed by 'd'.

Consider another possibility where you just always have a bad connection to a machine and you need to not have something break when the connection finally blips into nonexistence. 'screen' is made to be attached-to and detatched from and by default it will detatch anybody who isn't there and keep on chugging. This means that if you use screen consistently you never need to lose changes to a file being edited, or your place in a manual or other document being read either voluntarily or accidentally.

The final situation gives the most room to show off other features which is why it is saved for last. Being somewhat particular about my systems' operations I started watching my logs regularly (continually) and originally it was just the /var/log/messages file. As I was just tailing the file eventually I had to disconnect, lose my place, reconnect later, and then try to find where I was in this fairly large file. Using screen easily solves that problem since I can just SSH in, reconnect to my screen, and see which lines on my display are newer than the last time I was perusing. So far, so good. Some other issues cropped up on me and it became important for me to watch other aspects of my system such as disk space, memory utilization, my ability to ping the outside world, and I started liking to have 'top' just run and show me what was taking up my processor. This is still solvable by the same option of running screen over and over and trying to remember to which I would want to connect for which service, but it requires me to have a bunch of windows open on my client machine or it requires me to detach and reattach to various screens when I just want to look at something quickly with as few keystrokes as possible. Checking the 'screen' man page the concept of a window (within 'screen') is what we will discuss next. Demonstrating this without a video is tricky but I'll try to do my best.

First, let's isolate the commands to be used for system monitoring:

#Watch my top processes, refreshing once per second (from the default of once every three seconds)
top -d1
#Tail my system syslog output located in /var/log/messages filtering out 'STATS:' because I don't care about them and I'm too lazy to make it both STATS: and syslog-ng
sudo tail -f /var/log/messages | grep -v 'STATS:'
#Run a script I have that simply executes 'free -l' and 'df -h' every minute or so indefinitely to keep track of disk space and memory utilization
#The script is literally the following after the shebang line: while [ 1 ] ; do free -l && echo ''&& df -h && echo -e "\n\n"; sleep 60; done
#For those not familiar with bash at all it basically says, while true (forever), run free -l, print a blank line, run df -h, print two blank lines, wait a minute and repeat.
/home/ab/bin/spacemon.sh
#See if I can ping Google since they have better uptimes than I do on my little boxes
ping google.com
#Watch my firewall log for anything strange
sudo tail -f /var/log/firewall
#Watch my ndsd.log for my eDirectory instance in case something strange shows up in it that I should know about
sudo tail -f /var/opt/novell/eDirectory/mytree/myserver-a/log/ndsd.log

So what I have above is a set of several commands I want to have running all the time that I can check and do so quickly and without worry of a network disconnect. Ideally I'd like these to run 24x7 (they don't take much processing time) and be as simple to start as possible. In my case starting these up isn't a big deal since the servers only reboot when a Service Pack (SP) for SLES comes out and I reinstall everything; however, having everything start as easily as possible would be ideal. Let's see how screen handles this currently. I'm going to steal a paragraph from the 'screen' man page to define what I'd like to demonstrate:

<quote>
When screen is called, it creates a single window with a shell in it (or the specified command) and then gets out of your way so that you can use the program as you normally would. Then, at any time, you can create new (full-screen) windows with other programs in them (including more shells), kill existing windows, view a list of windows, turn output logging on and off, copy- and-paste text between windows, view the scrollback history, switch between windows in whatever manner you wish, etc. All windows run their programs completely independent of each other. Programs continue to run when their window is currently not visible and even when the whole screen session is detached from the user's terminal. When a program terminates, screen (per default) kills the window that contained it. If this window was in the foreground, the display switches to the previous window; if none are left, screen exits. Shells usually distinguish between running as login-shell or sub-shell. Screen runs them as sub-shells, unless told otherwise (See "shell" .screenrc command).
</quote>

The first thing to note is that screen normally creates a single window. If you just load screen on accident one day you won't even realize there are windows there for manipulation so getting started does not mean having to learn every feature of the application.

To see which window you are in let's use the window number which starts out at (unsurprisingly) zero. So from within screen press Ctrl+a followed by 0 (zero). Notice that at the bottom of your terminal the following text is shown: 'This IS window 0 (bash).'. As you can see you are in window zero already and you have the window named bash because that was the default application launched when you loaded screen. From here on out I'm going to use the notation that 'screen' uses in its man page for simplicity. Ctrl+a followed by 0 is simply C-a 0. Note that case does matter like it does in any computer. Adding additional windows within the same instance of screen is an easy task. Press C-a C-c (Crl+a followed by Ctrl+c). You may notice a slight flicker of your terminal and now you are in window number one. Test out changing to the current window to get the same prompt as before but telling you that you are already in window 1 with C-a 1 (Ctrl-a followed by 1). You can change to zero with C-a 0 and back with C-a 1. You will not get the note that you are already somewhere unless you really are already there. Just to keep on testing let's create four more screens by pressing C-a C-c four more times. Once completed you should be able to use C-a followed by zero through five to change to any of the six windows within this one screen session. Notice that if you try a numeral that is not valid (six through nine currently) you get a listing like the following (you also get this list when you do C-a C-w):

0$ bash  1$ bash  2$ bash  3-$ bash  4$ bash  5*$ bash

What this is telling you is that you have windows zero through five, they are all named 'bash' (we'll get to names more later), I am currently on window five (denoted by the '*') and I was previously on window 3 (denoted by the '-'). Your own results may be slightly different but if you were to push C-a 3 followed by C-a 5 and you had six default windows in the screen then your results should match my own. At this point let's prove these are all within the same screen by detatching (C-a d) and then re-attaching (`screen -x`). Now we should be back in 'screen' so press C-a C-w we should see the listing of screens again. Showing the screen sharing with others again if you open up another terminal somewhere (or access the box as this user remotely) and then run `screen -x` now you are in the same screen with the six windows as the other terminal. You may quickly find that your display does not always match that of the other user which is important to note. While screen lets both people on the same window (remember, only one window is created by default within screen) see the same things and type in the same area screen does not, at least by default, ensure that everybody within screen is on the same window. Also when you have multiple windows typing 'exit' in one of them closes bash within that window (assuming you're in a shell that the 'exit' command closes and 'bash' was what was opened in the window originally) and does not close the entire screen. Screen only exits when all of the windows within it are closed. When you finally exit screen you should always get the note that screen is terminated so that should help in recognizing that particular event vs. just a single window closing.

So we have a screen with six windows in it. Now let's see if we can get all of my commands in there so we can actually use the thing for something useful. Go to window zero (C-a 0) and type in the first command from my list above. Go to window one (C-a 1) and enter the second command. Continue through the last one. For the script you can either create your own script in your own ~/bin directory or you can just input the while loop directly and have it run. For the ndsd.log file either run all of these on a machine with eDirectory or choose another log file of interest to you which you can tail. When completed experiment with connecting, disconnecting, and changing among the various screens to see what is going on. Another interesting set of keystrokes is C-a C-a which takes you to the previous window. Using it over and over switches you back and forth between two windows indefinitely.

Once that is completed you may find that using Ctrl-a 6 gives you the usual list of windows but still with no details to help you differentiate one window from another except for their identifying integer. While each one started with 'bash' they are all doing their own thing now and a name that YOU care about would be nice. Thankfully the 'screen' devs are smart folks. Try the following, noticing the change of case: C-a A After pressing Ctrl+a and then 'A' (capital A) the bottom of the terminal changes and lets you enter a name for the window. Backspace over the old name and put in something descriptive for you. For window zero perhaps 'top' would be appropriate. Go to the next window (C-a 1) and then change its name (C-a A) to something you prefer. Do this for all your windows. Disconnect, reconnect, and see the window names stick. This removes a lot of ambiguity from life which is nice for our particular task. With window names in place you can also switch to windows by name using C-a ' (yes, Ctrl+a followed by a single-quote or tick) and then entering the name of the window you would like to try.

For navigation there are a few other tricks that are worth mentioning. Iterating through windows is trivial in a number of ways. From the man page:

C-a space
C-a n
C-a C-n     (next)        Switch to the next window.

On the other hand to go backward in the order:

C-a backspace
C-a h
C-a p
C-a C-p     (prev)        Switch to the previous window (opposite of C-a n).

Switching through windows like crazy you may eventually forget where you are. Find out:

Also at some point you may see a message from screen at the bottom of the terminal that goes away before you get a chance to read it. That's not a problem:

C-a m
C-a C-m     (lastmsg)     Repeat the last message displayed in the message line.

Screen also has the ability to write its contents (display) to a file, read in data from a file, change modes (to command mode to give complex commands to 'screen'), change key bindings, lock (password-protect) windows, and basically customize itself to be flexible for you. See the man page for full details on all of the interactive options but be aware they are, out of the box, extensive. Now that we've covered working within screen to a decent extent what about using screen from the command line? There are numerous ways to work with screen before even getting into the application itself including ways to start it without attaching, starting it and running applications automatically, and doing various other things. Some of the options I have found useful (so far) follow.

First let's start with some information gathering. The 'screen' command has a -list (-ls) parameter that will show available screens for the current user. Here is some sample output from various invocations when having no screens, one screen, and two screens running for the current user:

ab@mybox0:~/Desktop> screen -list
No Sockets found in /var/run/uscreens/S-ab

2009-06-19 10:22:25 Jobs:0 Err:1
ab@mybox0:~/Desktop> screen -ls
There is a screen on:
        31435.pts-3.mybox0   (Attached)
1 Socket in /var/run/uscreens/S-ab

2009-06-19 10:22:29 Jobs:0 Err:1
ab@mybox0:~/Desktop> screen -ls
There are screens on:
        31435.pts-3.mybox0   (Detached)
        31447.pts-3.mybox0   (Attached)
2 Sockets in /var/run/uscreens/S-ab

2009-06-19 10:22:35 Jobs:0 Err:1
ab@mybox0:~/Desktop>

So from here we can see that my user has two screens. One is attached-to by a user, and another is not attached-to at all. The first column there is actually the PID of the screen process for those interested in seeing that directly since that can be useful sometimes. The `pstree -p` output also shows my screens and the multiple windows underneath them:

So what is interesting is when you have multiple screens the `screen -x` command takes a little more to actually join a screen. When there is one screen to join it joins it but when there are multiples it needs to be told which to join. By default using the PID is easy enough. It may be more convenient at times, though, to have names for connecting to a screen just like you do with changing to a window within a screen. Let's do a test by creating a new screen named 'testme' after closing my previous screens out.

screen -S testme

When I detach and get a new listing this is now what I see:

ab@mybox0:~/Desktop> screen -ls
There is a screen on:
        5358.testme     (Detached)
1 Socket in /var/run/uscreens/S-ab

That's neat and all but the real use is when we re-attach. No longer are we using things like `screen -x 5358` though that still works; I can directly connect to the screen using the name that makes sense to me (the mere human) instead of the number that makes sense to the computer:

screen -x testme
So what if we want to go to a specific window within a specific screen, all by names or numbers?  All possible.  The new switch is -p and takes either the number or the name.  If you have duplicate names it takes the last one you were in (if applicable and the system knows, since it only knows about two recent windows at most) or else the first one in the list, in my experience, so use a number if that is your situation for some reason, or just get unique names:
screen -x testme -p windowname
screen -x testme -p 3

Either command above works the same way if window '3' is also named 'windowname'. If they are different windows then connecting works to whichever window is specified. Doing this you can connect to your screen named 'monitor' and your window named 'top' from anywhere quickly without having to pull up a list of screens, then a list of windows, etc. Another option that is useful for starting things up (automating our 'monitor' screen on system startup) is to have a .screenrc file for a given user. For my user, for example, I may want the following ~/.screenrc file to exist:

screen -t top top -d1
screen -t messagesmon sudo tail -f /var/log/messages | grep -v 'STATS:'
screen -t spacemon /home/ab/bin/spacemon.sh
screen -t pinggoog ping google.com
screen -t fwmon sudo tail -f /var/log/firewall
screen -t edirmon sudo tail -f /var/opt/novell/eDirectory/mytree0/myserver0-a/log/ndsd.log

This tells 'screen' to load up six windows every time it is loaded. This is not really what I want but it shows how you can customize screen for every time it loads. Instead, for my purposes, I would probably create a /home/ab/.screenrcmonitor file and then I would have a script load when the system started that essentially ran the following:

sudo -u ab screen -d -m -S monitor -c /home/ab/.screenrcmonitor

This tells screen to run as my user and start a new screen named 'monitor' using my customized screen config file as the way to load up screen. The only thing left to do here is to make sure /etc/sudoers is setup so that my user can run these commands without being prompted for the password. If I am prompted for the password I can enter them the first time I join the screen but that means the commands are not really running until I login and set that up. Putting the sudo command above in an init script would be fairly trivial and could then just be set to be the last thing to load with my system. One more consideration at this point may be that anybody who hacks my account suddenly has the ability to tail the firewall, messages, and eDirectory log files which would normally require 'root' privileges. Blocking that is fairly easy to do as well using a crypt-ed password in the same screen configuration file with the following line for the password 'password1':

password OHBn.WKL7Xtrc

With this line at the end of the screen configuration file attaching to the screen will just require the password to be entered once before access is granted. I was hoping to find a way to do that in the configuration file per window (and not for the entire screen) but haven't worked that out yet so if anybody finds it please share the wealth.

The password option leads nicely into the last section I'd like to cover. Normally screen is nice for sharing with others who have your username and password on a system (often, 'root' for administrative things at least), or at least can become your user somehow which could include any root-enabled user. It may be useful, though, to share a screen with a user who does not have the ability to become your user somehow. As luck would have it 'screen' has this ability built in as well. I'll only cover doing this on the fly since that makes the most sense to me but feel free to check the man page for other information as you need it.

First, there is a change I needed to make in the system for this to work. The 'screen' command needed to be SUID'd which means whomever runs it now runs it as the user owning the executable (by default this is 'root'). For the security conscious this means giving the process, and anything it lets the user do, full power over the system. Thankfully 'screen' is written well and I do not know of any bugs that would let a user do just anything unless they were already 'root' but keep in mind that this change must be done. The simplest way to change the default permissions on my SLED box from r-xr-xr-x (0555) to r-sr-xr-x (4555) is with the following command:

sudo chmod 4555 /usr/bin/screen

After entering the 'root' password the file is now set to run as 'root' when invoked. Once done we can startup our screen and use a new set of control sequences, namely C-a : (Ctrl+A followed by colon (:)). This puts you in 'command mode' within the screen and from here we will enter some commands, substituting in values as needed. In my case I am SSH-ing to my machine as the 'guestuser' user whom I want to join my (ab) user's screen. You will need to, as the owner of the screen, run C-a : before each of the following commands:

multiuser on
acladd guestuser OHBn.WKL7Xtrc

Once that is done the user 'guestuser', now on the system having the 'screen' command with the SUID bit set along with you, runs the following:

screen -x ab/monitor

Remember that my screen was named 'monitor' which is why it is shown above. The alternative would be to use the long format for the same as mentioned in the documentation:

screen -x ab/15875.pts-3.mybox0

The longer format is available running as the original user with the '-ls' switch but basically it's a lot easier to have a named screen. Either way after running the command the user (guestuser) is prompted to enter a password which you give them to enter. Keep in mind that you are essentially letting anybody who joins your screen have as much power as you have by default. 'acladd' is made to give the user power to join and essentially full privileges from there. The password parameter on the end (which still uses a crypt'd password) is optional so you can omit it if that suits you. To restrict permissions there is an extensive set of ACLs possible with the 'chacl' command. Some examples include the ability to give only read rights (only see the screen), write rights (ability to type in the screen) and execute rights (the ability to execute commands to the screen like you did to enable multiuser and to add a user to join your screen). The abilities are more-granular than that but this is all getting into somewhat advanced stuff that probably isn't needed by most individuals. For a great set of instructions see the man page.

Hopefully this helps you get started with another great command in Linux. This is available in source version so feel free to add it to other systems. Even windows can benefit from screen if you have something like cygwin installed, though by default there is no 'screen' there to enjoy. If you have ways that you use screen please feel free to add comments at the bottom. If you have other commands or switches that make your lives easier share them so others may learn as well, or write your own notes and post them on this same site for free. Suggestions for other tips/tricks are welcome as well. Enjoy.

NetWare to Open Enterprise Server 2 Migration – Lessons Learned

$
0
0

Over the past couple of months, I have been involved with migrating remote NetWare servers to Open Enterprise Server 2 (OES2) Linux. During this time, there are some lessons that have been learned that hopefully this article will be able to help you, the reader, perform your own migrations more smoothly and avoid some of the pitfalls we encountered.

For organization purposes, I have put the issues and resolutions into group headings based on purpose, like printing or DHCP. There will be general issues and how we overcame them in their own heading.

There are now many sources for information on migrating NetWare to OES2, of which I have authored several, so your choices are not limited to just the Novell documentation, although very helpful. The forums are a very good source for, "What do I do's" and "What does this mean?" questions. I will be addressing specific issues that were encountered repeatedly and what was done to correct or resolve the problem.

DNS-DHCP

Locater Objects and disappearing Zones

When installing or configuring OES 2 in a production tree with DHCP already present, when specifying the context of the dhcpLocator and dhcpGroup objects, ensure you specify the up-most location of the existing dhcp objects, when you run your DNS/DHCP utility, the zones will appear to be ..gone.

If this happens, don't panic, your zones are still there in the tree, it's just you've defined a new locater and it doesn't know about the existing zones. If you use iManager or ConsoleOne, you'll see the zone and subnet objects in the tree.

Remove the newly created locater objects in the contexts you specified when you installed or configured DNS/DHCP and then start your DNS/DHCP Utility and everything will appear as it should.

DHCP Daemon Fails to Load

When running dhcpd on the OES2 Linux server the first time fails and reports the error, "insufficient credentials."

The admin account that was used to initially install and/or configure DHCP in OES2 has changed his/her password in eDirectory or is no longer valid.

Utilizing services for OES2 on Linux that use eDirectory access require an admin or equivalent account for the service to access eDirectory. This key pair is encrypted and stored in a hash on the SLES server and is used by these services. If the account's credentials change, then the access fails.

Launch YaST and OES Configuration and enter credientials for an account, we used a system type account, with admin equivalence and exempt from Universal Password Policies to ensure normal password change intervals are not enforced.

Upon saving your configuration, the service loads normally.

Location of DHCP Objects Post-Migration

After the migration of a location that includes migrating DHCP, you will notice that in the base container, or your base Organization, there will be an object called,

dhcpservice_OESDHCP_[NW-SERVERNAME]

This is the DHCP service objects for OES-DHCP Linux. This service can be moved to the desired container as long as it can be walked from the root to the DHCP server object itself. In other words, if your DHCP Server object is located in OU=NewYork,OU=Eastern,O=US, you can place the DHCP Service object in any of these containers. If you placed your DHCP Service object in OU=DHCPStuff,OU=Eastern,O-US, the DHCP Server object would need to search to a lateral container and it wouldn't find it.

If you create a new DHCP Service object in the tree, you can specify the container that the object is created.

Both scenarios were performed during the migration and it was found that creating new DHCP services and subnets was more efficient and using the migration utility.

iPrint

Migrating from NDPS to iPrint produced it own challenges. Once again, migrations were performed with the migration utility and some were performed by simple creating new IPP printer objects. And Again, it was found more efficient in both reliability and time to create the iPrint objects, Driver Store, Print Manager and Printers from scratch, especially in smaller locations. If you have a large, 100+ printers, definitely use the migration utility.

Assigning printers to users was used by running an LDAP search against the users in a particular container and retrieving their current NDPS printer assignments.

ldapsearch -x -h [servername] -D "cn=admin,o=org" -W -b "ou=newyork,o=org"  objectclass=inetorgperson ndpsPrinterInstallList ndpsDefaultPrinter >newyork.ldif

This command will create a text file with the users DN and the printers assigned.

It also informs you which printer is the user's default printer.

It will look something like this:

## newyork.ldif

# extended LDIF
#
# LDAPv3
# base <ou=NewYork,o=Org> with scope subtree
# filter: objectClass=inetorgperson
# requesting: ndpsPrinterInstallList ndpsDefaultPrinter 
#

# TKing, NewYork, Org 
dn: cn=TKing,ou=NewYork,o=Org
ndpsPrinterInstallList: cn=HP4000-1,ou=NewYork,o=Org
ndpsPrinterInstallList cn=HP4000-2,ou=NewYork,o=Org
ndpsDefaultPrinter: cn=HP4000-1,ou=NewYork,o=Org#1246908645#1

# RJames, NewYork, Corp
dn: cn=RJames,ou=NewYork,o=Org
ndpsPrinterInstallList cn=HP4000-1,ou=NewYork,o=Org
ndpsPrinterInstallList cn=Xerox5632-3,ou=NewYork,o=Org
ndpsDefaultPrinter: cn=Xerox5632-3,ou=NewYork,o=Org#1246728603#1

what we want to do is take this list and modify it to reflect the new IPP Printers and then do an LDAP Modify to add them to each user.

If we create an IPP printer and then assign it to a user, there is an eDirectory pointer that is assigned for that printer. In ConsoleOne, if we look at the user we assigned the IPP printer, and click the Other tab, we can see this code pointer.

Look at the attribute, iPrintiCMPrinterList. Expand the attribute and there will be a 10 digit number. Write this number down. Now, below that 10 digit number will be a numeric value of 1 or 5. 1 means the printer is assigned to that user. 5 means that that printer is the user's Default Printer.

Create a new ldif file, I used the existing output from the query. Replace ndspPrinterInstallList: with iPrintiCMPrinterList: append #[ten-digit-code]# to the end of the entry and place a 1 or a 5, remembering that only one printer can be the Default. It should look similar to this:

iPrintiCMPrinterList: cn=HP4000-1,ou=NewYork,o=Org#1246908645#1

Add to each user entry, iPrintiCMPrinterFlags: 1, this allows the modification of the users IPP printer assignment.

Remove the line stating the ndpsDefaultPrinter

The above users entries should look similar to this:

## newyork.in

# extended LDIF
#
# LDAPv3
# base <ou=NewYork,o=Org> with scope subtree
# filter: objectClass=inetorgperson


# TKing, NewYork, Org 
dn: cn=TKing,ou=NewYork,o=Org
iPrintiCMPrinterList: cn=HP4000-iPrint-1,ou=NewYork,o=Org#1246908645#5
iPrintiCMPrinterList: cn=HP4000-iPrint-2,ou=NewYork,o=Org#1246908645#1
iPrintiCMPrinterFlags: 1

# RJames, NewYork, Corp
dn: cn=RJames,ou=NewYork,o=Org
iPrintiCMPrinterList: cn=HP4000-iPrint-1,ou=NewYork,o=Org#1246908645#1
iPrintiCMPrinterList: cn=Xerox5632-iPrint-3,ou=NewYork,o=Org#1246908645#5
iPrintiCMPrinterFlags: 1

Now, once you have your printers created and the PrintManager and Driver Store are loaded and running, run the following command from the OES 2 Linux server to add the users' printers.

ldapmodify -x -h [servername] -D cn=admin,o=corp -W -f /path/newyork.in

If you don't get any errors, you can verify the settings in ConsoleOne or through iManager - iPrint User Management

Removal of the old NDPS printers from the user's local profile is fairly simple. Once your migration is complete, remove the NDPS Manager and the NDPS printers will be removed from the users' workstation the next time they log in. You must still delete the NDPS objects from the tree manually.

The above process for printer query and assignment was developed for our team by Michael Bruner. Pretty Sweet process and a big time saver.

File System

File system migration is fairly straightforward, keeping in mind some gotchas that could pose a potential hair-pulling session.

  • Make sure that any virus scanning software on both the Source and Target server is offline and not running. Although it would catch any potentially infected files, some features could prevent the file copy altogether and ultimate failure of your migration.
  • Migrate non-user data first. Directories like, Zenworks and static type directories can be migrated without worrying about them changing while the migration is in progress. We migrated this data during the day and then the user data after hours and this saved a lot of time waiting for 10+GB of data to copy from one server to the other.
  • Migrate user data separately and perform a preliminary migration during regular hours, then perform a Sync of the data after hours to catch any files that might have changed.
  • Once all the data has been migrated to the new OES 2 Linux server, dismount your data volume on the NetWare server. This will prevent users accidentally mapping a drive or having a local mount point to the old server.
  • Unless you are doing a Transfer ID migration, change your login script to reflect the new servername once the data is migrated. This will also prevent remote access users, VPN, etc, from logging to the old server while you are still finishing up. Disabling login on the old server will only prevent you from migrating anything.

General

Let your users know that the first time they login the next day, that in general, the login will make some backend modifications and, if you run ZENworks Novell Application Launcher, (NAL), the client will re-cache everything and it's going to take a few minutes. Also, SLP and iPrint will take a few moments to install any printers assigned prior to migration.

If you use RSYNC for backup, make sure that port 873 is open on the OES 2 Linux server. The errors you get are confusing and seem to be DNS related when it's simply being blocked at the server.

Post Migration Issues

There are a couple of things that have been observed post migration, mostly related to the clients, mainly SLP. OES 2 uses OpenSLP and if you have an UNSCOPED setup, you'll need to add the DA's to each workstation's registry either manually or through an application push. Same with the iPrint Client. Do these items a week to two weeks prior to migrating and you should have no "Tree or Server Cannot be Found" the morning after migration to OES 2 Linux.

We did not migrate iFolder, NTP or FTP as these were not part of my current environment. When the servers were built NTP was set up independently.

Conclusion

Migrating to OES 2 Linux has been a wonderful learning experience and also, a lot of fun. We had some nail biting moments and gained far more Linux knowledge than we had when we started. I hope this information is helpful to you in your own migration process.

Enjoy!

Branch Server Redundancy

$
0
0

Structure:

For each branch to have DHCP redundancy, it will require splitting the subnet. Take the following example for Branch 1.

Subnet is 172.xx.45.0. The existing pool of addresses would be changed to 172.18.45.50 – 172.18.45.145. A new subnet would be created for 172.xx.45.146 and a new pool of available addresses would be created in the range of 172.xx.45.147 – 172.18.45.240. The newly created subnet would be assigned to a dhcp server at our corporate office. Once these modifications are made, dhcp services require a restart on both the Branch 1 and Corporate servers. It is normally recommended to use the 80/20 rule for subnets. This means that 80% of the available addresses go to the primary server and 20% to the secondary server. But since the largest branch has only 38 workstations, it should be safe to assign 50% to each since there are a little more than double available addresses to workstations.

While this configuration would not make up for a shortage of IP addresses should a large number of workstations be added, it will not make the situation any worse, either. If a server has no addresses left in its scope, it will simply ignore requests from clients for an address. When both servers are up and running, they can both reply to requests if they have available addresses. Likewise, when the servers run out of addresses, they run out. It is no different than how a single DHCP server would operate in this respect.

DHCP Helper:

A dhcp helper address must be configured on the router at each branch to forward dhcp requests to the external dhcp server configured with the second subnet for that branch. Client machines multicast a request for an address. The local dhcp server receives the request and the request is forwarded by the router directly to the secondary DHCP Server. Since the server on the local network replies faster than the remote server, the clients should receive their addresses from the local server. Should the local server run out of addresses or crash, then it will no longer service the requests and the clients will receive their addresses from the remote server. Once the local server is back online or has addresses available, it will begin to service requests once more. The following diagram demonstrates this model.

SLP

It is extremely important that you insure without a doubt that SLP is functioning 100% via DHCP OR by setting static SLP settings in the Novell Client. It is also extremely important that under the novell client’s advanced settings, the Server be left blank. If the server name is filled in with the branch server name, the client will attempt to log into the client to that server. If the server is down, the client will simply fail. If it is blank, it will attempt login to the servers it receives from DHCP in the order listed. If the branch server is listed first and it is down, then the client will attempt login to the secondary server.

DHCP Options:

DHCP plays a vital role in the failover functionality for the services. The following options should be set in DHCP.

Server Options:

Authoritative   yes
Pool Options:
3	routers		xxx.xxx.xxx.xxx
6	domain-name-servers xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx
15	domain-name	acmeinc.com
78	slp-directory-agent     	xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx
79	slp-service-scope	YOURSCOPE
85	nds-servers	xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx
86	nds-tree-name	ACME_TREE

Option 85 is the Novell Servers to log into in the order you would like for the clients to attempt login. The branch server’s address should be listed first.

Client DNS Redundancy:

Each workstation receives 3 DNS server entries. If one of the servers is down, it will go to the next, therefore, this redundancy is built into the manner in which the client resolves names naturally.

File Access Redundancy:

Each Branch Server has the user data rsync’d back to NWHQ-PRD-1 on a nightly basis. Should the branch server fail or become available, then the user drive mappings can be automatically switched to the NWHQ-PRD-1 Server. The manner by which the client determines which server to map to is by determined by the login server. The login server is determined from a list of servers given out by DHCP. If the branch server is up, it will be the first server in the list and the client will log into the branch server. If the branch server is down, the client will log into NWHQ-PRD-1. There is a built in login script variable called %FILE_SERVER. This variable is populated based upon which server the client logs into. Therefore, we can create directives in the login script like the sample that follows:

If %FILE_SERVER==LATL-PRD-1
    Then map root U:=\\LATL-PRD-1\VOL1\USERS\%CN
     ELSE MAP ROOT U:=\\NWHQ-PRD-1\VOL1\ATLANTA\USERS\%CN
END IF

This can be repeated for each drive to be mapped.

If a branch is down for an extended amount of time, then a "manual" reverse rsync should be performed to place the updated files on NWHQ-PRD-1 back on the branch server.

Printing:

As a general rule, printing is not as much of a show stopper as the other items mentioned above. If there was a complete hardware failure of the branch server, and it appeared that there would be an extensive down time of more than 1 day, the servers could be reconfigured on another server and the users could manually install the printers through the iPrint Management web page on the new server for us until the branch server could be brought back online.

Base64-to-hexadecimal converter

$
0
0
license: 
GPL 3.0

Earlier a need to convert base64-encoded values to the appropriate hexademical string came up and needed to be done multiple times from the command line in a scripted fashion. Unfortunately the latter requirement prevented me from using a website that does the conversions (at least, without a bit of work) as I usually use this one:

http://home2.paulschou.net/tools/xlate/

So off to Perl to see how quickly I could conjure something up. With the help of pre-installed and available MIME modules for Perl and some tips from Google on the best way to print data in its hex form this little script came into being. It is not an overly-high performer as it only takes one string per invocation but it could easily be adapted to read from another source and then process multiple times without loading a new Perl environment every time. In the meantime it either takes STDIN base64-encoded values or it will prompt for values. Everything except the actual hash is written to STDERR so you can filter that out and process from there if needed.

ab@mybox0: ~> echo 'RR19QlE2ED2IDSMhVKKjc0mpe4Lxx7nhHgyXBQ1wbKY=' | ./base64tohex.pl

ab@mybox0: ~> ./base64tohex.pl RR19QlE2ED2IDSMhVKKjc0mpe4Lxx7nhHgyXBQ1wbKY=

ZałącznikWielkość
base64tohex.pl.txt635 bytes

LJDT: The 'watch' command

$
0
0

Do you ever find yourself running a command, pressing the up-arrow, then [Enter], then the up-arrow, then [Enter], then the up-arrow.... this is a great exercise that will get your fingers used to typing random odd key sequences without your eyes paying attention, but otherwise it's a complete waste of time. A command I have come to love helps me point my OCD tendencies to other pointless tasks while running commands over and over for me. Introducing 'watch', because Linux Just Does That.

The first time this command saved my life was when I ran a command and had to wait several seconds to several minutes for a port to start listening for incoming data. A LAN trace would not satisfy the task... I had to use netstat over and over and over just to see if the TCP port would ever listen properly (no alternatives for seeing the list of open ports at the time). So off I went with the following command:

#get the output from netstat -anp (all output, numerically, showing process information for anything that is found so I know what is actually using each socket),
#then redirect away any errors because I'm not 'root', and finally grep for 'LISTEN ' which gets me just TCP output.

netstat -anp 2>/dev/null| grep 'LISTEN '

Up-arrow, [Enter], up-arrow, [Enter].... eventually I broke down and wrote a really dumb loop:

#Same as above, once per second.
while [ 1 ] ; do
  netstat -anp 2>/dev/null | grep 'LISTEN '
  sleep 1
done

This worked well enough, but it made my screen scroll indefinitely and it still required the ability to write a loop. So somebody pointed out a command called watch which, surprisingly enough, lets you "watch" the output of a command over time. Give it an interval and away you go:

#Same as above, but easier and nicer formatting.

watch --interval=1 "netstat -anp 2>/dev/null | grep 'LISTEN '"

The output is something like the following:

Every 1.0s: netstat -anp 2>/dev/null | grep 'LISTEN '                                      Tue Sep 22 13:46:02 2009

tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      -
tcp        0      0 :::111                  :::*                    LISTEN      -
tcp        0      0 :::22                   :::*                    LISTEN      -
tcp        0      0 ::1:631                 :::*                    LISTEN      -

Another example is watching for a file to change sizes. When waiting for an application to write something to a file the same thing applies... either I can use the up arrow to its destruction or I can just watch the output of a terminal that is regularly refreshed.

watch --interval=1 'ls -l /path/to/some/file.log'

Notice that not only does it show the output (once, not over and over), which is nice, it actually shows the command that is running, how often it is running, and a current timestamp indicating the last time the command was executed (within the last second or so with this particular command). By default 'watch' runs the specified command every two seconds instead of every one, but I need that up-to-the-second output at least. The command(s) specified can be fairly elaborate and anything executable on the system. Like a good command it works within a 'screen' so you could have one of these running perpetually watching for certain data in certain files and then acting based on that which you could connect/disconnect to/from as often as you would like. The 'watch' command comes with the 'procps' package on SUSE which is the same one that brings a few other useful tools to users including but not limited to 'ps', 'w', 'vmstat', 'free', and 'pgrep'.

There are other ways to write little 'watch'-like applications as shown above but this is obviously something others stumble upon and want to do regularly so to do it the quick/easy way use the right tools. Loops are decent tools, but a command that meets your exact need is a great tool and having a wide array of great tools is what Linux Just Does.

How to Add Additional Perl Modules to Your Linux Installation

$
0
0

Depending on whether you're using openSUSE, SUSE Linux Enterprise Desktop, SUSE Linux Enterprise Server or Open Enterprise Server and which version you're using you may be lucky and find that you can easily add the required Perl module via YaST.

Here packages are named perl-<module> where double colons (::) are replaced by a minus (-) so for example the Net::SNMP module can be added by installing the perl-Net-SNMP package.

However the majority of Perl modules are not available through YaST so have to be added another way. Fortunately Perl on Linux provides a way of automating the manual process.

As root run

# perl -MCPAN -e 'shell'

If it's the first time this has been done you need to answer some questions - you can either choose to autoconfigure these or run through manually and choose defaults. If you get prompted for a CPAN mirror URL use something suitable from http://search.cpan.org/mirror

Once done you can then install the additional modules - case is important!

For example, to install the Net::SMTP module do

cpan[#]> install Net::SMTP

at the end of which, if this is the first time using the shell, it will probably note that there's a newer CPAN so then

cpan[#]> install CPAN

which may take a while!

Afterwards it will prompt you to reload CPAN so

cpan[#]> reload CPAN

To exit

cpan[#]> quit


ECMAScript/JavaScript Development Without a Web Browser

$
0
0

Have you ever been coding Java and suddenly wanted to do something using loosely-typed variables or worked out an issue with a little less Java-ness? Have you ever wanted to take advantage of the familiarity people have with some languages (ECMAScript/JavaScript) while still using something that is not a web browser as an environment? Have you ever wanted to debug your JavaScript without refreshing a web page and your cache that just won't seem to ever refresh properly for you? This and more will be covered today with practical examples in both the Novell Identity Manager (IDM) and Novell Sentinel applications plus any other Java-based applications that you may have around your own environment.

I think it may be prudent to review briefly what everything we'll be discussing really is. For example, the title mentions ECMAScript and JavaScript together almost as if they are the same thing; that is actually a nice little coincidence as they are the same thing. ECMAScript is the proper name for the language standard as approved by (you guessed it) ECMA but JavaScript is the common name of the language that we all know and love. JavaScript has the word 'Java' in it but another common clarification is that JavaScript is not, in any way, directly related to Java. It is not meant to be a way to script Java and has no ties to the Java language with regard to data types, interpreters, etc. That it is very similar to Java syntactically is completely unrelated to its name; it is as much similar to C++, PHP or other C-ish languages. So as a recap ECMAScript is JavaScript and neither is related to the language and binaries from Sun known as 'Java'.

Another notable term used throughout this document will be 'Rhino' which does not refer to the large mammal with horns on its head but rather to the Mozilla project which makes a JavaScript interpreter available within a Java environment. At some point later I'll also mention Eclipse, the open source Integrated Development Environment (IDE) originally from IBM. Finally the 'Aptana' plugins for Eclipse make the IDE a bit more friendly. They are not required for anything specifically to work but they are recommended when doing Sentinel SDK development which is largely based on JavaScript. All of these tools and technologies are (in some form or another) free for use and most if not all are also open source.

I'm using a few examples from other sites I found online in preparing this which you may want to review for second opinions and clarifications on the topics covered herein. I'll try to mention them as I borrow from them but they include W3Schools, Jo Edkins' JavaScript tutorial, and the Rhino project page. Later as we get into IDM and Sentinel-specific tasks we'll be doing things that may reference the Sentinel SDK specifically.

In the introduction I mentioned a few reasons for going into browser-less JavaScript. Despite my disclaimer earlier that JavaScript and Java are not related (that is true: they are not) they can be used together thanks to the project known as Rhino. Rhino comes in the form of a JAR file which can be added to a Java application's classpath statement. Once there then the ability to run JavaScript code within Java (with some limitations) is present and makes up the rest of this document. There are some notable differences between JavaScript in a browser and JavaScript within Rhino that will be obvious to anybody who has used JavaScript before.

First, there is no Document Object Model (DOM) with which to interact. For example, 'document.write("stuff");' completely fails because there is (by default) no 'document' object. Other little debugging tricks like using 'alert' also fail. On the good side, though, there is the ability to run JavaScript interactively which can be nice for quick development and immediate feedback on simple problems with syntax or other runtime issues (invalid objects, methods, etc.). Instead of adding an alert statement a developer can simply use the print() method and put the data to be printed in there and see it without refreshing a browser or saving a file. I also mentioned the possibility of using what is essentially becoming a commodity programming language; everybody who has done basic web development has come across JavaScript and likely written a little bit. The syntax is familiar and examples of ways to do many things are plentiful. JavaScript is at the heart of AJAX and large frameworks such as the Google Web Toolkit (GWT). Needing to hire a ten-year-veteran of a proprietary programming language is not necessary for simple development work.

Let's dive right in. I've debated putting the JavaScript stuff before/after the Rhino stuff and have decided to put JavaScript first as that is what this is primarily about. If you want to get the Rhino side of things setup it is trivial and will be covered below so jump down, get that, then come back to walk through everything on your own computer as you read it. I will be separating out code examples so it should be easy to read this then setup the environment, and just run code like crazy later on. First the basics of JavaScript are similar to other programming languages you may have used. There are operators similar to those you have probably used before, variables, conditionals, loops, and JavaScript is also object-oriented. Unlike Java and some other languages the variables can change data types or be treated as different data types very easily. In the case of the Rhino implementation of JavaScript any classes available to the Java environment running Rhino are also available within JavaScript as will be shown later.

Variable declaration takes place with a string representing the variable which is made up of alphanumeric characters plus underscores (A-Za-z0-9_) and variables must start with a non-numeric from what I've found. Unlike some other languages like Perl or PHP there is no character preceding a variable to mark it as a variable. A variable has a scope that can be either limited by using the 'var' keyword or it can be a globally-scoped variable by simply declaring and instantiating the variable without the 'var' keyword. Declaring a variable just means telling the interpreter that the variable exists; initializing the variable means not only telling the interpreter that the variable exists but that it should also have a value which is then set into the variable. Declaring a scoped variable of 'myusername' and initializing it to have the value 'ab' all in one command would look like the following:

var myUserName = 'ab';   //Declare 'myUserName' and set it to the string value 'ab'.

var myAge = 30;    //Declare the 'myAge' variable and initialize it to the numeric value 30.

var isCool = true;    //Declare and initialize isCool to the boolean value true.

Note in the examples above that the first one included single-quotes around the value 'ab' while the second did not have quotes around numerics. Like many languages quotation marks should be used around strings but are not needed for numeric or boolean data types. Something else potentially of note is the use of comments at the end of the line. Like in Java, C and other languages single-line comments start with two slashes (sometimes known redundantly as forward-slashes) and comment out the line from the point of the two slashes until the next line starts. Multi-line variables are also similar to C where they start with /* and end with */ and everything in between is a comment. Also note the use of whitespace. Between keywords and variables ('var' and 'myUserName') whitespace must exist, but around the equal sign and between the semicolon and the comments the whitespace is optional and just there for readability. This bring up another point: newlines are not there to end a line or a statement in most cases as semicolons do that. In the example above newlines were needed to end the comments but otherwise the three variable initializations could look like the following and have exactly the same outcome:

var myUserName='ab';var myAge=30;var isCool=true;

Moving along it is possible to declare but not initialize a variable by simply using the 'var' keyword with the variable name (or names) and then a semicolon to end the line. The three variables above could be declared in the following way though they would all be null:

var myUserName, myAge, isCool;    //Declare but do not initialize explicitly.

In many languages declaring a variable without initializing it sets the value to a null value or a similar default and JavaScript is no exception to that convention. Because I was taught to always initialize variables when declaring them (to keep from getting frustrated when using languages that do not automatically initialize variables for you, such as C++) all of the examples here will declare and initialize variables and I encourage that behavior as it is a good practice overall.

So we have variables, let's do a little bit with them (I recommend having them set for these examples as it is more fun than using uninitialized variables). First, let's do the basic printing stuff:

print('Hello World!');   //Basic Hello World program... a one-line in JavaScript; doing this in a browser would simply replace the 'print' section with 'alert'.
print(myUserName);  //Print whatever is currently in the myUserName variable.  Notice there are no quotes around the characters that make up the variable.
print(myAge);        //Print '30' to the screen.
print(myUserName + '' + myAge);    //Print 'ab 30' by concatenating myUserName's value with a single space and then with myAge's value.
print('It is a true statement that ' + myUserName + ', who is age ' + myAge + ', has an isCool variable set to ' + isCool + '.');

Okay we have beat these variables and printing commands to death and it all seems to work. Let's play with operators besides the assignment operator (also known as the equals sign). So far we have just set variables to a value and that is it. It is time to change the values. You have already seen that '+' can join strings (concatenate) together but it is also available for arithmetic when numeric types are used exclusively:

myAge = myAge + 3;   //myAge should now be 33 if it was 30 before
myAge += 3;    //Functionally equivalent to the line above, but shorter.  myAge is now 36.
myAge /= 12;   //Divide myAge by twelve and set the result back into myAge... should be 3 now.
myAge *= 12;   //Multiply myAge by twelve and set the result back into myAge... should be 36 again.
--myAge;   //Decrement the value of myAge by one and set it then returning the value (35);
myAge--;   //Return the value of myAge and THEN decrement it by one so the NEXT time myAge is called it will show the value of 34.
myAge++;   //Increment the value of myAge after returning the current value (34).
++myAge;   //Increment the value of myAge and return the value (36);
myAge = 0XFF;   //Set myAge equal to 0XFF which is a hexadecimal representation of the decimal number we all know and love as 255;
myAge = 3.00e2;   //Use scientific notation to set myAge to 300.

Most of these examples apply to just about any programming language you can find so let's move over to Arrays. An array is a set of multiple values all available via a single variable name. The different values are accessed individually by using an index or an offset from the variable name itself. For example the zeroth offset [0] from 'myFavoriteThings' will be 'pianos' while the first offset [1] will be books. See below for the code examples:

var myFavoriteThings = ['piano', 'book'];   //Initialize the myFavoriteThings array with two values.
var myFavoriteThings = new Array('piano', 'book');    //Functionally equivalent to the previous initialization of the array.
print(myFavoriteThings[0]);   //Print 'piano'
print(myFavoriteThings[1]);   //Print 'book'
print(myFavoriteThings[2]);   //Print 'undefined' since this index does not reference anything that is as of yet defined in the array.

Something nice about a few of these languages is that Arrays are defined as 'sparse' arrays meaning that they only allocate memory for elements which have data in them. For example you could define an array with ninety-nine slots (places, indexes, offsets, whatever) for strings but only those indexes which were populated would actually take up space. The arrays in JavaScript (and other languages too) are also variable in size allowing you to add or remove elements on the fly without redefining the array's size in a static way which is very nice.

Another note that always makes me happy is that JavaScript arrays are naturally 'associative' arrays which means that the index does not always need to be just a simple integer but can be a 'key' which then has a 'value' referred-to by the key. Initializing an associative array is a little different but still simple to understand; notice the use of the braces around the key/value pairs instead of brackets around the values alone:

var myFavoriteThings = {'instrument':'piano', 'media':'book'};
print(myFavoriteThings['instrument']);   //Prints 'piano'.
print(myFavoriteThings['media']);   //Prints 'books'.

The initialization sets up the associative array (Perl programmers will recognize the term 'hash' meaning the same thing) of myFavoriteThings with a key of 'instrument' which refers to the value 'piano' and the key 'media' which refers to the value 'book'. Just to mention a quick example about multi-dimensional arrays JavaScript supports these as well and the syntax just nests what we have already learned to provide the functionality. This gets confusing for beginners so touching on this is probably all I'll do:

var myPets = {'cats':['rosco', 'tiger'], 'horses':['mikey', 'shasta']};    //Initialize the myPets associative array with a 'cats' key referring to an array of names of cats, and a 'horses' key referring to an array of names of horses.
print(myPets['cats'][0]);    //Prints 'rosco';
print(myPets['horses'][1]);     //Prints 'shasta';

With the basics of variables, arrays, and operators under our belt we can start doing fun things like conditionals and loops. The basic conditional executes code based on conditions that are checked. For example writing code to tease somebody for being "old" is trivial:

if(myAge > 29) {    //Check for the variable 'myAge' being greater-than the numeric value 29.
  print('Oh boy you sure are old....');    //Print a message
} else if (myAge > 25) {
  print('The joys of middle age... hopefully you are no longer dumb and acting like it but are still not too old either.');    //If the previous if statements haven't matched try this one.
} else {    //If the previous if statements did not match then the 'else' branch is used instead, assuming an else branch exists that is....
  print('Whippersnapper!  Get off my lawn.');
}

For a lot of possibilities for a single 'if' statement sometimes a 'switch' or 'case' statement is advantageous.

myResponse = 'optimistic';
switch(myResponse) {
  case 'pessimistic':
    //do something here for pessimistic responses
    break;   //skip to the end
  case 'optimistic':
    //do something here for optimistic responses
    break;   //skip to the end
  default:
    //Do whatever you want here for those times the previous cases were not met.
    break;
}

After conditionals are understood loops seem like a good place to venture next. We'll go over the 'for' and 'while' loops briefly. The 'for' loop starts out and typically sets an initial value for a counter/iterator, then sets a condition for how long the loop should run, and finally ends with a statement about how to increment the counter or iterator. All of the code within the braces is then run as long as the condition within the definition of the 'for' loop is true:

//Setup ctr0 to the value 0 and loop while ctr0 is less-than 10 incrementing it by one (++ctr0) at the start of each iteration.
var factorial = 1;
for(var ctr0 = 0; ctr0 < 10; ++ctr0) {
  print(ctr0);    //Print the current version of the ctr0 variable
  if(ctr0 != 0) {
    factorial *= ctr0;
  }
}
print(factorial);

The loop above combines a loop and a conditional as well. It initializes ctr0 to 0 and then loops while that variable is less-than 10, incrementing it by one every iteration. Inside the loop it prints the value of ctr0 and also, if the ctr0 value is not equal to 0, multiples it by the current 'factorial' variable (which was originally initialized to 1) and after the entire loop is completed prints out the factorial. This functionally is a little script that, with a lot of extra stuff, gives us the value for 9! or nine factorial (the product of all integers from one to nine) or 362880. Another kind of loop is the while loop which leaves control of counters or iterators to you and just loops while the user-defined condition at the beginning is true:

//Another factorial generator, though this one stops when the product reaches a certain value rather than when the factors reach a certain value.
var ctr0 = 1;
var factorial = 1;
while(factorial < 320000) {
  factorial *= ctr0;
  ++ctr0;
}
print(factorial);

Another type of loop that is very handy is one that loops through values of an object. Remember those arrays? What if you wanted to print everything in an array? Turns out that is also quite easy:

//Print all of the pet types.
for(var onePet in myPets) {
  print(onePet);
}

Note that if you still have the multi-dimensional array setup from above this only prints the keys for the first array and does not go into the nested arrays or other data structures. This is typically useful as you could easily get into those now by nesting another loop within the first loop that went through all of the values for any object found during this loop's iteration.

//Print all of the pet types and then the pets.
for(var firstElement in myPets) {    //Loop through all of the types of pets as those are the keys for the myPets associative array (hash).
  print(firstElement);   //Print this type of pet.
  for(var secondElement in myPets[firstElement]) {    //Loop through the keys of the nested arrays of the top arrays to get the names of the type of pet through which we are currently looping.
    print('' + myPets[firstElement][secondElement]);    //Print a couple spaces and then the name of the current pet.
  }
}

So with very little code we can get a list of everything within nested associative arrays.

After about ten minutes of programming (if that) you will probably find that you do the same things over and over. For example you probably print a lot, you may want to handle data types or objects in certain ways, or whatever. In programming exists a concept of functions or methods which take parameters (typically) and return a value or some output after doing a specific bit of processing on the available data. This is useful as it encourages data reuse so you are not writing the same ten lines of code thousands of times in a program or script and then having to modify all of those thousands of instances the second you determine that you have a glitch in your logic or a change to what your code does because of new requirements. Creating a method (function) in JavaScript is easy and it acts a lot like a variable. A method has a keyword ('function') followed by a name (myFunction) and then data which are used within the method. In the case of the method the code is also executed when the method is called but otherwise methods are very similar to variables:

//Define a method named addTwoNums which, when sent two values, adds them together and returns the result.
function addTwoNums(num0, num1) {
  return num0 + num1;
}

The sample method above could be called like the following, which I expect would end up printing the value 25 to the screen:

print(addTwoNums(18, 7));   //Print the sum of 18 and 7 to the screen.
print(addTwoNums(12345, 678910));   //Print the sum of '12345' and '678910' which is 691255.
print(addTwoNums(myUserName + ' has isCool set to ', isCool));   //Concatenates because of loose interpretation of data types in JavaScript and the overloaded '+' operator which also concatenates data.

Another benefit of using methods/functions is you can describe complicated code in a simple fashion. As an example it is easier to name a method something about returning a random integer from zero to four billion and change than it is to figure out what the code does sometimes:

//Method to get a random integer from zero to 2^32.
function getRandomInteger() {
  return (Math.floor((Math.random()*(Math.pow(2, 32)))));
}

print(getRandomInteger());

Finally this is a nice introduction to objects. In the previous example I made three calls to something called 'Math' which is a math object in JavaScript that is just waiting to be used for fun things that otherwise are not always fun to code. Specifically I used Math.floor() (round to the next integer below the value passed into the method), Math.random() (returns a decimal number between zero and one) and Math.pow() which is used for getting the results of one number taken to some power (two to the third power, for example, is eight). The Math object also returns results for things like pi (3.14159...) and other numbers that are fairly constant in the universe. It includes ways to calculate the sine, cosine, and other geometric things. Objects, in the software sense, are instances of certain classes and therefore take on the attributes and methods of those classes. The Math object exists to provide calculated results to the programmer as shown in the previous example. In JavaScript it is possible to create classes for your own purposes. A common class example is the 'Person' class. Instance variables often include things like eye color, first name, government identification number, e-mail address, etc. Instantiating an object named 'johnSmith' of the Person class could then have the variables set with data specific to John Smith. Benefits of doing so mean the ability to contain all of the attributes related to a single object all in one place instead of trying to maintain different arrays of the data for the object.

One thing that is interesting about variables in JavaScript is that you do not need to declare them as a specific object type (class) all of the time for them to be treated like a certain class. As an example any old string variable can immediately use the methods of the String class. Consider the following example where a simple string is created out of thin air but then the method toUpperCase() is usable to return the string all in upper case and the length attribute is usable to return the length of the string::

var myUserName = 'ab';    //Initialize the variable.
print(myUserName.toUpperCase() + ' is ' + myUserName.length + ' characters long.');    //Print a sentence showing the string all in upper case and the length of the string.

Other methods of the String class including the indexOf() method which returns the offset of a given substring. The match() method also lets the programmer test if the string matches a given pattern known as a regular expression. Some examples follow:

print('b' is found in the myUserName variable at index ' + myUserName.indexOf('b'));    //Print a sentence along with the offset of the 'b' character.
print('"b" will be replaced by " qwerty" as a result of this example');
print(myUserName.replace('b', ' qwerty'));   //Replace 'b' with ' qwerty' and print the resulting 'a qwerty' but do not set myUserName to this new value.

As another example of a built-in object we have the Date object which, as you probably guessed, deals with date and time functions. With this we can get the current time as it is set in different timezones around the world, in different formats (two and four-digit years for example), and other different ways. A date object can be set to the current date (and is by default) but can also be set to another arbitrary point in time. By comparing a Date object initialized before an operation and then after the operation is completed the time taken to perform the operation can be ascertained. As an example I'll create a Date object, count for a while, then create another Date object, and compare them to see the number of milliseconds between them:

var ctr0 = 0;
var myDate0 = new Date();      //Create an object to help timing a counter from zero to some number specified in the for loop.
for(ctr0 = 0; ctr0 < 1000000; ++ctr0) { }
var myDate1 = new Date();      //Create an object at the completion of the counting operation.
print('Counting took ' + (myDate1 - myDate0) + ' milliseconds.');   //Print the time taken in milliseconds.

The resulting text on my laptop was 'Counting took 139 milliseconds.' Some other basic calls to the Date object will show month, day, hear, hours, day of the year, seconds since 1970-01-01, etc. One really neat property of JavaScript objects is the ability to convert them back to the source which created them. For example the following examples show the code to create the myDate0 and myDate1 objects above:

print(myDate0.toSource());    //Show the source of the myDate0 object.
print(myDate1.toSource());    //Show the source of the myDate1 object.
print(Date().toSource());    //Show the source of a new Date object as it would be created right now.

The result of the three commands above (for me right now) follows:

(new Date(1257391274428))
(new Date(1257391274567))
(new String("Wed Nov 04 2009 20:31:03 GMT-0700 (MST)"))

One value here is that an object could be saved out to a string and read in by another script or a later incarnation of the same script. It also aids in learning about the objects' creation as well as seeing how the objects are currently setup. Also remember all of those Daylight Savings Time perils that crop up every year or so, or maybe every few months? Creating a Date object that is powered by Java and therefore knows everything Java knows (as updated by tzupdater) is trivial and seeing what the time is per that Java invocation does not require coding anything in pure Java then compiling and running. Just create a date object and see what the time is relative to GMT, or see how the date object looks in its source to see if the current time is inside the range of dates for daylight savings time (MDT for Mountain Daylight Time) or outside that range (MST for Mountain Standard Time). As any old Java implementation can be used to load the Rhino environment in which we do all of this current stuff checking one Java implementation and then another is trivial.

So now we have a basic overview of the JavaScript language it is probably a good idea to talk about how to do this outside a browser finally. It's time to show off the Rhino environment in which all of these examples are currently running. In order to get Rhino working on your system one simply needs a Java runtime installed (chances are one is already there, but if not get one from http://java.sun.com/ or your system's favorite installation source) and then the JAR file from the Rhino project that provides the desired functionality. With Java (including the 'java' executable in your user's PATH) and the JAR in place the following command should get things going for you.

java -jar /path/to/js.jar
Rhino 1.7 release 2 2009 03 22
js>

As you can see the prompt is now changed; here we can type or paste input and have it executed in realtime. As an example of both the input and the output:

js> var ctr0 = 0;
js> var myDate0 = new Date();      //Create an object to help timing a counter from zero to some number specified in the for loop.
js> for(ctr0 = 0; ctr0 < 1000000; ++ctr0) { }
js> var myDate1 = new Date();      //Create an object at the completion of the counting operation.
js> print('Counting took ' + (myDate1 - myDate0) + ' milliseconds.');   //Print the time taken in milliseconds.
Counting took 823 milliseconds.

To get start call the help() method which should help provide some built-in commands that will help over time. I have called it in the example below to give some idea of the output and options available by default besides what was covered above:

js> help()

Command                Description
=======                ===========
help()                 Display usage and help messages.
defineClass(className) Define an extension using the Java class
                       named with the string argument.
                       Uses ScriptableObject.defineClass().
load(["foo.js", ...])  Load JavaScript source files named by
                       string arguments.
loadClass(className)   Load a class named by a string argument.
                       The class must be a script compiled to a
                       class file.
print([expr ...])      Evaluate and print expressions.
quit()                 Quit the shell.
version([number])      Get or set the JavaScript version number.
gc()                   Runs the garbage collector.
spawn(arg)             Evaluate function or script name on a new thread
sync(function)         Creates a synchronized version of the function,
                       where the synchronization object is "this"
readFile(fileName [, encoding])
                       Returns the content of the file as a string.
                       Encoding of the string can be optionally specified.
readUrl(url [, encoding])
                       Similar to readFile, reads the contents of the url.
runCommand(name ...)   Runs a specified shell command. Additional arguments are
                       passed to the command
seal(args ...)         Seals the supplied objects
toint32(arg)           Converts the argument into a 32-bit integer
serialize(obj, fileName)
                      Serializes an object and saves it to a file
deserialize(fileName)  Reconstructs a serialized object
environment            Returns the current environment object
history                Displays the shell command history

Some of the these you should know already, such as print(). Another that will stand out to Java aficionados is gc() which lets you call the Garbage Collector to force memory to be cleaned up. quit() is also useful; you MUST have the () (parentheses) at the end of all of these including quit() in order to invoke the method or else you will simply get the definition (code) of the method back which isn't that help when trying to close your JS environment. Also included is readURL which goes out and reads the URL given to it and returns whatever it finds there to your screen which could include remote JS code, XML for parsing, or anything else like that. The following command, for example, pulls Google's homepage HTML into the googleHTML variable:

var googleHTML = readUrl('http://www.google.com/');

The load() method will also read in local files as JS code so you can include functions from other files to keep your files modular and flexible. Also recall that this entire environment is running within Java so we still have access to any classes available via Java as long as the JRE can call them and they are defined within JS. Making other classes available is simply a matter of modifying the calling JRE's classpath statement to include the paths to relevant classes. The 'java' command accepts the '-cp' parameter followed by a colon-delimited list of paths to classes. Once that is done the JS environment must still be told to make those packages available in one way or another. The following are examples that should work assuming that the necessary JAR files are present for the non-native Java classes:

importPackage(java.lang);   //Import the java.lang package so classes within this package are available.

importPackage(Packages.com.novell.xml.util);    //Import the com.novell.xml.util package; notice Packages must lead the actual package when the package itself is not a part of the java.* hierarchy of packages.

Once these are executed the following commands also work taking advantage of native Java within the JavaScript:

var mylong0 = java.lang.Long.parseLong(binstring, 2);
var base64c = new Packages.com.novell.xml.util.Base64Codec();

With that the wide world of Java is instantly available in a scripting fashion. Notice that 'mylong0' and 'base64c' are declared as one type or another in JavaScript though they refer to specific objects of certain classes in Java. As you have probably noticed the second command in either of the two preceding code blocks are using a class that comes from Novell. The reason for this is that these are part of the Novell Identity Manager set of classes. Originally when I was told how to use Rhino it was in the context of IDM which supports ECMASCript along with the more traditional DirXML/IDM policy and XSLT. A quick little shell script pulls in all of IDM's basic classes and launches the Rhino shell in one quick step. That script's code follows:

<code interpreter='/bin/bash' scriptfile='~/bin/dxrhino'>
#Script from development to load in the IDM environment for rhino testing.  Requires
#rlwrap as well.  Takes one value (JavaScript I believe) as a parameter.
PATH=/opt/novell/eDirectory/lib/nds-modules/jre/bin:$PATH

CP=
for i in /opt/novell/eDirectory/lib/dirxml/classes/*.jar
do
    CP=$CP:$i
done
rlwrap java -cp $CP com.novell.soa.script.mozilla.javascript.tools.shell.Main "$@"</code>

Notice that the PATH and CP variables are set to include a root-based installation of eDirectory's IDM engine classes. These can be modified to suit your needs. The most-important part is that a valid Java environment is used (which PATH helps with above) and that the JAR files are included (as loaded in via CP above). One other note is that rlwrap is included on that line where Java is called; among possible other things this lets the up-arrow key work properly to move back to previous commands entered in Rhino. If this is not there it is likely that you will experience garbage characters at the JS prompt when trying to use the up arrow to go to previous commands. rlwrap is trivial to add to the system even if it is from source so I would add it to any system that will be used even remotely. My compiled build shows version 0.30 in case that helps.

In a later article I'll be discussing some in-depth uses of this technology both within the Novell Identity Manager and Novell Sentinel-based products along with other little scripts that can now be easily created using JS and Java. If you have not done so by this point set up what has been discussed and see what potential uses apply to your own environment. The environment is painless to setup and works reliably thanks to Java.

Show All SLP Services in Linux

$
0
0
license: 
free
home page url: 
www.identitymgmt.net

The Problem

Back in the good old days of NetWare, if you wanted to see all registered services in SLP, you'd simply type: display slp services and the full list would appear on your screen.

Fast forward to Linux and OpenSLP and you can't do that very easily. Using a combination of slptool switches, you can eventually get all the same information, but it is cumbersome and time consuming.

The Fix

The following script was written to make that job simple and quick. It will write all the services to your screen and allow you to scroll up and down the list at will. It will also write the results to a file in the /tmp directory so you can look at it again without having to run the script. This also allows you to quickly grab the results from multiple computers for comparison.

Just copy your script to each server you'd like to run it on. I recommend putting it somewhere in the path, such as /usr/local/bin. Then, you make the script executable (chmod +x ./slpshowall.sh) and you are ready to run it.

slpshowall.sh
ZałącznikWielkość
slpshowall.tgz561 bytes

Forcing Printers to Users after iPrint Migration

$
0
0

During the iPrint migration, all of the drivers are migrated, the existing printers are renamed adding _NW to the end of the names, and the new iPrint Printers are created. The users assigned to the old printers are assigned to the new printers as well, but there is a caveat. That only allows the users to install the printers from the web page at http://serveraddress/ipp. It does not force install the printers for the users. Here is how you can work around this.

First, download the Softerra LDAP Browser from http://www.softerra.com/download.htm. Next you need to create a connection to an eDirectory server which contains a partion of the ou where the existing NDPS printers exist. Next we need to export some data that we will manipulate later. So from the export menu, select LDIF.

Next fill out the export form with the following information:

Note that we are only exporting the nDPSPrinterInstallList attribute. Once the export is complete, open the file in your favorite text editor. TextPad and UltraEdit are two that will allow multi-line editing and this is a requirement for this process. Here is a first look at what the export looks like.

The first thing we want to do is remove the line breaks. The following Replace Operation in UltraEdit will do this for us. We are literally searching for a line break followed by a space, followed by some other data. We are replacing that with the data following the space.

Once this query is complete, this is what we have.

As you can see, we still have some cleanup to do. Our next replace function will remove the line break which shows in the above shot as a small square. The best way to do that is to highlight the square and then hit CTRL-R for replace. This is the result.

Next we need to remove the _NW using the following replace function.

Here we are literally searching for _NW and replacing it with nothing. Finally, we need to change the attribute name from nDPSPrinterInstallList to iPrintICMPrinterList

Now all we have to do is use the file to get the information back into eDirectory. Copy the file to your linux server. Then run the following command from the server.

ldapmodify -x -s eDirectoryServerIP -D cn=admin,o=org -W -f filename

Now you should be able to go into iManager and select a user and their old printers should be reflected as new iPrint Printers. As long as they have the iPrint client already installed, the next time they log into eDirectory, the new printers should be pushed down to them.

Watch for another article coming soon on how to use ldap to set the Default Printer for them.

SLED11 SP1 Addon CD/DVD

$
0
0
download url: 
http://www.pcc-services.com/sled_rpms.html
license: 
GPL, others
home page url: 
http://www.pcc-services.com/

To help in using Suse Linux Enterprise Desktop 11 to the fullest, I have created a couple of Addon Discs you can use to enhance SLED11 Service Pack 1. The discs do include most of the libraries needed to have full multimedia playback functionality, along with many of the applications that I use on a regular basis that are not available on the official media.

You can download the addon discs at:
http://www.pcc-services.com/sled_rpms.html

Getpass 2.1 - Universal Password Retrieval Utility **UPDATED**

$
0
0
license: 
GPL

Getpass 2.1 - Universal Password Retrieval Utility **UPDATED**

This Linux utility retrieves a user's password from eDirectory and displays it in plain text.

This tool has been updated to work on all versions of SLES without library conflicts.

** Now works on both 32-bit and 64-bit installations **

The 2.1 release of this program also features a re-built installer system. It will not overwrite existing libraries on your system. You can choose a desired installation path.

This tool is very handy for verifying password synchronization in an IdM environment, migrating from one tree to another, etc.

As the author, I must ask that you refrain from using this tool in a malicious manner... I am not responsible if you violate company policies and get fired! I hold no responsibility for your actions.

The libopenssl-devel package is required. This can be found on the SLES11 SDK repository. (openssl-dev on SLES10)

If you have any questions regarding this utility or its use, please contact me via e-mail.

ZałącznikWielkość
getpass-2.1-2010-07-28.tgz3.47 MB
Viewing all 30 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>