Tuesday 11 December 2012

How to calculate the total amount of FireWall Logs per second

### Posting this here for the time being since the support site's SK is broken..

Edit: Not sure why CP runs this with three separate strings...just copy/paste this and you'll get your numbers (sleeps for 120 seconds):

SLEEP_TIME=120;SIZE_BEFORE=$(ls -l $FWDIR/log/fw.logptr | awk '{print $5}') ; sleep $SLEEP_TIME ; SIZE_AFTER=$(ls -l $FWDIR/log/fw.logptr | awk '{print $5}');expr \( $SIZE_AFTER - $SIZE_BEFORE \) \/ \( 4 \* $SLEEP_TIME \)

#######

Follow these steps to calculate/count the total amount of all FireWall Logs per second that arrive to this Security Management Server from all its managed Security Gateways:
  1. Connect to CLI on Security Management Server - over SSH, or console.

    Note:
    On Multi-Domain Management Server, go to the context of the relevant Domain Management Server: [Expert@HostName]# mdsenv [Domain_Name|Domain_IP]
  2. Go to the Log directory:

    [Expert@HostName]# cd $FWDIR/log
  3. Check by how much the size of the Pointer File grows during specific time
    (the time should be high enough to accumulate enough logs - e.g., 120 sec, 180 sec, etc):

    [Expert@HostName]# ls -l fw.logptr ; sleep SLEEP_TIME ; ls -l fw.logptr
  4. Calculate the log rate per this formula:

    RATE = ( SIZE_AFTER - SIZE_BEFORE ) / ( 4 * SLEEP_TIME )

    Use these three commands to automate the calculations:

    [Expert@HostName]# SLEEP_TIME=number_of_seconds

    [Expert@HostName]# SIZE_BEFORE=$(ls -l fw.logptr | awk '{print $5}') ; sleep $SLEEP_TIME ; SIZE_AFTER=$(ls -l fw.logptr | awk '{print $5}')

    [Expert@HostName]# expr \( $SIZE_AFTER - $SIZE_BEFORE \) \/ \( 4 \* $SLEEP_TIME \)


    Note: if the rate value has to be used in a shell script, then use this syntax:
    [Expert@HostName]# RATE=$(expr \( $SIZE_AFTER - $SIZE_BEFORE \) \/ \( 4 \* $SLEEP_TIME \))

Friday 7 December 2012

VSX: Policy installation failing due to "Can't open..."

Hi everyone,

Had a new issue happen to me this morning while pushing to an R67 VSLS cluster. During the push, one MVS reported that all of it's configuration files plus those of it's VS were missing.

This output was taken from $CPDIR/log/cpd.elg, however the message within Dashboard was nearly identical:

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/policy/local.dt

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/policy/local.scv

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/policy/local.lp

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/policy/local.cfg

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/CTX/CTX00002/policy/local.dt

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/CTX/CTX00002/policy/local.scv

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/CTX/CTX00002/policy/local.lp

[7 Dec 12:46:04] file_digest: Can't open /opt/CPsuite-V40/fw1/CTX/CTX00002/policy/local.cfg

etc, etc, for all of the VS (13).

Doing a quick 'ls' for any of those files returned no results...

I'm still not sure how the issue occured (and on only one of the three MVS in the cluster), but running this will correct the issue for you by creating the files all at once:

cat $CPDIR/log/cpd.elg | grep "file_digest: Can't open" | awk '{print "touch",$7}' | sh

Friday 23 November 2012

SPLAT + GAIA : Inaccessible via Console/SSH/GUI

Hi Everyone,

Over the last few months I've seen a large amount of SPLAT appliances become completely inaccessible via "normal" methods (R71->R75). Upon further investigation it seems all of them are suffering from the same problem, however it's quite strange as all three methods use separate authentication schemes.

##It should be noted that *only* CP-branded appliances have been experiencing the issue, open servers seem to be safe from this.

Edit:: Upon further investigation, it appears that all SPLAT and GAIA devices can be affected by this bug.


Doing a debug of an SSH authentication attempt we can see the firewall immediately close the connection via a FIN, without ever presenting the 'reprompt' as if you typed the wrong password:

SSH attempt with the -vv flags:
admin@<firewall_IP>'s password:
debug2: we sent a password packet, wait for reply
Connection closed by
<host_IP>

TCPDUMP:

<firewall_IP>.22 > <host_IP>.33506: F 1393:1393(0) ack 1269 win 79 <nop,nop,timestamp 3558805529 352019454> (DF)

After trying multiple ways of "breaking in", we gave up and rebooted one of the devices and attempted to access it via maintenance mode, which was succesful.

Looking at /var/log/messages, we can see faillog is broken in some way:

cp_pam_tally[23237]: /var/log/faillog is either world writable or not a normal file


Looking at the file in detail, we saw that it is completely corrupted (filled with ascii/hex symbols).

Replacing the file with a fresh, empty copy (or simply removing it entirely) corrects the issue.

However, next time /usr/bin/faillog is called to do a rollover, the file becomes corrupted once again, and all access is lost...

To prevent this from happening, I've implemented a 'manual' rollover to prevent faillog from doing it itself via CRON (obviously from maintenance mode after the issue occured):

To create a cron entry:
crontab -e

The editor on SPLAT is still VI(m), so press 'i' to enter input mode, and type:

* * * * * /bin/bash /home/admin/faillog_rollover.sh

This will have crond run the faillog_rollover.sh script every minute, which you can grab here (chmod +x it):

faillog_rollover.sh

Make sure to adjust the path for the script in crontab if you don't place it in /home/admin/

CheckPoint R&D is also aware of the issue and are working on a corrected faillog binary (shadow-utils really), however for the time being this is definitely an easy fix. Until the fixed binary is included in a normal release however, I'd *highly* recommend having this cron job installed on any SPLAT-based Appliance, since fixing the issue once it's occured via maintenance mode isn't the easiest thing to schedule.

Follow Up: CP has released a fixed shadow-utils RPM that addresses the issue, however they have confirmed GAIA is currently susceptible, and that the rollover fix will be incorporated into the next release of GAIA.

Thanks for reading,









Wednesday 15 August 2012

SPLAT/GAIA Static-Route migration scripts

Hi Everyone,

So I recently came across a situation where I needed to accomplish two things quite quickly:
1) Remove all active interfaces from a device and reconfigure them into load-sharing LACP bonds
2) Restore the previous routing configuration to the device post-interface removal.

Since #2 involved redoing over 1000 static routes, I of course didn't want to do this manually :)

I've created two sets of scripts; One for backing up the current configuration, and one for restoring the configuration post config-change.

(I'd suggest using wget to pull the raw files, however you can copy however you'd like):


Backups:
GAIA:
route_backup_gaia.sh
SPLAT:
route_backup_splat.sh

Restoring:
GAIA
route_rebuild_gaia.sh
SPLAT
route_rebuild_splat.sh


As for how to use them, I'll give you a basic scenario. Currently most routes for my test box are via eth2, however I want to move this link into a bond for better throughput and availability.


GAIA1# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP,
       O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
       A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed

S     0.0.0.0/0           via 192.168.0.1, eth0, cost 0, age 4 
C     10.100.100.0/24     is directly connected, eth2 
S     10.100.101.0/24     via 10.100.100.2, eth2, cost 0, age 5993 
S     10.100.102.0/24     via 10.100.100.2, eth2, cost 0, age 1090 
S     10.100.103.0/24     via 10.100.100.2, eth2, cost 0, age 1087 
S     10.100.104.0/24     via 10.100.100.2, eth2, cost 0, age 1084 
C     127.0.0.0/8         is directly connected, lo 
C     192.168.0.0/24      is directly connected, eth0 


Prior to making our changes, I run the backup script like so:
[Expert@GAIA1]# ./route_backup_gaia.sh
Backing up routes now...

DONE

You can find your routes in /home/admin/150812_195030_GAIA1_routes.txt


Looking through the route file you can see that it's parsed it into a useful format:
[Expert@GAIA1]# cat 150812_195030_GAIA1_routes.txt
0.0.0.0/0 192.168.0.1
10.100.101.0/24 10.100.100.2
10.100.102.0/24 10.100.100.2
10.100.103.0/24 10.100.100.2
10.100.104.0/24 10.100.100.2


We'll make our interface changes now (remove eth2 - migrate to bond0)

Post change we can see that we now have bond0 on 10.100.100/24, however all of our routes are now gone:

GAIA1> show route
Codes: C - Connected, S - Static, R - RIP, B - BGP,
       O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
       A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed

S     0.0.0.0/0           via 192.168.0.1, eth0, cost 0, age 5490 
C     10.100.100.0/24     is directly connected, bond0 
C     127.0.0.0/8         is directly connected, lo 
C     192.168.0.0/24      is directly connected, eth0 
 

Now we want to restore our previous routes:
[Expert@GAIA1]# ./route_rebuild_gaia.sh
Hello, please enter the correct log file to analyze
150812_195030_GAIA1_routes.txt
150812_195030_GAIA1_routes.txt
Thank you - Rebuilding the routing table now
Finished rebuilding the routing table...

Please remember to verify if the routes were rebuilt correctly!!
Goodbye
[Expert@GAIA1]# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP,
       O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
       A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed

S     0.0.0.0/0           via 192.168.0.1, eth0, cost 0, age 5669 
C     10.100.100.0/24     is directly connected, bond0 
S     10.100.101.0/24     via 10.100.100.2, bond0, cost 0, age 12 
S     10.100.102.0/24     via 10.100.100.2, bond0, cost 0, age 12 
S     10.100.103.0/24     via 10.100.100.2, bond0, cost 0, age 12 
S     10.100.104.0/24     via 10.100.100.2, bond0, cost 0, age 12 
C     127.0.0.0/8         is directly connected, lo 
C     192.168.0.0/24      is directly connected, eth0


And there you have it - nice and simple :)

SPLAT works the same way, however the scripts themselves are different of course since we need to use CLISH now.

If you want to get this to work on IPSO, the GAIA script would only need very minor modifications to how it deals with write-locks. If you need some help, let me know :)





GAIA CLISH Basics (Interfaces,Routes,Bonds,Saving)

Here are some really 'basic' GAIA CLISH commands everyone should know


Basic Configuration for an interface via CLISH (ifconfig/ethtool still work within expert-shell in case you prefer those):

Configure the interface with an appropriate ipv4 address and netmask
GAIA1> set interface eth2 ipv4-address 10.100.100.1 mask-length 24
Interace comments
GAIA1> set interface eth2 comments "Internal Interface"
Interface speed hardcoding (use 'auto-negotation on' instead if required)
GAIA1> set interface eth2 link-speed 1000M/full
Turn the interface "on" and active
GAIA1> set interface eth2 state on
Show current information
GAIA1> show interface eth2      
link-speed 1000M/full
ipv6-autoconfig Not configured
speed 1000M
mac-addr 00:0c:29:38:9f:6d
state on
duplex full
type ethernet
comments {Internal Interface}
mtu 1500
auto-negotiation Not configured
ipv4-address 10.100.100.1/24
ipv6-address Not Configured

Statistics:
TX bytes:0 packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 packets:0 errors:0 dropped:0 overruns:0 frame:0


Adding static routes in GAIA CLISH:

Destination of 10.100.101/24 via 10.100.100.2
GAIA1> set static-route 10.100.101.0/24 nexthop gateway address 10.100.100.2 on

GAIA1> show route
Codes: C - Connected, S - Static, R - RIP, B - BGP,
       O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
       A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed

S     0.0.0.0/0           via 192.168.0.1, eth0, cost 0, age 2413 
C     10.100.100.0/24     is directly connected, eth2 
S     10.100.101.0/24     via 10.100.100.2, eth2, cost 0, age 37 
S     10.100.102.0/24     via 10.100.100.2, eth2, cost 0, age 20 
S     10.100.103.0/24     via 10.100.100.2, eth2, cost 0, age 17 
S     10.100.104.0/24     via 10.100.100.2, eth2, cost 0, age 14 
S     10.100.105.0/24     via 10.100.100.2, eth2, cost 0, age 11 
S     10.100.106.0/24     via 10.100.100.2, eth2, cost 0, age 8 
S     10.100.107.0/24     via 10.100.100.2, eth2, cost 0, age 5 
S     10.100.108.0/24     via 10.100.100.2, eth2, cost 0, age 2 
C     127.0.0.0/8         is directly connected, lo 
C     192.168.0.0/24      is directly connected, eth0


Creating a bond from CLISH:

#Create the bond and assign a slave interface in one command:
GAIA1> add bonding group 0 interface eth1
 Enter an interface to add to the bond group.
 Only ethernet interfaces can be added to a bond group.
 The interface shouldn't have any IP addresses or aliases configured.
 Hit tab to obtain the available interfaces that can be added to the bond group.
# Set the "mode" of the Bond (I choose 8023ad here - aka LACP)
GAIA1> set bonding group 0 mode 8023AD
# Set the bond's primary interface:
GAIA1> set bonding group 0 primary eth1
# View your bond:
GAIA1> show bonding group 0
Bond Configuration
    xmit-hash-policy layer2
    down-delay 200
    primary eth1
    lacp-rate slow
    mode 8023AD
    up-delay 200
    mii-interval 100
    Bond Interfaces
        eth1

# This information is also available via Expert mode via /proc:
[Expert@GAIA1]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:0c:29:38:9f:63
Aggregator ID: 1


Saving your configuration:

GAIA1> save config

Friday 6 July 2012

SPLAT/GAIA: How to determine bond status (link/LACP etc)

Hi Everyone,

Had this question asked today: "How do you determine if your LACP (or XOR) bond is up and running and what state is it in?

Since ethtool and ifconfig don't provide you LACP details, you have to check via /proc like so (removed MACs for privacy):


Looking at bond0 here:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 17
        Partner Key: 32773
        Partner Mac Address: **************

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: **************
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: **************
Aggregator ID: 2

Slave Interface: eth4
MII Status: down
Link Failure Count: 0
Permanent HW addr: **************
Aggregator ID: 3

Slave Interface: eth5
MII Status: down
Link Failure Count: 1
Permanent HW addr: **************
Aggregator ID: 1

You can also configure how Checkpoint monitors the bonds with cphaconf show_bond
# cphaconf show_bond -a

                                      |Slaves     |Slaves |Slaves  
Bond name  |Mode               |State |configured |in use |required
-----------+-------------------+------+-----------+-------+--------
bond0      | Load Sharing      | UP   | 4         | 4     | 3      
bond1      | Load Sharing      | UP   | 4         | 4     | 3      

Legend:
-------
UP!               - Bond interface state is UP, yet attention is required
Slaves configured - number of slave interfaces configured on the bond
Slaves in use     - number of operational slaves
Slaves required   - minimal number of operational slaves required for bond to be UP

The steps found with sk69180 should also be followed to ensure slave interfaces have been added correctly.

Wednesday 27 June 2012

CheckPoint HA: How to force a failover (ClusterXL/VRRP)

Hi Everyone,

Based on some recent conversations I've had, it seems most people don't know how to force or test a failover with Check Point HA.

There is a single requirement for non-SPLAT/GAIA systems; FW-1 Monitoring State needs to be enabled. If you're running IPSO, you can do this via the VRRP configuration page.

To force a failover, run the following commands on the current cluster master:

This creates a pnote (problem notification) that is in problem state:
cphaprob -d fail -s problem -t 0 register
Verify it's in problem state with
cphaprob stat
and
cphaprob -i list
(you should see 'fail' in problem state)

Once you've finished your testing, run these two to reset it:
cphaprob -d fail -s ok report
cphaprob -d fail unregister

Make sure to verify that the pnote has been removed correctly before you log off.

That's it!

Wednesday 29 February 2012

Upgrade to R70.50 from R70.30/40 fails due to licensing errors on IPSO

Hi everyone,

It's been awhile since I've posted anything, but tonight I actually experienced something I hadn't seen before, so I figured I should share:

I was in the process of upgrading three separate clusters from R70.30/R70.40 to R70.50 and was presented with this error upon running the UnixInstallScript:

# ./UnixInstallScript

***********************************************************
Welcome to Check Point R70.50 Installation
***********************************************************
In order to install Check Point R70.50 you must first install Check Point R70 Software Blades
For additional information please refer to the release notes.


I checked the licenses and noticed that I was in fact already using Blade Licenses on some clusters, while on one I was still using the old system.

This perplexed me greatly. Upon digging around in /opt I noticed that CPshared was *completely* missing (if anyone knows why this is, please let me know):
# cd /opt
# ls -lah CPshared
ls: CPshared: No such file or directory

Upon realizing that this was the case, I moved to correct the issue (CPshared is just a link to /svn really)
# mkdir CPshared
# cd CPshared
# ln -s /opt/CPsuite-R70/svn 5.0

Once this was complete and I had 5.0 linking to /svn, I tried to run the script once again (with success!):

***********************************************************
Welcome to Check Point R70.50 Installation
***********************************************************
The following components will be installed:
 * R70.50
Installation Application is about to stop all Check Point Processes.
Do you wish to continue (y/n) [y] ?


I've seen a few unanswered posts around the community with the same error, so hopefully this will help someone out :)

Cheers,