6850 Stack loses backplane

Post Reply
Tyson

6850 Stack loses backplane

Post by Tyson »

I have a OS-6850-U24X and an OS-6850-48 stacked. On two occasions the primary unit has fallen off line. Secondary unit picks up and runs, but all connectivity is lost on the primary. I am hoping someone can guide me through troubleshooting or point me at the correct log to help determine what may be causing this. The only thing that jumps out at me in the logs is No superv msg recvd:

\== SM == No superv msg for last 5 sec. kickin SM flood msg in link B
*N3c S
*************** Stack Flood Counter *********************
+N3c : sm flood req sent on link B ==> 1
,N3c : sm flood res sent on link B ==> 7
-N3c : sm flood req recv on link B ==> 7
.N3c : sm flood res recv on link B ==> 0
/N3c A ------- Stack Debug counters ---------
CN3q ? sm_qdisp_recv_cnt (A & B) : 2020365
DN3q ? sm_drv_sent_cnt (A & B) : 2020358
EN3q C sm_drv_sup_recv_cnt[1] on Link B: 2020358
FN3q [== SM == No superv msg recvd in the last 10 secs on stacking link B
GN3q <== SM == Stacking link B Tx disabled
HN3q <== SM == Stacking link B Rx disabled
IN3| e 5T8: Ni(2) extraction detected
JN3| e _== HSM == csHsmUtilNiCtxBrdSend() nsm CS_HSM_NSM_ST_OP, poweroff 0 Ni2
KN3| ECS_NI_DOWN/CS_NI_NOTPRESENT msg Rx for slot 2
LN3| 1Del connection for slot 2
MN3| !NI_DOWN 2
NN3| 1Del connection for slot 2
ON3| 2VMC: Sent Pause for slot 2
PN3| 5Close NI 2 ESM, socket 170 OK
QN3| =Start Timer on ni_down 2 @ 1310143356
RN3| 7Close NI 2 NiSUp, socket 152 OK
SN3| =Timer closed skt on ni 1 @ 1310143356
TN3| ITimer closed skt for GVRP NI on ni 1 @ 1310143356
UN3| 2GMC: Sent Pause for slot 2
VN3| =Start Timer on ni_down 2 @ 1310143356
WN3| =Timer closed skt on ni 1 @ 1310143356
XN3… \== SM == No superv msg for last 5 sec. kickin SM flood msg in link B
YN3… S
*************** Stack Flood Counter *********************
ZN3… : sm flood req sent on link B ==> 1
[N3… : sm flood res sent on link B ==> 7
\N3… : sm flood req recv on link B ==> 7
]N3… : sm flood res recv on link B ==> 0
^N3… A ------- Stack Debug counters ---------

_N3… ? sm_qdisp_recv_cnt (A & B) : 2020424
`N3… ? sm_drv_sent_cnt (A & B) : 2020417
aN3… C sm_drv_sup_recv_cnt[1] on Link B: 2020417
bN3‰ S
*************** Stack Flood Counter *********************
cN3‰ : sm flood req sent on link B ==> 1
dN3‰ : sm flood res sent on link B ==> 7
eN3‰ : sm flood req recv on link B ==> 7
fN3‰ : sm flood res recv on link B ==> 0
gN3‰ A ------- Stack Debug counters ---------

hN3‰ ? sm_qdisp_recv_cnt (A & B) : 2020424
iN3‰ ? sm_drv_sent_cnt (A & B) : 2020417
jN3‰ C sm_drv_sup_recv_cnt[1] on Link B: 2020417
kN3‰ [== SM == No superv msg recvd in the last 10 secs on stacking link B
lN3‰ <== SM == Stacking link B Tx disabled
mN3‰ <== SM == Stacking link B Rx disabled
nN6C E== SM == Stacking link B status changed: DOWN
oN6` C== SM == Stacking link B status changed: UP
pN6` ;== SM == Stacking link B Tx enabled
qN6` ;== SM == Stacking link B Rx enabled
rN6a E== SM == Stacking link B status changed: DOWN
sN6d C== SM == Stacking link B status changed: UP
tN6d ;== SM == Stacking link B Tx enabled
uN6d ;== SM == Stacking link B Rx enabled
vN6i CRefuse to connect NI 2:152 after unpowered
wN6i CRefuse to connect NI 2:170 after unpowered
xN6i CRefuse to connect NI 2:152 after unpowered
|N6i g H== CSM == P54_1 - Startup hold request received
}N6j CRefuse to connect NI 2:170 after unpowered
€N6k CRefuse to connect NI 2:170 after unpowered
•N6l CRefuse to connect NI 2:170 after unpowered
™N6l e 4T8: Ni(2) insertion detected
šN6l CRefuse to connect NI 2:324 after unpowered
›N6l e _== HSM == csHsmUtilNiCtxBrdSend() nsm CS_HSM_NSM_ST_OP, poweroff 0 Ni2
œN6l CRefuse to connect NI 2:170 after unpowered
N6m CRefuse to connect NI 2:170 after unpowered
žN6m CRefuse to connect NI 2:323 after unpowered
¢N6m CRefuse to connect NI 2:170 after unpowered
÷N6n CRefuse to connect NI 2:170 after unpowered
øN6n CRefuse to connect NI 2:323 after unpowered
úN6n CRefuse to connect NI 2:170 after unpowered N6o CRefuse to connect NI 2:170 after unpowered N6o CRefuse to connect NI 2:323 after unpowered
N6o CRefuse to connect NI 2:170 after unpowered N6o CRefuse to connect NI 2:323 after unpowered N6o CRefuse to connect NI 2:170 after unpowered N6o CRefuse to connect NI 2:323 after unpowered N6o CRefuse to connect NI 2:170 after unpowered NN6p CRefuse to connect NI 2:170 after unpowered îN6q CRefuse to connect NI 2:170 after unpowered ïN6q CRefuse to connect NI 2:323 after unpowered ôN6q CRefuse to connect NI 2:170 after unpowered jN6r CRefuse to connect NI 2:170 after unpowered kN6r CRefuse to connect NI 2:323 after unpowered oN6r CRefuse to connect NI 2:170 after unpowered pN6r CRefuse to connect NI 2:323 after unpowered uN6r CRefuse to connect NI 2:170 after unpowered vN6r CRefuse to connect NI 2:323 after unpowered zN6r CRefuse to connect NI 2:170 after unpowered þN6s CRefuse to connect NI 2:170 after unpowered ÿN6s CRefuse to connect NI 2:323 after unpowered
N6s CRefuse to connect NI 2:170 after unpowered
eN6t CRefuse to connect NI 2:170 after unpowered
fN6t CRefuse to connect NI 2:323 after unpowered
jN6t CRefuse to connect NI 2:170 after unpowered
òN6u CRefuse to connect NI 2:170 after unpowered
óN6u CRefuse to connect NI 2:323 after unpowered
øN6u CRefuse to connect NI 2:170 after unpowered


Thanks for your help.

J.Tyson
one6f

Re: 6850 Stack loses backplane

Post by one6f »

Hi Tyson,

have you any .pmd files after this issue?
If you are a AL Business Partner, you can open a SR with Alcatel-Lucent.
They'll ask you for "minimum information" to open a case which will include the Crash.pmd.
From the output you provided, I can't see the software you're running.
I'd say that the issue may be any parity errors and is already solved in a later releases.
Alcatel-Lucent support answer will likely be "upgrade".
PS: As an alternative, check your stacking cable connections.
Tyson

Re: 6850 Stack loses backplane

Post by Tyson »

I don't have access to Alcatel-Lucent support on this device unfortunately.

I do have a .pmd. Any advice on how to read it, troubleshoot would be appreciated.

I posted software version below, as well as health statistics. Memory Utilization is quite high. This was taken right after the unit crashed again today, I see the CPU util. is quite high as well.

Software Version 6.3.1.1052.R01 Service Release, December 11, 2008.

Copyright(c), 1994-2007 Alcatel-Lucent. All Rights reserved.

OmniSwitch(TM) is a trademark of Alcatel-Lucent registered
in the United States Patent and Trademark Office.

-> show system
System:
Description: 6.3.1.1052.R01 Service Release, December 11, 2008.,
Object ID: 1.3.6.1.4.1.6486.800.1.1.2.1.7.1.10,


Flash Space:
Primary CMM:
Available (bytes): 7458816,
Comments : None

-> show health
* - current value exceeds threshold

Device 1 Min 1 Hr 1 Hr
Resources Limit Curr Avg Avg Max
-----------------+-------+------+------+-----+----
Receive 80 01 01 00 01
Transmit/Receive 80 01 01 00 01
Memory 90 91* 91 91 92
Cpu 80 21 16 22 96

->
You do not have the required permissions to view the files attached to this post.
Tyson

Re: 6850 Stack loses backplane

Post by Tyson »

A little more info. After performing a "Takeover" memory and CPU utilization on both units drops back to a normal level.


-> show health
* - current value exceeds threshold

Device 1 Min 1 Hr 1 Hr
Resources Limit Curr Avg Avg Max
-----------------+-------+------+------+-----+----
Receive 80 01 01 01 01
Transmit/Receive 80 01 01 01 01
Memory 90 70 70 69 70
Cpu 80 25 19 17 100

-> show health all memory
* - current value exceeds threshold

1 Min 1 Hr 1 Hr
Memory Limit Curr Avg Avg Max
-----------------+-------+------+------+-----+----
01 90 67 67 67 67
02 90 70 70 69 70

-> show health all cpu
* - current value exceeds threshold

1 Min 1 Hr 1 Hr
Cpu Limit Curr Avg Avg Max
-----------------+-------+------+------+-----+----
01 80 12 12 18 100
02 80 19 19 29 100

->
one6f

Re: 6850 Stack loses backplane

Post by one6f »

Hi Tyson,
sorry for the last response. Constantly rising memory utilization shows probably a memory leak on your stack. Only AL tech support can analyze your .pmd files and memory problems. If you have a BP account/support you can try it anyway, there is not device dependent. Before, you may to need to upgrade to newer SW version - 6.3.1 is quite old.
PS: if that is still relevant you can try to replace both sack cable.
Tyson

Re: 6850 Stack loses backplane

Post by Tyson »

Thanks for the advice one6f. I have been working with a vendor to gain access to a newer firmware release. However, for the moment things have stabilized. Strangely, performing a takeover moving the secondary CMM to primary seems to have stopped the memory leak. That makes me wonder if I may have a hardware problem with the primary unit. Either way, I'm working on getting the software updated and then hopefully a support ticket to determine the root cause.

Thanks
Tyson

Re: 6850 Stack loses backplane

Post by Tyson »

Just for closure on this thread, I have been running successfully without issue for the last 10 days on the latest maintenance release of firmware. It seems that the firmware upgrade has resolved the memory leak that was apparently causing the stack to crash.

Thanks for your help.

J. Tyson
Post Reply

Return to “OmniSwitch 6850 / 6850E”