OS6900 Troubleshooting

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

OS6900 Troubleshooting

Post by thermseeker » 17 Jan 2019 12:33

Hello all,

Edited Jan 23 to include code delimiters.

Sorry for the long post.

From about one year ago, after a major upgrade to our networking infrastructure, I've been documenting cleaning and tweaking the mess left by the company which did the job. I've reached a level where almost everything is well known, organized and under control. Now I'm studying and gathering info to go deeper into the switch core, which is (are) two OS6900-T40 operating in virtual chassis mode.

I'm running on 8.4.1.229.R02 GA

Problem 1:

My major concern are that PMD files are being generated, meaning that something bad is going on, which so far I don't know what is. I scheduled a system shutdown for next tuesday night when I intend to reboot the thing and see what happens after that. If PMDs continue to show up I'll open a support case with A-L.

I noticed that master and slave have significant difference on uptimes, the shorter (chassis 1) kind of matching a general power shutdown we did for maintanance on our 69 kV substation. The UPSs in theory held well during the outage, but for some reason chassis 1 shut down or rebooted.

Could it be that this is the cause for the PMDs? I erase the old ones to keep flash space below the threshold, but new ones continue to show up. The switch is not rebooting, at least not continuously:

Chassis 2: 311 days 20 hours 42 minutes and 16 seconds
Chassis 1: 98 days 7 hours 42 minutes and 46 seconds

Chassis 2 became master after chassis 1 rebooted.

Any experiences you wanna share about this? Should I try to reboot, or am I at risk of the switch not coming up again? That would be baaaad.

Problem 2:

I have two static routes in the configuration:

Code: Select all

! IP Route Manager:
ip static-route 0.0.0.0/0 gateway 10.1.119.251 metric 1
ip static-route 10.1.0.0/18 gateway 10.1.119.251 metric 1
Initially, when the equipment was installed, we had scenario 1 where the route through 10.1.119.251 was used to access other company branches, all of them within the 10.1.0.0/18 range. Internet access was provided through another path.

Later, in scenario 2, internet access was shifted to flow through our central administration, thus sharing the route through 10.1.119.251.

Could it be that the person who configured these routes forgot to remove the first static route (10.1.0.0/18) after scenarios changed from 1 to 2? It's very likely given his profile. I'm not a routing expert, but it seems to me that 0.0.0.0/0 includes 10.1.0.0/18. Does it make sense to have both?

Thank you so much for any reply.

Tales Maschio
Last edited by thermseeker on 23 Jan 2019 12:03, edited 2 times in total.

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

Re: OS6900 Troubleshooting

Post by thermseeker » 18 Jan 2019 05:32

Edited on Jan 23 to include code delimiters

If it helps in any way: I set log level to warning, here is what has been logged from yesterday untill now. Doesn't look very good at all :shock:

Code: Select all

faradmsw0254 show log swlog
2019 Jan 17 13:53:08 faradmsw0254 syslogd started: BusyBox v1.19.3
2019 Jan 17 14:11:04 faradmsw0254 sshd[2580]: Received keyboard-interactive/pam for admin from 10.1.114.13 port 57023 ssh2
2019 Jan 17 14:18:18 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547745498.948522] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 14:18:19 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547745499.237232] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 15:18:16 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547749096.878977] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 15:18:17 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547749097.191903] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 16:18:13 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547752693.311023] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 16:18:13 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547752693.664607] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 17:18:23 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547756303.641019] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 17:18:23 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547756303.925986] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 18:18:17 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547759897.149033] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 18:18:17 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547759897.509986] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 19:18:16 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547763496.343983] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 19:18:16 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547763496.665847] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 20:18:17 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547767097.779922] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 20:18:18 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547767098.096856] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 21:18:24 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547770704.134676] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 21:18:24 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547770704.467097] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 22:18:14 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547774294.914733] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 22:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547774295.223589] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 23:18:14 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547777894.741478] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 17 23:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547777895.042188] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 00:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547781495.945309] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 00:18:16 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547781496.270154] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 01:00:07 faradmsw0254 kernel: [26983449.523053] lldpNi[7246]: unhandled signal 11 at 00000000 nip 0f695cf4 lr 0ffdf35c code 30001
2019 Jan 18 01:00:07 faradmsw0254 swlogd: COREDUMPER  alarm(1) [1547784007.916818] Dumping core for task lldpNi
2019 Jan 18 01:00:07 faradmsw0254 swlogd: COREDUMPER  alarm(1) [1547784007.917431] Starting PMD, 1463926784 bytes free in flash
2019 Jan 18 01:00:26 faradmsw0254 swlogd: portMgrNi main warning(4) [1547784026.322310] : [pmReacDisconnectCB:271] Socket 83 disconnected
2019 Jan 18 01:00:26 faradmsw0254 swlogd: portMgrNi main warning(4) [1547784026.322379] : [pmAppDelAllEntriesForSockId:2905] Deleting All Entries for Socket :83 app:LL      DP(42)
2019 Jan 18 01:00:26 faradmsw0254 swlogd: portMgrNi main error(2) [1547784026.322416] PM-NI-TODO : [destroyTodoList:131] Err: :No todo list of the given socketId : 83
2019 Jan 18 01:00:26 faradmsw0254 swlogd: ChassisSupervisor appMgr alert(3) [1547784026.324320] appMgrClientTerminated: restarting task
2019 Jan 18 01:00:26 faradmsw0254 swlogd: ChassisSupervisor appMgr alert(3) [1547784026.324379] Failed App /bin/electric-fence /bin/lldpNi
2019 Jan 18 01:00:26 faradmsw0254 swlogd: lldpNi library(vcmLib) info(5) [1547784026.405014] vcmlib_connect_socket_backend@5483: Server 0/1/17651 (socket 29, vrf 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: COREDUMPER  alarm(1) [1547784030.416942] Core dump (size 272568320) completed, connecting to CS
2019 Jan 18 01:00:30 faradmsw0254 swlogd: ChassisSupervisor appMgr alarm(1) [1547784030.417823] recv CORE (528 bytes) on socket 166
2019 Jan 18 01:00:30 faradmsw0254 swlogd: COREDUMPER  alarm(1) [1547784030.418188] Dump ack received, exiting.
2019 Jan 18 01:00:30 faradmsw0254 swlogd: ChassisSupervisor appMgr alarm(1) [1547784030.419322] send ack to dumper ret=12, socket=166
2019 Jan 18 01:00:30 faradmsw0254 swlogd: confd library(pmdlib) info(5) [1547784030.419907] this client is file /flash/pmd/work/0/client_confd-1979
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(pmdlib) info(5) [1547784030.420147] this client is file /flash/pmd/work/0/client_vcmCmm-1982
2019 Jan 18 01:00:30 faradmsw0254 swlogd: rcdd library(pmdlib) info(5) [1547784030.420537] this client is file /flash/pmd/work/0/client_rcdd-3777
2019 Jan 18 01:00:30 faradmsw0254 swlogd: mplscmmd library(pmdlib) info(5) [1547784030.421220] this client is file /flash/pmd/work/0/client_mplscmmd-3883
2019 Jan 18 01:00:30 faradmsw0254 swlogd: ipmscmm library(pmdlib) info(5) [1547784030.421670] this client is file /flash/pmd/work/0/client_ipmscmm-3619
2019 Jan 18 01:00:30 faradmsw0254 swlogd: dhcpsrv library(pmdlib) info(5) [1547784030.421885] this client is file /flash/pmd/work/0/client_dhcpsrv-3949
2019 Jan 18 01:00:30 faradmsw0254 swlogd: iprm library(pmdlib) info(5) [1547784030.421839] this client is file /flash/pmd/work/0/client_iprm-3982
2019 Jan 18 01:00:30 faradmsw0254 swlogd: grm library(pmdlib) info(5) [1547784030.422030] this client is file /flash/pmd/work/0/client_grm-3920
2019 Jan 18 01:00:30 faradmsw0254 swlogd: svcCmm library(pmdlib) info(5) [1547784030.422874] this client is file /flash/pmd/work/0/client_svcCmm-3818
2019 Jan 18 01:00:30 faradmsw0254 swlogd: dhcpv6srv library(pmdlib) info(5) [1547784030.424361] this client is file /flash/pmd/work/0/client_dhcpv6srv-3938
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(pmdlib) info(5) [1547784030.424318] this client is file /flash/pmd/work/0/client_vcmNi-4004
2019 Jan 18 01:00:30 faradmsw0254 swlogd: qoscmmd library(pmdlib) info(5) [1547784030.425046] this client is file /flash/pmd/work/0/client_qoscmmd-3797
2019 Jan 18 01:00:30 faradmsw0254 swlogd: bcmd library(pmdlib) info(5) [1547784030.425728] this client is file /flash/pmd/work/0/client_bcmd-4039
2019 Jan 18 01:00:30 faradmsw0254 swlogd: bcd2 library(pmdlib) info(5) [1547784030.433319] this client is file /flash/pmd/work/0/client_bcd2-4079
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcspNi library(pmdlib) info(5) [1547784030.434456] this client is file /flash/pmd/work/0/client_vcspNi-4127
2019 Jan 18 01:00:30 faradmsw0254 swlogd: ipmsni library(pmdlib) info(5) [1547784030.435571] this client is file /flash/pmd/work/0/client_ipmsni-4170
2019 Jan 18 01:00:30 faradmsw0254 swlogd: etherNi library(pmdlib) info(5) [1547784030.439481] this client is file /flash/pmd/work/0/client_etherNi-4096
2019 Jan 18 01:00:30 faradmsw0254 swlogd: erpNi library(pmdlib) info(5) [1547784030.440172] this client is file /flash/pmd/work/0/client_erpNi-4130
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.583037] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.583273] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.583392] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.583490] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.583995] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.584141] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.584246] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmNi library(plApi) error(2) [1547784030.585024] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.619975] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.620181] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.620350] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.620515] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.621268] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.621431] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.621593] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:30 faradmsw0254 swlogd: vcmCmm library(plApi) error(2) [1547784030.621754] plGetVflStringFromIfIndex@11171: Invalid ifIndex 0 (context 0)
2019 Jan 18 01:00:32 faradmsw0254 PMD: collect -  args = 0 /flash/pmd/work/core.lldpNi.7246.b229 /flash/pmd/pmd-lldpNi-01.18.2019-01.00.32 coredump
2019 Jan 18 01:00:33 faradmsw0254 PMD-COPY-SWLOGS: not 8572
2019 Jan 18 01:00:39 faradmsw0254 PMD-TAR2: tar: removing leading '/' from member names
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/working/boot.cfg: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/ni?/swlog*: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/ni?/pmd/pmdNI*: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/swlog_archive/*.tar: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/certified/boot.cfg: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/tech_support*.log: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /flash/running.cfg: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: /tmp/pmd.collect: No such file or directory
2019 Jan 18 01:00:58 faradmsw0254 PMD-TAR2: tar: error exit delayed from previous errors
2019 Jan 18 01:00:58 faradmsw0254 PMD-RM2: rm: can't remove '/tmp/pmd.collect': No such file or directory
2019 Jan 18 01:01:01 faradmsw0254 swlogd: PMD main alert(3) [1547784061.682719] PMD generated at /flash/pmd/pmd-lldpNi-01.18.2019-01.00.32
2019 Jan 18 01:18:23 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547785103.680021] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 01:18:24 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547785104.025115] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 02:18:14 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547788694.027963] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 02:18:14 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547788694.405176] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 03:18:19 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547792299.884164] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 03:18:20 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547792300.151707] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 04:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547795895.634804] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 04:18:16 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547795896.019619] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 05:18:20 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547799500.985513] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 05:18:21 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547799501.343551] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 06:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547803095.285509] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 06:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547803095.558442] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 07:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547806695.325235] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 07:18:15 faradmsw0254 swlogd: mvrpCmm library(plApi) error(2) [1547806695.609580] plGetIfIndexType@1390: Invalid ifIndex 0 type
2019 Jan 18 07:22:01 faradmsw0254 sshd[10924]: Received keyboard-interactive/pam for admin from 10.1.114.13 port 55227 ssh2

faradmsw0254
Last edited by thermseeker on 23 Jan 2019 12:02, edited 1 time in total.

silvio
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 1356
Joined: 01 Jul 2008 10:51
Location: Germany

Re: OS6900 Troubleshooting

Post by silvio » 19 Jan 2019 13:10

Hi,
I am not sure about the log... there are error messages. But I don't know whether that are reasons for trouble....
I would prefer to make an update during the downtime. So you can check if the messages are still there and the PMD are finished. Than you don't need to open a ticket.

Problem 1: Have you found in the log some informations about the reason for the reboot? You can check with "show log swlog timestamp 11/23/2018 00:00:00 | more" (with your own time short before the reboot). If there no output you can upload the tar-file from the archiv folder, unzip the file and f.e. with notepad++ you can search within all the files for the reason.

I never heard that a pmd file is generated without reboot. So it sounds strange to me that after deleting of the pmd them are still there.... Maybe the update is usefull

btw: it is normal behavior that the switch with the shorter uptime will be the slave.

Problem 2: you are correct - the second route is not necessare with the first one. So you can delete the second route. But this route is no problem for the switch. So you can delete but its not important.

Beside the reboot 100days ago ... are there other problems within your network?

best regards
Silvio

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

Re: OS6900 Troubleshooting

Post by thermseeker » 22 Jan 2019 19:04

Hi Silvio,

Checking the logs for the reason was useless because logs were getting quickly filled by info messages like failed authentication attemps and supposed DoS attacks. I'm not yet 100% sure where those are coming from, but I suppose from some NMS running in our central administration trying to poll data using a (non-existant) default SNMPv2 community like public or private.

From when I started my research I configured logging to Warning level, so I could have useful information to search for problems. After I finish organizing everything I'll go back to Info level.

And I just finished bringing everything up after a programmed shutdown. I rebooted the VC one time willing to clear any remaining mess from the ongoing problems, then applied a new image, rebooted again and certified everything. I went from 8.4.1.229.R02 GA to 8.5.255.R02 GA. The system has been running for more than 1,5 hour now and everything looks smooth. No PMDs so far :)

Initially after upgrading I wasn't able to ssh to chassis 2, it simply didn't connect and no feedback was given from the command. Checking the logs I noticed something was wrong with the keys, erased .ssh/known_hosts and then I was able to connect. I think after the weird partial reboot when slave took over the keys got inverted. Which I think was correct, only no longer good.

Thank you very much for taking the time to read my original post and share your considerations about my problem.

Best regards,
Tales

Code: Select all

faradmsw0254 show system
System:
  Description:  Alcatel-Lucent Enterprise OS6900-T40 8.5.255.R02 GA, August 29, 2018.,
  Object ID:    1.3.6.1.4.1.6486.801.1.1.2.1.10.1.4,
  Up Time:      0 days 1 hours 30 minutes and 36 seconds,
  Contact:      Tales Maschio - 6305/6020,
  Name:         faradmsw0254,
  Location:     TRAFAR - Datacenter,
  Services:     78,
  Date & Time:  TUE JAN 22 2019 21:00:12 (.4)
Flash Space:
    Primary CMM:
      Available (bytes):  1265807360,
      Comments         :  None

faradmsw0254 show hardware-info
Chassis 1
CPU Manufacturer                  : Freescale Semiconductor
CPU Model                         : P2040
Flash Manufacturer                : Micron Technology
Flash size                        : 1997938688 bytes
RAM Manufacturer                  : Micron
RAM size                          : 3950020kB
FPGA 1 version                    : 1.6.0
FPGA 2 version                    : 0.0.0
U-Boot Version                    : 7.3.2.134.R01
Power Supplies Present            : 1,2
NIs Present                       : 1,2

Chassis 2
CPU Manufacturer                  : Freescale Semiconductor
CPU Model                         : P2040
Flash Manufacturer                : Micron Technology
Flash size                        : 1997938688 bytes
RAM Manufacturer                  : Micron
RAM size                          : 3950020kB
FPGA 1 version                    : 1.6.0
FPGA 2 version                    : 0.0.0
U-Boot Version                    : 7.3.2.134.R01
Power Supplies Present            : 1,2
NIs Present                       : 1,2

faradmsw0254 show microcode
   /flash/working
   Package           Release                 Size     Description
-----------------+-------------------------+---------+-----------------------------------
Tos.img           8.5.255.R02               229271876 Alcatel-Lucent OS

faradmsw0254 show pmd

########### PMD files(Chassis 1 /flash/pmd) ##########


########### PMD files(Chassis 2 /flash/pmd) ##########


0 PMD files found
Last edited by thermseeker on 23 Jan 2019 12:04, edited 1 time in total.

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

Re: OS6900 Troubleshooting

Post by thermseeker » 22 Jan 2019 19:27

EDITED: Forget it, already erased the mislocated certified folder.

Something else: in the mess that is (was) this 6900 filesystem I found a "certified" folder containing old image and configuration files inside /flash/app-signature. This was not supposed to be there, right? Probably some "expert" accidentally copied certified folder to that location.

Code: Select all

faradmsw0254 ls -l app-signature
-rw-r--r--    1 admin    user       4082359 Sep 30  2016 UAppSig.upgrade_kit
-rw-rw-rw-    1 root     root          1310 Jan 22 19:32 app-regex.txt
drwxr-xr-x    2 admin    user          4096 Sep 15  2016 certified


faradmsw0254 ls -l app-signature/certified/
-rw-r--r--    1 admin    user     212056284 Aug 18  2016 Tos.img
-rw-r--r--    1 admin    user            41 Sep  1  2016 boot.md5
-rw-r--r--    1 admin    user           438 Dec 17  2013 software.lsm
-rw-r--r--    1 admin    user         30202 Sep 15  2016 vcboot.cfg
-rw-r--r--    1 admin    user         29893 Aug 31  2016 vcboot.cfg.sav
-rw-r--r--    1 admin    user           596 Aug 24  2016 vcsetup.cfg
Inside the same folder in chassis 2 there is only app-regex.txt

Thanks again
Tales

silvio
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 1356
Joined: 01 Jul 2008 10:51
Location: Germany

Re: OS6900 Troubleshooting

Post by silvio » 23 Jan 2019 12:28

Hi,
I think too. You can delete this folder.
Not important.... log level are from Alarm (most important), warning .... to info up to debug1...debug3 (not important).
If you change from info to warning you are getting less messages (not the info). Within info is warning and more important included.
best regards
Silvio

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

Re: OS6900 Troubleshooting

Post by thermseeker » 29 Jan 2019 05:09

Hi!

After almost one week (6 days 11 hours now) I have zero PMDs in the switch. So I think this can already be considered a win. I still wonder what caused the problem and the reboot on chassis 1, but ok the satisfaction of seeing it working well is greater than the need for knowing the cause.

There are two things however that I'd like to learn about monitoring these devices:

- When I poll the VC for uptime, what does it reply when uptimes are different on both chassis? Is there a way to poll both uptimes?
- Is there an OID that would indicate the presence of PMD files in both master and slave?

Thanks again for all your support. This is a great, great place to learn about and get help on A-L equipment.

Tales

silvio
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 1356
Joined: 01 Jul 2008 10:51
Location: Germany

Re: OS6900 Troubleshooting

Post by silvio » 02 Feb 2019 06:33

I don't know a OID to get the uptime from all the chassis. But if you connect via ssh to the VC you always reach the master (f.e. chassis 1). From there you can establish a new ssh connection to all the slaves (f.e. chassis 2). As destination IP you can take the EMP (if configured and attached) or the internal IP. Last one you find with "debug show virtual-chassis connection".
Also I don't think that there is an OID for list the filesystem. Also in the cli-guide there is no MIB object for the command "show pmd-files".
regards
Silvio

User avatar
thermseeker
Member
Posts: 26
Joined: 08 Jul 2016 08:40

Re: OS6900 Troubleshooting

Post by thermseeker » 12 Sep 2019 07:04

Reactivating this thread... good morning everyone!

The logs on my OS6900 have been filled with what I suppose are trap replay messages every 5 minutes. As if the switch were replaying the stored traps every 5 minutes:

Code: Select all

2019 Sep 12 07:40:26.516 faradmsw0254 swlogd flashManager FlashMgr Main INFO: flashMgrValidateImage_exec: valid
2019 Sep 12 07:40:41.981 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO: ---- Logging MIP_SET type, command to be sent:
2019 Sep 12 07:40:41.981 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO: MIP_SETF(14) msg_id(-1362231232) (APPID_SNMP_AGENT(68/0) -> APPID_TRAP_MGR(72)) values:
2019 Sep 12 07:40:41.981 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO: Table(73738/0): trapStationTable
2019 Sep 12 07:40:41.981 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO:  Index: 172 17 20 1
2019 Sep 12 07:40:41.981 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO:   Object(6) Name: trapStationReplay
2019 Sep 12 07:40:41.982 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO:    Object Value(uint32): 1488
2019 Sep 12 07:40:41.982 faradmsw0254 swlogd MIP_GATEWAY mipgwd INFO: ----------------
The thing is, I don't get the replayed traps on Omnivista so I'm not sure if it is actually replaying them, of if there are any stored traps. Also, I don't see anything in vcboot.cfg that would trigger such behavior.

- Is this expected default behavior, and maybe someone messed with the log configurations so now these messages are showing?
- Is there a way to check and/or clear stored traps?

I tried to find reference guides to the log messages but got none.

Thanks.
Tales - thermseeker

silvio
Alcatel Unleashed Certified Guru
Alcatel Unleashed Certified Guru
Posts: 1356
Joined: 01 Jul 2008 10:51
Location: Germany

Re: OS6900 Troubleshooting

Post by silvio » 12 Sep 2019 08:59

in my opinion this log-entry will be generated every time if your omnivista (172.17.20.1) look for frequent updates.
You can change the time via network>discovery>settings>frequencies

Post Reply

Return to “OmniSwitch 6900”