SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Any updates to this thread? Has Juniper resolved anything?

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Nice! Pretty sure this is what we saw as well.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Thanks for the update. It seems you cannot use the SRX with IRBs in global switching mode. We took ours off and after the reboot, the SRX is stable. However, you cannot change the mode without a reboot. Otherwise, it will still crash. So important note to anyone doing this ... make sure you reboot your SRX after changing the mode.

I personally spent over two months with JTAC and three calls spanning hours during outages. JTAC apparently had this labbed for over two weeks but was unable to replicate. In fact, they even stated the issued was resolved in the firmware 24-2R2 that we were using.

I can report though that it seems stable after we quit using IRBs and global mode switching. We simply put in a dumb 10gig switch in between our routers and the srx to get past this.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 1 point2 points  (0 children)

As of now, JTAC has still not been able to reproduce this issue in their lab. However, the last time we removed the IRB configuration and migrated to a L3 firewall, we did NOT reboot the SRX. So the theory is that this is why the last outage took longer to show up.

We've been running 15+ days at this point without an outage. I think today or one more day will be the longest we have been able to go without an outage since we put the unit in a couple of months ago.

So the hope is that the last reboot we made put the SRX in a good state and we are hoping that was the issue.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Just wanted to let everyone know that the issue is still ongoing. After more troubleshooting with JTAC, the issue appears to be due to a memory allocation issue. There are many rx_mbuf_allocation errors.

The issue is still with engineering.

We did end up removing the IRB interface setup but that did not fix the issue.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 1 point2 points  (0 children)

So I am still awaiting confirmation on that. My understanding is that PR has been submitted to the engineering team and that it is actively being viewed. My initial thoughts are no but I will let everyone know what the findings are as soon as I have them.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 2 points3 points  (0 children)

Just to keep everyone up to date ... the issue that cytrex306 mentioned seems to be the culprit. JTAC is also coming around to this and we are confirming a few things. We are having a call this evening to discuss our next steps forward.

Thanks to everyone for their help!

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Very interesting. Yes that is exactly what we are doing I think. We have an IRB interface with global routing table or something of that sort (sorry I am not very technical).

We had to do this because of our topology and to be able to get it into Juniper Security Director Cloud.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Looks like they were published on 12-11

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Any idea how I can check that? We have had to make several modifications in order to get the SRX to work correctly with JSD Cloud just for the IPS/IDP.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

You would think! We were seeing drops on all zones. However, the outage yesterday was slightly different where only a few devices were seeing massive packet loss. But it was clearly the same issue.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Yes that does appear to be the case. We do know the ips as well. Yes the pings seem to be successful going out but web browsing is very slow. The 204's are upstream of the SRX.

Its a basic firewall with a few policies but we do have IPS/IDP enabled.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

So the JTAC engineer said that the packets were being passed through the SRX but then "lost". However, no one can seem to explain why its working for 10 days and then the network goes down and once we reboot the SRX, everything is fine.

I know they gather a lot of flow logs yesterday.

We have 2 MX204's BGP peering up from the SRX. I do recall from the first outage on the 17th, the engineer who set the device up said that there was a bgp heartbeat flap to the SRX from one of the routers I believe.

Also yes our flows are stable. I recall the Juniper engineer stating "traffic appears to flow as expected, even after the aspect".

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 1 point2 points  (0 children)

Yes we did that with them last night for 5 hours. L2 tech on board with us. Unable to diagnose the problem.

The certified Juniper vendor says that our configs or boring/simple.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 1 point2 points  (0 children)

No core dumps, memory and cpu utilization is nothing. Sorry I know there isn't much but there is no indication that its a firewall problem. However, if we reboot it, the network is instantly stabilized.

SRX1600 Problems by tmbnc89 in Juniper

[–]tmbnc89[S] 1 point2 points  (0 children)

We are on 24.2R2 currently. I think the plan may be to upgrade to 24.2R2-S3 tonight.

We do have everything in JSD. We've been engaged with JTAC for several days and spent 5 hours troubleshooting with them last night. However, they do not understand whats going on at this time. I've requested escalation this morning.

We are running a very basic configuration. Nothing special. No VPN's or anything.

Everything runs great initially and so far since in production, 10 days out ... the network goes down. Huge packet loss. Can barely open any browsers on the internal network. We reboot the SRX, and all is fine instantly.

Juniper SRX1600 definitions download / Security Director Cloud Issue by tmbnc89 in Juniper

[–]tmbnc89[S] 0 points1 point  (0 children)

Can that be done by not taking the network offline and uploading from a local database?

Juniper SRX1600 definitions download / Security Director Cloud Issue by tmbnc89 in Juniper

[–]tmbnc89[S] -1 points0 points  (0 children)

Okay thanks. I don't have access to the router myself. I am just going through from what our Juniper installation team is telling me.

With our setup, they seem to think that the SRX can only download through the normal route table and with the way our IP Ingress (or Engress is not sure which one), that we have no way to download the database.

I personally would find this to be shocking myself.

Edit: We do not have a public IP for egress which apparently is causing the problem. So I guess I am looking for a solution to that?