Enjoy the Podcast?
This session is for DBAs that want to understand more about networking.
Once data leaves your SQL Server, do you know what happens, or is the world of networking a black box to you? Would you like to know how data is packaged up and transmitted to other systems, and what to do when things go wrong? Are you tired of being frustrated with the network team?
In this session, we introduce how data moves between systems on networks, then look at TCP/IP internals. We’ll discuss real world scenarios showing you how your network’s performance impacts the performance of your SQL Server and even your recovery objectives.
Why I Want to Present This Session:
My career started on the network side. I’ve moved up the stack and I’m now a data professional. I can take that experience, and share it with you…so that you can understand how networking can impact your SQL Server’s availability and performance.
Brent Ozar: Alright, so in this session at GroupBy, Anthony Nocentino is going to be coming to you live from my office talking about networking internals for the SQL Server professional. Take it away, Anthony.
Anthony Nocentino: Thank you sir, thanks for having me at GroupBy, Brent, again, thanks for the office. But this is networking internals for the SQL Server professional, and I am trying to figure out this new rig I have. So that’s me, I’m Anthony Nocentino, which you can see me in person as well, so that’s kind of interesting. I’m a consultant and trainer and the founder of Centino Systems. I specialize in system architecture and performance, so I like to build systems and I like to make them go fast. That’s one of the specialties that I do. I have an undergraduate degree and a Masters degree in computer science, I’m trying to finish a Ph.D., and you’ve probably heard this joke before, but I’ve been in the last year of my Ph.D. for the last five years.
Data platform MVP, friend of Redgate, and there’s all my contact information. Please contact me, I love having discussions about SQL Server, Linux, PowerShell, kind of my pet topics. Follow me on Twitter, that’s the main way that I get information out to community, blog posts, things like that. Speaking of blog, there it is. And I’m also a Pluralsight author, and so this course, this presentation is built off of a course that I did do for Pluralsight, Linux internals – Linux networking internals, and I kind of took that and crossed the domain into the SQL Server world. So if you want to go and learn about Linux networking internals for about four hours, it’s super fun, go there and check out that course. It is on Linux, but the concepts still all apply.
And so here’s what we’re going to head today guys. We’re going to talk about network topologies and the OSI model, and I’ve given this presentation before, and I’ve been accused of being a little bit too academic about it, but I think if you understand the core foundations behind things, like the actual plumbing about how data moves in networks, you’ll become a more general problem solver, so having those concepts naked to you guys is something that – it’s like how I like to teach. If I just teach you one way to solve a problem with one particular tool, well yes, you can solve that problem but if you really understand the plumbing, then you can really become a more general problem solver.
So we’ll start out with a little bit of theory in network topologies and the OSI model, but then we’ll shift into the things that you’re actually using in your data center on your SQL Servers because you know, SQL Server really wouldn’t work very well if it didn’t have to send its data somewhere. And the way that it does that is – well, it’s over a network, right? And that’s the idea behind this talk is so that you guys can have that meaningful conversation with your networking team. So the core technologies that you guys are probably using in your data center are going to Ethernet, IP, and TCP.
Of course there wouldn’t be a technical presentation without a ton of demos and I have a ton of demonstrations and if you guys want to kind of go in a particular direction and get a little bit nerdier about stuff that we’re doing in the demos, please do tell, or do ask. And then I’m going to apply that directly to SQL Server. We’re going to talk about some performance problems I can percolate up, and kind of one of my pet topics when I do presentations in the SQL community is availability problems. Things like AGs, log shipping, backup, all that stuff writes data to the network and if you have a poorly performing network, well, you’re probably going to have availability problems.
So this entire presentation could end with this slide. We need to talk about three core concepts in networking. There’s throughput, latency, and reliability, and we’re talking about throughput, we’re talking about how much data I can move in a time interval. So things like megabytes per second, bits per second, Gbits per second, that’s the unit of measure and how much I’m able to move in a unit of time. Latency is how long that transmission takes from the beginning of the transmission until the end of the transmission, and these things are very tightly interrelated and we’re going to talk about how that works a little bit later. And reliability, certainly if I transmit a message I need it to get to where it needs to go to, and these three things are really the core of the entire presentation, and we could just stop here and as long as those three things are good, your network’s going to work well. But we’re going to look at why they’re interrelated throughout the presentation.
So I love this slide. When I did this presentation earlier this year, everyone kind of was like, “Wow, this is an internals talk, why are you showing my this basic diagram?” And I intentionally made it so it looked like a little kid drew it on purpose. And so, I wanted to just get some nomenclature out of the way… There’s a LAN and a WAN, right? And LANs, the main function behind a LAN is to interchange data between hosts at a very high speed, low latency number. So if the computation and the data – we want those things physically adjacent so they can exchange data very, very quickly. Now, a wide area network, or WAN is generally speaking, is going to interconnect geographically dispersed sites. I’m going to connect to the data center to the corporate office to corporate office to the remote office, or whatever that is, and they exchange data between those geographically dispersed distances, it’s going to take time, which impacts throughput and our ability to exchange information between our systems. So when we’re designing systems, we generally don’t want to put the producer and the consumer of information very far away because you’ll have – likely have poorly performing problems.
I’m speaking in general terms when I talk about that. We can certainly do special things. We can have layer two networks that bridge over WANs, or we can have layer three networks that are routing inside of a data center, all that stuff, but we’re talking about center cut technologies like the main stuff that most people are doing. If you have those guru network ninjas on your team that do those fancy things and make your life a little easier, take advantage of that.
So the OSI model, the Open Systems Interconnect model is the classic way to represent what a network looks like, and we’re going to break it up into layers, there are seven layers. Each of those layers has a primary function of doing one particular job and doing it really well, and at the very bottom of the layers, the physical layer, it’s the actual wire, it’s the bits and the bytes that signals the electrons to hit the wire, the light that hits the fiber, that stuff.
Most commonly you’re going to hear the term Ethernet thrown around at this label, or this layer, so people give you an Ethernet cable, things like that. But that’s really not quite what Ethernet is. Ethernet really functions at the data link layer, and the data links layer is responsible for encoding messages, when does the message start, when does the message stop, and that’s what Ethernet really is. It’s the protocol, it’s the messaging protocol at layer two, or the data link layer, and that unit of transfer is the frame, so you hear that term Ethernet frame. Ethernet is also responsible for access control. Not in the sense of security, but in the sense of who gets to transmit on a wire at a particular point in time, and it has impacts on performance, which we’ll talk about in a little bit.
Then there’s the network layer. The network layer’s primary responsibility is addressing and routing. How do I get from where I am right now to where I need to go, and uniquely identifying me in that network of computers. And the primary unit of transfer is the packet. People throw the term packet around a lot, up and down the stack, I’m losing packets here, I’m dropping packets there, but from a nomenclature standpoint, transfer unit, networking layer is packet. And so we’re going to focus on IP and some other devices that function at this layer, there are going to be routers and your actual computers, which we’ll introduce a little bit later.
And as we keep moving up the stack, we have the transport layer, and this is pretty much one of the cooler – I don’t want to say cooler layers, that’s really nerdy, but you know, whatever. It’s responsible for data segmentation and delivery. The concept is the operating system is going to write byte string data into what’s called a socket, and the socket’s responsibility is to hand that information down into TCPI, or TCP, and TCP’s primary function is to segment that data, to break it up into smaller transmittable chunks so they can pass it down to the network layer to be shipped to where it needs to go. And so it’s the responsibility of the transport layer to do that. So those transfer units are called segments, or TCP segments. So when you’re losing data, you want to use the right term when you’re talking about – when you’re addressing data, I don’t say losing, because it’s not necessarily a bad thing, but if you’re addressing that function at what’s called a segment, and the protocols that you’re going to see at this layer are TCP and UDP. We’re going to zoom in a lot on TCP today. If you want to talk about UDP we can but we’re not going to focus on it that much.
As we move up the stack, we talked a little bit about the application level socket. That’s where the application will write into if it functions up the session layer, so the actual byte string data that’s represented in-memory inside of our applications will write for the socket. And the socket is a facility of the operating system, so the applications don’t have to care about what’s going on under the hood in the transport layer. It just writes the data to the socket and the socket does the work to break it up, put it on the network, put it on the wire – put it in the frames, and then put it on the wire and send those signals to where they need to go.
A little bit higher with the presentation layer, that’s for the translation of data into the application. So it’s basically what this data might look like if there’s any special – I want to say framing, but formatting that needs to be presented up into the app. There’s another model that people use other than the OSI model, it’s called the TCPIP model, and the application, presentation and session layers are all kind of smushed together into what’s called the application, and the remaining layers are transport, network and data link, which makes kind of sense because if I’m a network engineer, if someone wrote data into that socket and I’m able to read it, that’s kind of where my job starts. And so those are the two models, but this is the one I learned in school, and this is the one I’m teaching you guys.
So let’s talk about how data actually moves through a network, and the core concept is encapsulation. So if I’m an application, I need to do something. In this case let’s say I’m a web browser. Functioning at the application layer there, so that’s the OSI layer, application. I’m going to request something like a webpage and that’s going to push down that code into this session and the session is going to write that byte stream data into the socket, which is going to go put that data into the transport layer. So we see the transmission from it, or translation from application level constructs, like HTTP GET, it’s going to be broken up into a chunk and segment it and put into the transport layer, and what happens now is we take that data, that byte stream data and put a TCP header on it. And a TCP header’s responsibility is to say this is where I’m coming from, from an application standpoint, and this is where I’m going to go to on my destination system from the application standpoint. And we’ll look at the internals of that a little bit later, the header itself.
Once that information is written at the transport layer, then I need to go and figure out where this packet now needs to go, so I have a source and destination IP address, I put that inside the header, and then I figure out what I’m going to do with that. I write that down into in this case, it would be Ethernet, so we’re going to write that down into the data link layer, we tack on a header, and a trailer. So the header information has source and destination information from my Ethernet network, and also a trailer to make sure the electron that hit the wire actually come out and they have a CRC check and when I read them back in I can check to see if those actually are the things that were written to the wire on the other side of the fence. And there’s our electron binary things at the physical layer. Writes that across the wire, transmits to the other host, and now this is on the other side of the fence, those things are read up off of the wire, and the framing occurs, so we know when it starts and stops based on this function at the data link layer, and then we hit the network layer. Strip the frame off, we figure out from the IP information where that needs to go from a host’s standpoint.
Now there’s an interesting case right here. There’s one of two things that can happen. If this piece of data is destined for a host, I’m just going to pass it up to the transport layer and go ahead and de-encapsulate the information. But if this piece of information needs to go to another network, it’s the responsibility of the network to decide where this thing needs to go, and so the router would read this information, make a new decision on the destination of this particular IP packet, and potentially forward it out another interface to the router and push it back down the data link layer. So you can see it go up to layer three, a decision is made, and it would get re-encapsulated and pushed back down and signaled to the next hop device. But let’s just say this was destined for a particular machine, that machine reads the information in the IP packet, strips off the header, strips off the TCP information; from that it knows which application to send that information to on the remote host. It goes ahead and processes the HTTP GET, get it a webpage, and the server responds to that request. That’s how data moves through a network.
There are some things that obviously participate in this conversation. There’s the network interface controller of the neck. Ethernet’s functioning that layer too, our routers and PCs – I put PCs in there because I had … it didn’t have to be a PC, but your computers and servers and whatnot will function at layer three. The TCP ports, that’s how we establish where it’s coming from and where it needs to go to. The socket, we talked about how the applications writes into the socket, the socket is an operated construct that is provided to be – abstracts the networks so we don’t have to worry about the plumbing under the hood when we’re actually writing things like code inside of our applications, and the actual writing processes are functioned inside the application, and those are just exchanging data. They don’t really know that they have to deal with anything under all these different layers that are occurring under the hood. Encapsulation, it’s so exciting.
If there are already questions, do interrupt me. I know I kind of just jumped in the deep end there. If you do have any questions, I would prefer to get those out. So let’s zoom in on the data link layer. These are the things that we’re going to look at the specific technology that I think you guys are using in your data centers today. And so the primary thing that we are using at the data link layer are layer two, in our data centers is Ethernet. And we talked about how it’s responsible for encoding the messages and transmitting the frames on our local area network, exchange those things, source and destination, at a very high data rate.
Systems or end stations are uniquely identified on a Ethernet by what’s called a MAC Address or Media Access Control. Media Access Control, I told you one of the responsibilities of Ethernet or the data link layer is access control.
MTU, we’ll talk about MTU now. MTU is the total length of the frame payload. It’s how much data I can put into our frame, and it’s not fragmentable, which means if I have hosts on the same LAN, they all have to communicate with the same MTU because we need to be able to know when the message starts and when the message stops. Often times, people talk about things like jumbo frames, and really what jumbo frames bring to the table is they minimize the number of messages that I have to send at this layer. So if I have larger frames, I can put more data into the frames, and its workload dependent. If your workload benefits from the fact that it can fully populate jumbo frames every single time, then it might be something that you want to evaluate. Just test it out, but know that you can’t just change it at one host. The sender and the receiver and all other machines on that same layer two segment have to have the same MTU. You can – with inner layer two segment, if you need to translate to another layer two segment and you do have a different MTU in that segment, the traffic would have to pass through a router. We’ll talk about that in a little bit.
And so the two things that we really want out of our data link layer are very high bandwidth and very low latency. So the devices that we’re going to see at this layer are going to be switches and networking. And one of the ways that we do provide very high speed, very low latency networks is that really the protocol is very simple. There’s not a lot of action going on inside of the Ethernet frame. Where I’m going to, where I’m going from, so my source and destination MAC Address, my type, that’s the type of traffic coming from the layer three protocol. So in most cases, that’s going to be IP or maybe ICMP if I’m doing ping, things like that. So that’s going to be marker encoded in two bytes right there. The actual data is the payload, and there’s that CRC check on the end that makes sure that the data that’s translated on the wire is actually what’s read off the wire on the destination.
So when we’re talking about layer two networking, the things that we need to look out for is, well, do we have a link? Is the wire actually there? The green lights, are those blinking properly. We also have to worry about speed and duplex mismatches, and this is huge for a performance standpoint because if we’re off in either one of these, we don’t know and basically how that encode messages or how to have access control on the wire, and that impacts throughput, because what will happen is we’ll have to start retransmitting data, and it’s not the responsibility of Ethernet to retransmit data, higher protocols in stack have to retransmit the data and that might have a penalty on our performance. Certainly will have a penalty on our performance.
And so this is really important when we are deploying systems. I need to make sure that the thing that I purchase, my one Gbit, ten Gbit, 40Gbit Ethernet is actually giving me the throughput that I need. I don’t want to go under production and not have touched that because the last thing you want to find out is when you’re in production that 40GB link that you have is giving you like a gig a second. Or at the other case, other than pre-production testing is something changed. Did you get a new network interface control on your server? Did somebody change out the switchboard? You saw a cliff in performance, right, and these are the kind of things that we need to look out for. You’re going to see – most of the time, to find this issue, if you are having a duplex or speed issue, is you’ll see this on the switchboards, they’ll measure, they’ll have things called like CRC errors and runts and giants because it wasn’t able to signal the frame correctly. You also – you might see this in higher end interface cards where they’ll have performance data inside of Windows, or Linux, whatever operates you’re using and get that information out of there, but generally, this probably the switch is the way to go.
Another thing to look out for at layer two is CPU saturation on switches. Switches have – enterprise switches have application integrated circuits. Application specific integrated circuits, ASICs, built in to the boards that manage the switching function. They read the frames, they write their frames, and they do it really well, and really, really fast. They also have buffers per port. Less quality or lower enterprise switches, those not tier one switches share that stuff. They share those resources, the ASICs and the memory buffers amongst the collection of ports. So it’s possible that if someone saturated a port it really impact the collection of ports and we’re not going to have a lot of insight into this, but if you troubleshot the upper layers and you’re at the point where you’re having performance anomalies inside of your application and your systems, this might be something that you want to bring up to your networking team and say, “Hey listen, under extreme load, I’m seeing this but it’s happening maybe to a collection of systems that are physically adjacent in a switch inside of our data center.”
So IP, layer three is a network layer, and what we’re going to be using most in our data centers is IP. Maybe this is 20 years ago, that whole IPX – SPX thing, but now we’re just showing our age. And it’s a responsibility of IP or the network layer really is for addressing and routing between networks, uniquely identifying hosts and being able to determine where to go next. The unit of transfer is a packet, and inside that packet, it’s going to have addressing information in our layer four data. We’re going to encapsulate that data from layer four, stick it in IP packet, put a source and destination on there, and ship it off to where it needs to go.
The devices that function at this layer will be routers, firewalls in your actual servers. Firewalls are routers that they look at a little bit further into the packet and into the transport layer and things like that as they go up the stack and help make more specific decisions about what to do. They’re always going to be multi-interfaced devices where they receive traffic on one interface, make a decision on that traffic and write it to another interface. And also your machines, your servers and your work state, every computer that you own is a router as well, even if it has one interface, because it has to make the determination, is the data that I’m sending, is it destined for this network that I’m on or do I need to send it to another network? And we’ll look at the IP routing decision process here in a few seconds.
So the function of IP connects to networks, its connectionless, meaning that IP has no concept of state. If I drop the packet, IP isn’t going to request to retransmission. It’s just going to drop the packet. It’s the responsibility, again, of upper layer protocols to maintain do I need to get this information to where it needs to go. Inside of IP, the source and destination are defined in the IP header, but the network is what determines the path. There’s no pathing information inside the IP packet. I write the packet onto the network and the routers, the network itself determine what to do with that information. We’ve discussed already that routers forward packets, they’ll read on one interface, they’ll look at the information it has, look at what’s called a routing table, make a decision, and write it out another interface. Literally, and that concept it’s read and write, it’s moving data between interfaces. Routers make that decision with – routers even – not only your PCs and your servers, but actual routers, the devices themselves make that decision with what’s called a routing table. It defines what to do with the packet. So I have a packet, I have the source and destination information, I look at the destination, I have this thing called a network mask that helps me specify the narrowness I guess you could say, or the breadth of a network, and I have the next top interface that I’m going to write it down to, and that’s it. That’s what the routing table has and it has that information for all of the potentially connected networks on that router, and a concept that’s called a default route, which we’ll explore in a second. And so your routing process is pretty straightforward. Look at the information, look at the router table, make a decision, write it to that next interface.
Routers connect networks, and we introduced this when we talked about the difference between a LAN and a WAN. Routers connect networks with generally higher latency connections at lower bandwidth connections, so those things, it’s up to us to make sure that we’re putting the resources in the right places so the producers and consumers can potentially aren’t traversing wider network link through a router, or even if they are, that we know what that latency is and involves potentially there’s an impact to our applications.
So ARP, ARP’s an interesting one. I kind of toy around with where to put this as a concept. Do I put it back in layer two, or would I leave it in layer three? So I tested it out and I’m going to try it in layer three today. You guys let me know how I do. So ARP. ARP is short for Address Resolution Protocol, and its primary function is to bind the layer two MAC Address of an individual ARP interface controller to the layer three IP address of that host. That’s a sign to that controller, because if I’m sending data to another machine on a local segment and I need to transmit a piece of information from IP 1 to IP 2, there’s no real routing process that can occur. We’re not really functioning at layer three, we’re kind of functioning at layer two, and I need to figure out how do I find the MAC Address to send to put into my Ethernet frame for that particular IP that I’m looking for. I need to be able to say, “Who has this IP address on my local segment?” Because we’re not routing that packet, we’re not asking someone to make a decision. So what happens with ARP is the Ethernet controller for the device that’s making a transmission broadcasts it out to all hosts in the layer two segment, so those directly connected, Ethernet-connected devices broadcasts out and says, “Who has this IP address?” And the machine that has it will unicast or individually send the frame from itself back to the sender and say, “I have that address, here’s my MAC Address” and then the transmission can occur.
So the IP routing decision process, this is how each and every machine and each and every router in the world routes packets. It looks at the destination address in the IP header, and we’re going to look at the internals of an IP header here in a second. It compares that with the network mask to determine is the system I’m trying to communicate to, it is local or remote, is it on a destination network? If it’s on a destination network then I got to do something specific with it. If it’s on my local network, well, we just talked about what it does with it with ARP. If it’s local delivery, we just put it out the direct connected interface, we ARP out and that process, that transmission occurs. But if have to – if it’s not on my local network, if it’s somewhere else, and that process has to determine by looking at is the destination address and the MAC Address, and it makes the determination if it’s local or not based on the bit values between those two, it does some and or information and determines if it’s local or not.
If it’s not local, then it’s going to look at the local routing table on the machine and determine what to do with it next and if it has a specific route, it will put it down on that network interface. We talked about how routers would read, they look at the frame or they look at the packet, and they’ll write it on a particular interface. It’s the exact process there.
If it doesn’t have a route, if there’s no route to find in the routing table, very specific information, if there’s no route at all, it sends it to what’s called a default route or your default gateway. This means I don’t know what to do with this piece of information, I send it to the default gateway, goes to the default gateway and we really hope that that default gateway knows what to do with the packet – the IP I’m trying to communicate to, and that’s the kind of the concept is the network determines the path. We don’t know, so we have to go and pass it a little bit closer to where it needs to go, and that process will occur from router to router to router to router until we get to the destination.
So inside of an IP packet, there’s a bunch of information other than just source and destination address. You can see those there at the bottom. There’s version information, we have IP 4 and IPV 6 so they’re right up there in the beginning is the version information. We have the header’s length, we have – headers can be of variable size, so that’s recorded right there. Type of service or DiffServ is a way for network engineers to apply policy at the IP level, so I can mark traffic in a particular way, I need it to pave this way, so that’s a little bit out of scope for this conversation, but that’s what that is.
The total length, so I didn’t really bring up fragmentation yet, but an IP transmission can be broken up into smaller chunks and the primary reason is there are variants MTU. There’s a MTU on one particular media… might have a 9000-byte transmission, break that into smaller chunks, well, I need to be able to reassemble that if it’s routed so that it can get to where it needs to go and be represented properly. So the total length is the length of the entire non-fragmented length of the transmission. The identifier is a numeric identifier where we are in the transmission. Flags determine what type of transmission this is, so we can set a flag that says don’t fragment, which is common when you’re doing some testing, we need to probe around things like MTU size. Fragment offset is actually the byte stream value in the IP transmission that I’m at, so with those three pieces of information, total length identifier and fragment offset I can determine and reassemble an IP datagram if I need to.
Time to live is a key concept into the internet, so we talked about how the network determines the path and it just kind of hands that data off to the next router. Well, if there’s a routing anomaly, well, that can go on forever because I’m just going to keep passing that thing closer to where I think that it needs to go when I’m making that routing decision, so what time to live does is each time a routing decision is made on an IP packet, it decrements this value, which is an integer value, from either they’re going to be 128 or 64 depending on your operating system, and it decrements that value. So if you get down to one and it hits a router, what’s going to happen is the router is going to decrement that value and go, I’m done with you, it’s going to discard that packet and it’s going to send a message back to the original sender right from the source address and say time to live exceeded. So we’ve probably all seen that when we’re working on a networking issue. Well, that’s why it occurs. We have protocol – sir.
Erik Darling: We do have a couple questions over in the Slack if you want to answer those or if you want to get through this slide first.
Anthony Nocentino: We can do them now. Go ahead, do you want to read them or should I pull them out?
Erik Darling: I’ll go read them. No problem at all. Vladi wants to know if you are going to cover layer eight. CIO said, “Our new firewall will monitor that.”
Anthony Nocentino: I hope that’s a joke.
Erik Darling: It’s a Russian name so I’m not sure whether that’s a joke or not. There’s another question…
Anthony Nocentino: If they are maybe I missed something so.
Erik Darling: Another is Brent, Brent assures us that it’s a joke. Mark V asks, “How do you measure throughput latency in the third parameter you mentioned? It will be great to measure before versus after making network changes.”
Anthony Nocentino: I’m going to show you some demonstrations how to measure. We’re going to do that together at the end.
Erik Darling: Okay, and then Vladi asks again, “What is ARP poisioning?”
Anthony Nocentino: Alright, so that’s I think where I can inject or reply to an ARP request. I don’t know specifically because that kind of leans towards security topics which I am not going to act like I know what I’m talking about over there.
Erik Darling: No problem, I am going to go back into my shell.
Anthony Nocentino: Okay, thank you. So protocol, that’s going to be things coming from the top, so like TCP and whatnot, what’s getting passed down from the upper layers of our protocol sack. Header checksum, we need to make sure that this thing doesn’t get changed in any way. Source and destination address, IP options and padding. Padding is just going to be a little bit of fluff because we need to fill out the entire data structure because we do have a variable length.
So IP right, so things to look out for. There’s a lot of action here. So routing issues we talked about and I only say routing issues in the sense that we’re going to troubleshoot routing issues, but routing issues in the sense that did the network change from what I expect it to be. Functioning this way one way, and now it functions another way, and that has potentially have an impact on the performance of my system.
One of the things that’s interesting nowadays is you guys might have heard the term MPLS, or multi-protocol layer switching. That actually abstracts out the physical network from us, so we’ll basically pop in one side of the network and pop out the other side of the network and have a look like, one hop and from a networking standpoint and that can be very challenging to troubleshoot. So if you are using that, you probably need to be hanging out with your network people a lot more just to make sure that you guys are on the same page about what’s going on there.
Time to live, we introduced that concept. We looked at the internals of an IP packet and that can cause you an issue because simply, if you can’t reach it and your router’s not going to pass your traffic and you’re discarding your data, well, that’s going to be pretty easy to see if you’re using the tool like traceroute, but that information might not percolate up into your application, so things like Wireshark can yield that information to you and we’ll look at that in a demo.
Bad network masks or mismatching network masks more accurately is there’s a network mask that’s defined throughout layer three segment is that configured correctly on my device or other devices on the network. If it’s not, then I could have discovery issues or discoverability issues I can’t really determine where my network starts and stops because I’m not able to take that information in the IP address with the network mask, make the right decision on is this local or remote, and then drop packets right, simple if my datagrams are getting – or my packets are getting discarded by intermediate routers, maybe because they’re having issues, it’s kind of hard for me to figure that out but if we are using a tool like Wireshark, you can kind of snip those things out, pun intended, and we’ll look at that a little bit later.
Misplaced resources, we kind of beat that concept a little bit when I talked about producers, consumers and that being positioned correctly in our network. You want to make sure that those things are physically adjacent if we need high bandwidth low latency communications. Routers are choke points. Simple as that. If you think about the concept of what a router does, it takes all the information on subnet and sends it out one interface so that it can get a little bit closer to its destination. So often times, those things can be bandwidth or CPU constrained. Maybe they’re misprovisioned initially, maybe we just grew over time, and those things can change, so we need to make sure, and like check in every once in a while and say, “How are we doing on bandwidth and CPU?” If you are having bandwidth and CPU problems on your router, you’re going to see things like drop packets, you’re going to see an increase in latency. And those issues will percolate up higher into your protocol stack and cause you application issues.
One thing that’s commonly overlooked in networking is name resolution latency. How long it took for my DNS server to reply to the query, like I asked for a domain, that thing is 40 milliseconds away and I got to wait 40 milliseconds for a reply back. That’s 40 milliseconds my application had to wait before it could start an IP connection, or get that thing closer to where it needs to go. And so name resolution latency is often overlooked performance issue.
Windows systems cache, so they’ll cache after that first query cache information locally. Linux systems by default actually do not cache. You have to add a thing called the name service caching data or NSCD to your Linux system to cache that information. If you are using – most commonly you’ll see that used in schema where you have centralized authentication like AD in I guess you could say a cross-platform role. So look out for that when you’re troubleshooting and that will pop up pretty evidently either in your application using APM or application monitoring facility, or in Wireshark as well.
So a layer four at TCP, this is where it’s going to get really fun you guys. TCP is simply – this is the TCP and IP are like why the internet works and this is the heartbeat of what goes on inside of the internet, and if these things didn’t exist, I literally wouldn’t be talking to you right now.
So we talked about in the beginning that it’s the responsibility of layer four for segmentation, ordering of data and reliable deliveries. We’re adding a few dimensions there in addition to segmentation. So segmentation right, taking that byte stream data from the socket, from the application, breaking it up into chunks, making sure that chunk gets to where it needs to go in the right order, right, I don’t need to be shuffling the bits and the bytes around because if I did it and I write it to a buffer in the application and it’s not in the right order, that’s not good, right? And I also need to make sure that it gets there. Reliable delivery, did that thing that I transmitted at the application level get to where it needs to go? Either the transfer as a segment. We talked about how application data is broken up into segments, we push that data down in the IP and IP is responsible for the delivery of this information at coming up from layer four.
So we talked about in the beginning that TCP can uniquely identify where that information goes, what application it came from, what application it needs to go to, and this is how it does that right. It uniquely identifies the connection by these four things. Source IP, source port, right, there’s TCP ports, those things that get thrown around, numeric identifier, basically of what socket is being opened by the operating system, and the destination IP and the destination port. So those four things I write data into one side, it pops out the other side of the network at the socket level on the client that I’m transmitting the data to. So devices we’re going to see at this layer are going to be your operating system. It’s your operating system’s responsibility to do what layer four does, right, segmentation, ordering, reliable delivery. The things we’re also going to see at this level layer are application are firewalls, load balancers, and WAN optimizers, and we’re not really going to go into deep dive conversation about any of these things, but I strongly caution you if you see change of management, any of these buzz words fly up, pay attention because these things can intercept and interact with layer four TCP and start responding to your network requests and their connections and changing the way that they operate. So if you see these things pop up in your route application or firewalls, load balancers, WAN optimizers, red flag, talk to your network people and figure out what’s going on.
I was involved in a situation a couple years ago where someone put a WAN optimizer in and it was one of those databases where it was proprietary database of a Windows networking and what happened is the WAN optimizer would respond to the write request of the remote database and it would – the database certainly didn’t see the write request because the WAN optimizer was completing the write at the network layer. It’s crazy bananas, and the database got corrupt. So these things are definitely red flags. Not that we’re going to dive into this, but if you see these things pop up, go talk to your network people, they’re probably not going to talk to you. Figure out what project was bought, go get the manual, read it, that’s what I do, and figure out how it’s going to impact your application.
So TCP’s killer performance optimization, so TCP requires positive acknowledgment for each and every segment transmitted, right, we talked reliable, ordered segmentation, those things, right, so we need to make sure that what it’s transmitting gets to where it needs to go. And if I needed to acknowledge every single segment, right, before I transmitted the next segment, I would never be able to fully utilize that links capacity between the sender and the receiver. I have to write a segment in, I have to wait, and then I get the reply. Think like synchronous transmissions in AGs conceptually, but we’re not going to do that because of the latency. Latency is slow if I had to wait for that. Only be able to transmit a segment size worth of data, so what TCP does is allows you to have a number of unacknowledged segments transmitted between the client and sender. Sender and client. So this is how bandwidth and latency become tightly correlated. We’re going to go through that now.
So the number of unacknowledged segments that TCP can set … Literally, it’ll write data in the socket and make multiple segments and start transmitting those right, and if they’re unacknowledged, then what it can do is its being able send more and more and more data and more fully utilized your links capacity. But this is really a function of latency because what’ll happen is if TCP doesn’t get replies back to those segments that it’s transmitting in a timely fashion, it will kind of start restricting how many unacknowledged segments it can transmit in a time interval. And it also reacts if you drop any packets. So if you drop packets, your what’s called a congestion window, that’s the concept that I’m describing here, your congestion window will shrink, it will shrink very rapidly too, so you’ll see a dramatic reduction in throughput if you drop the packet.
So if you’re ever downloading something on the internet, do you ever see how it kind of starts off slow and gets fast over time? Well, that’s what’s happening, right. TCP is realizing that your network connection is reliable and it’s able to figure out the number of unacknowledged segments that it can have in flight at a particular point in time and starts fully utilizing your links capacity.
What’s interesting though when I talk about latency and change in latency that can impact throughput, it’s not just change in latency but even static latency, meaning if I have a WAN connection between two sites and it’s 10, 20, 30, 40 milliseconds away, the combination of the round trip latency and the actual interfaces link capacity determine how much data in individual TCP steam can consume. So I was involved in a project late last year, and I was brought in to install or deploy multiple site availability group configuration, so synchronous within the data center, asynchronous to the remote data center DR and – sir.
Erik Darling: Before you dig into this slide, I really apologize, there was a question that I missed from about 10 or so minutes ago, it was, which layer would nick drivers fit in?
Anthony Nocentino: Nick drivers…
Erik Darling: [crosstalk] layer eight since I got…
Anthony Nocentino: So I guess it really would fit into a layer right, the driver’s responsibility is to expose the device to the operating system, so your operating system – I guess I can see where you’re heading. So it’s the function of the operating system as a device to be able to write byte stream, its own bytes into the device, and that’s going to happen at a much lower level inside of the kernel itself. It’s not necessarily – I mean, I guess you could argue like layer one, layer two conceptually, but…
Erik Darling: Okay.
Anthony Nocentino: Cool. Let’s talk about bad networking decisions. So what happened was they provisioned a 250Mbit link between their primary site and DR and 250Mbits, pretty spicy meatball we’re talking about throughput there remote data centers, right. 42 milliseconds of round-trip time, so I stood up the first AG in dev and I did a load test, because this is what we do. We touch things out to make sure they’re going to perform in the way that we think they’re going to perform, right? The data center was 42 milliseconds away, 250Mbits, I was only able to move 7.2Mbits a second between the primary and the secondary data center because the database mirroring endpoint right, which is the primary sub restraint to move information between two AG replicas functions over TCP and it’s a single TCP stream, right? And I could only move 7MB, which is not good because it needed about 30MB to actually be able to meet the recovery objectives, right, to make sure that the data that they’re writing in the primary sites is actually replicating to their DR in a timely fashion. So those two things are very tightly correlated.
Now, it’s not that we’d only ever be able to use 7Mbits a single TCP stream will only be able to use 7Mbits, so they were – you know, they weren’t banking on the fact that AGs had this. For the record, I was brought in after the circuit was purchased. And so you know, you can have multiple TCP streams so if I had a collection of AGs right, I’d have multiple TCP streams and be able to consume that link more consumably. So we had to make good decisions there, make sure we load test, and make sure that what we’re getting is what we’re paying for, or conceptually what is actually accurate from a technology standpoint.
Alternate – so we had to fix that. We actually had to have the link physically re-provisioned, they got it down to mid-20s or moving about 15Mbits a second now right, even though, 250 MB link, single TCP stream, 250Mbits per second.
So full control, so congestion window controls how the unacknowledged segments from the sender’s side of the world, sliding window is the receiver side of the world. So similar concept if you’re interested in AG internals, happens here as well in that the receiver of the information can signal back to the sender and say, “I’m having a resource constraint internally, I need you to back off” and it’ll shrink what’s called the sliding window, and reduce the number of unacknowledged segments. AGs effectively have the same concept of full control on their insides as well.
So the way that we transmit data between systems, we need to initiate a conversation, right? Your TCP connection establishment is the three-way handshake, right? And this is going to be a critical technique if we’re troubleshooting how connectivity issues between systems, and we’re going to look at that a little bit more closely in the demos today. But if I’m starting a communication between a sender and a receiver, I’m going to put a segment on the wire and it’s called a flag set inside of the header of TCP that’s called a synchronization flag, right, and I’m going to put a numeric value, in this case, N, I’m going to write that value into the TCP header and transmit that to my client that I’m trying to open a connection with, right? This is an establishment. That state is called SYN_SENT, right, and this is going to become like your absolute tool that you can go back to your network people and say this is not my problem. You’re in SYN_SENT, something is happening below me that’s preventing me from being able to transmit. Put that in memory bank for your long-term ninja fighting – network ninja fighting skills.
What’s going to happen is I’m going to receive that segment on the receiving side, and two things are going to occur on the receiving side. We’re going to acknowledge that initial packet, see I did, I mixed it up right there, segment, I’m going to acknowledge that segment N, right? The value that I received, I’m going to increment that value by one, write that information into the header, but I’m also going to send another synchronization flag inside the same segment and put another value M into there and send that back to the original sender, and that’s SYN_RECEIVED, and you’re not going to be in SYN_RECEIVED very often. That’s going to be a very quick transition in state, and that information is transmitted back to the original sender. We’re going to acknowledge M, right? So now we have two values. We have N and M, and what’s happened here is we’ve built the connection by direction, I opened a connection to – from client to sender and the sender – from sender to client excuse me, and the receiver opened the connection back up to me, so there’s bi-directional communication and we’ve established two initial synchronization values, N and M, and the state that we’re in now is established, which means I can actually transmit data between my two systems now bi-directionally.
So what happened here, it’s super critical, right? We established what are called the initial sequence numbers, N and M. They’re integer values, and it totally makes sense when you say this thing out loud, it’s critical to order to reliable delivery in both directions. Order, right? I can certainly establish the order of a transmission because I have numeric values that represent where it needs to go, and I also if I missing packet 27 in a stream, I can say, I lost packet 27 very easily, right? And so that information is readily available simply by this very simple, elegant technique that’s going to help us be able to track state.
TCP does have a concept that retransmissions so I can go re-request 27 if I lost it, but that would impact throughput like we’ve said. There’s several other states, but these are the primary ones that occur during connection setup. When we tear down connections of certain states and there’s also states for if there’s an anomaly and things need to go south in a particular fashion. So the TCP layer, things that we need to look out for is latency at any layer. If anything below me is injecting latency, whether it’s the physical media, like that 250Mbit circuit, whether it’s a dropped frame, dropped packet, all those things are – excuse me, I say dropped frame or dropped packet, but a buffer overrun on a switch or CPU latency on a router, right? Anything that’s going to inject latency, you’re going to see a variants in throughput, right? They’re tightly correlated and TCP responds very quickly to anomalies. Drop packets are obviously bad news because that’ll impact our sliding window and cause retransmits.
We talked a little bit about OS queuing, right? We talked talk about flow control. What can happen is inside of the operating system they can start queuing TCP segments … there’s a resource constraint internally. Firewalls will always forever be the pain in our side, we need to open communications to our systems, hopefully, we’re using firewalls on our operating systems nowadays and aid defense and depth strategy, I’m going to show you how to debug that a little bit in the demos. And now you know why row by agonizing row, I guess you say development, is so bad, right? Because now what I have to do right, if I’m writing a row and I have to transmit that thing across to network, I have to write it to a socket, put it into a TCP segment, put it into an IP header, put it into a frame, send it across the wire, and do that business again and de-encapsulate that data. If I had to do that for each and every transmission, that overhead is high, right?
If I just took all that information, I fully packed a segment that broke up into fully populated IP packets and broke into fully populated MTU or Ethernet frames that the MTU that I have for that network, you can see why that would be more efficient, right? I put more people on less train cars essentially.
One of the things that I do want to discuss because there’s a corollary is protocol overhead, and I didn’t really build this into the deck but it’s popped into my head right now. Conceptually speaking, if I do have something like a 1GB link, I’m never going to be able to fully realize 1GB, right? Not because of resource constraints, but because of protocol overhead, right? TCP will roughly – if you’re at 93% of link capacity, you’re as good as it gets. You can do a little bit of massaging to get close to like 97, 98, but again, we’re shooting for kind of the middle of the ground of what’s going on inside of corporate data centers today. So put that number in your head, 93%, and you should be happy.
So kind of to recap, data encapsulation, right? So we have data, right? We slap on TCP headers, we have data that goes into a IP segment, IP packet, I did it again, and then we put that thing in our frame and it gets off to where it needs to go. Man, Brent you are right, it is a lot faster when you’re just in here by yourself, or with people watching.
Erik Darling: It just flies by.
Anthony Nocentino: It’s demo time. So seriously, if you guys want to do something or you see something in the demos today that you want to look at a little bit closer, just raise your hand and say it because I’d rather have this be a discussion than me blasting all this nerdiness at you.
I was just talking about iPerf. I think it’s the coolest thing in the world.
Anthony Nocentino: Yes dude, it can totally like visualize exactly what’s occurring. It’s very…
Erik Darling: Server on one, and another thing on the other and just blast packets around you’re like, cool, it works.
Anthony Nocentino: Totally fun. So we’re going to use Wireshark. The network sniffers are going to give you a very, very high-level view of that. We’re going to look in ARP request, right? Because if we don’t – had we had the inability to complete an ARP request then we can’t transmit data, right? We’re going to look at trace route and we’re going to look specifically at time to live. We’re going to see…
Erik Darling: So, real quick. There was a question, it said – you said something about 93% that someone missed.
Anthony Nocentino: Okay, so 93%, so protocol overhead of a TCP transmission, right? So if I have – if I’m writing data between two systems, if I hit 93% of my links capacity, so if I have like a 1GB link or a 10GB link and I’m hitting 930 MBits a second or 9300 MBits per second for a 10GB link, that’s about as good as it’s going to get without any like performance tuning at the network level. So if you’re hitting like mid-90s or low-90s on your throughput, the measured throughput between client and sender, you’re as good as it gets.
Erik Darling: Which is pretty good. I think 93% is pretty decent.
Anthony Nocentino: Yes, and I mean, don’t throw in like virtualization overhead and resource contention and things like that, like if you’re in a VM, honestly, if I’m at a customer site and the VMs are in like the 600, 700 range, I’m like “You’re good” on 1GB links, right?
Erik Darling: I’m with you on that, man. VMs are weird.
Anthony Nocentino: Yes, but I mean, you know what you’re getting into when you have shared resources, so if…
Erik Darling: Some people do.
Anthony Nocentino: Okay.
Erik Darling: Like, what are those virtualized everything, I need a virtual box, got this and that, I’m good.
Anthony Nocentino: Yes, totally. Thank you, sir. We’ll look at traceroute, we’ll look at layer three just to visualize how things move through a network. Layer four we’ll look at netstat, right? So that we can look at that – I guess you’d say, tuple? The IP connection right, how we can identify a connection between two systems with the source IP, destination IP, source port and destination port. We’ll look at connections specifically blocked by firewall, so you can very quickly determine if that’s your issue. We’ll look at iPerf, we’re going to measure the impact of network latency. We’re literally going to inject some latency into a network today and we’re going to see a reduction in throughput, and similarly with reliability issues, we’re going to drop some packets today as well.
So on my laptop here I have two data centers. I make a really bad joke and I say data center one is under the left speaker and data center two is under the right speaker, right? And those two things are connected, they simulated WAN, and so I have a collection of VMs that make this happen. I have a virtual network that connects SQL A and B, so conceptually they’re on the same layer two segment, right? I have data center two where I have one SQL Server, SQL C, right? That is on its own layer two segment that’s connected by a router in between it’s a Linux box that’s got a network interface on both networks, right? Because remember, routers connect networks, so I’m connecting DC1 and DC2 with this Linux machine.
On that Linux machine, I can inject latency, and I can inject packet loss, reliability issues, and I’ll show you guys how to do that. I have some Shell scripts that I wrote to wrap around because the commands are hard to type in demos, but if I don’t show you that code remind me to show you the code.
We’re also going to look at – I added this today, I added it earlier this week, but we’re going to look at today, the impact of latency and reliability issues on AG throughput. It’s going to be very obvious, and one of those things I want you guys to be able to see.
So I can see you guys now because I’m not in presentation mode anymore. So this is Wireshark. How are we doing from a font size standpoint? You guys good or you want it a little bigger?
Erik Darling: Good, thank you.
Anthony Nocentino: So this is Wireshark right, and so right off the bat we have to choose which interface we want to listen on, right, because we’re going to be pulling frames off of the wire and doing what’s called protocol analysis and it’s the job of Wireshark to decode that stuff for us. And so we are going to function on – we’re going to listen on Ethernet…
Daniel Hutmacher: Everyone says here, “Anthony, are you using VyOS for the router?”
Anthony Nocentino: Am I using what?
Daniel Hutmacher: VyOS for the router.
Anthony Nocentino: No, it’s actually just a machine that I configured to route the information between the two networks. You can see here we can go a little nerdier because that’s how we roll, and I just have a collection and network connections on here that will connect so you can see this is Ethernet, effectively, Ethernet zero on my network, 167, whatever, network manager. So we have – that’s my outside connection, so that’s Brent’s local IP scheme guys, pay attention.
So inside that network I have a 1921681.1.1 so that’s the network interface that’s on DC1, our data center one, and that’s the network interface that connects data center two and I simply use this machine configured as a router to connect the two together. So that’s the CTL, I go into IP forward, literally, this is all you need to route packets between them. Since they’re directly connected, then there’s no complex routing schemes or reachability to other networks. This will just work almost out of the box that connect the machines together on a virtual network like this.
That got nerdy fast. So we’re going to sniff packets off Ethernet zero, and I’m going to put a filter on this because otherwise Dropbox, all those things will fly by and I want you guys to be able to zoom in on what we want to look at. So this is simply an order filter that says, give me all of my ARP requests, give me all the traffic for this particular IP address, give me all the traffic that meets what’s called ICMP or internet control message protocol. Things like ping, traceroute will pop up there.
So the first thing we’re going to do is we’re going to go ahead and fire this thing up, and I have to tell it my secret password because it’s switching into what’s called promiscuous mode, no jokes please, and go over to – is that a glass of wine?
Erik Darling: Yes.
Anthony Nocentino: It is almost five o clock in New York, right? So it’s good.
Erik Darling: Yes, we hit close enough. I’m under promiscuous mode too so.
Anthony Nocentino: I’m not judging, I am jealous. So the first thing we’re going to do is we’re going to look at our ARP table, right? So we have an ARP table and we get that information and we type in ARP-a, remember ARP’s function is to map layer three addresses into the layer two address or the MAC Address, so what we’re going to get is IP addresses mapped to physical addresses. Super zany, but these are the network interface MAC Addresses for the devices on the network. You can see there’s two different types, dynamic and static, and static are the ones that are – dynamic are the ones that are static – get it out Anthony. Dynamic typed ARP entries are the ones that are built because of ARP requests, so we can see things like 1921681.1.1, which is the router on my network, right? 1921681.1.110 is a domain controller on my network, and 1921681.1.120 is the other – SQL B. It’s the other host on my local segment, and the way that I get information over to SQL C, which is on 1921681.1.2 is it’s got to go through the router, so those things will be sent to 1.1 so I don’t have an ARP entry for that. Flowing downstream and stuff.
So the next thing that we’re going to do is I’m going to copy and paste the command because I’m really bad at typing, I’m going to drop this in here, but I am going to tell you what it does if it works. So what this command does, it’s going to delete the ARP entry for 1921681.1.1, so you see ARP-d, double ampersands if that command succeeds, execute the next command, which is going to ping the same address immediately twice, and we’re going to do that, but before we do that we’re going to restart Wireshark because I don’t have to sift through all this data, do this business, and come back here and stop this.
So let’s go down and find the ARP request. And so the Arp request, right, this is internals guys, so it’s going to get nerdy. The ARP request comes from my Ethernet controller, right? Remember, I told you it broadcasts out to everything so we see the broadcast there. Now, you see the change in the value, we get a little bit like VMware_16. That’s called the I think OUI. Each and every manufacturer has a specified six bytes that are theirs, right? So in this case, if VMware_16, those are the bytes that belong to VMware, and so you see if we go down here to the actual framing data, it kind of resolves that name because it has that information inside of Wireshark, but really it’s OO, OC, that very first part, and then the remainder of that is OA94. The broadcast, right, so it broadcasted out, so that’s all F’s, so everyone will listen to the request and one person will reply, and that’s going to be a unicast transmission, so you can see there, it goes from one MAC to specifically to another MAC, and it says, my MAC Address is for 192.168.1.1 is going to be 3594, right? It’s going to be the address that I’m going to receive, and now I’m actually going to transmit data to you, right, and I’m going to do that, in this case, with a ping packet.
So you see the request, it goes out to that particular IP, and then they reply right away. They wouldn’t be able to do that right, if it wasn’t able to take that data at layer three, write it for layer two, and transmit it locally on the segment. Okay, let’s look at trace route, now we’re going to move up the stack.
Okay, so traceroute. So let’s get – so let’s start capturing packets again. I stopped that only because they will just generate buckets of data. You can put capture filters on in Wireshark, but I just did not today. So what traceroute does, you guys now know how traceroute works. I already told you how it works actually. What happens is like, the network determines the path, right? So how is it that traceroute can figure out the intermediate routers that that information is not being communicated back to the sender, right? And so what I did here the trace route minus D, what minus D does, it doesn’t try to resolve the IP address as the intermediate routers to names, right? So it just gives me the IPs, this will take yonds if I took that off, but sometimes that yields valuable information because people encode information into their router’s host names. So what’s happening here, if we go and we – I’m going to stop trace route but I am going to go back to this output.
What’s happening here is trace route is trying to ping for that .2.2.2, right? And what it does is in that very first ping, it is able to – I’m going to kill this because obviously, we’re not going to get a lot of information out of this traceroute today. What happens in trace route is I try to ping for .2.2.2 and I try to do it multiple times, and I measure how long those replies came back. That’s what this is. Min, max and average. So super peppy when I’m on my local segment, right? Less than a millisecond.
That very first thing that I tried to traceroute, or tried to – the very first time I pinged for .2.2.2, I had the TTL set to one, and that first router decremented TTL to zero and replied back to me, right? Because normally in a transmission, the network determines the path. I’m exchanging the A to the U, I’m not going to know what router I passed through under normal – like at all, right? But in this case, I’m using the TTL to spoof the router to say, “Hey, I need you to send me some data back about you” and the only way that can do that is if you discard my packet and send a message back to me that says TTL exceeded and I look at that datagram that you sent back to me, I can pull out your source address can build the path across the network, right? I go – the way that I get to the next router, I just set TTL to two, which means I pass through that first router, I get to the second router, hit the second router, discard the packet, messages back to me, I record the value and I keep going on.
What’s happening here is something in the network is blocking what’s called ICMP, so it’s blocking that reply message back, and unfortunately you’ll see this sometimes. You might actually see stars for a little bit then it might come back because it passed through that router has increased the TTL. So let’s go look at this on the inside.
So if we go down, there we go. So we see ping, right? It’s trying to ping 184.108.40.206 from my machine, 192186.1.1, and then if we look at inside, so we were looking at Ethernet a second ago. If we look inside of IP, and we see our TTL is one, right? So I hit that very first router, that router replies, so you can see the reply that came back, that TTL exceeded. Came from 1.1, I received that information and now our traceroute the program can build the path by incrementing the TTL through the network. So now there’s an anomaly. You can go to your network folks and be like, “Dude, I can’t even reach the host. What happened?” Right? But you can also use traceroutes to build that path. If it was this one way and it was this another day, something changed, then you can use that information to have a meaningful conversation.
Erik Darling: One of my favorite patch night things to do was when I restarted the machine I would do ping minus T and just ping it until I got a response, then I would – like my heart would finally resume beating.
Anthony Nocentino: Nowadays, like if you were doing like a physical server we’re so spoilt, right? With VMs it’s like, two packets dropped. And like I got a build like a real server, I’m like…
Erik Darling: You know, it’s really funny, there’s a guy who tells a story about how when he first started working with databases, he would go in in the morning, 45 minutes before work started and turn on the machine because that’s how long it would take to boot up. And now 30 some odd years later, whenever he needs to reboot a machine, he goes in 45 minutes earlier to do it because that’s how long it would take to startup and count all the RAM and go through everything, so it’s like you came full circle with how long it takes to get up what’s considered a big machine up.
Anthony Nocentino: Totally. I took SQL skills emersion event three, HDDR and you know, that’s one of the things they talk about. You need to put that into account, how long does it take for that massive SQL Server to go through all the memory it needs to go through before it all lines, right?
Erik Darling: And then you know, even just to do like a startup on the database and everything…
Anthony Nocentino: Right, there’s that part. So route print – I’m going to pipe that in more so it doesn’t shoot off the screen really quickly. Every machine on your network is a router, except the switch. Like, anything layer three and above. This is the routed information for us, my system here, right? I have a couple of nicks, most of which are bogus, but I have one real network adapter right here, this InterGbit network interface controller VMware, and so that is you know, OA94 for its MAC Address. So as we go through this information here, this is stuff that should totally make sense to you guys now, right?
From a routing table standpoint, at the very first that we have at the top here is what’s called the default route. So zeros represent unknown, right? So I have an unknown network destination. If I don’t know where this thing needs to go, I’m going to put it to the default gateway, or what do they call – yes, default gateway. 192168.1 for this particular interface that I’m on, which is 1.10, so the whole point of routing, where it needs to go, what’s the next hop? Which interface needs to go down?
We see 1921681.1 right there, that’s my local segment, right, so if I’m transmitting this stuff on my local segment, it’s on link, I’m just going to send it down to layer two and transmit it locally. 1.1.10, I think that’s me, and the anything above here, this is the IP broadcast, so 1.2.55 so I can broadcast an IP. These 224 things are multicast. I never really worked with multicast, so I’m kind of a network ninja, but maybe not. Don’t judge.
The network route that comes from our default gateway that we configure in the user interface, or PowerShell, however, you do these things, actually adds this kind of persistent route so you can see how that’s defined there.
So let’s move up the stack into layer four now. The command that we’re going to use most often at this layer is netstat, and what that’s going to do is it’s going to give us information, right, those four pieces of information, right? The source port, destination port, source address, destination address, to uniquely identify a connection and its state. So if we’re looking at this information here, this stuff – anybody take a wild guess what is running on 5022? Somebody say it. It’s an AG, database mirroring endpoint, right? So I have a database mirroring endpoint on this…
Erik Darling: [crosstalk]
Anthony Nocentino: Thank you. So we have TCP 5022 listening on this one, it’s connected to SQL B, right, with an established connection, right? It’s connected from SQL A to SQL B and it’s exchanging data. But I told you also that it’s bi-directional, right? So we’ll have if we go further down, we should have a connection from B to A. See if I can find the other side of that connection. I can’t find it. Hopefully – I did not check my AGs when I turned on my laptop. But we have established connections between our systems. You can see I have that SSH session open to my router here, and we can see it’s uniquely identified right there, 1921681.10 to 1.1, protocol request switch.
If we do minus AN, like A does, it’s going to list all connections regardless of state, N is going to stop trying to resolve hostnames from IPs because this could just take a long time. So here we see a lot more information, we have all the ports that we’re listening on, so we see things like 3389 for RDP, we see things like 1433 for SQL Server, 5022 and 5023, which are two AG database mirroring endpoints. Those are up and listening, waiting for connections, right? So those are going to be in that TCP state. If we go down, we can see the rest of the information, we have all of our established connections and whatnot. Down here, we have some time waits which means there’s another TCP state where the sender is initiated a close but it’s going to hold the port open for a little bit in the event that they need to open that connection again. So it’s kind of an optimization to make sure that someone needs their reconnect that they still can.
One of the things that we need to do often is to be able to affiliate a port to a process, and I totally just flew past that. Netstat minus BANO, BANO, for some reason that just sticks in my head and it’s not going away and I’m able to figure out what ports are bound to which processes, right? Super important stuff if we need to figure out if an application is listening on a particular port that I define, right? If the application’s not listening, certainly whatever is trying to connect to it won’t be able to. So here you can see, 5022 belongs to SQL Server and its process ID and it’s listening, so everything’s happy, happy up and running. We also have TCP 1443 for SQL Server right there, that’s for just regular old database data and things like that.
So now we’re going to do some iPerf stuff. So I have like 99 windows open because I have all of those machines, but I’m using PowerShell remoting, rather than having to jump between VMs and RDP sessions and things like that. If you guys don’t know what PowerShell remoting is, it’s awesome. So this is on SQL A. So I want you guys to orient yourselves, because we have a lot of windows open here. This is going to be a PowerShell remote session into SQL B, but I’m just on the same window, so I’m not jumping around. Essentially I have a remote terminal into the system, if you’re not familiar with PowerShell remoting. SQL C there and another SQL C right here, and we’re going to walk through all of that stuff together.
So the first thing that we’re going to do is we’re going to launch iPerf minus S on SQL A. We’re going to jump over to SQL B and we’re going to do iPerf minus C – don’t worry about the syntax, I’ll figure out a way to get this code to you, just focus on the concepts right now. So I’m on B, right, and I tell iPerf, which is iPerf three, that it’s going to connect to this hostname. And what it’s going to do is send a bunch of data across TCP to SQL A, right. So it’s hopefully doing that right now, and we’ll get some performance numbers from our system. So you can see right there, it’s humming along at about 1.67Gbits per second. Lowercase B, huge difference. That’s pretty good. I’m moving a lot of data between those two systems that are locally connected, right, in that same LAN in my virtual data center, underneath my left speaker.
And now, let’s go ahead and do that same exact test, but we’re going to do it from SQL C this time. We’re going to go over to SQL C, do iPerf – 3 minus C SQL A. The only thing that’s happening right now is that Linux router, the thing that’s connecting the two networks, has to process this data. It’s going to take the data from SQL C, right, 1921682.110 and send it to SQL A 1921681.110. Let’s go see what our performance numbers are there for that connection. And we lost, I’d say, a good 40% of the throughput. We’re flirting with about a Gbit a second right there, 980Mbits across the entire transfer. So actually that’s about 40% throughput, just by injecting the routing process. And it’s not good or bad, right. In this case, it is what it is because that’s how fast the machine I’ve provisioned can move that data.
If you’re in your data center, and this is, you know you perform this task and you see throughput that’s just not right for what you have provisioned, then you need to pay attention. It’s not that this is a good number, this is a bad number, but it impacts your throughput because that device has to do that work.
So now, let’s go back onto the router. Control L clears your console, so all those SQL on Linux people, get used to control L; I love it. So I have a couple of Shell scripts on here. We’re going to do add latency first. Let’s look at the Shell script so you guys can see what I’m actually doing here. So inside of here I have a program called TC, or Traffic Control. What it does is it’s going to inject latency, 40 milliseconds of latency on the interface, this particular interface ENO50332208, which is one of the physical adapters on my network. And so it will queue the data, hold it and then dispatch it after 40 milliseconds. So I’m injecting delay, that’s what’s going on here. I just don’t want to have to type this for you guys, so I wrapped it up into a shell script.
So let’s perform that same exact test from SQL C back to SQL A. We were at 983Mbits a second with that data transfer. Alright, what’s the over or under? It’s a gamble now, put some money down on what they think this value’s going to be on this transmission. Put it up in the chat window.
Erik Darling: Everybody’s way more interested in when you’re going to drink the tequila.
Anthony Nocentino: I told you, during the AG demos. 36Mbits per second, right. So I injected 40 milliseconds of delay. So think about it. If your data center was 40 milliseconds away and it had a 1Gbit connection – in this case it’s 1Gbit, that’s the maximum throughput I’d ever be able to achieve between my two data centers, without doing something special. [crosstalk] Well no, not even that it’s asynchronous. Yes, synchronous would cause – that would be log throughput delay, and I’d have to pay the penalty in HADR [crosstalk]. But this would – if I’d loaded the table or re-indexed the table, anything like that that’s going to generate a bunch of transaction log, I’d tap out at 38Mbits per second. So I’d have a queue on the sender and I’d have to deal with that. and we’re going to do a demo of that in a few minutes.
So this is real stuff, guys. This isn’t like clouds and arrows, this actually impacts what we do. So I’m going to go ahead and remove my delay with that Shell script, and now I’m going to add instability. So I have a Shell script in here, and this Shell script simply – same program, TC, what it will do is drop 25% of our packets, right. Ping minus T, drop a packet, you know, you lose that data. That’s not necessarily going to impact ICMP, or ping, because that’s not TCP data, it’s not going to affect the throughput of that thing that I’m sending. But on a transmission that we generate with something like iPerf, which functions over TC, we’re going to see a significant impact here. So right off the bat – you’ll maybe not notice it because you didn’t do this demo 100 times, like I did, but there’s a significant delay in how fast this even builds the connection. It’s going to take a long time for me to even get the data back, whereas before I was able to build the connection quickly and transmit the data. Because what’s happening is every fourth packet is getting tossed, so it has to retransmit and then wait and get that information. So hopefully this will come soon, or we can cheat and go over – see we’re not even getting transmissions showing up from a throughput standpoint on the server side. The server will either write its I/O during the transfer, and I believe the client will write its I/O – oh, we’ve got a connection timeout. I bet you I’ve left my firewall up, give me a second.
Erik Darling: It’s like the first thing I always run into when I set up like VM farms to do anything. Like, I’ve got to turn this entire firewall off or it’s going to mess me up for a week…
Anthony Nocentino: here, I’ll restack the deck from a demo standpoint – I’ll show you guys how to troubleshoot this. So let’s go and start that connection, then we’ll open up that other window. This is the other window I had – they probably gave up – oh no, it actually went through this time. Okay, my firewall wasn’t up. But look at the throughput number, so obviously it failed to even build a connection before. So this time, the second time, it went through.
We’re talking about a connection, which our original test that we did when we had a clean line, moved 1Gbit a second. We added 40 milliseconds of delay, 36Mbits a second. We took the delay out, we just inject it in stability by dropping packets, we’re in the single digits on Mbits. It even floated in KB. So if you see drop packets, this severely impacts your throughput. So if you have an unstable LAN link to a data center, this is certainly going to impact your recovery, right. If you have a database that uses something like database mirroring, log shipping and all that fancy stuff, because it’s all transmitted over the network. What can happen, especially in this case, let’s say there’s network maintenance going on or there’s a wonky LAN link that just can’t figure it out and you expect to have 30Mbits a second of throughput and you’re getting 798K, your logs going to get bigger and your queues are going to get bigger, you might fill up a disk, your database is going to shut down and then you are no longer employed; things like that.
So let’s go ahead and remove delay. I have my handy cheat sheet over here that you all can’t see. And let’s do some AG testing now. So I have a script here, because I don’t trust anything that comes out of the HADR replica states when it comes to throughput. So I wrote a script that reads the data from the perfmon counters, this is probably the standard stuff that you guys are used to seeing. I’m reading the data out of the perfmon counters so that I can pull information that looks like this.
I have to do one thing real fast, avert your eyes. See what happened there? I want A to be the primary for this demo and so I’m going to flip that over real fast. So the AG configuration, I’ll walk you guys through this while trying to dig out of this hole here. It’s synchronous AGs in DC1, left speaker is asynchronous to DC2, over on the right speaker we have that LAN connected too. So we’re going to go and see how this really impacts our system now.
So we failed over… So let me rerun this. so what’s happening here is I’m using SQL CMD to get the data from all the replicas in my AG. So I’m just taking the same query, executing it in all the replicas so I can pull back that information. The primary is going to have a different view of the world, because it knows about all of the other replicas that are participating in the AG party, and then the secondaries are going to track their own information.
So let’s go ahead and add a workload. We’re going to add a subtle workload and we’re going to go ahead and look at some data. And the data that we have is things like the amount of log that’s being generated, so about 3.5Mbits. The reason why I don’t trust the DMVs at all is because I’m certainly not moving 2,100,000KB per second when I’m generating 3,400KB per second of log throughput. Perfmon on my local replicas isn’t going to report anything, so if I go down to the remote replicas, you can see I’m transmitting about 3411KB per second on perfmon. That certainly correlates a lot more tightly with the log throughput that’s generated on the primary.
Same thing on that SQL C, that last row of data there, that log send rate key B from perfmon. I have no latency either, so I take the value of what my rate is and I divide it by queue size, or queue size by rate, and I have basically a concept of latency; how far behind is my AG replica? All the other cool data that we have in our AGs, and I also track redo latency in this script. So if you guys want the script, it’s actually on my website.
So now, let’s do this. Let’s go ahead and add instability. So we’re going to drop every fourth packet, and I want to light it up with a lot more users. So we’re going to be inserting a bunch of data. What’s actually going on under the hood is I’m inserting a bunch of data into a GUID clustered index, so it’s just shredding my transaction log with page splits and generating a bunch of superfluous junk to replicate over my LAN link.
Watch this, SQL CMD, right, it’s going to take longer to even hit that other server. And just so we can see, if I do SQL minus C, we should see it drop some packets. There we go, it’s going to drop that packet. So we’re just suffering here from a network standpoint, right. So let’s go ahead and look at some of the data. So that process has been running for 30 or 40 seconds; as long as I’ve been babbling trying to kill some time to get some meaningful data to show you guys. You can see, it struggles to connect to C, right, because SQL CMD is going to connect over the network, so it’s going to take a while to even get that data.
Now, with the network suffering my log send queue is now 0.5GB, in the amount of time that we’ve just been blabbing about SQL Server stuff. And so, this is – this can be a big deal if you’re generating this kind of load and you’re expecting that reliability between your sites. So it can put you in a pretty nasty situation. So let’s go ahead and – so what’s happening under the hood is that data’s got to get there, right. So it’s got to retransmit it, our congestion window shrinks, throughput suffers and we just get back to sending, essentially, one segment at a time. That’s what that would regress to if we looked at it. You could actually use Wireshark if you really want to get into it. If you go down here, TCP stream graphs, you can go through here and actually calculate all that if we strip off all this traffic, if we can get some cool graphs. We can look at window scaling, so when TCP decided to make adjustments to the window size, that information would be captured there so we can see that.
So if you are suffering a network anomaly, you do have the tooling to be able to discover that very quickly using Wireshark. It’s a fantastic tool. I highly suggest everyone using it in their networks. So let’s get back to the hand-wavy stuff like this.
Daniel Hutmacher: A couple of users previously mentioned in the Slack channel that you should probably ask your client or employer for permission before using Wireshark. It could land you in trouble if they don’t like those types of applications.
Anthony Nocentino: Okay, I guess like – Brent, do you mind if I use Wireshark at your house?
Erik Darling: Freewheeling consultants need not apply for permission.
Anthony Nocentino: Thanks, Brent…
Brent Ozar: Just so everyone’s aware, probably the reason I said yes is because I’ve started my tequila already, whereas he’s still drinking his water, so go ahead and carry on.
Anthony Nocentino: You guys see the regular deck now, right, we’re good? Alright, the big reveal, yay… So what did I tell you about fundamental concepts, hours ago when we started this thing? I want you guys to understand how this stuff moves around. Because if you understand how it moves around, you’ll be able to put the pieces together about where the anomaly is, right. I missed a demo – I have to do it, sorry guys. Because of the firewall thing I forgot to show you that.
So let’s go back on SQL A, I’m going to kill this. I have some PowerShell; I’m looking at you, Erik. All the PowerShell does is turns on the firewall. I’m going to start up iPerf again and I’m going to go back to SQL C. I’m going to try to make that connection. I’m going to grab that second terminal I had to SQL C. Really SQL C? Then I’m going to go do net stats. And the status of that connection between me to that 1.130 and 1.110 is since sent. So remember, if you’re here, you go to the network people and say something below layer four is not functioning properly; help me figure this out please. If you come to them and say hey my TCP status is sitting in sent, they’re probably going to listen to you a little more seriously than if you’re like, “Dude, I can’t reach it.”
And so understanding how that handshake occurs puts you in this condition that you can understand that. So now let’s get back to this. So if you understand those fundamental concepts, we can build our own troubleshooting methodology, and we’re not looking at a single command to help us solve our problem, but we can figure out where, with the tooling and things that I showed you guys today, where the anomaly is. We know that that was a firewall issue, but at least we could say, “Network guys, I’m since sent, just help me figure out where we’re falling down underneath this. It could be layer three; I might not be able to reach it from a routing standpoint. It could be layer two; I might not be able to ARP to that system.” So you just kind of go down the stack, right. Since sent, we knew that was a firewall issue. Then the next thing I would do is ping it. Maybe if I got a TTL exceeded then I know there’s a routing anomaly. If I go down – if I’m not even getting a reply from ping then I check ARP. This is literally how I troubleshoot connectivity issues. I go down to ARP and I say ARP minus A. so I have a MAC address for the system that I’m trying to communicate to? Whether it’s the local router or the actual machine that I’m trying to exchange frames with on my local statement – that’s how we figure these things out, just walking down the stack when I can’t connect to it.
Maybe even up the stack a little bit. So you saw since sent, right? If I’m trying to connect to a port that doesn’t exist on a remote host, same condition, so then we use net stat to figure out, is the application even listening at that port. So taking those concepts and stitching it all together, right, now that we know how this all works. So hopefully, with the demos today, you guys are going to help yourselves isolate these bottlenecks or anomalies. You can use iPerf – so the person that talked about Wireshalk; talk to your network people before you run iPerf. Because you don’t want to b e blasting data across the network and making them sad, plus you also want to get good data to make sure that your test is a controlled test, so you’re not competing with something else and consuming the bandwidth that you’re trying to test with.
So if we are suffering from network performance issues or networking issues, your performance and your reliability will suffer. We looked at performance with iPerf, we looked at reliability with our AGs; this will impact you guys. And so luckily we have some stuff to help us out inside of SQL Server. So application performance, we know that can suffer. Anytime I have to exchange data with my SQL Server, it’s going to happen over the network, so it’s going to impact how my application performs, right. And that’s what async network I/O yields – what async network I/O yields, as far as I understand, hopefully, I’m correct here, is when SQL Server wants to write data into the network socket and it’s got to take time for it to reply back, that’s what async network I/O is. It’s that time it took from the write to the acknowledgment of that write coming across the network, the receiving of that traffic if you’re using network-based disks.
Erik Darling: Quick question, Mark V wants to know, “Can you go over how you did the baseline check one more time. There was a third parameter you want to track, so there was throughput latency and then…
Anthony Nocentino: Reliability. So if you’re dropping packets. Yeah, so that’s what the add instability stuff, where I was dropping 25 – you can see those things are all tightly correlated. It’s literally like a triangle.
So if you’re using network-based disks, like iSCSI or NFS, basically if you’re using vmWare, you’re using one of these two things, right, if you’re not using Fiber Channel – and this actually applies to Fiber Channel as well. But if you’re experiencing latency in a network, that can percolate up into SQL Server as disk latency issues. And you can have things like slow systems, locking, blocking, database errors, 15-second I/Os, you know, and all those things that make us sad DBAs. So again, if you’re seeing lots of shaky disk I/O and it’s not because of you, it might be because of the network. So go talk to the network people.
Availability problems – we went through that in pretty good detail in the demonstrations. So AGs allow us to track latency, so we have things like wait types, like HADR sync commit. So that measures the replication latency between two synchronous replicas in the same data center, but two synchronous replicas that are configured in synchronous availability mode, right. It’s the measure of the time it takes to write that stuff on the primary, right up to the secondary and reply back. So you have to pay the penalty of the disk I/O locally and also the network I/O, right – and potentially the disk I/O on that remote system as well. But that network stuff in between is all accounted for in that wait type. HADR synchronizing throttle is a wait where the secondary synchronous replica, maybe it goes offline for a period of time and comes back online and it’s catching up, right. The synchronizing throttle’s going to measure how long it takes to catch up. If you’re seeing that, right, maybe your AGs are falling offline because of the connectivity issue and they’re having to catch back up. So you want to keep an eye on that.
HADR transport flow control – that’s where a secondary is messaging back to the primary that says, I’m having a resource constraint, slow down. And those concepts all apply to mirroring and log shipping as well. Not necessarily the wait types, but the impact of throughput and latency. Backups can suffer as well. Hopefully, you’re writing data to remote systems, not on the same system. And that’s probably going to happen over the network. If you have backups that you’re trying to write and you’re experiencing performance anomalies, certainly they can impact you there. And most importantly, we take backups because we need to do restores. If I need to be able to do a restore in a controlled amount of time across the network, I need to be able to know that stuff.
So what do you need to do? Capacity planning, that’s the biggest thing for me. Know what you’re paying for. If you have a network that is expected to perform at this level because you bought that stuff and you knew the specifications for your equipment, make sure that you are, right. Baseline your environment, so that you know what your normal workload looks like. And also benchmark it so that you know that you’re getting what you pay for, so you can be sure that your applications actually function the way we want them to function. And also that our- latency sensitive or high-performance applications that are physically close to each other are on properly sized inter connectors. Make sure that we’re thinking about where we’re placing applications and database servers so they’re all where they need to be from a resource consumption standpoint.
So key takeaways guys, know these fundamentals. I don’t say review it, but know these fundamentals. And knowing this stuff is going to make you a more general problem solver, so you can do things like isolate performance issues and reliability issues using the things and the techniques that we showed today. All the demos, hopefully, I’ll be able to make available to you guys somehow, I don’t know how we exchange that stuff but I’ll get it to you probably on Twitter and stuff. And SQL is going to suffer if your network performance is suffering and reliability is questionable. 99 times out of 100, your applications aren’t going to be on your SQL Server – or if I built it for you then it’s not going to be on your SQL Server. I’d put it somewhere else because I want your SQL Server to be the SQL Server, right. So we have to go to the network to be able to exchange data.
So here’s a link to the title of the Pluralsite course that I did on this. we’re going to go into this in much more detail, talking about TCP internals, IP internals, routing, making routing decisions. It’s all in Wireshark. I strongly encourage you to watch that if you want to get into it. If you don’t have a plural site subscription, shoot me a note. I can give you a 30 day card. Not because I’m trying to sell it to you, but I want you to be able to get access to this information. You can get it for free, I can give you a trial card for 30 days.
I wrote a blog post that kind of dives into more detail about that AG demo that I did. So if you guys are interested in learning about that, that’s there, or here at my blog. And there’s my contact info. Follow me on Twitter, please, so we can be like nerd friends on the interwebs. And are there any questions?
Erik Darling: Yes, why haven’t you drank yet? If you dump it in a plant, I’m going to know.
Brent Ozar: How did you get started using Wireshark and why? Or like dealing with any kind of network traffic, what got you into it?
Anthony Nocentino: So I – when I first got out of college I got hired to be a system admin at a hospital in the Washington suburbs here, right. And they were like, “Hey, we want to make our internal EMR that we developed publically available.” So we bought a colo, and they were like, “Anthony, put a network in.” So I was like, okay. And so I did the whole Cisco learning academy CCNA, CCNP, CCIE, all that stuff. I took the CCIE, I failed. BGP – you know what – I don’t have that certification, but that was the path that I was going down got a long time. It was real hardcore networking bits and bytes, under the hood stuff. So since then I just basically worked my way up the stack. So I’m at, I guess, layer five now – I guess I’ll work until I’m layer eight.
Erik Darling: Funny – so as a DBA, if you’re picking out hardware, usually the line item you don’t look at is a network card. Usually, you’re like CPU, RAM, what kind of disks, whatever. What are the most important factors to consider when picking out a network card for a SQL Server?
Anthony Nocentino: So generally speaking, the stuff that’s going to make it into service class hardware is okay; it’s okay enough. If you find that you need more advanced capabilities, for instance, one of our common advanced capabilities is iSCSI Offload, right. The network adapter itself will maintain that information for you. What that means is your operating system doesn’t have to maintain that connection state, and presents blocks up to your devices in your OS, and that’s a big one.
Erik Darling: So say, you know, you’re going for the AG type setup and you know that your network path from one to the other is decent, you know, there’s a pretty low latency. Any old network card going in there should support your data throughput; or what would you look at?
Anthony Nocentino: I think the major decision point there would be, do I dedicate a separate interface before I really start sifting through which type of controller. Because oftentimes it’s not up to us, right. We have to go with what the server team or the network team’s going to tell us to do. But have a conversation with them about what those capabilities are. So it’s not necessarily that I would say you need to use this network card, but what I would say is what are the networking card’s standards and review those specifications to make sure that you can operate with what is being provided.
Erik Darling: Sure, just because there are such specific recommendations about core count and speed and RAM amount and speed – I just thought maybe there was some similar approach to network cards as well.
Anthony Nocentino: You can research it.
Brent Ozar: Mark is really keenly troubleshooting problems that he’s having in networking and storage in the cloud, and he’s kind of looking for – he goes, “Go over your baseline one more time.” But what I think he’s really asking is, if you’re watching an existing server, how do you gather metrics and know that it’s okay or not? Without being invasive, starting Wireshark, whatever?
Anthony Nocentino: So it’s – for me, I use tools that would capture performance over time at a higher level, right. And when I smell something funny at a particular level, then you start using more invasive techniques. So if you were using like an APM, like New Relic or those kinds of things, you’re going to see things over time, you’re going to see a deviation, up or down. Similarly, if you’re using one of the SQL specific tools, again, if you see a deviation or a spike or a shift in wait stats distribution, then you would go with it the next level.
You’re not going to jump to Wireshark because the [inaudible] ratio is extremely high. Is that the right analogy? High or low? Anyway, you get the point. So you’re not going to jump to Wireshark right away, but for today’s demonstration, to kind of like show you guys, sift around that information; that’s why I used Wireshark.
Erik Darling: So what if there’s like no deviation. What if things were never good?
Anthony Nocentino: Things were never good?
Erik Darling: If you were unhappy with it from day one, what would you look at as ways to troubleshoot or improve it?
Anthony Nocentino: So I would honestly, at that point, I would review the utilization, I would review the utilization, I would review the specifications of the licenses involved. Like literally, there’s a funny story – I was involved in an issue, literally at the sea level of an organization for a major hospital system, and they just couldn’t figure out why their database server was performing poorly. So I get involved and I literally walk through the data center, like okay. And I looked at the cabling layout of their UCS deployment, and they were aggregating all this stuff into a toper rack, I think they were called Fabric Interconnects. I barely know what UCS is and does, but I know enough about how it’s stitched together.
So what I did is I went, I downloaded the manual, all 500 pages of it, and read the damn thing, so I could figure out what the specifications were, what are the expectations of the interconnects, right; what are the inputs and outputs. And went through the CEO and I’m like, “You’re over provisioned, on the switch, you either get more switches or you get faster ports.” It’s as simple as that. If you’re in a condition where things are consistently bad, then there is some fundamental decision or some decision that’s not correct or your requirements have shifted, or you’ve got bad data. There’s some fundamental assumption that’s not correct about your deployment.
Erik Darling: It’s just so tough in the cloud because networking is at such a premium. And often going up to some gigantic instance size, it’s hard to get enough throughput sometimes.
Brent Ozar: And it’s shared, and you can’t see what’s going on in the other VMs.
Anthony Nocentino: Yeah, and that was kind of the concept behind the MPLS commentary in the beginning – the physical infrastructure is abstracted out for you, and you’re sharing the physical infrastructure under you. You don’t have that visibility anymore [crosstalk]. When you say cloud, though, you assume that risk. You understand that assumption that you’re going to have variances. Like the commentary that we had about using VMs, right…
Brent Ozar: Which I also think is where you’re going more invasive really pays off, you know, starting to build in some kind of tests and, you know, before you run SQL Server on it, go see how bad the VM really is.
Anthony Nocentino: Yeah, I mean baseline. Know what you’re getting into, and then when there’s a deviation, figure it out.
Erik Darling: Someone in the WebEx chat has a question. “RDM was very shiny to me in offloading CPU usage to NIX, but I’ve never used it. Is it still important?
Anthony Nocentino: So, I haven’t – I don’t have a lot of experience with RDMA. Conceptually, this is kind of like one of those edge cases, fancy stuff that I threw out the warning or the caveat in the beginning. Conceptually it makes a lot of sense. Fundamentally, what I think is going on is it’s offloading or short-circuiting the transmission, the exchange of data between memory and the actual controller, bypassing certain parts of the data transmission. It has to occur inside the chassis itself, going from the bus – I think that’s what’s going on. It’s certainly an interesting technology, I just haven’t dove into it to get a rugged understanding of it, yet. There’s a similar concept in cluster computing with InfiniBand, if you guys are familiar with that as an interconnect. Basically, what’s occurring is a direct memory transfer from memory to the actual controller itself, bypassing the CPU. And we all know that in SQL Server, the way that we get good band width is we have fast CPUs. Similarly, if we can go around that thing, it makes sense, right.
Brent Ozar: Well thanks, sir. Excellent presentation, lots of Q&A back and forth in Slack as well. Edwin says he sees your MAC temperature at 206 degrees Fahrenheit, so I think it’s 207 now, probably time to quit.
Anthony Nocentino: Yeah it’s five or six VMs. Thank you, guys, for coming, I appreciate it.
Brent Ozar: Thank you, and thanks for literally coming to my house.
Latest posts by Anthony Nocentino (see all)
- Inside Kubernetes – An Architectural Deep Dive - March 1, 2019
- I Needed to Install 80 SQL Servers…Fast. Here’s How I Did It! - November 17, 2017
- Monitoring Linux Performance for the SQL Server Admin - May 11, 2017