Issues with Sentinel · Issue #1427 · StackExchange/StackExchange.Redis · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Sentinel #1427

Closed
ctlajoie opened this issue Apr 10, 2020 · 50 comments
Closed

Issues with Sentinel #1427

ctlajoie opened this issue Apr 10, 2020 · 50 comments

Comments

@ctlajoie
Copy link

I implemented support for Redis Sentinel in an application yesterday and I had a rough time getting it to work. I had to actually dig into the library code to figure out what I was doing wrong. If you make a few minor tweaks to the API you could entirely avoid the problems I ran into.
Allow me to explain. This is not the exact code I had; I modified it to illustrate the point.

IConnectionMultiplexer sentinel = ConnectionMultiplexer.Connect(new ConfigurationOptions {
    ServiceName = _redisSentinelServiceName,
    EndPoints = { { _redisSentinelHost, 26379 } },
    CommandMap = CommandMap.Sentinel
});

IConnectionMultiplexer master = sentinel.GetSentinelMasterConnection(new ConfigurationOptions {
    ServiceName = _redisSentinelServiceName
});

Looks like it should work... but there are 2 problems with this code.

  1. IConnectionMultiplexer does not have a GetSentinelMasterConnection function.
  2. The sentinel ConfigurationOptions MUST have TieBreaker = "" to work.

I have a few suggestions:

  1. Add GetSentinelMasterConnection to IConnectionMultiplexer
  2. Have a separate ConnectionMultiplexer.ConnectSentinel(...) function, which sets CommandMap and TieBreaker to the required values, and make endpoints default to port 26379.
  3. Just use the ServiceName provided in the Sentinel configuration, so I don't have to provide it twice

I realize there may be any number of reasons why these suggestions might not make sense (this isn't my code), but please consider if there's some improvements to be made with regard to the library's Sentinel functionality.

@marcelbeeker
Copy link

marcelbeeker commented Apr 11, 2020

I have a rough time too getting this to work for the Redis Sentinel running on my local machine. I'm glad to hear I'm not the only one.

I can't figure out how I can insert records in my Sentinel Cluster. T'm running 2 master Pods and 3 Sentinel and 3 Redis pods on my Cluster.

I'm using this piece of code:

var options = new ConfigurationOptions()
{
CommandMap = CommandMap.Sentinel,
EndPoints = {
{ mylocalip, mysentinelservicenodeport },
},
TieBreaker = "",
ServiceName = mysentinelservicename,
};

var masterOptions = new ConfigurationOptions()
{
ServiceName = mysentinelservicename
};

        var conn = Conn.GetSentinelMasterConnection(options);

The GetSentinelMasterConnection method call fails with an exception: "Unable to determine master"

@mgravell
Copy link
Collaborator

Ultimately, this is one of the key concerns I had about merging the sentinel code. Fundamentally, we (the core maintainers) aren't routinely sentinel users, so it is hard to say "yes, this is the right API and does what it needs to do". So we held off on the sentinel code for ages. Eventually we merged it, and: it looks like there's already some gaps and problem areas. Now I'm in that awkward "what option is best now" position again :/

@ctlajoie
Copy link
Author

ctlajoie commented Apr 11, 2020

@mgravell That makes sense. Now I understand why this functionality was only recently merged in. I still believe the API can be made much better with a few "minor" changes.

A refinement of my original suggestions would be to have an interface like the following.

public interface ISentinel {
    IConnectionMultiplexer GetMaster(ConfigurationOptions options = null);
}

ConnectionMultiplexer implements this interface, and has a static method ISentinel ConnectSentinel(options) which forces CommandMap and TieBreaker to the correct values, and defaults the endpoint ports to 26379 if they are not specified. That way from a user perspective we could write code like:

var sentinel = ConnectionMultiplexer.ConnectSentinel(new ConfigurationOptions {
    ServiceName = _redisSentinelServiceName,
    EndPoints = { { _redisSentinelHost, 26379 } }
});
var master = sentinel.GetMaster();

This would largely avoid the confusion caused by having sentinel be an instance of ConnectionMultiplexer with all of the other methods that come with it (and are unusable). You can still cast it to ConnectionMultiplexer but you would have to go out of your way to do so.

@ejsmith
Copy link
Contributor

ejsmith commented Apr 12, 2020

@mgravell I get what you are saying and it's not fair for the community to expect it, but you guys are pretty much THE Redis library for .NET and using sentinel is a very common thing to do especially for HA scenarios. If you guys don't personally use it and don't want to maintain it then I would hope that you would defer to the community to do so and trust them. I can try and help out more.

As @ctlajoie said, there are definitely some usability issues currently. I ran into the exact same issues. I will try and get a PR together to address these issues without affecting current usages of the library. Any feedback on @ctlajoie 's suggestions before I implement them?

ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 12, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).
ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 13, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.


Oops commented out the wrong line
ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 13, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.


Oops commented out the wrong line


Again
ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 13, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.
ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 13, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.
ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 13, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.
@ejsmith
Copy link
Contributor

ejsmith commented Apr 13, 2020

@ctlajoie I just submitted a PR that addresses these issues #1431 Take a look and see what you think.

@mrmartan
Copy link

There is also this - #1067 (comment)

The transparency issues weren't solved. I have tried giving Sentinel a go with Microsoft.Extensions.Caching.Distributed, Microsoft.Extensions.DependencyInjection.AddStackExchangeRedisCache and Polly.Caching, and it regrettably (but not surprisingly) failed:

StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). ProtocolFailure (None, last-recv: 1535) on dcvmredwebdev23.parfums.local:26379/Interactive, Flushed/ComputeResult, last: ECHO, origin: SetResult, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 7 of 10 available, last-heartbeat: never, global: 37s ago, v: 2.1.28.64774
   at StackExchange.Redis.ConnectionMultiplexer.ConnectImplAsync(Object configuration, TextWriter log) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 862
   at Microsoft.Extensions.Caching.StackExchangeRedis.RedisCache.ConnectAsync(CancellationToken token)
   at Microsoft.Extensions.Caching.StackExchangeRedis.RedisCache.GetAndRefreshAsync(String key, Boolean getData, CancellationToken token)
   at Microsoft.Extensions.Caching.StackExchangeRedis.RedisCache.GetAsync(String key, CancellationToken token)

Improving the API is worthy, but I still believe it is important that ConnectionMultiplexer.ConectAsync abstracts Sentinel away, based only on the information in the connection string or on negotiation with Redis/Sentinel.

I could understand this was out of scope of #1067. Should we create a new issue for that?

@mrmartan
Copy link

@mgravell @ejsmith

I get what you are saying and it's not fair for the community to expect it, but you guys are pretty much THE Redis library for .NET

IMHO StackExchange.Redis should be a member of the .NET Foundation.

@mgravell
Copy link
Collaborator

mgravell commented Apr 14, 2020

IMHO StackExchange.Redis should be a member of the .NET Foundation.

we... are? https://www.dotnetfoundation.org/blog/2019/03/07/welcoming-miniprofiler-stackexchangeredis-protobuf-net-and-sign-service

@mrmartan
Copy link

Totally missed that.

@mgravell
Copy link
Collaborator

y, but I still believe it is important that ConnectionMultiplexer.ConectAsync abstracts Sentinel away, based only on the information in the connection string or on negotiation with Redis/Sentinel

I'm inclined to agree; if we step back (many) years, this actually was the case in BookSleeve (the ancient ancestor of SE-Redis). That is why the service name existed in the config.

Broadly speaking, I'm open to re-enabling this, but I am concerned about it changing semantics, in particular what happens when people have a value in there that currently does nothing. However! I am open to doing something more controlled here, and maybe this is something we should do that changes the direction of #1431 - i.e. instead of adding methods like SentinelMasterConnect, maybe we should be changing the config string / config object with something new (to avoid the problem above), so that if we did (via config or string):

var config = // ... blah, but including "some new use sentinel flag" or something
using var con = ConnectionMultiplexer.Connect(config);

with Connect[Async] essentially doing:

if (config.NewFlag)
    return SentinelMasterConnect(config); // this is now a private method
return // ... old logic

thoughts?

@mrmartan
Copy link

mrmartan commented Apr 14, 2020

How about Connect[Async] essentially doing:

try
{
  // current connection methods
}
catch (SorryIAmNotRedisIAmSentinelException)
{
  return SentinelMasterConnect(config); // this is now a private method
}

That is, you would not need a new flag. You could tell from the protocol, that the connection string points to Sentinel instances.

@mgravell
Copy link
Collaborator

mgravell commented Apr 14, 2020

or more simply we could just do:

var conn = current connection methods
if (!string.IsNullOrWhiteSpace(config.ServiceName) &&
    conn.ServerSelectionStrategy.ServerType == ServerType.Sentinel)
{
    using (conn) { return conn.SomeMethod(config.ServiceName); }
}
return conn;

If you don't want to do this flip (i.e. you want to connect to the sentinel itself): do not supply a ServiceName. That works...?

@mrmartan
Copy link

mrmartan commented Apr 14, 2020

Sounds good.

Just to be clear, since the method is called SentinelMasterConnect, how does this expect to work with slaves?

That is, will the CommandFlags.PreferSlave and CommandFlags.DemandSlave be respected?

@mgravell
Copy link
Collaborator

Good question, and I'm open to suggestions; should it be issuing both SENTINEL slaves {svc} and SENTINEL get-master-addr-by-name {svc}? again, open to thoughts here

@mrmartan
Copy link

I'd be inclined to it. Can't see now what it would mean for implementation.

I would tell though, that Sentinel should only affect the way the pool of PhysicalBridges/PhysicalConnections gets managed and maybe how commands are assigned to them.

As a side note. There could be an opportunity here to introduce load balancing, even it it was a simple round-robin. Currently we are using 3-node Sentinel managed "clusters", the connection strings point to Redis nodes directly, oblivious of Sentinel. Bulk of requests (say 90%) ends up on the slave that occurs first in the connection string (or whatever slave PhysicalConnection is deemed first after Sentinel failover).

@mgravell
Copy link
Collaborator

(it should already round-robin as long as it knows it is talking to replicas; it can't round-robin to the primary)

@mrmartan
Copy link

mrmartan commented Apr 14, 2020

What could cause it not knowing it is talking to replicas? Could that affect, or be affected by, (future implementation of) Sentinel?

@mgravell
Copy link
Collaborator

the server not responding to info replication (by having info disabled) would be the most likely cause

@mgravell
Copy link
Collaborator

however, the thought occurs; if we were getting primary/replica info from sentinel - we could actually pre-seed the expected server types based on which list it is coming from; then it doesn't matter whether the server respects that command

@ejsmith
Copy link
Contributor

ejsmith commented Apr 14, 2020

I am going to try and re-work my PR #1431 to make it so that you can connect to a sentinel managed master with just a connection string and not have to go through a different connect method. I do think that you should be able to explicitly connect to a redis sentinel server though and have the commands limited to the sentinel commands. So I think we should keep the SentinelConnect and SentinelMasterConnect methods. But we can add a flag to the configuration options that will let you automatically get a managed master connection using a sentinel server using the normal Connect method.

@mrmartan
Copy link

That still leaves out the slaves. What would happen here?

using var redis = await ConnectionMultiplexer.ConnectAsync("mySentinel:port,ServiceName=someService");
var db = await redis.GetDatabase();

var result = await db.StringGetAsync(someRedisKey, CommandFlags.DemandSlave);

I do think that you should be able to explicitly connect to a redis sentinel server though and have the commands limited to the sentinel commands.

Makes sense. Although I don't think you need extra flags or connect methods, besides the two:

ConnectionMultiplexer.ConnectAsync();
ConnectionMultiplexer.SentinelConnectAsync();

The client calling ConnectionMultiplexer.ConnectAsync() already knows whether he passes Redis or Sentinel endpoints to it. The two are not interchangeable, and mixing them together should raise an exception. I would naively expect that should be enough, but there is also, as @mgravell suggested, the flag in the form of config.ServiceName. Do you see a reason not to use that?

@mrmartan
Copy link

mrmartan commented Apr 14, 2020

If there was to be a new flag, I am thinking, use it to do away with any extra connect methods while helping users to avoid a pitfall.

Consider:

ConnectionMultiplexer.ConnectAsync("mySentinel:port,ServiceName=someService");

vs (this would allow you to connect to a redis sentinel server though and have the commands limited to the sentinel commands, but this is also where I can see the pitfall)

ConnectionMultiplexer.ConnectAsync("mySentinel:port");

vs (here imagine the ServiceName optional)

ConnectionMultiplexer.ConnectAsync("mySentinel:port,ServiceName=someService,SentinelManagedConnection=true");

As a user how would you image these methods to behave under various conditions?
And the benefit is obvious, no new public API.

@mgravell
Copy link
Collaborator

My view here is that if the server connected to is clearly a sentinel, and if a service name is supplied: it is fine to bounce automatically without any extra change, so no extra config option is needed. Anyone historically using a redis server (not sentinel) endpoint and supplying a service name will be unaffected. Anyone who wants to connect just to the sentinel: shouldn't supply a service name.

Re replicas: we should try and identify master and replicas from the sentinel, and prime all the endpoints. The replicstion role (primary/replica) of each can probably be set in advance based on this info.

@ejsmith
Copy link
Contributor

ejsmith commented Apr 16, 2020

I think it’s a completely valid scenario to want to connect directly to the sentinel server without getting redirected. How would I go about doing that? Would we just connect and sniff to see if it’s a sentinel and change the command map on the fly? I am in favor of leaving the other methods for intuitiveness, but also making it possible to connect with just a connection string.

I don’t know how the read scenarios currently work. Does the client know about all the masters and slaves and a flag in each command tells it which set of servers to use?

@mgravell
Copy link
Collaborator

How would I go about doing that?

Just don't set a service name in the config

Does the client know about all the masters and slaves and a flag in each command tells it which set of servers to use?

The library supports that usage, yes. Right now I think the sentinel usage only looks up he master, but it could lookup everything.

@ctlajoie
Copy link
Author

I like direction you took this. Would the presence of a service name in the config also cause the ports to default to 26379?

@mgravell
Copy link
Collaborator

@ctlajoie I'd be hesitant to do that; the key aim here is to not break people who may have set that value without knowing the much deferred intent of it, so if their config is currently routing them to a redis-server instance, I don't want to break them.

@ejsmith
Copy link
Contributor

ejsmith commented Apr 22, 2020

@mgravell but that is a brand new config option, isn't it? It seems to me that if we are going to trigger sentinel mode from the servicename setting then it should trigger the default port setting as well. Of course if you specify a port then that would override the default.

@mgravell
Copy link
Collaborator

This is not a new config option, so people could today have it specified. So what I suggest is: if all we have is a service name and an endpoint, don't make any guesses until we have spoken to the endpoint. If it is a redis server: ignore the service name and connect like we always have; if it is a sentinel: sure, resolving the service makes sense.

@ctlajoie
Copy link
Author

In Kubernetes it is common to run sentinel and redis in the same pod. Thus they will have the same IP address, but the services are running on different ports. I can only speak for myself, but IMO it would be best to have it attempt to connnect on 26379 first, if a ServiceName is provided.

What was the ServiceName property used for, historically?

ejsmith added a commit to ejsmith/StackExchange.Redis that referenced this issue Apr 26, 2020
…r for working with sentinel setups (StackExchange#1427) Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (StackExchange#1430).

Add string configuration overloads for sentinel connect methods. Remove password from sentinel servers as it seems the windows port does not support it. Add some new tests.
@ejsmith
Copy link
Contributor

ejsmith commented May 1, 2020

@ctlajoie @mrmartan I updated the sentinel improvements PR #1431 Take a look and see if you think it solves these problems.

@mrmartan
Copy link

mrmartan commented May 4, 2020

Scanning through it and can't see how it manages the slaves ("GetMaster" methods), but that can be simply my ignorance. The API seems great now. I can run it through some stress tests on our setup and trigger Sentinel failovers if that would be helpful.

@ejsmith
Copy link
Contributor

ejsmith commented May 4, 2020

@mrmartan Are you asking about being able to connect read only to the slaves?

@mrmartan
Copy link

mrmartan commented May 4, 2020

No, not particularly.
I am asking about what I was going on about here several times (sorry 😅).

That is:

var conn = ConnectionMultiplexer.Connect("localhost,serviceName=mymaster");
var db = conn.GetDatabase();
db.StringGet("key", CommandFlag.DemandSlave);

would route the REDIS GET key command to any one of the current replicas of mymaster.

Does that make sense? Also here #1427 (comment) and here #1427 (comment)

@ejsmith
Copy link
Contributor

ejsmith commented May 4, 2020

@mrmartan yeah, I'm not sure how that works. I was just focused on getting the core sentinel support working. I think there are several more things that would be nice to have like what you are showing where it would send that command to a slave instance. I am thinking that will probably work though. Did you give it a try?

@mrmartan
Copy link

mrmartan commented May 4, 2020

No, I didn't. I can try tomorrow.
If we want that to work, shouldn't there be a test for it?

@mrmartan
Copy link

mrmartan commented May 5, 2020

So the slaves don't work.
image
Changing that to PreferSlave makes it work (since it just goes to the master).

Also, and I don't know whether that is expected, server selection strategy is standalone rather than sentinel.

image

While the actual configuration I am feeding to it is dcvmredwebdev11:26379,dcvmredwebdev13:26379,dcvmredwebdev12:26379,serviceName=mymaster,syncTimeout=500,connectTimeout=750,connectRetry=0

@ejsmith
Copy link
Contributor

ejsmith commented May 5, 2020

@mrmartan hmm.. not sure on that. I didn't change any of that part of the code. My PR was just trying to get the basics to work. I would think that if that code currently works in a master/slave setup then it should work for a sentinel setup since sentinel is just a leader election quorum that sits on top of a master/slave config and automatically switches the master. @NickCraver @mgravell thoughts?

@mrmartan
Copy link

mrmartan commented May 5, 2020

Normally you would connect the multiplexer to dataNode1:6379,dataNode2:6379,dataNode3:6379 which sets up 3 servers in appropriate roles, and if these are managed by Sentinel and there is a failover, the connection multiplexer will switch the roles. The multiplexer you are returning has only one server, i.e. the master. It does not know about any other (slave) servers and so (I assume) any existing code can't manage that.

@ejsmith
Copy link
Contributor

ejsmith commented May 5, 2020

@mrmartan I can take a look at that and improve things. I would like to get the current PR merged first though because as of right now I can't use SE.Redis in a sentinel configuration at all because it blows up trying to manage the server list (#1430).

@mrmartan
Copy link

mrmartan commented May 6, 2020

I see. Although I would argue that giving users a version that can be integrated with 3rd party caching libraries and Sentinel while then working only with masters could lead to some performance pitfalls, i.e. it locks you out of horizontally scaling your sentinel setup. Could you release the fix for #1430 separately?

Of course, the decision is ultimately with @mgravell and @NickCraver

@ejsmith
Copy link
Contributor

ejsmith commented May 7, 2020

@mrmartan I just updated my PR (#1431) and added the slaves and verified that you can execute read commands against slaves with a sentinel managed connection.

@mrmartan
Copy link

mrmartan commented May 7, 2020

I have yet to try a failover scenarios, but after one quick test it seems to work.

@ejsmith
Copy link
Contributor

ejsmith commented May 7, 2020

@mrmartan that's great. Would love to hear how all your testing goes. It would be a big help to get more testing on this.

@mrmartan
Copy link

Seems to be working great. Probably gonna make canary deploy to production. FYI, loadbalacing seems fishy, Like it works with 3 replicas (including master) while the commands are only eligible for 2 (due to CommandFlags.DemandSlave).
image
The instance with few commands is the master.
image

@ejsmith
Copy link
Contributor

ejsmith commented May 12, 2020

That's great. Thanks for testing that out! Did you also verify that failover is working correctly?

You would need to ask @NickCraver or @mgravell about how it load balances the slave reads.

@mrmartan
Copy link

I'll open a new issue about the load balancing.

In other news, I think I've found an issue. After forcing a failover (SENTINEL failover mymaster) the original master, now slave, enters a period of replication (which in our case takes minutes) during which it is loading data into memory (I think Sentinel exposes this as "link down") but SE.Redis still routes traffic to it. This means that a portion of all traffic fails with RedisServerException "LOADING Redis is loading the dataset in memory"

NickCraver pushed a commit that referenced this issue Jun 8, 2020
Add SentinelConnect and SentinelMasterConnect to ConnectionMultiplexer for working with sentinel setups (#1427)
Fix issue with duplicate endpoints being added in the UpdateSentinelAddressList method (#1430).

This change makes it a lot easier and more discoverable how to connect to a sentinel server while also allowing connecting with just a connection string change which allows existing libs that are using SE.Redis to be used in sentinel mode.

Adding a `serviceName` parameter to the connection string triggers sentinel mode. It will connect to the sentinel and discover the current master and return a managed connection that follows the master. 

```csharp
var conn = ConnectionMultiplexer.Connect("localhost,serviceName=mymaster");
var db = conn.GetDatabase();
db.StringSet("key", "value");
```
@nsalikhov
Copy link

We are also facing with strange load balacing on replicas

image

Yellow line is RPS on replica and it's two times more than any other replica (excluding green one, which is Master with write only commands)

@mgravell @NickCraver can you help with this?

@masums
Copy link

masums commented Nov 5, 2020

I am also observing the same condition. I have three read replicas. First node gets high amount of traffic. 2nd node gets half amount of traffic than first node. 3rd node gets half amount of traffic than 2nd node. Is there any way to distribute same traffic to all the read replicas?
I am using Amazon ElastiCache server 1 primary node with 3 read replicas. Calling with command flag Demand Replica.

@NickCraver
Copy link
Collaborator

Since this was scoped to the other issue (resolved now), let's please open any load balancing issues separately, so we can figure out a path there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants