- Notifications
You must be signed in to change notification settings - Fork635
Fix State machine sub-regions do not resume from last state after restore from persistence #811#998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:2.5.x
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
chutch commentedSep 13, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hey@ShvetsovDV , thanks for looking at it. I think there are actually 3 different bugs that are contributing to this issue and needs to be addressed. I'm lately looking into this issue on my project and would like to help to push those fixes through. Maybe we can do it one by one? Resuming
You did fix it alreadyhere. Going through all regions is wrong, because the reset is anyway recursive.
Fixedhere. So all regions we iterate over are going to be always overwritten by last context. PersistingThere is also an issue with persisting the context of regions. The context of region overwrites the main context. Now, I'm not sure if the solution proposed is the correct one. Let me explain what I have observed. I have implemented my own The OOTB However, for Now the "Persisting" issues can be quickly fixed by using custom persister and persisting on So, I have two questions... do you think we can split the fixes for persisting & resuming? How can I help to push these things through? I can gladly take over. |
ShvetsovDV commentedSep 14, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@chutch, thanks for the answer! Indeed, when solving the problem, several errors were fixed in different parts of the code. I think we need to commit the edits with one commit, otherwise there will be no positive effect. I would also like to receive a comment on my commits from@jvalkeal .
Here fixed a bug in determining the machine ID for the context before saving it. Perhaps your effects are due to the fact that I would be glad to receive examples of cases where my solution does not work. |
chutch left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, I'll try to rephrase to explain my concern. Let's assume that we have following simple State Machine:
┌─────►X1─────►X2────┐ │ │start ───►A─┤ ┌┘►B────►Z │ │ └─────►Y1─────►Y2───┘Where:
start, A, B, Zare states of the root State Machine, with idexampleX1, X2are states of a sub-machine, a region calledxtherefore idexample#xY1, Y2is state of another sub-machine, a region calledy
Now, let's assume state change fromX1 ->X2. From the debugger I see that whenpostStateChange &preStateChange are called for the same state change, they are called with different parameters
preStateChange
public void preStateChange(State<S, E> state, Message<E> message, Transition<S, E> transition, StateMachine<S, E> stateMachine, StateMachine<S, E> rootStateMachine) {// with "values"state=X2stateMachine=examplerootStateMachine=example// the very same SM object, root onestateMachine == rootStateMachinepostStateChange
public void postStateChange(State<S, E> state, Message<E> message, Transition<S, E> transition, StateMachine<S, E> stateMachine, StateMachine<S, E> rootStateMachine) {// with "values"state=X2stateMachine=example#x <== correct SM, because the state change happened within a regionrootStateMachine=example// different SM objectsstateMachine != rootStateMachineThat makes me think, that maybe trying to fix this issue by "reverse-engineering" withfindSmIdByRegion is not a proper way to fix it.
For some reason for the very same state changepostStateChange method receives proper SM objects. Then passing those 2 objects tobuildStateMachineContext() methods works perfectly fine without any changes in the code. So persisting onpostStateChange seems to be working fine.
WhereaspreStateChange receive 2x root SM, the very same object, the very same reference.
So this makes me think that maybe the fix should be performed somewhere deeper? Please mind this is just a hunch and maybe I am wrong. I think I would a lot more time to understand the inner workings of SM to see where the bug is. I am sure your code is working and doing the job. I am just wondering if it is not going to seal a deeper issue.
Let me know what you think about it. Hope this makes sense.
| id =getDeepState(state); | ||
| }elseif (state.isOrthogonal()) { | ||
| if (stateMachine.getState().isOrthogonal()) { | ||
| //if (stateMachine.getState().isOrthogonal()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't understand this change, can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In this part of the code, it is important for us to write the child state machine identifiers into the parent context.
When we have a state machine with nested state machines, we need to get a context like this:
DefaultStateMachineContext [ id=testid , childs= [ DefaultStateMachineContext [ id=testid#FIRST , childs=[] , childRefs=[] , state=S21 , historyStates={} , event=E3 , eventHeaders={id=9ab47504-7da3-1853-8a2d-4a7e667a2def, timestamp=1633085509033} , extendedState=DefaultExtendedState [variables={}]] , DefaultStateMachineContext [ id=testid#SECOND , childs=[] , childRefs=[] , state=S31 , historyStates={} , event=E1 , eventHeaders={id=9638028e-a69e-4adf-92d4-bb4655f4c2a2, timestamp=1633085509039} , extendedState=DefaultExtendedState [variables={}]]] , childRefs=[] , state=S2 , historyStates={} , event=null , eventHeaders=null , extendedState=DefaultExtendedState [variables={}]]This context for the parent statemachine must include child identifiers (in our example,id = testid # FIRST andid = testid # SECOND). Using these identifiers, we will restore the context from the database (from the "state" table by the "machine_id" field) for the child state machines.
In the line of code} else if (state.isOrthogonal ()) { we determine if ourtarget state is ramified.
In the line of codeif (stateMachine.getState (). IsOrthogonal ()) { we determine if oursource state is ramified.
The fact that thesource state and thetarget state at the same time together looks illogical and looks like a configuration error for the statemachine.
Due to the lineif (stateMachine.getState (). IsOrthogonal ()) { child state machine IDs are not saved in the parent context.
For example, consider thetestPersistRegionsAndRestore test.
You are right that the error goes somewhere deeper. But the source code is difficult to write and needs a serious rewrite. Such elaboration takes time. To carry out serious rewrite without coordination and explanation of certain decisions by the owner of the source code, is fraught with new errors and loss of time. In my implementation, I am locally parsing the structure of the statemachine in the findSmIdByRegion method. This fix will not introduce new bugs, although it looks pretty crutch. |
In a serious revision, to simplify the code, I would like methods for analyzing the structure of the state machine similar to findSmIdByRegion (may be findNextState, getAvailableStates, findEventsFromSourceState, ...) into a separate static class. |
kiran-mallineni commentedJul 25, 2024
yes there is a bug AbstractStateMachine which is calling AbstractPersistingStateMachineInterceptor with incorrect state machine. can we prioritise this to fix in the library? we fixed some parts by extending the library. however AbstractStateMachine is so complicated as there are no proper extension points |
Uh oh!
There was an error while loading.Please reload this page.
Fixes#811
I think, for reason this problem, we need to fix two bags in AbstractPersistingStateMachineInterceptor:
Here we need to look at the line 171. I'am think condition "stateMachine.getState().isOrthogonal()" is unnecessary.
The fact is that all changes are saved for the main state machine(with a main state machine id). We need to fix it.
And second part of fix it is edit resetStateMachine in AbstractStateMachine