In previous posts I have described some of the challenges that you might encounter when working with solution built using
“JSON over HTTP” Service Oriented Architecture pattern. In this part we are going to look at one common solution which tries fix the problems.
Let’s look at the original figure that shows our imaginary system and all the dependencies. Remember each colored line shows different user story.
When looking at this figure it is easy to say “We have way too many dependencies”. If you don’t analyze or think “Why too many dependencies is a problem” then common solution is just to remove all of them. This can be done by introducing another system into the architecture. This system is often called Proxy or a (Service) Router. You could compare this to Mediator or Facade pattern in software development or perhaps to Reverse Proxy (web architecture).
As you can see our new Service Router has just removed all dependencies between the systems. Since it is hard to look at imaginary figure of the architecture let’s see what our original user story would look like when using Service Router.
First we have the original user story from part two.
And now same thing using “Service Router”. Note the “double” numbering of the steps. There is now two calls per steps because everything goes through the “Service Router”.
As you can see dependencies are (at least on paper) gone but what does that actually mean? Let’s look at the problems listed in previous parts and see which of those we have fixed.
Every System Knows Every Other System
We could argue that this has been fixed. After all there is no longer direct connection between e.g. “Bank Services” and “Customer & Account” but the reality is this that all that has changes is the address where we send the request.
|Before ||HTTP POST account.com/account/123/deposit |
|After ||HTTP POST servicerouter.com/account/123/deposit |
If we would trace all our user stories by drawing lines from system to system you would see that all that has changed is the fact that now every request goes through the “Service Router”.
Again even though in paper we removed all the dependencies they still exist. Remember in part three we wrote down “definition of a dependency”:
- There is a dependency if system cannot complete the user story without calling another system.
- There is a dependency if system cannot complete the request without calling another system.
The “Bank Services” still needs to call “Customer & Account”. The fact that it does it by using “Service Router” doesn’t change anything.
In part three we also wrote down the definition for hidden dependency:
There is a hidden dependency between systems if taking system A offline affects system B even though there is no direct connection between them.
Same thing here. Remember the hidden dependency between “Bank Services” and “Invoicing”. It is still there. If you take down Invoicing you will see error in “Bank Services” no matter how many “Service Routers” you put between them.
Big Ball of Mud
As you can see the figure looks so much nicer without all the lines. Each system only touches the new “Service Router” because that is the only visible dependency.
I think this is even worse now. We have introduced yet another system into our architecture and now every single system depends on that. In addition to that it looks like there is no dependencies between the “blue parts” even though in reality the dependencies / hidden dependencies are still in place.
Instead of having system specific error handling in every system we could put it into “Service Router”. It would take care of the retry logic etc. Now there are only few issues with that:
- We still need system specific error handling because the “Service Router” might be down and we need to resend the request.
- The actual error handling logic is just code. We can put it into shared library and every system can use it. There is no need to copy paste code. If you are using something like HttpClient you can implement the whole logic pretty nicely.
The “Service Router” could store the requests (local storage) and we could use some other tool/scheduler to resend them. This would work just fine if you only need to store the actual HTTP request and nothing else.
With this kind of solution there is no need have system specific storage for the failed requests. This kind of solution is not without challenges.
When “Customer & Account” service is down what should the “Service Router” return to Streaming? Since the (long-term) retry logic is now baked into “Service Router” this is not an error case. One way is to “lie” and just return the same HTTP 200 or we could use HTTP 202:
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
In case you need/want to change the actual return codes beware of the leaky abstraction that we talk about next.
The leaky abstraction is defined as follows:
A leaky abstraction is any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden. The term is most frequently used to call attention to a flaw in a software or hardware abstraction.
I would say there is a high probability that the details of “Service Router” will quickly leak into other system. One good example is the retry logic and HTTP codes. At some point there will be a piece of code in the Streaming service that looks like this:
var response = httpClient.Send(request);
if (HttpStatusCode.Accepted == response.StatusCode)
// The Service Router accepted the request but
// it was not executed. The "Customer & Account"
// is probably down so we need to...
The God object is defined as follows:
In object-oriented programming, a god object is an object that knows too much or does too much. The god object is an example of an anti-pattern.
Even though we are not talking about OOP the same applies to “Service Router”. Since it already knows each and every system there is a high probability that some day the logic inside it will contain exceptions based on systems calling it or systems it is calling.
Solving Problems in Production
At first it might sound like this would be lot simpler with “Service Router” since we have single place which contains:
- Logs for all the requests
- Logs for all the responses
- Logs (and implementation) of retry logic
In many cases those will certainly help you understand why something failed but they are missing one key piece of information: The actual business logic that happens inside each service. You get the raw HTTP requests and responses but there is no context.
Let’s look at the original user story implemented with “Service Router”.
- Connect to the bank and to get the payment information.
- Match the payment to customer/account and increase the limit accordingly.
- Mark the invoice as paid.
Let’s look at the log files of raw HTTP requests
HTTP GET bank.com/payments/ Successful with body:
Received HTTP 200 from bank.com/payments/
HTTP POST from bankservices.com to account.com/payments
HTTP PUT from account.com to invoicing.com/account/39302/payments
Received HTTP 200 from invoicing.com/account/39302/payments
Received HTTP 200 from account.com/payments
As you can see we are logging every request/response in the “Service Router”. Now let’s compare this logging to traditional service specific log files.
Payment ($100.00) with reference number 940403940 received from bank.
Sending deposit request to Account service (Ref #940403940, $100.00).
Received payment (Ref #940403940, Amount $100.00)
Account #39302 uses reference number 940403940.
Account #39302 current balance is $47.50. Adding $100.00 to it.
Notifying Invoicing of account update (Account #39302).
Account #39302 has received payment. Looking for invoices.
Found invoice #203093000 and marking it as paid.
It is quite obvious that solving problems in production is pretty hard with just the logs “Service Router” provides. Raw HTTP logs can help you trace other things like performance problems or configuration issues but they don’t help you track down business logic related problems.
Take it Down
How easy it is to mess with single system now that we have “Service Router”. As you can remember from previous part “mess with” means
- Take the system down since other systems don’t depend on it
- Connect to it remotely, attach debugger etc. without affecting much the other parts of the system or the system as a whole
- Bring the system back up and expect things just work after that
The “Service Router” doesn’t really help us. Sure we can use it to return something like HTTP 503 Service Unavailable but then we would need to implement additional logic into the calling service to gracefully handle it. That is what the “Service Router” should have done for us.
A fix that doesn’t really fix anything
When I started to write this part my initial thought was “This Service Router is going to be a disaster”. I was really surprised to find out that in some cases (Handling long-term errors) it can provide functionality that might actually work and be useful. Sure it is just one case and not without it’s own challenges but still it was nice to find out that the whole solution is not as bad as I thought it would be.
The fact is that “Service Router” doesn’t fix (and sometimes makes them even worse) most of the initial challenges we had with Service Oriented Architecture means we there will be at least another part for the series.