With good tools it is easy to write regular expressions

by toni 7. October 2013 08:41

The following pictures describe what lot of developers experience when writing regular expressions.

The hours writing the regular expression

m8ea9

And when the pattern finally works

iS2WJkP

Instead of learning regular expression from Wikipedia, Linux man pages or from a tutorial you can use one of the many tools that are available. Let’s look at one such tool.

Assume you have hundreds of files that are named like:

File_001
File_temp
File_2
File_00000123

Your job is to find out the largest number. In this case the correct answer would be “123”. You could write lot of code or you could use following regular expression (no I didn’t test this but you get the point) to extract the number after the underscore:

(?<=File_0*)[1-9]{1}[0-9]*

Writing something like this using a good tool is a matter of minutes. My tool of choice was Expresso (free but you need to send email to the developer to register it). It can analyze the regular expression:

expresso_main_screen

As you write the regular expression the analyzer shows what it means. This helps a lot because you don’t have to remember what each part of the expression does. Also instead of trying to remember the cryptic syntax for “Match a prefix but exclude it from the capture” you can just look under Groups:

express_groups

Of course this is just one tool and there are many alternatives such as RegexBuddy or Rad Software Regular Expression Designer. If you don’t like the idea of installing software you can use online tools like RegExr or RegEx101.com.

Hopefully these tools make it easier for you to write regular expressions!

Tags:

Code

About semicolons in JavaScript

by toni 16. September 2013 07:21

As you probably know semicolons are optional in JavaScript but e.g. in the book JavaScript: The Good Parts it is recommended to use them:

JavaScript has a mechanism that tries to correct faulty programs by automatically inserting semicolons. Do not depend on this. It can mask more serious errors.

JavaScript uses a C-like syntax, which requires the use of semicolons to delimit statements. JavaScript attempts to make semicolons optional with a semicolon insertion mechanism. This is dangerous.

In many ways this reminds me of C/C++ that I saw and wrote back in year 2000 when I was working as a C++ developer.

Yoda Conditions

if (NORMAL == mode)

It is called yoda condition because the way you read it

if (NORMAL == mode) reads “If normal is the mode”
if (mode == NORMAL) reads “if mode is normal”

Anyway this was recommended because it prevented lot of bugs:

// Bug: Assignment instead of comparison!
if (mode = NORMAL)

// Compilers gives warning because you cannot assign to constant
if (NORMAL = mode)

Pointer initialization

It was recommended that every pointer is initialized to NULL. Sure even back then we had some compilers that did that automatically but not for every platform. I remember writing code for HP-UX where the compiler did this. After porting the code to SunOS and running in release mode I got burned.

Using braces { } with if statements

// Not recommended
if (NORMAL == mode)
    PrepareForNormalOperation()

// Recommended
if (NORMAL == mode)
{
    PrepareForNormalOperation()
}

Usage of braces was recommended because they prevented lot of bugs when changes were made into the code base.

// Bug! SomeOtherMethod is always called
if (NORMAL == mode)
    PrepareForNormalOperation()
    SomeOtherMethod()

// Using braces always makes it harder to introduce
// bugs when modifying code.
if (NORMAL == mode)
{
    PrepareForNormalOperation()
    SomeOtherMethod()
}

Conclusion

So what's all this C++ has got to do with JavaScript? These recommendations were widely used because they prevented lot of bugs in the long run. This was especially true in the following cases

  • New developers/team starts making next version
  • Sub contractor working on the code
  • Junior developers
  • Change request is implemented six months after the release

Sure you can argue that you can use code reviews, unit tests etc. to spot these bugs but so far I haven’t seen any project where those have found all these bugs before going into production. You can also argue that Linux kernel does not follow these rules. That is true but we are talking about Linux kernel which is a bit different than typical software project.

In this light the whole “I’m not using semicolons in JavaScript because they are optional and I know the special cases when I have to use them” seems strange. Sure extra ; will bring some unnecessary noise to your code but is that really your biggest problem?

Did you write down the pros/cons list about semicolons and came into conclusion that preventing this code noise is more important than any other benefit you get in the long run when using semicolons?

If so then I can respect that;

Good tests give you confidence to change anything

by toni 26. November 2012 09:04

Introduction

The system has been under development for two years and the purpose of the project is to replace server side of an old legacy system. Clients and server communicate using messages. Following figure shows the message flow.

legacy_message_flow

The normal communication flow is as follows

  1. Start session between client and server.
  2. Initialization with some additional data.
  3. Exchange any number of different messages.
  4. Client sends Stop which ends the exchange of messages.

How this works is that the first two messages: Start Session and Initialize affect to every other message that server receives. Any change to those messages will affect other messages. Server is basically keeping some state information and doing some additional things based on those two messages that are received first. Since this server is supposed to serve old legacy clients we cannot change that logic.

After two years of development there was a change request that would change the functionality of Start Session and Initialize messages. Since every other message depends on those we can list few challenges:

  • How do we analyze how this change will affect other messages
  • How long does it take to make the changes
  • How long does it take to test that everything works

Test Suite

Imagine system without any tests. It would be impossible to make this change without huge effort. Remember this functionality was implemented two years ago. What if it was five years ago? What if the developers are no longer working for the company?

Fortunately the situation wasn’t that since our test suite contains lot of integration tests. Yes, not unit tests but integration tests. In the scope of this post term integration tests means test that starts from the input (message) received by the server and ends with the database.

legacy_integration_tests

If you had only unit tests you could only tell whether single message was processed successfully but you would have no idea how the system would work. How that change would affect the processing of following messages?

From the start we built integration tests based on what happens in production. Following figure shows simple scenario from production.

legacy_message_flow_addpart

In the figure above the scenario is that before AddPart can be sent message Start Session, Initialize and AddProduct must have been sent. This means in our integration test for AddPart we had the following code that is executed before each test.

public void Initialize()
{
    this.Given_Session_Start_has_been_sent();
    this.Given_Initialize_has_been_sent();
    this.Given_AddProduct_has_been_sent();
}

This code will actually send the needed messages the same way they would be sent in production environment. This gives us following benefits:

  • It is easy to put the system into different states by simply sending messages in different order.
  • The integration test works just like the production i.e. no need to put fake data into database to put the system into specific state.
  • Smaller change of having bugs in the actual tests/test data.
  • Easy to test situations where messages are received in the wrong order.

With this kind of integration test setup making changes to code is lot easier. Even if you don’t know the code base that well you can be confident that you are not breaking anything when making changes.

Conclusion

Unit tests are great. If you have algorithm that checks that format of bank account is correct you should write unit tests. But unit tests only tell you that small piece of code works. They don’t tell you whether your system still works, whether all the interactions between different parts work. That’s why you need integration tests.

There is one book that I always recommend: Growing Object-Oriented Software, Guided by Tests. Yes it is few years old and code examples are for Java but it is a great book. It shows how to build systems by starting with the integration tests so that you know your end-to-end scenarios will work. And most importantly that they still work after you make changes.

Code Analysis helps you find issues with disposable resources

by toni 20. November 2012 07:14

A while ago I was working with old code base and I decided to run Code Analysis against it. Since that had been used during development I didn’t expect to find anything major. I just wanted to see whether Code Analysis had improved during this time and whether it could find something.

I was pretty surprised when it reported CA2000 warning.

CA2000 Dispose objects before losing scope In method 'Connection.CreateConnection()', call System.IDisposable.Dispose on object 'resource' before all references to it are out of scope.

The code in question looked like this (line causing the warning highlighted).

public void CreateConnection()
{
    var resource = new DisposableResource();
    this.legacyConnection = new LegacyConnectionWrapper(resource);
    this.legacyConnection.Connect();
}

First I didn’t understand why Code Analysis thought there was a problem since

  • Connection, LegacyConnectionWrapper and DisposableResource all implement IDisposable correctly.
  • LegacyConnectionWrapper owns DisposableResource and will dispose it.
  • Connection owns LegacyConnectionWrapper and will dispose it.

Then I decided to see what MDSN would say about this warning

If a disposable object is not explicitly disposed before all references to it are out of scope, the object will be disposed at some indeterminate time when the garbage collector runs the finalizer of the object. Because an exceptional event might occur that will prevent the finalizer of the object from running, the object should be explicitly disposed instead.

Now it all made sense. If there would be an exception on line 4 or 5 the DisposableResource would not be disposed right away. In some cases that can cause really hard to find bugs. Assume you try to call CreateConnection() with aggressive retry policy. The call keeps failing and you are never releasing any resources. At some point the whole reason for failure might be the fact that you have run out of some unmanaged resources because garbage collector hasn’t freed those resources yet.

It is also worth mentioning that I only got this warning when I used “All Rules” rule set. Using just the minimum or recommended rules did not produce this warning.

Beware of JSON over HTTP architecture Part 7

by toni 12. November 2012 05:11

In the previous part we came up with a solution to our problem: Queues.

queue_solution

Steps are

  1. Connect to the bank and to get the payment information.
  2. Notify others with “Payment Received” event.
  3. Get the event, match the payment to customer/account and increase the limit accordingly.
  4. Notify others with “Balance Increased” event.
  5. Get the event and mark the invoice as paid.

The new solution looks lot better but how about the problems we had, did it fix any of those?

Solving Original Problems

Dependencies

In our original SOA based architecture we found out that sooner or later every system knows every other systems because we need to call them directly. Now that we have started to use queues that problem doesn’t exist anymore. Once system has completed it’s part it will notify others. It doesn’t have to care what happens next.

Hidden Dependencies

We still have hidden dependencies but they are bit different now. Our design for the user story looked like this:

soa_get_payments2

In the third part we came into following conclusion about hidden dependency:

“…in the scope of the user story the Bank Services actually depends on Invoicing because if the invoicing service is down Bank Services will receive an error and the processing of payment fails.”

Now that we are using queues we don’t have that problem anymore. Of course the whole user story cannot complete if Invoicing is down but we have the following benefits thanks to queues.

  • Other systems can complete their part even if Invoicing is down.
  • Other systems are not affected if Invoicing takes lot longer to do it’s job since they are no longer waiting for it.

Availability

Since there are no direct calls between our systems we don’t have to have 100% availability. If Invoicing is down balances are still updated and the “Balance Increased” events are stored in the queue. As far as “Customer & Account” or “Bank Services” knows everything is working just fine. They are not affected by the downtime of Invoicing.

Error Handling

Remember in the previous solution we needed to manually implement error handling? E.g. retry the request three times.  Now the queue is actually part of our infrastructure. We can do e.g. mirroring so in case of a hardware failure our system just keeps on running. If we are running the system in the cloud (Amazon, Azure) we might still have to do some manual error handling (“retry request n times”) but cloud providers have pretty good infrastructure so you can be quite sure that queue just works.

Now that different systems are clearly separated it is lot easier to handle possible errors. We only need to care about single system.

bugs_in_single_system

If there is bug in Invoicing it is limited to that system. We don’t make any REST calls so there is no need to handle possible errors that happen in other systems. Easier to debug, fix and maintain.

Long-Term Errors

As long as you have free disk space the queue will store the messages until we are ready to consume them. In case of long-term error there might be tens of thousands of messages in the queue. Once the problem has been solved we can temporarily increase the number of consumers in order to handle the load as fast as possible.

multiple_invoicing_processes

Solving Problems in Production

It is lot easier to solve problems when one system does one thing and you don’t have to think about scenarios like “When payment is received we call system B which then calls system C and if the original input was this then this fails because…”.

When Invoicing does not work we can be pretty sure the problem is actually in that service. Also in case of an error it is reported inside single system. Typically with SOA based systems each system reports the failure in their own log since they log the responses they get from other systems.

Take it Down

Since we are using queues it is lot easier to “mess with” the systems that are in production. We can do e.g. the following to a single system

  • Take it down for hours/days
  • Make changes to it (hot fixes, configuration changes)
  • Connect to it remotely, attach debuggers etc. without affecting other systems

Challenges

It seems we have fixed most of the problems of the original architecture but our solution is not all about sunshine and unicorns. There are some challenges.

Poison messages

When messages are stored in queue it is very likely that you will encounter poison messages:

A poison message is a message in a queue that has exceeded the maximum number of delivery attempts to the receiving application. This situation can arise, for example, when an application reads a message from a queue as part of a transaction, but cannot process the message immediately because of errors.

If the problem with the message is not corrected, the receiving application can get stuck in an infinite loop, starting transactions to receive the message and then aborting them.

To handle those messages make sure sure you have the means to do following

  • Move poison messages into another queue (retry queue, dead letter queue)
  • Analyze poison messages to see what is wrong (corrupted messages, missing data etc.)
  • Fix the messages and resend them again

Different types of messages in single queue

Once you start using queues it is common to use single queue to handle different types of messages. There isn’t anything wrong with that but you have to remember at least one possible downside. Assume the consumer of the messages has been unavailable and you have tens of  thousand messages in the queue. When the consumer comes back up he starts to process those messages. Now if you are using the same queue to handle messages with different priorities following happens.

important_messages_blocked

As you can see your important messages are stuck there in the middle of thousands of less important messages. Some queue solutions support Hybrid FIFO/Priority queue which might be handy in situations like these or you might just have different queues for different messages. Also the AMQP (Advanced Message Queuing Protocol) support priorities. Just make sure if your queue solution supports AMQP it also implements support for priorities.

Inconsistent State

Since you don’t have direct REST calls between the systems you must plan for “inconsistent state”. In our example it can mean e.g. following.

  • Even if the payment has been processed by “Bank Services” the balance of the account has not been updated yet.
  • Even if the account balance has been updated the invoices has not been processed yet.

In many cases this means that you need to store some kind of timestamp for different operations because at least the help desk must be able to tell the customer what has happened and when. In case part of the system is down they can tell the customer “We have received your payment but your balance was last updated last Friday so the payment has not been processed yet.”.

Overusing Queues

I think this is perhaps the biggest problem. Now that we have our hammer every problem looks like a nail. Understand why and when queue is a good solution and do not try to force every user story and system to use it. There are lot of common examples of situations where using queue (or service bus) is not a good idea but they really deserve blog post of their own.

Final Words

This is the last part of the series. I think we managed to take a pretty good look into the challenges of the typical “JSON over HTTP” / SOA architecture and we found a solution that fixes most of the problems.