Thursday, February 27, 2014

A new project, localize an open source application and get help from the crowd

I've volunteered to localize a desktop .NET application. By localize I mean to extend the application in such a way that it is functional in other languages, cultures, and geographic locations. In a cursory meaning this applies to the presentation layer, the visual interface, that meets the end user but I discovered that there are other details that I need to address in this endeavor. The application that I selected for my project was developed in English and my goal is to modify it in such a way that users preferring other languages, potentially languages that does not use the Roman alphabet, find the same functionality in the application as the English speaking users. I’m writing this text as I research the topic to help me formulate a strategy, measures of success and an execution plan. Hopefully you will find something valuable in this text. I learned a lot from doing this.

Let’s start by defining a measure of success.  The application as I start is fully functional and written for English users. I’m not the developer of the application but I have obtained the source code from a GitHub repository, meaning it is open source, and I do not have any personal relation to the original author. To consider my task successful I will need to accomplish a few things.

  1. The original author shall approve my code changes and merge my version of the application with the official version on GitHub. In GitHub this is normally done with a pull request.
  2. The default language shall still be English but a compiled version shall allow the users to swap to “any” other language (for which a translation exist) and use it with the the same functions as before.
  3. I can obviously only translate to languages that I speak and therefore I will need to find help for other languages. For this I will use crowd sourcing and a service called Transifex. The idea here is very simple, I will upload a file with English words and anyone can suggest and translate to a language of their choosing. Once the translation is complete I can download the newly translated words and add to the application. 
  4. Since the previous point is dependent on people I don’t know, I will still consider the project a success if no one volunteers to translate. As a proof of concept I will translate on Transifex to one other language, and thus make the application bi-lingual. The application code shall however be extended in such a way that another language can be added once a translation is available and there shall be documentation for how to do so. 
  5. The name of the application and where to find the source code, the Transifex repository and application executable shall be announced here and on other channels.

Next I define a strategy and break down what and what changes are required to the application code to accomplish this. The application is a .Net 4.5 application and it’s written in Visual Basic. Although Visual Basic is not my strongest programming language I feel confident that I can solve this problem with my previous experience in .Net. I intend to use as much as possible of the framework to support this so a few things mentioned here may be specific to .Net but the general concept I believe are applicable to any other software application.

  • The first change I need to make is to map and externalize all English strings of the application. The framework has concepts of Resource files, Satellite Assemblies, CultureInfo, and Resource Management so I will rely on the framework and not reinvent things already existing. 
  • Words in different languages have different number of characters which may cause string expansion or contraction. This means that an element of the application interface may grow or shrink in size as the locale is changed.  The most straight forward solution is to increase the placeholder, the control, to fit the longest string.
  • Text in images can’t be translated so any text inside an image needs to be refactored to a string variable if it shall be translated.
  • String concatenation of strings presented in the UI is a bad idea and should be avoided. Some languages use a reversed word order.
  • All numbers shall be localized because both the comma “,” and the dot “.” can be used as the decimal separator instead of the “.”. Furthermore, some locales allow a comma be used to separate thousands.
  • Sorting collections shall be dependent on the locale as the sorting order may be different. Any sort shall rely on the sort function of the framework for the specific locale.
  • If the application has phone numbers, postal codes or any locale specific entity they should be formatted and presented according to accepted conventions.
  • Fixed or hardcoded paths to Windows folders must be avoided. For example the path “C:/Program Files” may not be called the same on OS with a different language.

The above bullet points are not everything that may be considered, and some may not even apply for certain types of applications. It’s important to understand this and be dynamic when approaching the upcoming work. I started on this list before I examined the code and will remove or add things if I discover other aspects of the localization at a later stage. It’s important to test and iterate this. One could even formulate test cases with the bullets and the application use cases in mind. Due to the size of the project I chose not to do this formally.

I expect the number of languages and cultures to grow over time as more translations are completed on Transifex. For a new language to be supported by the application a resource file need to be added to GitHub and a satellite assembly included with the application release. I'm still investigating the most straightforward way, other than manually repeating the steps,  to accomplish this.




Friday, February 14, 2014

A word on mutability in Scala

If you familiar with Scala you've probably been reminded a few times to use immutable objects instead of mutable. There are many reasons and the one most often argued is that immutable state makes concurrency easier. This is true but its not the only reason, and immutable object should not always be used for that matter either. Some designs are more suited for mutable states even in Scala.

To entertain myself and you readers I would like to show an example where mutable perhaps behaves unexpectedly, all in the purpose of getting more comfortable with Scala programming.

Let's begin with a simple class called PairOfShoes. This class overrides the hashCode and equals functions which is common for objects. Pay attention to the variables in the constructor and the mix function which can mix-up the pair.

The 'eq' is a member of AnyRef which is the parent to all (reference) classes, if you were wondering.

Next we will instantiate shoes of different sizes and put these in a HashMap.
  val p1 = new PairOfShoes(10, 10)        
  val p2 = new PairOfShoes(7, 7)       
  val map = new HashMap[PairOfShoes, String]()  
  map += (p1 -> "BigPair")            
  map += (p2 -> "SmallPair")  
Pretty boring so far right, but here comes the fun part. Imagine that you are leaving a house party and you are looking for your shoes which are size 10. You can be sure that the pair of size 7 is not yours but by mistake you grab one size 10 and the other size 11 without noticing. In code this this writes to that you are sure the SmallPair does not equal the BigPair and that you mix-up a pair (two actually), leaving you with odd sizes.
  p1 == p2   // -> false  
  p1.mix(11, 10)  
You'll have a different pair of shoes and its still a big size but if you try the HashMap an exception will be thrown since the Pair is mutated.
  map(p1)   // -> throws NoSuchElementException  
This happens because the state of p1 has mutated and the hash has changed. To show this Scala provides a brief syntax to evaluate the hash (the double pound)
  p1## // -> 320  
  p1.mix(11, 10)  
  p1## // -> 321  
Lets create a third pair, also of size 10, and use this with the HashMap and see what happens
 val p3 = new PairOfShoes(10, 10)  
 map(p3)  // -> throws NoSuchElementException  
Even though the hashCode of p3 is the same as p1 before the mix-up the map still gives a NoSuchElementException. The strictEquals function gives false since p1 has changed. Now try to set p3 like this instead
 val p3 = new PairOfShoes(11, 10)  
The canEqual and equals does not help because, and I may be corrected here, in this case Scala use something called object location equality, the address in memory determines the equality. Finally, remove the mix(11, 10) and set the sizes of p1 and p3 equal to each other and the map finds a value of p3. This is confusing and the point I'm getting to is to use immutable values where possible.

I hope you enjoyed this short example and that it gives you something to think about. I recommend and personally believe in two rules for handling equality in Scala, both of which the example didn't follow.
  1. If two objects are equal, they should have the same hashCode
  2. A hashCode computed for an object should not change for the life of the object.
To avoid this perhaps unwanted behavior, the constructors arguments should be changed to immutable and the class modified to cater for this.

If you are interested in a more detailed text of equality I recommend this article:
http://www.artima.com/pins1ed/object-equality.html