Software, Technology, Investing: 2014

Monday, May 5, 2014

Hacking with Electric Imp

I've spent most of a sunny San Francisco weekend at a indoors Hackathon. It's been a lot of fun and I got the chance to work with some very talented software and hardware engineers. Our team included engineers which during the week work for companies such as Twilio, Google, and Salesforce tackled the challenge of mashing together a project devoted to doing societal good. We had a guy in the team from the hardware company Electric Imp which brought a tiny connected device, that attached to water pipes can measure flow to the millimeter and we based our project on this cool little thing. The goal was to build a website and encourage households to donate money or water excess to societies in need. This was a weekend of play and we used a bunch of technology to put this together. Basically the (water) flow went like this:

Hardware meter sends pulses for water passing through a pipe to the Electric Imp network
This is forwarded as JSON to Firebase
Further forwarded to Keen.io where its aggregated and queries are used to present graphs on a website
As an alternative we used Highcharts for charts.
A website shows monthly, daily, minute consumption of water on a webpage

The project we presented is here https://github.com/tombrew/flowbright

We won the price for most scrappy project since we used an array of APIs of to get a simple prototype working. (We even had a SMS feature to receive water stats with the Twilio API).

It was a lot of fun and I'd like to thank my team members for a creative and fun weekend.

Tuesday, April 22, 2014

Jython and Scala

It is possible to run python libraries and scripts in a JVM application with Jython. Jython is an implementation of python in Java and in this post we'll look at how to use Python from Scala code.

Download Jython from here. I grabbed version 2.7

In this example I'm going to create a simple REST-server in Python and start it from a Scala application. The python code:

 from flask import Flask  
 app = Flask(__name__)  
   
 @app.route('/')  
 def hello():  
   return 'Hello World!'  
   
 @app.errorhandler(404)  
 def page_not_found(e):  
   return 'Sorry, Nothing at this URL.', 404  
   
 @app.errorhandler(500)  
 def page_not_found(e):  
   return 'Sorry, unexpected error: {}'.format(e), 500  
   
 if __name__ == "__main__":  
   app.run()

Before you run this make sure you install Flask.

$pip install flask.

Try it:

$python restserver.py

You should see this output in your terminal
* Running on http://127.0.0.1:5000/

so point your browser to this URL to confirm the client access.

Now we are going to start the server from Scala using Jython. There are several ways to use this Python code from Scala, for example the Python code can be emedded in the scala file but since we have already saved a .py file we'll load the file as a stream.

We setup a new sbt project as per usual with the folders/files:

build.sbt
src/main/python/restserver.py
src/main/scala/Program.scala
lib/jython.jar

The jython.jar is copied to the lib folder so sbt knows where to find it. The Program.scala looks like this.

 import org.python.util.PythonInterpreter  
 import org.python.core.{Py, PyString}  
   
 object Program extends App{  
 
val sys = Py.getSystemState()  
   sys.path.append(new PyString("jython2.7b1/Lib/"))  
   sys.path.append(new PyString("/usr/local/lib/python2.7/dist-packages/"))  
   
   val interp = new PythonInterpreter()  
   val scriptFile = new java.io.FileInputStream("src/main/python/restserver.py")  
   interp.execfile(scriptFile)  
   
   println("press any key to exit...")  
   readLine()  
 }

You may need to update the paths to the Lib folder of your Jython installation and the place where python modules are installed

It's well known that Jython doesn't support C extensions so for this to work you may need to install or upgrade MarkupSafe. To list installed modules

$pip freeze

Look for a line with MarkupSafe. If its not found then install otherwise upgrade.

$pip install --upgrade MarkupSafe

You should see a confirmation that a Plain-Python version is installed, i.e. ==================================================
WARNING: The C extension could not be compiled, speedups are not enabled.
Plain-Python installation succeeded.
==================================================

Compile and run your Scala project with sbt. You can now visit the same URL.

To sum up, Jython takes the Python programming language syntax and enables it to run on the Java platform. This allows seamless integration of Python and Java based libraries and applications. I've shown a sample of python a REST-server started by a Scala application.

Saturday, March 22, 2014

Scala lifted and direct embedding and large column count

In the latest version of Slick (2.0), there are two separate APIs named lifted embedding and direct embedding. Lifted embedding is the most stable and advised to use in production but we will look both of them here. Lifted embedding is explained in the Slick manual as:

The name Lifted Embedding refers to the fact that you are not working with standard Scala types but with types that are lifted into the scala.slick.lifted.Rep type constructor.

Direct embedding is something that exist as as an alternative to lifted but its clear that this is an experiment in Slick and will be further developed in coming versions.A database table is declared like this, the lifted syntax first and then the direct embedding syntax

Both are self explanatory, a table with the columns NAME and PRICE is declared. In the case of lifted embedding, all queries are lifted by an implicit conversion to a Rep type, in this case Rep[String] and Rep[Double]. For example a query like this

val q = coffees.filter(_.price > 8.0).map(_.name)

does lift the price, 8.0 and name to Rep[..]. This is not of the highest importance and is often transparent when working with Slick. It can however give some confusing compiler errors if you are not aware of this. Queries in direct mode are written a bit differently so lets take a look at this. There are two factory objects to execute queries against, Queryable and ImplicitQueryable. They both support a few familiar collection methods but many are missing. The ones supported are drop, filter, flatmap, length, map and take. To use either one a SlickBackEnd must be created, but with the ImplicitQueryable the backend and session objects only need to be assigned once to a query.

db withDynSession {

import scala.slick.direct.{SlickBackend, AnnotationMapper}

val backend = new SlickBackend(scala.slick.driver.H2Driver, AnnotationMapper)

val q1 = Queryable[Coffee]

val q2 = q1.filter(_.price > 3.0).map(_.name)

println(backend.result(q2.length, session) + ": " + backend.result(q2, session))

val iq1 = ImplicitQueryable(Queryable[Coffee], backend, session)

val iq2 = iq1.filter(_.price > 3.0)

println(iq2.length + ": " + iq2.map(_.name).toSeq)

}

If you plan to use the direct embedding access to the Scala compiler is required at run-time, so another dependency must be declared, for example like this in sbt:

libraryDependencies <+= (scalaVersion)("org.scala-lang" % "scala-compiler" % _)

With direct embedding there are some limitations in the 2.0 release. For example, typesafe database Inserts are not supported, something that is easily done with lifted embedding. Furthermore direct embedding only support the primitives String, Int, and Double in its column mapping.

I've used lifted embedding so far and don't see a reason yet for using direct embedding. There are some obvious limitations to its usage but it will be interesting to follow how this alternative API develop in future versions of Slick.

Another well known limitation of Slick is that when using the code-generator in a database-first scenario a different implementation is used for larger tables with more than 22 columns. I first considered to put this in a separate post but since I'm on the topic of Slick I might continue. The reason for not allowing tables with more than 22 columns are related to Scala only supporting tuples with 22 elements and less. Lets start with by using the code-generator on a table with 23 columns and see what happens. Still using a H2 database and have prepared a table with the 23 columns of type Int.

val slickDriver = "scala.slick.driver.H2Driver"

val jdbcDriver = "org.h2.Driver"

val url = "jdbc:h2:test"

val outputFolder = "."

val pkg = ""

scala.slick.model.codegen.SourceCodeGenerator.main(

Array(slickDriver, jdbcDriver, url, outputFolder, pkg))

This will create a file Tables.scala. The interesting rows are these

implicit def GetResultTable23Row(implicit e0: GR[Int]): GR[Table23Row] = GR{

prs => import prs._

Table23Row.tupled((<<[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int], <<?[Int]))

}

def * = (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c22, c23) <> (Table23Row.tupled, Table23Row.unapply)

The documentation also have an explicit warning about long compilation times for tables with more than 25 columns so lets do a quick experiment with on table with 2, 23, and 200 columns and measure the time. I'll start with code-generation to see if there are performance effects. This code snippet is used to time the code generations.

The average times on my test computer are (in seconds) 0.0871, 0.0623 and 0.1222. The larger table runs longer by roughly a factor of 2 but this is expected since the generated file is larger. The respective file sizes are 2, 5 and 27 KB. Considering that the file size which grows by a factor larger than 10 (2 to 27KB) only doubles the execution time, the conclusion is that large tables does not severely impact the code-generation.
As a next step lets look at the compilation time. I create three new projects, called project2, project23 and project200 and add the generated code files for respective table size.

The project with a code file containing a table with 2 columns compiled in 2 seconds, the project with a single file with table of 23 columns compiled in 18 seconds. The project with the large table with 200 columns did not compile at all and I have yet to find a workaround for this.

Summary

The new Slick framework from Typesafe have two separate APIs for database queries, lifted embedding and direct embedding. Since the features of direct embedding are still limited I don't see any reasons yet to pick direct embedding over lifted embedding. The authors of Slick warned about slow compile time for larger table dimensions. In my test a single project containing a small table declaration was considerable faster to compile than a project containing a medium size table declaration. My test using a large table (200 columns) caused the compiler to fail, and the project didn't even load in an IDE like Eclipse. Slick is an interesting tool in the ORM/FRM (Functional Relation Mapper) landscape but it has limitations that are important to be aware of.

Monday, March 17, 2014

A view of the Mastercoin system

This is an attempt to get a birds-eye view of the Mastercoin system. By collecting all addresses and for each address build up a list of all outgoing transactions I have been able to build a few graphs which I will show here. To render these graphs the Excel template NodeXL was used and I will post a link to the Excel file at the end if anyone wants to play with this further. In March 2014 there were roughly 2700 unique Mastercoin addresses, out of which 2288 had a positive balance in MSC. In all graphs a vertex (circle) is an address, and an edge (arrow) is a transaction from one vertex to another.

The first graph shows a raw overview containing all collected data (click for a larger image). As you can see the graph is directed in the direction of sent payments.

As expected there are many outlier vertices with only one edge pointed to, which is an address that only have received payments from a single address. There are also vertices which appear to be more central, connected by a large number of edges. If you look closely you will see a few arrows that are perfect circles. This is an address sending a payment to itself.

The vertices connected by many edges are obviously of some significance so I'll focus on them next. The next four graphs are sub-graphs of the the four vertices with the highest number of outgoing edges, i.e. addresses which have sent many transactions.

The look very similar and resemble a hub and spoke pattern where one vertex in the center connect to many but the connected vertices does not connect to any or a few. Edges, if they exist, between vertices in the spoke are included and examples are seen primarily in the two graphs on the left.

Next I'll create four sub graphs for the four vertices with the highest amount sent in MSC. Out of these four vertices only one is included the the four above. (What is special about this address?)

Next, I'll start with the full graph in the beginning but will filter the graph by outgoing degree, i.e. sent transactions and raise this number gradually. The size of the vertices are also scaled where a higher number of transactions shows a larger circle and its presented in a grid with equal spacing between vertices.

Outgoing degree 1 and higher:

Outgoing degree 4 and higher:

Outgoing degree 10 and higher:

Outgoing degree 20 and higher:

To finish this I'll take a look at some metrics for graph theory:

Vertices*	2288
Total edges	3613	Number of payments sent
Unique Edges	2530	Payment only occurred once
Self-loops	4	Payment with same sender/receiver
Connected Components	5	Islands with no connection via payments
Vertices in largest component	2279
Diameter	12	Largest shortest path between vertices (how far have money travelled)
Max out-degree	245	Sent payments from address
Max in-degree	150	Received payments to address
Max (Eigenvector) centrality	0.034	Payments between an address with large number of payments to another with large number of payments
Avg (Closeness) centrality	0.003	Distance from one address to all other addresses. If equal to 1, the will be for every address one payment to all other addresses
Avg Clustering	0.017	Payments between receivers of of a payment
Reciprocal payments	232	Receiver sent payment back to sender

* as mentioned in first paragraph

Thursday, March 13, 2014

A very simple example of spray-client usage

This post is about Scala and how to send REST/HTTP request with the spray-client API. I wrote this because the official tutorial here was leaving out some details in my opinion. I'll start with the dependencies required to compile an example program similar to the tutorial on the spray page. If you worked with Scala you've probably also used sbt (Simple Build Tool) which is a great tool to manage dependencies. If configured correctly it does a lot of things for you, but I'll show an alternative and manual way to include the libraries needed for this example program.1

The first set of dependencies are, the links will take you to downloads pages on mvnrepository.com

akka-actor 2.2.x
spray-can
spray-http
spray-httpx
spray-util

In addition to the above a few more jars are needed

parboiled-core
parboiled-scala
spray-client
spray-io
typesafe-config (part of Scala distribution, i.e. if you have Scala installed you already have this jar)

With these jars on the build path we can begin with the example code. The code declares a trait (or an interface) to a Webclient with a get method, a class implementation of the trait and an application object. The application object starts the Akka system, which spray is built on, creates the client and sends a HTTP GET request to an URI. At some time in the future the reply is received. This is everything needed and is shows the conciseness of Scala and Spray in a quite eloquent manner.

There is of course much more to spray, and if futures, akka and actors are not familiar, a very smart next move is to study them before continuing with spray. An introduction on Scala Futures and Akka should not take much time and will make you a lot more productive with spray.

1. When writing this I was on a network where sbt dependency resolution did not work, so manually downloading jar-files was a necessity for me.

Thursday, February 27, 2014

A new project, localize an open source application and get help from the crowd

I've volunteered to localize a desktop .NET application. By localize I mean to extend the application in such a way that it is functional in other languages, cultures, and geographic locations. In a cursory meaning this applies to the presentation layer, the visual interface, that meets the end user but I discovered that there are other details that I need to address in this endeavor. The application that I selected for my project was developed in English and my goal is to modify it in such a way that users preferring other languages, potentially languages that does not use the Roman alphabet, find the same functionality in the application as the English speaking users. I’m writing this text as I research the topic to help me formulate a strategy, measures of success and an execution plan. Hopefully you will find something valuable in this text. I learned a lot from doing this.

Let’s start by defining a measure of success. The application as I start is fully functional and written for English users. I’m not the developer of the application but I have obtained the source code from a GitHub repository, meaning it is open source, and I do not have any personal relation to the original author. To consider my task successful I will need to accomplish a few things.

The original author shall approve my code changes and merge my version of the application with the official version on GitHub. In GitHub this is normally done with a pull request.
The default language shall still be English but a compiled version shall allow the users to swap to “any” other language (for which a translation exist) and use it with the the same functions as before.
I can obviously only translate to languages that I speak and therefore I will need to find help for other languages. For this I will use crowd sourcing and a service called Transifex. The idea here is very simple, I will upload a file with English words and anyone can suggest and translate to a language of their choosing. Once the translation is complete I can download the newly translated words and add to the application.
Since the previous point is dependent on people I don’t know, I will still consider the project a success if no one volunteers to translate. As a proof of concept I will translate on Transifex to one other language, and thus make the application bi-lingual. The application code shall however be extended in such a way that another language can be added once a translation is available and there shall be documentation for how to do so.
The name of the application and where to find the source code, the Transifex repository and application executable shall be announced here and on other channels.

Next I define a strategy and break down what and what changes are required to the application code to accomplish this. The application is a .Net 4.5 application and it’s written in Visual Basic. Although Visual Basic is not my strongest programming language I feel confident that I can solve this problem with my previous experience in .Net. I intend to use as much as possible of the framework to support this so a few things mentioned here may be specific to .Net but the general concept I believe are applicable to any other software application.

The first change I need to make is to map and externalize all English strings of the application. The framework has concepts of Resource files, Satellite Assemblies, CultureInfo, and Resource Management so I will rely on the framework and not reinvent things already existing.
Words in different languages have different number of characters which may cause string expansion or contraction. This means that an element of the application interface may grow or shrink in size as the locale is changed. The most straight forward solution is to increase the placeholder, the control, to fit the longest string.
Text in images can’t be translated so any text inside an image needs to be refactored to a string variable if it shall be translated.
String concatenation of strings presented in the UI is a bad idea and should be avoided. Some languages use a reversed word order.
All numbers shall be localized because both the comma “,” and the dot “.” can be used as the decimal separator instead of the “.”. Furthermore, some locales allow a comma be used to separate thousands.
Sorting collections shall be dependent on the locale as the sorting order may be different. Any sort shall rely on the sort function of the framework for the specific locale.
If the application has phone numbers, postal codes or any locale specific entity they should be formatted and presented according to accepted conventions.
Fixed or hardcoded paths to Windows folders must be avoided. For example the path “C:/Program Files” may not be called the same on OS with a different language.

The above bullet points are not everything that may be considered, and some may not even apply for certain types of applications. It’s important to understand this and be dynamic when approaching the upcoming work. I started on this list before I examined the code and will remove or add things if I discover other aspects of the localization at a later stage. It’s important to test and iterate this. One could even formulate test cases with the bullets and the application use cases in mind. Due to the size of the project I chose not to do this formally.

I expect the number of languages and cultures to grow over time as more translations are completed on Transifex. For a new language to be supported by the application a resource file need to be added to GitHub and a satellite assembly included with the application release. I'm still investigating the most straightforward way, other than manually repeating the steps, to accomplish this.

Friday, February 14, 2014

A word on mutability in Scala

If you familiar with Scala you've probably been reminded a few times to use immutable objects instead of mutable. There are many reasons and the one most often argued is that immutable state makes concurrency easier. This is true but its not the only reason, and immutable object should not always be used for that matter either. Some designs are more suited for mutable states even in Scala.

To entertain myself and you readers I would like to show an example where mutable perhaps behaves unexpectedly, all in the purpose of getting more comfortable with Scala programming.

Let's begin with a simple class called PairOfShoes. This class overrides the hashCode and equals functions which is common for objects. Pay attention to the variables in the constructor and the mix function which can mix-up the pair.

The 'eq' is a member of AnyRef which is the parent to all (reference) classes, if you were wondering.

Next we will instantiate shoes of different sizes and put these in a HashMap.

  val p1 = new PairOfShoes(10, 10)        
  val p2 = new PairOfShoes(7, 7)       
  val map = new HashMap[PairOfShoes, String]()  
  map += (p1 -> "BigPair")            
  map += (p2 -> "SmallPair")

Pretty boring so far right, but here comes the fun part. Imagine that you are leaving a house party and you are looking for your shoes which are size 10. You can be sure that the pair of size 7 is not yours but by mistake you grab one size 10 and the other size 11 without noticing. In code this this writes to that you are sure the SmallPair does not equal the BigPair and that you mix-up a pair (two actually), leaving you with odd sizes.

  p1 == p2   // -> false  
  p1.mix(11, 10)

You'll have a different pair of shoes and its still a big size but if you try the HashMap an exception will be thrown since the Pair is mutated.

  map(p1)   // -> throws NoSuchElementException

This happens because the state of p1 has mutated and the hash has changed. To show this Scala provides a brief syntax to evaluate the hash (the double pound)

  p1## // -> 320  
  p1.mix(11, 10)  
  p1## // -> 321

Lets create a third pair, also of size 10, and use this with the HashMap and see what happens

 val p3 = new PairOfShoes(10, 10)  
 map(p3)  // -> throws NoSuchElementException

Even though the hashCode of p3 is the same as p1 before the mix-up the map still gives a NoSuchElementException. The strictEquals function gives false since p1 has changed. Now try to set p3 like this instead

 val p3 = new PairOfShoes(11, 10)

The canEqual and equals does not help because, and I may be corrected here, in this case Scala use something called object location equality, the address in memory determines the equality. Finally, remove the mix(11, 10) and set the sizes of p1 and p3 equal to each other and the map finds a value of p3. This is confusing and the point I'm getting to is to use immutable values where possible.

I hope you enjoyed this short example and that it gives you something to think about. I recommend and personally believe in two rules for handling equality in Scala, both of which the example didn't follow.

If two objects are equal, they should have the same hashCode
A hashCode computed for an object should not change for the life of the object.

To avoid this perhaps unwanted behavior, the constructors arguments should be changed to immutable and the class modified to cater for this.

If you are interested in a more detailed text of equality I recommend this article:
http://www.artima.com/pins1ed/object-equality.html

Wednesday, January 29, 2014

Super Bowl graph

I have not added anything to this blog for a while, work and studies at Coursera have consumed my attention. For a few weeks now I've been taking part in Social And Economic Networks: Models and Analysis at Stanford University, hosted by Coursera. I'm an amateur on this topic but I find it very interesting.

The course primarily uses two software packages for working with networks, Pajek and Gephi. There are other options and the course forum have seen discussions to which software is better, if the two aforementioned are good enough for the real-world or is only used by academia and so forth. I'll settle with linking to others such as NetworkX and UCINET hoping that I one day will be able to use all of them, but for now I'll focus on Pajek and Gephi to get through the course and perhaps claiming to be more than a novice in networks. Let's get back to the topic of the post, the NFL Super Bowl is this weekend (February 2nd) and even though I'm not an avid follower of American Football it seemed like a fun exercise to visualize the historical Super Bowl matches in a graph. Gephi is useful for manually drawing nodes and edges and this is exactly what I did, many more things can do with Gephi but Pajek which does not have this drawing feature seem to be more geared towards heavy analysis.

The graph have many teams with only one link, meaning the team have only faced one team in the Super Bowl. This is a unweighted graph so it doesn't reveal if the teams that are connected by a link have played each other more than once. It's also clear that there are a few central teams with many links (high degree) which have faced many opponents. This graph is connected (meaning any node is connected to all other nodes through links) but it doesn't have to be. Can you think of how?

Two teams have 6 other unique opponents: New England Patriots, Denver Broncos but there are many more with only one link, as expected. The degree which I mentioned already is empirical to network analysis, and is used in the perhaps simplest measure centrality measure, the degree centrality. Other centrality measures exist, such as closeness centrality, betweenness centrality and eigenvector centrality. Even if these lack any useful meaning in the case of Super Bowl matches, they are commonly used in other graphs and need to be well understood.

This was a trivial analysis and the graph may not be necessary to find out which team that historically faced the most opponents but it was meant to be an introduction to graphs and the Gephi and Pajek software. The files are available for download if anyone wants to extend the data set; for example making it a weighted graph for teams that has played each other more than once, make it directed to embed the winner of a match etc.

https://www.dropbox.com/s/jseiwvdclp7qe9j/super%20bowl%20graph.gephi
https://www.dropbox.com/s/1jl161340renfkv/super%20bowl%20graph.net