DigitallyCreated

Blog

Querying Azure Table Storage Data using FSharp.Azure

May 11, 2014 12:06 PM by Daniel Chambers (last modified on May 18, 2014 2:40 PM)

In my last post, I showed how to use FSharp.Azure to modify data in Azure table storage. FSharp.Azure is a library I recently released that allows F# developers to write idiomatic F# code to talk to Azure table storage. In this post, we’ll look at the opposite of data modification: data querying.

Getting Started

To use FSharp.Azure, install the NuGet package: FSharp.Azure. ~~At the time of writing the package is marked as beta, so you will need to include pre-releases by using the checkbox on the UI, or using the –Pre flag on the console.~~ (v1.0.0 has been released!)

Once you’ve installed the package, you need to open the TableStorage module to use the table storage functions:

open DigitallyCreated.FSharp.Azure.TableStorage

Compatible Types

To provide an idiomatic F# experience when querying table storage, FSharp.Azure supports the use of record types when querying. For example, the following record type would be used to read a table with columns that match the field names:

type Game = 
    { Name : string
      Developer : string
      HasMultiplayer : bool
      Notes : string }

We will use this record type in the examples below. We will also assume, for the sake of these examples, that the Developer field is also used as the PartitionKey and the Name field is used as the RowKey.

FSharp.Azure also supports querying class types that implement the Microsoft.WindowsAzure.Storage.Table.ITableEntity interface.

Setting up

The easiest way to use the FSharp.Azure API is to define a quick helper function that allows you to query for rows from a particular table:

open Microsoft.WindowsAzure.Storage
open Microsoft.WindowsAzure.Storage.Table

let account = CloudStorageAccount.Parse "UseDevelopmentStorage=true;" //Or your connection string here
let tableClient = account.CreateCloudTableClient()

let fromGameTable q = fromTable tableClient "Games" q

The fromGameTable function fixes the tableClient and table name parameters of the fromTable function, so you don't have to keep passing them. This technique is very common when using the FSharp.Azure API.

Getting everything

Here's how we'd query for all rows in the "Games" table:

let games = Query.all<Game> |> fromGameTable

games above is of type seq<Game * EntityMetadata>. The EntityMetadata type contains the Etag and Timestamp of each Game. Here's how you might work with that:

let gameRecords = games |> Seq.map fst
let etags = games |> Seq.map (fun game, metadata -> metadata.Etag)

The etags in particular are useful when updating those records in table storage, because they allow you to utilise Azure Table Storage's optimistic concurrency protection to ensure nothing else has changed the record since you queried for it.

Filtering with where

The Query.where function allows you to use an F# quotation of a lambda to specify what conditions you want to filter by. The lambda you specify must be of type:

'T -> SystemProperties -> bool

The SystemProperties type allows you to construct filters against system properties such as the Partition Key and Row Key, which are the only two properties that are indexed by Table Storage, and therefore the ones over which you will most likely be performing filtering.

For example, this is how we'd get an individual record by PartitionKey and RowKey:

let halo4, metadata = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "343 Industries" && s.RowKey = "Halo 4" @>
    |> fromGameTable
    |> Seq.head

You can, however, query over properties on your record type too. Be aware that queries over those properties are not indexed by Table Storage and as such will suffer performance penalties.

For example, if we wanted to find all multiplayer games made by Valve, we could write:

let multiplayerValveGames = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
    |> fromGameTable

The following operators/functions are supported for use inside the where lambda:

The =, <>, <, <=, >, >= operators
The not function

Taking only the first n rows

Table storage allows you to limit the query results to be only the first 'n' results it finds. Naturally, FSharp.Azure supports this.

Here's an example query that limits the results to the first 5 multiplayer games made by Valve:

let multiplayerValveGames = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
    |> Query.take 5
    |> fromGameTable

Query segmentation

Azure table storage may not return all the results that match the query in one go. Instead it may split the results over multiple segments, each of which must be queried for separately and sequentially. According to MSDN, table storage will start segmenting results if:

The resultset contains more than 1000 items
The query took longer than five seconds
The query crossed a partition boundary

FSharp.Azure supports handling query segmentation manually as well as automatically. The fromTable function we used in the previous examples returns a seq that will automatically query for additional segments as you iterate.

If you want to handle segmentation manually, you can use the fromTableSegmented function instead of fromTable. First, define a helper function:

let fromGameTableSegmented c q = fromTableSegmented tableClient "Games" c q

The fromGameTableSegmented function will have the type:

TableContinuationToken option -> EntityQuery<'T> -> List<'T * EntityMetadata> * TableContinuationToken option

This means it takes an optional continuation token and the query, and returns the list of results in that segment, and optionally the continuation token used to access the next segment, if any.

Here's an example that gets the first two segments of query results:

let query = Query.all<Game>

let games1, segmentToken1 = 
    query |> fromGameTableSegmented None //None means querying for the first segment (ie. no continuation)

//We're making the assumption segmentToken1 here is not None and therefore 
//there is another segment to read. In practice, this is a very poor assumption
//to make, since segmentation is performed arbitrarily by table storage
if segmentToken1.IsNone then failwith "No segment 2!"

let games2, segmentToken2 = 
    query |> fromGameTableSegmented segmentToken1

In practice, you'd probably write a recursive function or a loop to iterate through the segments until a certain condition.

Asynchronous support

FSharp.Azure also supports asynchronous equivalents of fromTable and fromTableSegmented. To use them, you would first create your helper functions:

let fromGameTableAsync q = fromTableAsync tableClient "Games" q
let fromGameTableSegmentedAsync c q = fromTableSegmentedAsync tableClient "Games" c q

fromTableAsync automatically and asynchronously makes requests for all the segments and returns all the results in a single seq. Note that unlike fromTable, all segments are queried for during the asynchronous operation, not during sequence iteration. (This is because seq doesn't support asynchronous iteration.)

Here's an example of using fromTableAsync:

let valveGames = 
    Query.all<Game> 
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" @> 
    |> fromGameTableAsync 
    |> Async.RunSynchronously

And finally, an example using the asynchronous segmentation variant:

let asyncOp = async {
    let query = Query.all<Game>

    let! games1, segmentToken1 = 
        query |> fromGameTableSegmentedAsync None //None means querying for the first segment (ie. no continuation)

    //We're making the assumption segmentToken1 here is not None and therefore 
    //there is another segment to read. In practice, this is a very poor assumption
    //to make, since segmentation is performed arbitrarily by table storage
    if segmentToken1.IsNone then failwith "No segment 2!"

    let! games2, segmentToken2 = 
        query |> fromGameTableSegmentedAsync segmentToken1

    return games1 @ games2
}

let games = asyncOp |> Async.RunSynchronously

Conclusion

In this post, we’ve covered the nitty gritty details of querying with FSharp.Azure. Hopefully you find this series of posts and the library itself useful; if you have, please do leave a comment or tweet to me at @danielchmbrs.