Trello to BigQuery

This page provides you with instructions on how to extract data from Trello and load it into Google BigQuery. (If this manual process sounds onerous, check out Stitch, which can do all the heavy lifting for you in just a few clicks.)

What is Google BigQuery?

Google BigQuery is a data warehouse that delivers super-fast results from SQL queries, which it accomplishes using a powerful engine dubbed Dremel. With BigQuery, there's no spinning up (and down) clusters of machines as you work with your data. With all of that said, it's clear why some claim that BigQuery prioritizes querying over administration. It's super fast, and that's the reason why most folks use it.

Getting data out of Trello

Ok, lets claim your data from Trello. Step one is extracting it from Trello's servers.  You can do this using the Trello API. Full API documentation is available online here.

The Trello API is actually quite simple in the grand scheme of APIs, and it endpoints that can provide information on boards, lists, cards, and actions. Go ahead and use the information in the docs to get all of the datasets you need.

Sample Trello data

The Trello API returns JSON-formatted data. Below is an example of the kind of response you might see when querying for the details of a List.


[{
    "id": "4efe314cc72846af4e00008a",
    "data": {
        "list": {
            "id": "4eea4ffc91e31d174600004a",
            "name": "To Do Soon"
        },
        "board": {
            "id": "4eea4ffc91e31d1746000046",
            "name": "Example Board"
        },
        "old": {
            "name": "To Do Later"
        }
    },
    "date": "2011-12-30T21:46:52.874Z",
    "idMemberCreator": "4ee7deffe582acdec80000ac",
    "type": "updateList",
    "memberCreator": {
        "id": "4ee7deffe582acdec80000ac",
        "avatarHash": null,
        "fullName": "Joe Tester",
        "initials": "JT",
        "username": "joetester"
    }
}, {
    "id": "4efe3147c72846af4e00006d",
    "data": {
        "list": {
            "id": "4eea4ffc91e31d174600004a",
            "name": "To Do Later"
        },
        "board": {
            "id": "4eea4ffc91e31d1746000046",
            "name": "Example Board"
        },
        "old": {
            "name": "To Do Eventually"
        }
    },
    "date": "2011-12-30T21:46:47.843Z",
    "idMemberCreator": "4ee7deffe582acdec80000ac",
    "type": "updateList",
    "memberCreator": {
        "id": "4ee7deffe582acdec80000ac",
        "avatarHash": null,
        "fullName": "Joe Tester",
        "initials": "JT",
        "username": "joetester"
    }
}]

Preparing Trello data

This part can get tricky: you need to parse JSON in the API response and map each endpoint to a corresponding table in the destination database. Get a solid handle on the datatypes for each endpoint. The Stitch Trello Docs can give you a sense of what datatypes will come through the API.

Loading data into Google BigQuery

Google Cloud Platform offers a helpful guide for loading data into BigQuery. You can use the bq command-line tool to upload the files to your awaiting datasets, adding the correct schema and data type information along the way. The bq load command is your friend here. You can find the syntax in the bq command-line tool quickstart guide. Iterate through this process as many times as it takes to load all of your tables into BigQuery.

Keeping Trello data up to date

Hooray! You've written a script to move Trello data into your data warehouse. Wouldn't it be great if that was all there was to it? Consider what's going to happen in the event that new data is created, and it needs to make its way into your data warehouse?

One scenario, depending on the design of your script, would be to load the entire dataset all over again. This is as good as guaranteed to be slow and painful. Delays can be costly if you've got deadlines to meet.

The best thing you can do is build your script so it has the ability to make out fresh and updated information. Then you can incrementally update the destination. This can be settled by using primary keys in your logic. Some good examples would be modified_at, updated_at, or some other auto-incrementing field. After that, you need to figure out a way to get your script running continuously, even when you're on vacation! A cron job or continuous loop are both options here.

Other data warehouse options

BigQuery is really great, but sometimes you need to optimize for different things when you're choosing a data warehouse. Some folks choose to go with Postgres or Redshift, which are two RDBMSes that use similar SQL syntax. If you're interested in seeing the relevant steps for loading this data into Postgres or Redshift, check out To Redshift and To Postgres.

Easier and faster alternatives

If all this sounds a bit overwhelming, don’t be alarmed. If you have all the skills necessary to go through this process, chances are building and maintaining a script like this isn’t a very high-leverage use of your time.

Thankfully, products like Stitch were built to solve this problem automatically. With just a few clicks, Stitch starts extracting your Trello data via the API, structuring it in a way that is optimized for analysis, and inserting that data into your Google BigQuery data warehouse.