Showing posts with label Fitbit. Show all posts
Showing posts with label Fitbit. Show all posts

Saturday, 24 September 2016

FODMAPs 01 – Data collection

There’s something in my diet that ain’t sitting right. It makes me feel bloated, fatigued and just damn uncomfortable. It’s been like this for years, though it’s been tolerable. Recently I went to a dietitian/nutritionist to learn more about what I should and should not shove down my mouth.

After describing my general diet, I received advice that will sound obvious to most. I need more fruits, vegetables, fibre and water.
“How many fruits and veges am I supposed to eat?”, I asked.
“Two serves of fruit, three serves of vegetables a day.”
“Oh, so the recommendation hasn’t changed since kindergarten?”. I was really hoping that it had been scaled back to two fruits per day. Or one magic fruit pill.

I took the advice as best as I could manage (who has time to eat five serves of vegetables a day? Takes so long to chew). There were marginal improvements. I felt less bloated and fatigued, so my decisions were leading me in the right direction. Similarly, I had stopped drinking coffee back in March and noted improvements. Each dietary change added an improvement.

However, I still feel uncomfortable. Years ago I attempted to rectify my dietary issues with data. I recorded what foods I was eating and what symptoms I felt day to day with the intention to analyse my way to a remedy. I planned to “net” what foods caused upset. I never got around to the analysis.

I’m getting around to it now. I have the right tools.

The nutritionist said I should try a low FODMAP diet. FODMAPs are a group of carbohydrate that are poorly digested. After a FODMAP diet of at least 6 weeks, I’ll gradually reintroduce different FODMAP groups and note my tolerance. I can identify my problem foods then avoid them. But not ice cream. If ice cream is a problem food, I’ll just take lactase beforehand.

I need an app that collects my food intake. I’ve used myfitnesspal in the past. When I Googled for instructions on exporting my data, I couldn’t find a clear guide, or it was a paid option. I can log foods with the Fitbit app, however retrieving the data is also not easy. The Fitbit R scraper I use does not retrieve food data. I would have to access my data via an API.

Instead I’ll use the Memento Database app. Memento Database allows users to customise fields for data capture then easily export the data as CSV. My “Food” library captures the foods or ingredients I consume with the current datetime captured upon entry. I will use short general labels for foods and ingredients as possible since I’d like to group the foods for analysis.

My “Symptoms” library captures a symptom with the datetime. I used to enter detailed symptom descriptions. I want to keep it brief. I'll include feelings of "Fatigue" or feeling "Bloated". The symptoms will be placed in a single-choice list. I expect that these symptoms will decrease as I persist with the lower FODMAP diet. The symptoms will increase when I reintroduce the problem FODMAP groups. Ice cream will totally be fine. Totally.

I will combine this food and symptom data with Fitbit data, namely calories burned, weight and sleep. I’m curious to see if my weight changes with the diet (assuming little change in the calories burned day-to-day) or if my sleep improves. 

In say, 6 weeks’ time, I’ll have data to wrangle then analyse.

Sunday, 29 May 2016

Fitbit 03 – Getting and wrangling all data

Previous post in this series: Fitbit 02 – Getting and wrangling sleep data.

This post will wrap-up the getting and wrangling of Fitbit data using fitbitscraper. This is the list of data that was gathered [1]:
  • Steps
  • Distance
  • Floors
  • Very active minutes (“MinutesVery”)
  • Calories burned
  • Resting heart rate (“RestingHeart”)
  • Sleep
  • Weight.

For each dataset, the data was gathered then wrangled as separate tidy data frames. Each data contained a unique date per row. Most datasets required minimal wrangling. A previous post outlined the extra effort required to wrangle sleep data due to split sleep sessions and some extra looping to gather all weight data.

Each data frame contains a Date column. The data frames are joined by the unique dates to create one big happy data frame of Fitbitness. Each row is a date containing columns of fitness factors.

Now what? I feel like a falafel. I’m going to eat a falafel [2].

With this tidy dataset I will continue the analytics journey in future posts. For now, I wish to quickly visualise the data. Writing lines of code for plots in R is not-so-quick. Thankfully there’s a point-and-click visualisation package available called ggraptR. Installing and launching the package is achieved as follows. 
devtools::install_github('cargomoose/raptR', force = TRUE) # install
library("ggraptR") # load
ggraptR() # launch

My main hypothesis was that steps/distance may correlate with weight. There was no relationship observed on a scatter plot. This is preliminary, future post will focus on exploratory data analysis. Prior to data analysis I need to ask some driving questions.


I plotted Date vs Weight. My weight fell gradually from October 2015 through to December. I was on a week-long Sydney to Adelaide road trip during the end of December, got a parking ticket in Adelaide and did not have recorded weights whilst on the road. My weight steadily increased since. Not a lot of exercise, quite a lot of banana Tim Tams.



After sequential pointing-and-clicking, I overlayed this time plot with another factor - the “AwakeBetweenDuration”. In the previous post I noted I wake-up in the middle of the night. It may take hours before I fall asleep again. The tidy dataset holds the number of minutes awake between such sessions. The bigger the bubble, the longer I was awake between sleep sessions.



Here’s a driving question: what accounts for the nights when I am awake for long durations? I was awake some nights in October, December (some of my road trip nights – I couldn’t drive for one of those days as I was exhausted), January and then April. February and March appeared almost blissful. Why? Tell me data, why?  

Here is the Fitbit data wrangling code published on GitHub, FitbitWrangling.R: https://github.com/muhsinkarim/fitbit Replace “your_email” and “your_password” with your email and your password used to log into your Fitbit account and dashboard.


References and notes
1. The fitbitscraper function get_activity_data() will return rows of activities per day including walking and running. I only have activity data from 15th February 2016. Since I’m analysing data since October 2015 (where I have weight data from my Fitbit scales) I chose not in include activity data in the tidy dataset.
2. I ate two.

Sunday, 15 May 2016

Fitbit 02 – Getting and wrangling sleep data

Previous post in this series: Fitbit 01 – Getting and wrangling weight, steps and calories burned data.

Let’s continue to get and wrangle Fitbit data. This post will tackle sleep data and address my unattractive trait of not easily letting things go.

I applied the following R package fitbitScraper function wrapped in a data frame:
dfSleep <- as.data.frame(get_sleep_data(cookie, start_date = startDate, end_date = endDate))

There’s a lot of data. Below is the list of columns I want to keep:
  • df.startDateTime – The start datetime of sleep.
  • df.endDateTime – The end datetime of sleep.
  • df.sleepDuration – The minutes between the start and end time. df.sleepDuration = df.awakeDuration + df.restlessDuration + df.minAsleep".
  • df.awakeDuration – The minutes of wakefulness.
  • df.restlessDuration – The minutes of restlessness.
  • df.minAsleep – The minutes of actual sleep. I want to maximise this data field. More sleep less grumpiness.

If you’re a normalish human being, you would look at the sleep data and examine it for any anomalies. Your keen eye will note that some sleep durations are split across two or more sessions. That is, you went to bed, woke up in the middle of the night, then fell asleep again. Depending on how long you were awake for, Fitbit will record separate sessions.

I am working towards a tidy dataset with each row representing a unique date with the day’s Fitbit data. Separate sleep sessions cause duplicate dates. Being a normalish human being, you write code that will group the split sleep sessions back together. Unique dates per sleep lends to tidy data.

If you’re not a normal human being, you examine the intricate nature of these split sleep sessions and spend way too much time writing code that groups the data back together.

I am not a normal human being.

Split sessions occur for two reasons. The first occurs when I wake up in the early hours of the morning. An example is below.



On the night of the 22nd March I fell asleep at 23:37. I woke up at 3:49 on the 23rd. After hating my life and eating morning chocolate, I finally fell asleep again at 4:51. Technically the second session date should be displayed as the 23rd of March, not the 22nd. These display dates are akin to the “date I tried to fall asleep”. The datetimes record the true date and times.

When I combine these separate sessions, the new sleep start time will be 23:37 and the new sleep end time will be 07:46. The SleepDuration, SleepAwakeDuration, SleepRestlessDuration, SleepMinAsleep, SleepAwakeCount and SleepRestlessCount values will be summed together. I would also like to note the number of minutes I spent awake between sessions and the number of separate sessions (two in this example).

The second split sleep session type appears to be a glitch with sessions being separated by a difference of one minute. An example is below. 



On the 23rd of February I have sleep sessions ending at 03:47 then resuming at 03:38. There are multiple instances where this occurs in my datasets. As before, the sleep variables from both sessions will be summed. I don’t need to note the minutes between the sessions as the one minute difference is meaningless. Further, the number of sleep sessions should be recorded as one, not two.

There’s a final consideration with split sleep sessions. Consider the below.



According this this display, on the 8th April I slept from 22:23 to 5:27, then on the 21:06 to the 23:57 on the same day. No I didn’t! The sleep session datetimes are overlapping. The session displayed on the 8th April from 21:06 to 23:57 should read the 9th April, not the 8th. This session needs to be combined with the sessions displayed on the 9th April from the 23:58 to 6:03. I may not be a normal human being, but I have my limits. For this last case, I did not write a patch of code that could group the data appropriately. Since there were few instances when such overlapping sessions occurred, I let it go – such sessions were removed from the dataset resulting in missing sleep data for that particular date.

Here is a quick plot of the average number of minutes asleep per weekday. Nothing out of the ordinary. I sleep an average 7.6 hours on Sunday nights and an average 6.4 hours on Wednesday nights. I can’t think of a reason why I get fewer hours on a Wednesday night. 



I spend most of my time wrangling data. I come across problems in datasets as described above often, and write code to return the numbers back to reality as much as possible. When the costs (time and effort) outweigh the benefits (more clean data) I have to let some data go and remove it.


I have outlined the code at the end of this post for the avid reader.
#### Sleep        
        
    ### Get data
    
        dfSleep <- as.data.frame(get_sleep_data(cookie, start_date = startDate, end_date = endDate))


    ### Keep key columns
    
        dfSleep <- dfSleep[ , c("df.date", "df.startDateTime", "df.endDateTime", "df.sleepDuration", "df.awakeDuration", 
                                "df.restlessDuration", "df.minAsleep")]

        ## Rename colnames
        # Date is sleep date attempt
        colnames(dfSleep) <- c("Date","SleepStartDatetime", "SleepEndDatetime", "SleepDuration", 
                               "SleepAwakeDuration", "SleepRestlessDuration", "SleepMinAsleep")


    ### Combine the split sleep sessions
    
        ## Index the Dates that are duplicated along with their original
        duplicatedDates <- unique(dfSleep$Date[which(duplicated(dfSleep$Date))])
        dfSleep$Combine <- ""
        dfSleep$Combine[which(dfSleep$Date %in% duplicatedDates)] <- 
            dfSleep$Date[which(dfSleep$Date %in% duplicatedDates)]
        
        ## Subset the combine indexed rows
        dfSubset <- dfSleep[which(dfSleep$Combine != ""), ]

        ## Aggregate rows marked to combine
        dfSubset <- 
            dfSubset %>%
            group_by(Combine) %>%
            summarise(Date = unique(Date),
                      SleepStartDatetime = min(SleepStartDatetime), # Earliest datetime
                      SleepEndDatetime = max(SleepEndDatetime), # Latest datetime
                      SleepDuration = sum(SleepDuration),
                      SleepAwakeDuration = sum(SleepAwakeDuration),
                      SleepRestlessDuration = sum(SleepRestlessDuration),
                      SleepMinAsleep = sum(SleepMinAsleep),
                      SleepSessions = n() # Number of split sleep sessions
            )


    ### Get the minutes awake between split sessions    
    
        ## Calculate sleep duration using start and end time using floor (round down)
        dfSubset$AwakeBetweenDuration <- floor((as.POSIXct(dfSubset$SleepEndDatetime) - as.POSIXct(dfSubset$SleepStartDatetime)) * 60)
        dfSubset$AwakeBetweenDuration <- dfSubset$AwakeBetweenDuration - dfSubset$SleepDuration
        
        ## Set any duration less than five minutes to zero
        dfSubset$AwakeBetweenDuration[which(dfSubset$AwakeBetweenDuration < 5)] <- 0


    ### Remove rows of combined sleep sessions
    
        ## Remove any row with AwakenBetweenDuration greater than five hours
        dfSubset <- dfSubset[-which(dfSubset$AwakeBetweenDuration > 60*5), ]


    ### Replace split sessions in dfSleep with dfSubset
    
        ## Prepare dfSubset for merging
        
        ## Remove duplicate dates from dfSleep
        dfSleep <- dfSleep[-which(nchar(dfSleep$Combine) > 0), ]
        
        ## Add new columns
        dfSleep$SleepSessions <- 1
        dfSleep$AwakeBetweenDuration <- 0
        
        ## Remove Combine column
        dfSubset <- dfSubset[ , -which(colnames(dfSubset) == "Combine")]
        dfSleep <- dfSleep[ , -which(colnames(dfSleep) == "Combine")]
        
        ## Bind dfSubset
        dfSleep <- rbind.data.frame(dfSleep, dfSubset)

        
    ### Coerce as date
        
        dfSleep$Date <- as.Date(dfSleep$Date)

Sunday, 24 April 2016

Fitbit 01 – Getting and wrangling weight, steps and calories burned data

I love my Fitbit Charge HR [1]. It tracks my steps, it tracks my heartbeat. It tracks my sleep hours. Alas it does not keep me warm at night unless if malfunctions, bursts into flames then rudely singes my beard.

I’m keen to analyse my Fitbit data to gain some insight into my biology and behaviour. What days and times am I the most active? What drives my weight gain and loss? What do I need to do to get a decent night’s sleep? I will attempt to answer such questions through a series of Fitbit posts, each one taking a snapshot of the process from data to insights. This post is focussed on getting the Fitbit data, then wrangling it into tidy data for further visualisation and analysis.

Accessing my Fitbit data is made easy with the R package fitbitScraper. I have been using my Fitbit since March 2015. I have been recording my weight via the Fitbit Aria Wi-Fi Smart Scale since the end of September 2015. I will analyse data from October 2015 (a complete month since recording my weight) to March 2016 (seven months). These are the data variables of interest:
  • Weight 
  • Sleep 
  • Steps 
  • Distance 
  • Activity 
  • Calories burned 
  • Heartbeat. 

I will focus on weight, steps and calories burned for this post.

Weight

After authenticating [2], assigning my start and end date, I applied the get_weight_data function and received the following output:

 
> get_weight_data(cookie, start_date = startDate, end_date = endDate)
                  time weight
1  2015-09-27 23:59:59   72.8
2  2015-10-04 23:59:59   73.6
3  2015-10-11 23:59:59   74.5
4  2015-10-18 23:59:59   74.5
5  2015-10-25 23:59:59   74.5
6  2015-11-01 23:59:59   74.0
7  2015-11-08 23:59:59   73.3
8  2015-11-15 23:59:59   73.4
9  2015-11-22 23:59:59   74.3
10 2015-11-29 23:59:59   73.1
11 2015-12-06 23:59:59   72.3
12 2015-12-13 23:59:59   72.3
13 2015-12-20 23:59:59   72.5
14 2015-12-27 23:59:59   73.0
15 2016-01-10 23:59:59   72.6
16 2016-01-17 23:59:59   72.8
17 2016-01-24 23:59:59   72.7
18 2016-01-31 23:59:59   72.5
19 2016-02-07 23:59:59   72.8
20 2016-02-14 23:59:59   72.7
21 2016-02-21 23:59:59   73.1
22 2016-02-28 23:59:59   73.5
23 2016-03-06 23:59:59   74.0
24 2016-03-13 23:59:59   73.8
25 2016-03-20 23:59:59   73.7
26 2016-03-27 23:59:59   75.0
27 2016-04-03 23:59:59   74.6
28 2016-04-10 23:59:59   74.6

I have more weights recorded which are not being captured. I tried setting the dates within a single month (March 2016) and it returned the following:

> get_weight_data(cookie, start_date = "2016-03-01", end_date = "2016-03-31")
                  time weight
1  2016-02-29 20:46:18   74.1
2  2016-03-01 21:04:14   74.4
3  2016-03-02 07:24:03   73.7
4  2016-03-02 21:10:52   73.9
5  2016-03-03 21:55:57   74.2
6  2016-03-08 20:09:37   74.0
7  2016-03-09 22:19:34   74.8
8  2016-03-10 20:06:33   73.4
9  2016-03-12 20:40:28   73.5
10 2016-03-13 21:03:40   73.3
11 2016-03-14 21:02:22   73.5
12 2016-03-15 22:13:07   73.4
13 2016-03-16 18:54:18   73.2
14 2016-03-17 21:02:32   74.4
15 2016-03-18 20:27:08   74.4
16 2016-03-20 18:21:45   73.1
17 2016-03-23 21:54:01   75.3
18 2016-03-24 20:03:48   75.4
19 2016-03-25 14:09:23   74.3
20 2016-03-27 20:18:27   74.9
21 2016-03-29 21:02:56   74.9
22 2016-03-30 22:08:01   74.9
23 2016-03-31 22:22:25   74.2
24 2016-04-02 22:29:36   74.5

Huzzah! All my March weights are visible (as verified by checking against the Fitbit app). I wrote code that loops through each month and binds all weight data. I will make this available on Github soonish. I noted duplicate dates in the data frame. That is, on some days I recorded my weight twice of a given day. As this analysis will focus on daily Fitbit data, I must have unique dates per variable prior to merging all the variables together. I removed the duplicate dates, keeping the weight recorded later on a given day since I tend to weigh myself at night. The final data frame contains two columns: Date (as class Date, not POSIXct) and Weight. Done.


Steps

Getting daily steps is easy with the get_daily_data function. 

> dfSteps <- get_daily_data(cookie, what = "steps", startDate, endDate)
> head(dfSteps)
        time steps
1 2015-10-01  7496
2 2015-10-02  7450
3 2015-10-03  4005
4 2015-10-04  2085
5 2015-10-05  3101
6 2015-10-06 10413

The date was coerced to class Date, columns renamed and that’s it.

Calories burned

Using the same get_daily_data function, I got the calories burned and intake data.

> dfCaloriesBurned <- get_daily_data(cookie, what = "caloriesBurnedVsIntake", startDate, endDate)
> head(dfCaloriesBurned)
        time caloriesBurned caloriesIntake
1 2015-10-01           2428           2185
2 2015-10-02           2488           1790
3 2015-10-03           2353           2361
4 2015-10-04           2041           1899
5 2015-10-05           2213           2217
6 2015-10-06           4642           2474

As with the steps, the date was coerced to class Date and the colnames renamed. The function returns both the calories burned each day and the intake of calories. Calories intake is gathered items entered into the food log. I only recently stopped recording my food. For items that did not have a barcode or were easily identifiable in the database, I would resort to selecting the closest match and guestimating serving sizes. I was underestimating how much I consumed each day as I was often below my target calories intake, yet I gained weight. I stopped recording food and will wait for a sensor to be surgically embedded in my stomach that quantifies the calories I shove down there. I disregarding the calories intake variable.

Merging the data 

The data frames for weight, steps and calories burned were merged using Date.

> df <- full_join(dfWeight, dfSteps, by = "Date")
> df <- full_join(df, dfCaloriesBurned, by = "Date")
> head(df)
        Date Weight Steps CaloriesBurned
1 2016-03-31   74.2 10069           2622
2 2016-03-30   74.9  7688           2538
3 2016-03-29   74.9  4643           2180
4 2016-03-27   74.9  9097           2510
5 2016-03-25   74.3 11160           2777
6 2016-03-24   75.4  8263           2488
 
I now have tidy data for three Fitbit variables of daily data. Here’s a quick plot of Steps vs Calories burned.




There’s an obvious relationship since I burn calories with each step. Most of my activities involve making steps – walking, jogging, getting brownies, fleeing swooping birds. I do not clock-up steps when I kayak. I don’t know how Fitbit treats such activities. I’ll check the calorie count before and after next time.

Ultimately I’d like to observe whether any variables can account for outcomes such as weight and sleep. Here’s a Steps vs Weight plot.




There’s no relationship. It remains to be observed whether the inclusion of other Fitbit predictors could account for weight. More on this in future posts.


References and notes
1. I am not affiliated with Fitbit. I’m just a fan. I will however accept gifted Fitbit items if you would like to get in touch with me at willselloutfortech@gmail.com
2. Authenticate with the login function; login(email, password, rememberMe = FALSE). Use your email address and password you use to access your Fitbit dashboard online.