A cornucopia is like a funnel that keeps on giving • cornucopia

The goal of cornucopia is to facilitate reporting on sponsored and organic activities across various platforms, including Facebook, Instagram, and (to a lesser extent, due to lack of usable API) LinkedIn. Other networks may be added in the future.

Why call it `cornucopia`?

For you all marketing folks, a cornucopia is like a funnel that keeps on giving. Also known as the “horn of plenty”, it’s basically really your wildest dream: a funnel that endlessly overflows with abundance.

Hence, the marketing slogan of cornucopia is:

turn your every funnel into a cornucopia!

The premise of cornucopia is that there’s plenty of small and big platforms that can be used to interact with the API (or to connect them to other data visualisation or data processing platforms) of big “social networks”/advertising platforms, but there’s really just a few open source packages in the r ecosystem openly available in this space. Some of these packages are of excellent quality, but overall there’s a steep learning curve when joining this space, and few tools enabling the beginner-intermediate users to look at these interrelated processes in an orderly fashion, or to facilitate some routine work for more advanced users.

Long term, this is the purpose of cornucopia, which is currently at an early stage of development.

As these so called “social networks” are all, at their core, marketing tools, throughout the package documentation I will use the language found on each platform’s website and use marketing terms by default. Needless to say, this can be used for reporting of non-profit websites, or for optimising conversions that are not purchases, etc.

Please enjoy cornucopia responsibly.

Experimental package warning

Again, keep in mind that this is an incomplete and not fully tested package. It currently uses only API calls that read data, so it should be safe to use. But:

it tries to cache data for efficency and speed, but the caching mechanisms may not always work, so make sure the extracted data are fit for purpose (more testing will be introduced)

Also, the usual disclaimer: this free software comes with absolutely no warranty.

Installation

You can install the development version of cornucopia with:

remotes::install_github("giocomai/cornucopia")

Settings

You can you use cc_set() to set start and end dates to be used by all other functions, as well as tokens, user identifiers, and caching preferences. You can provide as many or as few settings as you like. You can also pass the same as parameters to individual functions, without using cc_set() at all.

library("cornucopia")
dates_l <- cc_set(
  start_date = "2023-01-01",
  end_date = Sys.Date() - 1,
)

dates_l$start_date
#> [1] "2023-01-01"

Here is a full list of parameters that can be set with cc_set():

start_date, end_date, fb_user_token, fb_page_token, fb_page_id, fb_business_id, fb_ad_account_id, fb_product_catalog_id, fb_user_id, ig_user_id, ga_email, ga_property_id

Meta / Facebook / Instagram

In order to get data out of the Meta ecosystem through APIs, you will need to create an app following the procedure on their Developer platform.

You can then get your token from your app page, after adding the “Marketing API”. When you retrieve your token, you can select permissions: you probably want to include both “ads_read” and “read_insights”, as they are read-only and hence safe, while you probably don’t want to tick “ads_management”, unless you really know what you are doing.

You can also get a token with a different set of options for customisation, through the Facebook’s Graph API explorer.

Things, however, are not so easy, as you’ll need to go through additional steps to get long-lived tokens, Facebook page tokens, etc. - more details below.

Also, be mindful that the Meta APIs do not always return meaningful error messages, and the documentation has only few examples… as some queries work or not depending on the type of ad (its creatives/its format/etc.) or the type of organic post (is it a video? if so, is it a reel? etc.), there’s often some trial and error involved.

More broadly, whenever you get an error message that is unclear, think creatively about what could be wrong: has my token expired? did I exceed query limit? did I include a field which was not available for this endpoint? did I use the wrong identifier (e.g. a post id instead of a video id, or vice versa)?

Organic - Facebook pages and access tokens

Data about Facebook pages need to be accessed with a Facebook page token, that is separate to the Facebook user token.

After retrieving the Facebook user token (as mentioned above, e.g. on the Graph API explorer), the first step is then to retrieve one’s own Facebook user id:

library("cornucopia")
cc_set(fb_user_token = "actual_token_here")

cc_get_fb_user()

And then use the Facebook user id to request all pages managed by that Facebook user, including the relevant Facebook page token:

cc_set(fb_user_id = "actual_user_id_as_retrieved_with_cc_get_fb_user")

cc_get_fb_managed_pages()

It is this page token that can then be used to retrieve information about a given page.

You may hope that things would just work and that you’d see all your pages when you run cc_get_fb_managed_pages(). Things, however, may be not so simple, because of granular permissions. In other words, you need to explicitly grant permissions to access pages. As is characteristic of Facebook, they like to move settings around, but you should be able to add this permission from the Graph Api Explorer (for context, see also this answer on StackOverflow).

Select your app from the drop down menu, and then from the “User or Page” dropdown select: “Get page access token”. You will be asked to re-authenticate, and then you will be able to choose between:

“Opt in to all current and future Pages”
or select from a list of your pages

Then you will see you’ll have more permissions in the list, including

pages_show_list
pages_read_engagement
ads_read
read_insights

You probably want to add also pages_read_user_content in order to retrieve information about your own posts. If you are managing pages through business manager, this is probably not enough, as you will need to add also the business_management permission. Also consider that if you want to interact with the Instagram page associated with this page you will also need:

instagram_basic
instagram_manage_insights

Do remember to re-generate your token after adding permissions to actually get access.

You can run cc_get_fb_managed_pages() and get the token for all of your pages, or if you know what you’re looking for, you can get the token for a specific page as follows:

cc_get_fb_managed_pages()

fb_page_token <- cc_get_fb_page_token(
  fb_user_id = cc_get_fb_user(),
  page_name = "My example page"
)

You should be now good to go.

But depending on how you created your token, you may well still be using a short-lived token. This may be fine, but perhaps you really want to have a long-lived token, so that you don’t need to constantly retrieve it again through web interfaces.

In order to get a long-lived user and page token, you need first to get a short-lived user token with the appropriate authorisations as described above, and then input your Facebook app id and secret, which you can retrieve from your app page page.

cc_get_fb_long_user_token(
  fb_user_token = "your_short_term_token_here",
  fb_app_id = "your_fb_app_id_here",
  fb_app_secret = "your_fb_app_secret_here"
)

Then you can use this long-lived user token to get a long-lived Facebook page access token. With the following function you will get a data frame with access tokens to all pages to which your short-lived tokens has access (you can select just a few or include all from the web intreface when you create the short-lived access token).

cc_get_fb_long_page_token(
  fb_user_id = "your_fb_user_id_here",
  fb_user_token = "your_long_term_token_here"
)

A word of caution about access tokens

Store these access tokens safely: while long-lived user access token expire after 60 days, long-lived page access tokens do not expire. We are solidly in password territory here: make sure you do not include these tokens in scripts you share, as they can be used to retrieve data, and, especially if you’re not really careful when you create them, do all sorts of other things, including posting to your page or creating ads. In theory, these permissions are super-granular, but in practice, especially for the pages you control through a Business Manager account, getting the right set of access in place is not straightforward (and maybe, not even possible). So eventually you may end up with access tokens with substantial access: treat them as you would treat secret passwords, and use them with care.

At this stage, all functions in this package use only read API, but the same access tokens may potentially be used for other APIs. So… take care.

Using the `keyring` package for storing tokens securely

Throughout this readme and the documentation, reference is made to tokens as if they were directly included in scripts. As mentioned above, this is less than ideal, as this potentially implies having tokens stored as plain-text in scripts, as well as in local history files or server logs.

A convenient and much safer approach relies on the keyring package, which allows to store tokens using your operating system’s credential store.

Here is how a keyring based workflow would work.

First, go the Graph API explorer page and be ready to retrieve your Facebook user token, then run the following command to input it interactively and store it in your operating system’s keyring.

library("cornucopia")

library("keyring")

keyring::key_set(service = "fb_user_token")

Notice that fb_user_token here is just the way I decide to name this in my local keyring: I can give this whatever name I like, or add the username argument if I plan to use more than one account.

Then I would usually need to add my Facebook user id. With the following command, I can retrieve and store the relevant id without even seeing it in the console.

keyring::key_set_with_value(
  service = "fb_user_id",
  password = cc_get_fb_user(
    fb_user_token = keyring::key_get(service = "fb_user_token")
  ) |>
    dplyr::pull(id)
)

In order to add safely your Facebook page token, you could then proceed as follows.

First, get the exact name or id of your Facebook page with:

cc_get_fb_managed_pages()

And store the relevant page id with:

keyring::key_set(
  service = "fb_page_id",
  username = "My example page"
)

Then retrieve and store the Facebook page token in a single command:

keyring::key_set_with_value(
  service = "fb_page_token",
  username = "My example page", # use your page name, if you manage more than one page
  password = cc_get_fb_page_token(
    fb_user_id = keyring::key_get(service = "fb_user_id"),
    fb_user_token = keyring::key_get(service = "fb_user_token"),
    page_id = keyring::key_get(
      service = "fb_page_id",
      username = "My example page"
    )
  )
)

Now that you have stored these tokens in your local keyring, you can include at the beginning of your scripts something like this, without worring that your tokens will be shared involuntarily:

cc_set(
  fb_user_id = keyring::key_get(service = "fb_user_id"),
  fb_user_token = keyring::key_get(service = "fb_user_token"),
  fb_page_token = keyring::key_get(
    service = "fb_page_token",
    username = "My example page"
  ),
  fb_page_id = keyring::key_get(
    service = "fb_page_id",
    username = "My example page"
  )
)

For the sake of simplicity, you may find in this readme and elsewhere in the documentation example code that may suggest to include your tokens as plaintext: you now know that with keyring there is a better way. You have been warned.

Once you’ve got your tokens

Now that you have these tokens, you probably want to set them and let them be used throughout the current session:

cc_set(
  fb_page_token = fb_page_token,
  fb_page_id = fb_page_id
)

(if you prefer, you can actively pass the token to each function call, or, you may rather prefer to use some other solution to make sure access tokens are not included in scripts, shared by mistake, etc. )

Good, now you have your Facebook Page Token, which you can use to get information about your page and posts, what do you do next? (also, be mindful that the same token can be used also to actually post on your page, so treat it with due caution and make sure it remains private)

Then, you probably want to get a list of all posts from your page.

posts_df <- cc_get_fb_page_posts()

Yes, you probably have a lot of posts, but this function caches result by default, so you will have to do it only once, and then only newer posts will be retrieved, so it may be worth your time. If you don’t want to wait and just need a few posts, then you can retrieve only the most recent posts with something like:

posts_df <- cc_get_fb_page_posts(
  max_pages = 10,
  cache = FALSE
)

Which will retrieve the most recent 10 pages of posts (each page has 25 posts, so you do the math).

Besides Facebook post id, you already get some other basic information about each of the posts, namely:

created_time, id, permalink_url, message, full_picture, icon, is_hidden, is_expired, instagram_eligibility, is_eligible_for_promotion, promotable_id, is_instagram_eligible, is_popular, is_published, is_spherical, parent_id, status_type, story, subscribed, sheduled_publish_time, updated_time

But you probably want to know more. There’s a bunch of different things you can find out, and these vary depending on the type of post. But the most common next step is probably to get some more information about these posts with:

cc_get_fb_page_post_insights()

[to do]

cc_get_fb_page_insights()

cc_get_fb_page_video()

cc_get_fb_video_insights()

Meta ads

For the time being, cornucopia partly relies on fbRads to get data about sponsored campaigns and store them locally (long terms, all API calls will be done directly by cornucopia for consistency).

token <- "looooooooooong_string"

account <- "00000000000000000"

fbad_init(
  accountid = account,
  token = token
)

ads_df <- cc_get_fb_ads()

Notice that if you ask for a lengthy period, you may hit the API query limit. The error message is however not helpful, as it apparently complains about fields. Just wait and try again after a few hours: all downloaded data are by default stored in a local folder and nothing will lost, queries will be made only for missing data.

If you’re hitting the API limits but what to proceed with writing your code as you wait, you can set the only_cached parameter to TRUE, so you can proceed with your analysis with the data you have until you’ll be able to download more data.

ads_df <- cc_get_fb_ads(only_cached = TRUE)

Notice that you can customise the fields to retrieve and that not all fields can be asked at the same time. See the embedded list cc_valid_fields_ad_insights for list of valid fields, divided by broad categories (this subdivision has been made by the package author, not by Facebook itself). Caching of retrieved contents by type of fields will be addedt to future versions.

Not all ad-related information, however, can be retrieved through this endpoint.

Creatives

For example, if you want details about the creatives used in an ad, you will first need to make queries to retrieve the creative_id associated with each ad (see documentation of this endpoint), and only then query the ad creative endpoint to retrieve relevant information about the creative.

The first step of this process, i.e. retrieving creative_id can be achieved by passing a vector of ad_id to cc_get_fb_ad_creatives_id(). Data will be cached locally by default, assuming creatives will mostly be added at the time when the ad is created.

Action breakdowns

Here’s more interesting information, but also where caching gets trickier. But say, you are interested in actions by day and by ad, you can use cc_get_fb_ad_actions_by_day, passing to it a vector of ad_id (or adset_id, for that matter, the APIs don’t seem to mind), and you’ll get a daily breakdown.

You can get even more details: for example, do you want to know how many of those viewing your video ads had the sound on:

cc_get_fb_ad_actions_by_day(
  ad_id = example_id,
  type = "actions",
  action_breakdowns = "action_video_sound"
) |>
  dplyr::filter(is.na(action_video_sound) == FALSE)

For many such breakdowns, including this one, you get a meaningful breakdown only if your ad is relevant. For example, if the example_id above is of an ad that does not have any video, no information is returned. If you ask for product_id for an ad that is not based on catalogue, you won’t get anything. And so on.

See the official documentation for details.

Caching does not really work with this function at this stage, as no consistent approach for updating cached data has been implemented, yet.

Instagram

As for all things related to Meta’s API, you will need an app, and a valid token with all the needed permissions (not having the right permissions is the most frequent problem you’ll find, so when troubleshooting, take it from there).

So first set your ig_user_id and your token. If, as may well be the case, you don’t know your ig_user_id you can retrieve it with cc_get_instagram_user_id(), as long as you have the fb_page_id of the Facebook page associated with the given Instagram account and fb_user_token (notice, the user token, not the page token). For reference, see also step 5 of this official guide.

cc_set(
  ig_user_id = "00000000000000000", # probably about 17 digits, not the legacy Instagram id
  fb_user_token = "loooong_string"
) # the regular token, not the "page token"

And you can get some basic information about your profile:

cc_get_instagram_user()

Or just some specific fields:

cc_get_instagram_user(fields = c("username", "followers_count"))

In order to get detailed information about each of your posts, you first need to know their ig_media_id. You can get this id for all of your posts with the following command:

cc_get_instagram_media_id()

Be mindful that this may make many queries, as Instagram gives the result in batches of 25… if you have thousands of media, it may take some time. Data are however cached locally by default.

You can then pass the resulting ig_media_id to cc_get_instagram_media() to get more information about a given Instagram post.

cc_get_instagram_media()

Responses to cc_get_instagram_media() are cached by default, and updated with decreasing frequency as posts get older (data are refreshed every day for the last week, once a week for the last month, once a month for the last year, once a year for previous years). As a consequence, you should mostly be able to keep this in scripts and rely on it to autoaupdate data without much delay.

Retrieve leads from Meta ads

In order to retrieve leads from ad campaigns that rely on Meta’s native forms, you should make sure that your user token has the right permissions (it must include leads_retrieval).

Then you must get the identifier of your lead form: you can see it when you create it, when you download leads as a csv from a single ad, or from the dedicated section in the ads manager. You should be aware that only the form id matters; all leads from that form will be retrieved, no matter the the ad set or ad in which it is included (refernce to adset and ad are however included in the returned along with the data).

cc_get_fb_leads(form_id = "insert_form_id_here")

You can of course feed these data into your own workflows, e.g. uploading them to Google Sheets with googlesheets4, and even do this recurrently, by running e.g. the script every hour as faciliated by packages such as cronR or taskscheduleR.

LinkedIn does not allow for exporting statistics about pages or ads systematically, if not by using one of a very small number of ridiculously expensive third party services.

This complicated the independent processing of the data, as well as their inclusion in third party dashboard.

To deal with both of these issues, cornucopia includes a set of functions that facilitates:

processing files with statistics exported from LinkedIn
store them locally in a consistent manner
keep them updated in a set of Google Sheets, in order to facilitate their integration with services such as Looker Studio

The user just needs to download relevant files and store them in a folder, without paying attention to anything else, really. Files for more than one page can be included in the same folder and no special attention needs to be dedicated to the time period included: cornucopia will always strive to include data for the longest possible period, always preferring the most recent data available. For this reason, it is usually easiest to export data about the last 365 days, and let this package deal with the rest.

Export files for all sorts of a statics from your LinkedIn page. They will have file names such as “pagename_followers_1684688073420.xls”. Throw all of them in a folder, that we’ll call “LinkedIn_stats”.

You can then retrieve some basic information about these files using:

cc_get_linkedin_stats_files(path = "LinkedIn_stats")

This allows to see when a file was exported, what is the name of the page, and what type of statistics it includes: all of this information can be gathered from the file name.

Each of the statistics types exported from LinkedIn has its own data format, and indeed there is little consistency in these files, if not for the ridiculous insistence on including dates in the ridiculous US date format (month-day-year), no matter where you are in the world.

Anyway… we can now move on to the specific functions for each type of statistics. The following functions parse all relevant files and merge the data preferring the most recently downloaded data over older files (this may be irrelevant in many cases, but may well have some impact in statistics associated with a given post).

followers_df <- cc_get_linkedin_stats_followers(
  path = "LinkedIn_stats",
  page = "example-page"
)

Once we have these data, we can of course process them as we usually would. But for the sake of this post, we are imagining a workflow that requires us to upload these files to a Google Sheet, in order to facilitate data retrieval through Google Looker Studio.

We can do this manually, but, of course, we’d much rather use a set of convenience functions that will process data and upload them automatically to the same Google Sheet, updating the dataset if one was previously uploaded.

cc_drive_upload_linkedin_stats_followers(
  path = "LinkedIn_stats",
  page_name = "example-page"
)

cc_drive_upload_linkedin_stats_content(
  path = "LinkedIn_stats",
  page_name = "example-page"
)

cc_drive_upload_linkedin_stats_visitors(
  path = "LinkedIn_stats",
  page_name = "example-page"
)

Google Analytics

Only the most tentative integration with Google Analytics has so far been implemented, relying on googleAnalyticsR.

If you are interested in the ratio (possibly, calculated as a rolling average), consider something such as the following.

cc_set(ga_email = "example@example.com",
       ga_property_id = 123456789)

cc_get_ga_event_ratio(events = c("session_start", "purchase"))

cc_get_ga_event_ratio(events = c("session_start", "purchase"),
                      rolling = TRUE)

This is a quick implementation, and further convenience functions based on googleAnalyticsR may be introduced, including local caching.

Disclaimer

I despise ad-tech, but I’ve got work to do.

License

cornucopia is released under a MIT license.

cornucopia

Why call it cornucopia?