Simplified GPU-powered cryptocurrency mining on Ubuntu 16.04

I recently won $12k in Microsoft Azure credits at a Hackathon, which I didn’t have any real use for. Since I didn’t want them to expire either, I decided to at last mine some cryptocurrency on Azure’s GPU-powered VMs of the NV-series.
Getting everything set up on Ubuntu 16.04 proved to be a very manual and tedious process, especially when spinning up several VMs.

To that end, I created some scripts that make installing the required miners incl. all dependencies, libs and frameworks a lot easier.
In the /steps folder of the repo, you will find a few scripts that - when executed in the given order with root permissions - will setup everything you need to start mining with ccminer

1. Prepare packages

We create a subfolder ~/mining in the users home directory, where we will install everything we’re going to need. We also have to install a number of packages later steps depend on.

1_aptget.sh
1
2
3
mkdir ~/mining
apt-get update && apt-get -y dist-upgrade
apt-get install gcc g++ build-essential libssl-dev automake linux-headers-$(uname -r) git gawk libcurl4-openssl-dev libjansson-dev xorg libc++-dev libgmp-dev python-dev

2. Download Nvidia Driver and CUDA

At the time of writing this, versions 375.66 of the Nvidia driver and 8.0 of CUDA were the most recent - change accordingly, if newer versions are available.

2_getinstaller.sh
1
2
3
cd ~/mining && wget http://us.download.nvidia.com/XFree86/Linux-x86_64/375.66/NVIDIA-Linux-x86_64-375.66.run
chmod +x ~/mining/NVIDIA-Linux-x86_64-375.66.run
cd ~/mining && wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb

3a. Disable Nouveau driver

In order to use the Nvidia Driver, we first will have to disable the default Noveau driver. The installation of the Nvidia driver will fail on the first run, but will also disable the Noveau driver, so that after rebooting the machine, the installation will work.

3a_install-nvidia-driver.sh
1
~/mining/NVIDIA-Linux-x86_64-375.66.run --accept-license --no-questions --disable-nouveau --no-install-compat32-libs

3b. Install Nvidia driver

Noveau driver has been disabled by the previous run of the Nvidia Installer. This will now work

3b_install-nvidia-driver.sh
1
~/mining/NVIDIA-Linux-x86_64-375.66.run --accept-license --no-questions --disable-nouveau --no-install-compat32-libs

4. Install CUDA

Cryptocurrencies are obtained by using computing power to calculate for a large number of input variables (hashes) if they match certain criteria.
The possibility of finding a match is often referred to as the difficulty of the currency, i.e. at the time of writing Bitcoin’s difficulty was 595,921,917,085, meaning that only one out of 600 Billion checks would result in an actual positive result, aka a Bitcoin.
It is obvious that doing more calculations in parallel is favourable to the yield of the mining.

That’s where GPUs come in - they have significantly more computing units, thus allowing a much higher rate of parallelism at checking hashes.
CUDA is a technology developed by Nvidia that enables using GPUs to process general purpose calculations (such as checking hashes).

This speeds up calculations significantly, simply due to the total number of cores available (4096 cores in an Nvidia M60 vs. 4 in a current gen Intel i7)

4_install-cuda.sh
1
2
3
4
5
6
7
8
dpkg -i ~/mining/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb

apt-get update
apt-get install cuda-toolkit-8-0
usermod -a -G video $(whoami)
echo "" >> ~/.bashrc
echo "export PATH=/usr/local/cuda-8.0/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda8.0/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc

dkpg might take a while. If shell script hangs or dies, try executing it manually.

Reboot again

5. Install ccminer

After a reboot, we have all the drivers installed and ready. The only thing missing is a mining software that knows how to do the actual calculations that will result in coins.
ccminer is one of the most widespread solutions and provides support for a large number of different currencies

5_make-ccminer.sh
1
2
3
4
5
6
7
8
9
10
11
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
cd ~/mining
git clone https://github.com/tpruvot/ccminer
cd ccminer
git checkout linux
./autogen.sh
./configure
make
make install

Other miners

Setting up other miners (i.e. Marlin for Siacoin - see 6_install-marlin.sh)should be fairly easy at this point since most complexity goes into preparing the system to correctly utilize its GPU units.

Happy mining

Batch checking contents of an S3 bucket with AWS lambda

A while ago, I faced the challenge to check the contents of a few thousand folders in an AWS bucket to see if they meet certain criteria, i.e. number and type of files per folder, etc.
The best way to achieve this proved to be a node.js script ran on Lambda with an API Gateway in front, for a number of reasons:

  • The aws-sdk is preinstalled for every lambda function and can be required without any installation
  • Granting Lambda read access to a certain bucket is quite easy
  • Lambdas can have a number of triggers, i.e. Any AWS event, HTTP requests through API Gateway or a cron like execution
  • API Gateway makes it easy to provide a simple REST interface for your endpoint or even offer a user-friendly UI (Minimalistic Angular 1 in S3 my case)
  • Most things can be handled by the serverless framework for you, without the need to fiddle around with AWS

The Script

One thing worth mentioning is that AWS limits the number of objects per listObjectsV2 call to 1000. If your bucket contains more elements (in my case up to 25.000), your API response will contain a field called NextContinuationToken which allows you to fire another request that continues where the first request got capped.

We use these tokens to recursive call the getObjects call until the list is finished and call handleObjectList on the elements of each call. An object outside the call scope can be used to collect data of each call and keep it for the onFinsihed function to calculate the final result.

In this example we also provide parameters to ignore certain elements when invoking the execution via REST call or set the bucket name dynamically. This is of course optional, but proved to be quite useful for my use case.

Another thing that I found quite useful was to publish the results to an SNS topic for further processing - this is optional as well, but I nonetheless left the code in the snippet.

The actual script is not really rocket science and looks as follows.

bucket_check.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
'use strict';

module.exports.checkBucket = (event, context, callback) => {
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const sns = new AWS.SNS();

const requestBody = JSON.parse(event.body);

// [OPTIONAL] can be used to store results of the iterations
const resultJSON = {};

// [OPTIONAL] provided queryParams to control script, i.e. &ignore=1,2,3 could be used to ignore elements 1,2 and 3
const scriptParams = {
ignore: requestBody.ignore || []
};

console.log(scriptParams);

const s3Params = {
Bucket: requestBody.bucket,
Prefix: 'your/folder/within/the/bucket' // limits the results to only a certain folder
};

let onFinished = () => {
const result = {
checkedBucket: s3Params.Bucket,
// ..
};

// SNS is optional, but proved quite useful. If not needed, just call callback(null, response) directly
sns.publish({
TopicArn: "arn:aws:sns:<topic>",
Message: JSON.stringify(result)
}, (err, data) => {
console.log(err ? 'error publishing to SNS' : 'Message published to SNS');

// Required if you want to use an AWS API Gateway in front
const response = {
"statusCode": 200,
"headers": {
"Access-Control-Allow-Origin" : "*", // Required for CORS support to work
"Access-Control-Allow-Credentials" : true // Required for cookies, authorization headers with HTTPS
},
"body": JSON.stringify(result)
};


callback(null, response);
});
};

let handleObjectList = (data) => {
// Keys of the form "a/b/c.jpg"
const keys = data.Contents.map(c => c.Key);

keys.forEach(key => {
// do something with the filename, i.e. aggregate the data in resultJSON
});
};

const getObjects = (token) => {
if (token) {
s3Params.ContinuationToken = token;
}

s3.listObjectsV2(s3Params, (err, objectsResponse) => {
if (err) {
console.log(err, err.stack); // an error occurred
}
else {
handleObjectList(objectsResponse);

if (objectsResponse.NextContinuationToken) {
console.log('Continuing Request with Token ', objectsResponse.NextContinuationToken);
// Recursive call with the ContinuationToken of the previous request
getObjects(objectsResponse.NextContinuationToken)
} else {
onFinished();
}
}
});
};

getObjects();
};

Deploy

The easiest way of putting this to work is by getting everything set up by the Serverless framework. In this example, it will

  • Zip, upload and deploy your code into an Lambda function
  • Create an API Gateway as an entry point
  • Wire everything together and set the correct permissions and roles
serverless.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
service: bucket-checker

provider:
name: aws
runtime: nodejs6.10
region: eu-central-1
stage: stage
profile: bucket-check
memorySize: 1536
timeout: 180
iamRoleStatements:
- Effect: "Allow"
Action:
- "s3:Get*"
- "s3:List*"
Resource: "*"
- Effect: "Allow"
Action:
- "sns:*"
Resource: "*"
functions:
checkBucket:
handler: bucket_checker.checkBucket
name: ${self:provider.stage}-checkBucket
description: Checks the contents of a given S3 Bucket
events:
- http:
path: bucket/validate
method: post
cors: true

Regarding runtime

The scripts runtine depends on a number of factors, mostly memory size of the Lambda function, number of items in the bucket and type of processing done on these items.

Using the maximum available memory size, a simple analysis of 25.000 items takes ~30s for my use case.

If you find the runtime to be slow, make sure that you are utilizing the proper amount of memory. Make also sure to properly set the Prefix parameter in the S3 config, as this could greatly influence the number of items that have to be checked.

Distribution of Pokémon in Munich

After the launch of Pokémon Go at the beginning of July everyone seemed to be on the lookout for Pokémon. While I found the hunt for Pokémon to become tedious after a week or two, I found the idea to have a look at the data behind the game a lot more interesting. To that end, I forked the very popular Pokémon Go Map project and added some logic to collect and store the data in a MySQL database.

I then used Tableau to build some fancy visualizations of the dataset at hand. All the data I will refer to later can be found here

TL;DR: Did analytics and visualizations on Pokémon - see Interactive Tableau Sheet

The Data

The data collected by the script is pretty raw and simple, with essentially a single table only containing ~206k rows, each one representing the appearance of a single Pokémon. The area I scanned can be best described as a hexagon with a “radius” of around 3.65km (=34.6km²) around the center of Munich, Germany. This already gives us some interesting statistics (within certain margin of error of course - see my thoughts regarding the quality of the data below), i.e. a Pokémon appearance rate of 124 Pokémon / km² / h.

Raw Pokémon data

This raw data is already interesting on its own, since it contains 3 relevant dimensions: the ID of the Pokémon as well as time and location of its appearance. However, in order to allow for a more extensive and human-friendly representation, we need more data, which I - of course - found in the depth of the internet in the form of an unbelievable number of projects aiming at bringing Pokémon into a structured, queryable format. This allowed me to extract additional data like details, evolution levels or types and store them in my database as well.

Using some simple SQL statements, I created a single SQL table that made a few changes to the original data:

  • Added the type of the Pokémon (id + human-friendly name)
  • Added the groupId and groupOrder are used to enable clustering on a finer lever by adding Pokémon into logical groups, i.e. Bulbasaur and all its evolutions would be have groupId=1 and the respective orders 1,2,3. The screenshot below illustrates this.
  • Added the German and English name of the Pokémon
  • Replaced the complicated alphanumeric encounter_id with a numeric encounterId
  • Omitted spawnpoint_id, which seemed to be only random, non-repeating IDs

We can now use this table and join it with the other tables to create the final result that we’ll use for the visualizations. The following query will give us everything we need - I’m sure it can be further optimized by proper indexes, temporary tables and SQL Voodoo, but since it completes in less than 20 seconds, I didn’t really bother.

aggregation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SELECT 
p.enc_id AS encounterId,
p.pokemon_id AS pokemonId,
pd.name_de AS nameDE,
pd.name_en AS nameEN,
pd.groupId,
IF(id < 4, id, (pd.id % pp.groupStartsAt) + 1) AS groupOrder,
pt.type_id AS typeId,
t.type AS pokemonType,
p.latitude AS lat,
p.longitude AS lng,
UNIX_TIMESTAMP(p.disappear_time) AS disappearsAt
FROM
(SELECT
pp.*, @rownum:=@rownum + 1 AS enc_id
FROM
pokemon pp, (SELECT @rownum:=0) r
WHERE
pp.disappear_time > '2016-07-26 00:00:00'
ORDER BY pp.disappear_time ASC) AS p
JOIN
(pokemon_types pt,
pokemon_details pd,
types t,
(SELECT
groupId, id AS groupStartsAt
FROM
pokemon_details
GROUP BY groupId)
AS pp)
ON (p.pokemon_id = pt.pokemon_id
AND p.pokemon_id = pd.id
AND t.type_id = pt.type_id
AND pp.groupId = pd.groupId)
ORDER BY encounterId , disappearsAt ASC

Enhanced Data

Adding the type of a Pokémon increased the number of entries from 205k to 322k since a Pokémon can have multiple types and thus will be listed once for each type, i.e. Pidgey in the screenshot below. We will account for this later in Tableau by making sure we use type as a dimension in our vizs.

Enhanced Pokémon data

Preprocessing

Tableau proved to be surprisingly clumsy in handling CSV as import data, making it hard for me to bring the data in the desired form. After a few unsatisfying attempts to properly import the CSV with Tableau, I decided to do the preprocessing in Excel and then use this sheet as a data source for the import to Tableau. This went significantly smoother then importing from CSV and gave me nice data to work with. The only things that remained to be adjusted manually where:

  • Transforming the Unix Time of disappear to a proper date using the following formula: DATEADD('hour',-8,(Date("1/1/1970") + ([disappears]/86400)))
  • Changing the type of ids from continuous to discrete numbers
  • Tell Tableau that the decimals are actually lat / long coordinates

Quality of the data

The available data had a few flaws that we should talk about before continuing.

Missing Data

While collecting the data, I paused the script multiple times for a short while. In this time, no data has been collected at all. This is not a problem if we display the data on a map, but distorts the visualization in time based charts.

Missing data
Distorted visualization

Changing search radius

In addition to stopping the script several times, I also changed the parameters of it multiple times, i.e. increasing/decreasing the search radius. This had a direct influence on the number of Pokémon found in a certain interval. As a result of this, absolute numbers are not comparable for different timespans.

Total appearances per hour

Geographical distribution

Caused by the way the algorithm works (simulating a Pokémon Go player running in an endless spiral from the inside to the outside and then starting over from the center again), the likelihood of missing the appearance of a Pokémon gets higher, the further you move away from the center. We can see this by using the cluster function of Tableau with the number of appearances of a Pokémon as its only parameter. This gives us three clusters (low=red, medium=yellow, high=blue) for the number of total appearances.

Cluster

This clustering is also influenced by the change in search radius, but in my opinion mostly caused by the increased probability of missing a Pokémon in the areas further away from the center.

disappearTime

The data we use when aggregating over time is not actually the time when the Pokémon appeared on the map, but rather the date when it will disappear. This is again due to the way the script works, which leads to the fact that we neither know the actual time of appearance nor the Pokémon Go API telling us when a certain Pokémon did actually appear. However, since Niantic tells us the disappear date of each Pokémon, we just assume that each Pokémon stays on the map for the same duration and thus can use the disappear for our timeline, especially when looking for interesting pattern rather than concrete predictions.

Visualization

The Tableau Sheet I created is available here and you can try out everything I’ll describe here - which you should absolutely do.

Count per type

This visualization shows all the Pokémon for a certain type (i.e. Bug, Fire, ..) on a map. Pokémon belonging to the same group (i.e. Weedle -> Kakuna -> Beedrill) have the same base color. The size of the marker on the map indicates the number of total appearances for this particular Pokémon. If you analyze the data, different pattern emerge, i.e. for Pokémon of type bug, you can see in the example below, that all three Pokémon of the Weedle -> Kakuna -> Beedrill evolution chain are found in exactly the same places, just with decreasing probability.

Bug-Pokémon found on map
Weedle found on map
Kakuna found on map
Beedrill found on map

Each pattern is interesting on its own and there’s not enough time to talk about each own here, but one I found particularly interesting is the distribution pattern of the rare Dragon Pokémon, which seems to be aligned with Munich’s heavily frequented Altstadtring and the river Isar.

Dragons found on map

Count per type and evolution

This is probably one of the neatest visualizations since it shows a lot of information in a visually very appealing way. It’s basically the same concept like the one we already talked about, but instead of using just one map, we use a grid of maps, where the X-axis is the evolution (aka groupOrder) of the Pokémon and the Y-axis is the type. Again I really encourage you to look at the interactive Tableau Sheet by yourself - you can do it all in your browser and apply your own filters / constraints.

Map grid

Spawns over time

That’s a cool one, too. It shows the spawns of all Pokémon on a timeline (in minutes), grouped by Pokémon groups. Each pixel-wide line marks the spawntime (actually disappearsAt - see above) of a Pokémon, color coded by group and pokemonId. You can see that there has been barely a minute, where no Pidgey has spawned somewhere in the area we scanned. You can also see, that spawntimes seem to be distributed fairly even, although the frequency decreases the higher a Pokémon’s evolution level is.

Spawn over time

Statistical analysis

Another interesting way of looking at the available data is taking coordinates as what they actually are: decimal numbers. This allows as to apply statistical metrics like average or median on them and get valid lat/long coordinates as well, which we can than visualize again, i.e. by showing both the average an median spawn points of each Pokémon on our map. The chart shows a higher density of spawn point in the center for the average compared to the median. This reflects what we saw on the previous visualizations as well: More Pokémon spawned in the center than to towards the borders of the scanned area.

Median and average spawns of Pokémon

We can also look at metrics like the standard deviation of the spawn points of each Pokémon. This shows how outspread the individual spawn points around the average spawn point of each Pokémon are, meaning that the further a Pokémon is to the right/top of the plot, the higher its deviation is. The plot shows the deviation in meters, which at such low distances can be fairly accurately derived from the lat/long distances without having to account for the curvature of the earth.

When we look at the median of the standard deviations, we can claim (I know: this is very simplified and not really significant) that the average Pokémon appears 1.8km north/south and 1.4km east/west from its arithmetical center.

I also tried to find some clusters in the standard deviation plot, but neither type nor group nor groupId resulted in any significant clusters. The clustering in the screenshot below is solely based on the standard deviation of lat and long, which obviously finds (meaningless) clusters.

Median and average spawns of Pokémon

Conclusion

There’s no real conclusion, but we nonetheless found out a few interesting things about where, when and how Pokémon appear in Pokémon Go.

  • Spawn rate and ratio seem to be constant and are not changing over time.
  • Related Pokémon most of the time spawn in the same areas, although the higher evolved, the less frequent.
  • Pokémon tend to appear near places that match their type, i.e. Water Pokémon are mostly found near rivers (The same applies for Ice and - weirdly enough - Fire Pokémon).
  • 95.6% of all recorded Pokémon are of the lowest evolution level (1), 4.1% have evolved to the second level and only 0.32% have reached evolution level 3.
  • There’s a Pidgey and Rattata epidemic in Munich, with each accounting for 14% of all Pokémon.
  • 28 Pokémon never showed up in Munich, 123 out of 151 however did.
  • There is a relation between evolution level and spawn rate/probability (not really surprising), but there is no statistically significant relation between evolution level and spread over the map.

I really enjoyed putting this together and hope you found it as fascinating as I did :).

Edit November 2016 Niantic effectively locked-out all crawlers and bots by now, so there will be no fresh data to be collected.