Can credit models work in DeFi?

Yes, and here is how I made one…

Image made by Stable Diffusion. Prompted “Artificial Intelligence Decentralized Finance Credit”

Audience

  • Web3 and DeFi builders.
  • Data Scientists and Machine Learning Engineers.

Abstract

Machine Learning has become a popular technique to assess credit risk. Nowadays, lending businesses combine traditional credit scores such as Fico with additional data, feed everything into Machine Learning models, and generate scores to approve or reject loans.

DeFi is an exciting field where most of the lending activity still relies on heavy collateralization rather than credit scoring; but as the space matures, it will also become home for new kinds of digital identities and credit worthiness credentials.

R&D is flourishing with projects exploring this idea through different approaches:

  • Using in-chain wallet behaviour (RociFi for example);
  • Focusing on actual lending history (ARCx, Creda, Trava);
  • Leveraging KYC and bringing off-chain data into the blockchain realm (Quadrata);
  • Aggregating different scoring providers and combining all of the above (Spectral.)

The goal of this article is to add a prototype to the cocktail above; start an open source repository that could serve as a benchmark and an open laboratory; and hopefully provide some useful insights.

The Model Objective

The goal of my model is to predict if an Ethereum wallet interacting with a lending protocol will have their first liquidation event in the next 30 days.

What can it do?

  • Assess the risk of a wallet before approving a loan; to adjust their interest rate over time; or set loan limits and other terms.
  • Monitor the health of a credit portfolio to anticipate declines in borrowers risk levels.

Why?

Because ultimately, better risk assessment means better loan terms for the end users; and more access to more people.

Within this vision, there are different ways to think about the algorithm and the target, why 30 days, survival modelling, etc. But for this first iteration, I wanted to build the simplest thing that could possibly work and make it as generic as possible.

The Data Set

To come up with very generic features, I first ignored the lending aspect, and focused on representing a wallet profile in the most independent way. I brute forced hundreds of thousands of features related to Ether, ERC20 and NFT transfers; then made a front-end to explore the feature library as of August 31st 2022 snapshot: https://helga.datascience.art

Example of a query using the HELGA wallet searcher: wallets who used DeFi tokens in 2019 and are currently worth more than US$1Million. The tool shows some general statistics about the selected cohort, in addition to the different features available for Machine Learning modelling.

Once I had the data pipeline ready to build all these features at any given point in time, I downloaded a CSV from Dune Analytics with Aave and Compound borrowers: lending_accounts_20221010.csv.

Next step was to repeat borrowers at different snapshots between their first Aave/Compound event and their last event. For those who got liquidated, I stopped at the first liquidation event. So the model is only trained on the wallet profile at a given point in time; and ignores specific lending protocol behaviours.

Again, this is a deliberate choice to make the model user centric and applicable to any wallet with tokens and NFTs history. From my experience building enterprise credit models, previous lending and repayment features are the most predictive variables; using those would have made my model stronger, but only applicable to wallets who already engaged in DeFi and have a credit history.

Finally, I split the dataset into the following chunks:

  • Training: Only Aave records before 2022–04, filtering out all addresses who interacted with both Aave and Compound.
  • Validation: Aave records 2022–05
  • Test: Records after 2022–06

I further decomposed the test set into different variations to understand how well the model generalizes to: (a) a new protocol it has never seen and (b) wallets already present in training set vs new ones.

Training the Model

First I trained a mini XGBoost model on a random sample of 5k examples from the training set to select the top 400 features.

Then after some manual fine tuning, trained my first XGBoost model without anything fancy.

Please enjoy the simplicity of this XGBoost training notebook.

Results

Grouping model scores by bucket, the lowest one has under 6% liquidation risk, vs highest one +20%, so the model is doing a good job at sloping risk.
Performance drop from .66 to .61 AUC when focusing on new wallets only. Meaning that the model managed to proxy some Aave credit history through ERC20 token features such as Aave interest bearing tokens and such.
Another performance drop when moving from Aave to Compound, but it is still sloping risk.

Challenge

The hardest challenge in building this model is: how to avoid learning macro conditions?

For context, it is well known that the liquidation risk in DeFi protocols is tightly correlated with price volatility. Therefore, how to ensure that my model is not learning to predict risk based on token prices?

Even if the features don’t incorporate prices directly, the model will proxy and create wrong relationships. For example, when people traded ApeCoin, that’s when prices went down, and liquidation risk went up. This will not generalize in the future because there is no reason why risk will go up every time people trade more ApeCoin, or is there? Anyways, I just picked a random example, just like a model can learn useless relationships.

I scored Compound records before 2022–05, and even if the model had never seen those wallets nor that protocol, it does follow the monthly macro levels of liquidation.

The model is also using the wallet features to proxy macro trends.

This is a model weakness, and of course doesn’t generalize into the future. But it’s not very severe. The difference between AUC scores on Compound “future” data is only one point, which is pretty stable. (AUC = 0.59 against the past data vs 0.58 against future data.)

One avenue for future improvement is to switch from 30-day-liquidation target to a random window for each wallet; which will introduce noise in the macro signals and force the model to focus on the wallet signals.

Insights

Shapley Values Violin Chart for the top 20 features of the model

The chart looks funny for the layman but it’s actually a compact and powerful means to explain what the model is doing. Here’s how to read it:

  • Every dot represents a wallet.
  • Feature codes are on the left, descriptions are provided below.
  • Color: red means the feature is more frequent in the wallet; blue is less frequent; grey is not present at all.
  • Left means the model predicts less risk; right it’s predicting more liquidation risk.
Top 20 feature codes and descriptions

#0 Received any amount of tokens with small activity (s…coins?) These wallets tend to be more prone to liquidation. But while those who never received any coins in this category are clearly less risky; the relationship is more complex for those who received them. The model is using this variable in interaction with other ones to generate different risk scores.

#1 Sent any amount of Ether. The effect of this variable is typical in credit models, where you’ll find Credit Age as one of the main features. It’s a natural proxy to the time this wallet has been active, the longer the better.

#2 Received an equivalent $US 6-digit amount of DeFi tokens. Big players are generally less risky; but a few dots on the right indicate that this can interact with other variables to trigger higher risk scores. (High leverage degens?)

#4 and #5 The model likes the wallets who staked Ether or used Curve DAO.

#8 Model also likes the wallets who trade high activity (mainstream) tokens. Consistent with the first feature, #0 and #8 are basically saying that high activity tokens are good and low activity tokens are bad.

#17 Stable coin users are less risky. Calling out this one because of the clean shape in the violin: red on the left, blue on the right.

These interpretations are to be taken with a grain of salt as the challenge above is not resolved yet in this model; but they provide an interesting baseline to understand this first prototype and improve upon.

Conclusion

In summary, the model presented here was trained with Aave outcomes, using only data before May 2022. It generalizes well up to 4 months into the future; to a different protocol (Compound); and also to the wallets that were not part of its training set.

Also, the dataset and modelling notebooks are available here, so if you’d like to experiment further or look at different aspects of the data, you can save the time to process the 1.7 billion Ethereum transactions and jump right into the fun part.

DMs open on Twitter if you like this work and would like to contribute.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store