GitHub
When analyzing a GitHub repository, SOMEF uses the GitHub REST API
(GET /repos/{owner}/{repo}) to retrieve metadata. The table below shows how GitHub API
fields map to SOMEF categories:
| SOMEF category | GitHub API field | Notes |
|---|---|---|
code_repository |
html_url |
|
owner |
owner.login |
agent_type is extracted from owner.type (User or Organization) |
date_created |
created_at |
|
date_updated |
updated_at |
|
license |
license |
Nested object with spdx_id, name, url |
description |
description |
|
name |
name |
|
full_name |
full_name |
Format: {owner}/{repo} |
issue_tracker |
issues_url |
The {/number} suffix is stripped |
forks_url |
forks_url |
|
stargazers_count |
stargazers_count |
|
keywords |
topics |
|
forks_count |
forks_count |
|
homepage |
homepage |
|
programming_languages |
languages |
Additional GET to /repos/{owner}/{repo}/languages. Returns a dictionary with byte counts per language |
releases |
/repos/{owner}/{repo}/releases |
Paginated results, mapped via release_crosswalk_table |
download_url |
(constructed) | Built as https://github.com/{owner}/{repo}/releases |
Archive download¶
SOMEF downloads the repository archive from https://github.com/{owner}/{repo}/archive/{ref}.zip.
GitHub archive URLs do not include a Content-Length header, so SOMEF uses a streaming check:
it reads the response in 1 MB chunks and aborts if the total exceeds the configured size limit
(see --download-limit).
If the ref name is ambiguous (a branch and a tag share the same name), GitHub returns HTTP 300. SOMEF handles this by trying the following fallback URLs in order:
https://github.com/{owner}/{repo}/archive/{ref}.zip(short form)https://github.com/{owner}/{repo}/archive/refs/heads/{ref}.zip(explicit branch)https://github.com/{owner}/{repo}/archive/refs/tags/{ref}.zip(explicit tag)https://github.com/{owner}/{repo}/archive/main.zip(legacy rename fallback)
Limitations¶
- Private repositories: SOMEF cannot access private repositories without a valid token.
Enrichment via CODEOWNERS¶
When --reconcile_authors (-ra) is enabled, SOMEF fetches additional user details
(name, company, email) from GET https://api.github.com/users/{username} for the
repository owner and for each CODEOWNERS entry.