Datasets:
chunk_id stringlengths 36 36 | content stringlengths 16 6.19k | dense_vector list | sparse_indices listlengths 3 289 | sparse_values listlengths 3 289 | doc_type stringclasses 3
values | company stringclasses 32
values | company_name stringclasses 3
values | fiscal_year int32 2.02k 2.02k | confidentiality stringclasses 1
value | source_file stringclasses 84
values | num_pages int32 4 549 | financebench_doc_name stringclasses 84
values | fb_company stringclasses 32
values | fb_doc_period stringdate 2015-01-01 00:00:00 2024-01-01 00:00:00 | fb_gics_sector stringclasses 9
values | chunk_index int32 0 2.36k | page_number int32 1 549 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
00002683-add9-40bf-bb7d-f3bf4684dc6f | [FY2022 10K | Page 329] in aggregate principal amount of the debt securities of each affected series make written request upon the Trustee to institute such
action and have offered reasonable indemnity in connection with the same and the holders of a majority in aggregate principal
amount of the debt securities of each... | [
-0.036093421280384064,
-0.014244001358747482,
0.03409527614712715,
0.05489923432469368,
-0.0035807061940431595,
0.021400541067123413,
0.029539041221141815,
0.05832882225513458,
0.0319526232779026,
0.011778331361711025,
0.02440919168293476,
-0.009289337322115898,
0.0378711074590683,
-0.0456... | [
58671053,
59740736,
84077638,
113883297,
144051554,
158020840,
194714415,
272066790,
361186773,
379738165,
402908866,
435973150,
570245443,
705337195,
718402331,
783977789,
882435614,
903265471,
913951781,
915228966,
917062287,
930883039,
961573583,
1044312602,
109350... | [
1.7233781814575195,
1.4164990186691284,
1.9327385425567627,
1.7233781814575195,
1.4164990186691284,
1.9327385425567627,
1.8575197458267212,
1.7233781814575195,
1.4164990186691284,
1.4164990186691284,
1.4164990186691284,
1.7233781814575195,
1.4164990186691284,
1.4164990186691284,
1.416499... | 10k | jpmorgan | Unknown | 2,022 | public | JPMORGAN_2022_10K.pdf | 382 | JPMORGAN_2022_10K | JPMorgan | 2022 | Financials | 1,827 | 329 |
0001b199-9543-4bd1-a265-0a358a7fee9c | [FY2022 10K | Page 132] gan Chase Bank, N.A., as joint lead arrangers and joint book managers (Exhibit
10.2 to the Company’s Current Report on Form 8-K dated October 30, 2019 10.4 Amendment No. 1, dated as of August 25, 2022, to Five-Year Credit Agreement, dated as of October 30, 2019, among The
Boeing Company for itse... | [
-0.06107992306351662,
-0.023766787722706795,
0.004458695650100708,
0.03391985595226288,
-0.02973601594567299,
0.020280035212635994,
0.029763907194137573,
0.004620365798473358,
0.016049373894929886,
0.008487612940371037,
-0.02689458429813385,
-0.046975020319223404,
0.06625544279813766,
-0.0... | [
8255000,
14784032,
19522071,
24992248,
32328956,
36234414,
62221562,
201915537,
264741300,
297260153,
342027875,
407983593,
411347293,
413897922,
492661292,
516762017,
516830072,
522108628,
592326839,
673492538,
715590789,
718402331,
725740837,
749129358,
783977789,
... | [
1.275073528289795,
1.771626353263855,
1.6144474744796753,
1.275073528289795,
1.6144474744796753,
1.275073528289795,
1.6144474744796753,
1.275073528289795,
1.275073528289795,
1.275073528289795,
1.275073528289795,
1.6144474744796753,
1.275073528289795,
1.6144474744796753,
1.614447474479675... | 10k | boeing | Unknown | 2,022 | public | BOEING_2022_10K.pdf | 190 | BOEING_2022_10K | Boeing | 2022 | Industrials | 570 | 132 |
000421ae-beb2-4615-a8c7-a265dc5642ea | [FY2023 8K | Page 108] Section 2.09 Optional Conversion or Continuation of Revolving Credit Advances.
The Company may on any Business Day, upon notice given to the Agent not later than 11:00 A.M. (New York City time) on the third U.S.
Government Securities Business Day prior to the date of the proposed Conversion ... | [
-0.019598187878727913,
-0.03531866893172264,
0.017586195841431618,
0.06835076957941055,
-0.07460551708936691,
0.07539959996938705,
-0.021928977221250534,
-0.0026675742119550705,
-0.013906368985772133,
-0.01339811086654663,
-0.0030258260667324066,
-0.04514047130942345,
0.1004405990242958,
-... | [
19522071,
28501148,
36234414,
58671053,
84077638,
103616747,
107966814,
130405786,
140061020,
273987075,
422208903,
520205939,
540511657,
695743380,
718402331,
728487644,
783977789,
801446384,
930883039,
964745590,
987377478,
1052767088,
1115203559,
1124641761,
119417... | [
1.9297584295272827,
1.7186450958251953,
1.7186450958251953,
1.7186450958251953,
1.410115122795105,
1.410115122795105,
1.410115122795105,
1.8538511991500854,
1.410115122795105,
1.410115122795105,
1.410115122795105,
1.7186450958251953,
1.410115122795105,
1.7186450958251953,
1.4101151227951... | 8k | pepsico | Unknown | 2,023 | public | PEPSICO_2023_8K_dated-2023-05-30.pdf | 172 | PEPSICO_2023_8K_dated-2023-05-30 | PepsiCo | 2023 | Consumer Staples | 497 | 108 |
00055ae1-20c0-470c-98ca-1b5c82e9abbc | [10K | Page 114] AnnualReport on Form 10-K for the fiscal year ended May 27, 2018).10.25 Agreements, dated November 29, 1989, by and between the Company and Nestle S.A AnnualReport on Form 10-K for the fiscal year ended May 27, 2018).10.25 Agreements, dated November 29, 1989, by and between the Company and Nestle S.A (... | [
-0.010962311178445816,
-0.013606281019747257,
0.05856551602482796,
0.07251699268817902,
-0.023176908493041992,
0.009512043558061123,
-0.031043272465467453,
0.003979934845119715,
-0.005184189882129431,
0.12584054470062256,
0.004223877098411322,
-0.011636925861239433,
0.05189647898077965,
-0... | [
14784032,
19522071,
43262580,
53472192,
62221562,
123113293,
154979824,
161802799,
188927665,
219659958,
260759230,
361186773,
374304644,
378174453,
389542590,
394428255,
494070171,
613148321,
666694729,
697303812,
704967072,
718402331,
738393179,
741029359,
749129358... | [
1.9107069969177246,
1.2521120309829712,
1.7567061185836792,
1.5959194898605347,
1.7567061185836792,
1.2521120309829712,
1.2521120309829712,
1.7567061185836792,
1.5959194898605347,
1.2521120309829712,
1.8498932123184204,
1.5959194898605347,
1.2521120309829712,
1.5959194898605347,
1.252112... | 10k | general_mills | Unknown | 2,019 | public | GENERALMILLS_2019_10K.pdf | 140 | GENERALMILLS_2019_10K | General Mills | 2019 | Consumer Staples | 718 | 114 |
00069756-315c-42f5-874f-52a6c8ff9579 | [FY2022 10K | Page 73] relief from royalty method.
5. The fair value of IPR&D was determined using the income approach, specifically the multi-period excess earnings method. The fair value of the identified intangible assets subject to amortization are amortized over the assets’ estimated useful lives based on the patt... | [
-0.06574323028326035,
0.022296300157904625,
0.0519774854183197,
0.01662737876176834,
0.004103144630789757,
0.006825046148151159,
-0.005032041575759649,
-0.021253252401947975,
0.02229764685034752,
0.04374108836054802,
0.0008000148227438331,
-0.025663746520876884,
-0.005296696443110704,
-0.0... | [
100532018,
123462699,
151069542,
192389267,
424203027,
437624215,
623355355,
640124220,
653391441,
780280564,
783977789,
784500617,
841656574,
875398669,
897829118,
952937786,
989509788,
1010104944,
1093501205,
1124641761,
1222283305,
1263887312,
1285987188,
1289414591,... | [
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.7950598001480103,
1.5160161256790161,
1.5160161256790161,
1.7950598001480103,
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.5160161256790161,
1.795059... | 10k | amd | Unknown | 2,022 | public | AMD_2022_10K.pdf | 121 | AMD_2022_10K | AMD | 2022 | Information Technology | 568 | 73 |
0007407b-721f-4eed-8cae-146cb6ed5979 | [FY2023 10Q | Page 6] eal estate transactions — 4,373,820
Acquisitions, net of cash acquired — (1,597,739)
Proceeds from repayment of principal on note receivable 152,518 — Distributions from unconsolidated affiliates 6,019 925
Investments and other (216,485) (155,280)
Net cash provided by (used in) investing activitie... | [
0.0013020324986428022,
-0.005927325692027807,
0.01096722949296236,
0.037510666996240616,
-0.029736867174506187,
-0.02529432624578476,
-0.03142590448260307,
-0.0010548814898356795,
0.00929039902985096,
0.039715833961963654,
-0.049944523721933365,
-0.04029497504234314,
0.03690844029188156,
0... | [
19522071,
36234414,
73978416,
76871726,
78085578,
102120688,
116350199,
151069542,
154959502,
158020840,
185582780,
210118348,
212512720,
243669559,
264741300,
276191442,
282052045,
286273892,
302425066,
438347465,
447523244,
484543298,
516830072,
520205939,
550510908... | [
1.8512613773345947,
1.254622459411621,
1.254622459411621,
1.254622459411621,
1.254622459411621,
1.5979571342468262,
1.7583515644073486,
1.254622459411621,
1.254622459411621,
1.5979571342468262,
1.254622459411621,
1.254622459411621,
1.254622459411621,
1.254622459411621,
1.254622459411621,... | 10q | mgm_resorts | Unknown | 2,023 | public | MGMRESORTS_2023Q2_10Q.pdf | 62 | MGMRESORTS_2023Q2_10Q | MGM Resorts | 2023 | Consumer Discretionary | 16 | 6 |
00082cd3-2493-4b2c-afd7-5e5bc40c7353 | [FY2021 10K | Page 116] EXHIBIT 32.1
CERTIFICATION OF CHIEF EXECUTIVE OFFICER PURSUANT TO SECTION 906 OF THE SARBANES-OXLEY ACT OF 2002,
PURSUANT TO SECTION 1350 OF CHAPTER 63 OF TITLE 18 OF THE UNITED STATES CODE
I, Hans E. Vestberg, Chairman and Chief Executive Officer of Verizon Communications Inc. (the Company), ce... | [
-0.009064792655408382,
-0.02593613602221012,
0.023282883688807487,
0.04487323760986328,
0.02430996671319008,
0.021205535158514977,
0.0037934817373752594,
0.027690300717949867,
0.02101266384124756,
-0.02459966391324997,
0.04827685281634331,
0.06861007213592529,
0.019750768318772316,
0.02298... | [
12171607,
14784032,
19522071,
58671053,
62221562,
84077638,
84932836,
100532018,
106015722,
126729552,
161006309,
325801278,
361186773,
375772094,
405553547,
442064690,
508896879,
516115046,
538856249,
623624099,
647566671,
718402331,
722829366,
783977789,
809654831,
... | [
1.366990327835083,
1.366990327835083,
1.366990327835083,
1.8285714387893677,
1.9091525077819824,
1.366990327835083,
1.686227560043335,
1.366990327835083,
1.366990327835083,
1.366990327835083,
1.8285714387893677,
1.366990327835083,
1.366990327835083,
1.366990327835083,
1.366990327835083,
... | 10k | verizon | Unknown | 2,021 | public | VERIZON_2021_10K.pdf | 120 | VERIZON_2021_10K | Verizon | 2021 | Communication Services | 694 | 116 |
00093e86-204f-4da9-942f-63d3f1bab7c0 | [10K | Page 123] We are not required ( | [
-0.001258366391994059,
-0.031344737857580185,
0.023593146353960037,
0.04436665400862694,
-0.01206400990486145,
0.06450881063938141,
-0.03210818022489548,
0.0830228254199028,
0.019499242305755615,
0.05326776206493378,
-0.036301832646131516,
-0.07947711646556854,
0.048837658017873764,
0.0184... | [
516115046,
783977789,
1263887312,
1632341525
] | [
1.6741974353790283,
1.6741974353790283,
1.6741974353790283,
1.6741974353790283
] | 10k | general_mills | Unknown | 2,019 | public | GENERALMILLS_2019_10K.pdf | 140 | GENERALMILLS_2019_10K | General Mills | 2019 | Consumer Staples | 779 | 123 |
0009513c-0ca7-4859-aef7-d8af47130520 | "[FY2022 10K | Page 199] nts significantly exceed, in the Firm’s\nview, the possible losses that c(...TRUNCATED) | [-0.06448780000209808,0.001931021804921329,0.05120464041829109,0.019339481368660927,-0.0310360491275(...TRUNCATED) | [14784032,209336618,272066790,361186773,367477241,522108628,590228230,640124220,783977789,787166532,(...TRUNCATED) | [1.5536551475524902,1.5536551475524902,1.5536551475524902,1.5536551475524902,1.5536551475524902,1.55(...TRUNCATED) | 10k | jpmorgan | Unknown | 2,022 | public | JPMORGAN_2022_10K.pdf | 382 | JPMORGAN_2022_10K | JPMorgan | 2022 | Financials | 1,087 | 199 |
0009f868-036f-4dc8-8a22-5fdd329ac2a2 | "[10K | Page 52] urchase price over par value as a reduction of retained earnings. During 2020, we r(...TRUNCATED) | [-0.001601856085471809,-0.01024798396974802,-0.017165863886475563,0.023481756448745728,-0.0195261668(...TRUNCATED) | [102120688,105243925,113387672,138383321,149100262,251112200,264741300,281546400,406093298,503671584(...TRUNCATED) | [1.8028168678283691,1.9817030429840088,1.8028168678283691,1.527114987373352,1.8028168678283691,1.527(...TRUNCATED) | 10k | lockheed_martin | Unknown | 2,020 | public | LOCKHEEDMARTIN_2020_10K.pdf | 132 | LOCKHEEDMARTIN_2020_10K | Lockheed Martin | 2020 | Industrials | 359 | 52 |
FinanceBench (voyage-finance-2 embeddings)
Pre-computed embeddings for the FinanceBench corpus. Skips ~$5-15 of Voyage API cost and ~30 minutes of ingest time vs re-embedding from raw PDFs. Intended consumer: the RAG agent at Rishabhmannu/financebench-rag-agent (install: pip install financebench-rag-agent).
What's in the box (frozen)
| Field | Value |
|---|---|
| Source corpus | FinanceBench (SEC filings: 10-K, 10-Q, 8-K, earnings releases) |
| Parser | pypdf (canonical) |
| Documents | 84 distinct SEC filings |
| Chunks | 68,059 |
| Dense vectors | voyage/voyage-finance-2 (1024-dim, cosine) |
| Sparse vectors | BM25 tokens (Qdrant fastembed) |
| File format | Parquet (snappy) + manifest.json |
| License | CC BY-NC 4.0 (inherited from FinanceBench) |
Sector coverage
| Sector | Chunks |
|---|---|
| Consumer Staples | 15,714 |
| Consumer Discretionary | 10,563 |
| Information Technology | 10,221 |
| Industrials | 7,095 |
| Financials | 7,030 |
| Health Care | 6,465 |
| Utilities | 5,534 |
| Communication Services | 3,568 |
| Materials | 1,869 |
Use it
Prerequisites: install the CLI and bring up the local stack — see the project's docs/setup.md. The CLI command below assumes the api container is already running.
CLI consumer (recommended)
financebench seed --from-hf cmpunkmannu/financebench-voyage-finance-2-embeddings
Direct parquet (any RAG stack — no project setup needed)
from huggingface_hub import hf_hub_download
import pandas as pd
path = hf_hub_download(
repo_id="cmpunkmannu/financebench-voyage-finance-2-embeddings",
filename="chunks.parquet",
repo_type="dataset",
)
df = pd.read_parquet(path)
Schema (chunks.parquet)
| Column | Type | Description |
|---|---|---|
chunk_id |
string (UUID) | Stable point ID |
content |
string | Header-annotated chunk text |
dense_vector |
fixed_size_list[1024] | voyage-finance-2 embedding |
sparse_indices |
list | BM25 sparse vector indices |
sparse_values |
list | BM25 sparse vector values (parallel to indices) |
doc_type |
string | e.g. 10k, 10q, earnings |
fb_company |
string | FinanceBench canonical company |
fb_doc_period |
string | Fiscal period (year) |
fb_gics_sector |
string | GICS sector |
financebench_doc_name |
string | FB document ID |
page_number |
int32 | PDF page number |
chunk_index |
int32 | Position within document |
source_file |
string | PDF filename |
company, company_name, fiscal_year, confidentiality, num_pages |
mixed | Additional payload fields |
manifest.json contains pipeline config, the exact dense distance metric, and the snapshot timestamp.
License and attribution
The chunk text is derived from FinanceBench (Islam et al., 2023), released under CC BY-NC 4.0. This dataset inherits that license — non-commercial use only.
Voyage AI's voyage-finance-2 is a commercial API. Embeddings are redistributed under their terms of service. Cite Voyage AI when using.
@article{islam2023financebench,
title={FinanceBench: A New Benchmark for Financial Question Answering},
author={Islam, Pranab and Kannappan, Anand and Kiela, Douwe and others},
journal={arXiv preprint arXiv:2311.11944},
year={2023}
}
Snapshot provenance (frozen)
- Snapshot generated:
2026-06-01T17:46:12.918581+00:00 - Generated by:
financebench-rag-agentv0.3.0 - Source project: https://github.com/Rishabhmannu/financebench-rag-agent
- For current eval results, methodology, and pipeline updates: see the project repo's
docs/. This README is frozen at snapshot time; the project may have moved on.
- Downloads last month
- 39