Dataset Viewer
Auto-converted to Parquet Duplicate
chunk_id
stringlengths
36
36
content
stringlengths
16
6.19k
dense_vector
list
sparse_indices
listlengths
3
289
sparse_values
listlengths
3
289
doc_type
stringclasses
3 values
company
stringclasses
32 values
company_name
stringclasses
3 values
fiscal_year
int32
2.02k
2.02k
confidentiality
stringclasses
1 value
source_file
stringclasses
84 values
num_pages
int32
4
549
financebench_doc_name
stringclasses
84 values
fb_company
stringclasses
32 values
fb_doc_period
stringdate
2015-01-01 00:00:00
2024-01-01 00:00:00
fb_gics_sector
stringclasses
9 values
chunk_index
int32
0
2.36k
page_number
int32
1
549
00002683-add9-40bf-bb7d-f3bf4684dc6f
[FY2022 10K | Page 329] in aggregate principal amount of the debt securities of each affected series make written request upon the Trustee to institute such action and have offered reasonable indemnity in connection with the same and the holders of a majority in aggregate principal amount of the debt securities of each...
[ -0.036093421280384064, -0.014244001358747482, 0.03409527614712715, 0.05489923432469368, -0.0035807061940431595, 0.021400541067123413, 0.029539041221141815, 0.05832882225513458, 0.0319526232779026, 0.011778331361711025, 0.02440919168293476, -0.009289337322115898, 0.0378711074590683, -0.0456...
[ 58671053, 59740736, 84077638, 113883297, 144051554, 158020840, 194714415, 272066790, 361186773, 379738165, 402908866, 435973150, 570245443, 705337195, 718402331, 783977789, 882435614, 903265471, 913951781, 915228966, 917062287, 930883039, 961573583, 1044312602, 109350...
[ 1.7233781814575195, 1.4164990186691284, 1.9327385425567627, 1.7233781814575195, 1.4164990186691284, 1.9327385425567627, 1.8575197458267212, 1.7233781814575195, 1.4164990186691284, 1.4164990186691284, 1.4164990186691284, 1.7233781814575195, 1.4164990186691284, 1.4164990186691284, 1.416499...
10k
jpmorgan
Unknown
2,022
public
JPMORGAN_2022_10K.pdf
382
JPMORGAN_2022_10K
JPMorgan
2022
Financials
1,827
329
0001b199-9543-4bd1-a265-0a358a7fee9c
[FY2022 10K | Page 132] gan Chase Bank, N.A., as joint lead arrangers and joint book managers (Exhibit 10.2 to the Company’s Current Report on Form 8-K dated October 30, 2019 10.4 Amendment No. 1, dated as of August 25, 2022, to Five-Year Credit Agreement, dated as of October 30, 2019, among The Boeing Company for itse...
[ -0.06107992306351662, -0.023766787722706795, 0.004458695650100708, 0.03391985595226288, -0.02973601594567299, 0.020280035212635994, 0.029763907194137573, 0.004620365798473358, 0.016049373894929886, 0.008487612940371037, -0.02689458429813385, -0.046975020319223404, 0.06625544279813766, -0.0...
[ 8255000, 14784032, 19522071, 24992248, 32328956, 36234414, 62221562, 201915537, 264741300, 297260153, 342027875, 407983593, 411347293, 413897922, 492661292, 516762017, 516830072, 522108628, 592326839, 673492538, 715590789, 718402331, 725740837, 749129358, 783977789, ...
[ 1.275073528289795, 1.771626353263855, 1.6144474744796753, 1.275073528289795, 1.6144474744796753, 1.275073528289795, 1.6144474744796753, 1.275073528289795, 1.275073528289795, 1.275073528289795, 1.275073528289795, 1.6144474744796753, 1.275073528289795, 1.6144474744796753, 1.614447474479675...
10k
boeing
Unknown
2,022
public
BOEING_2022_10K.pdf
190
BOEING_2022_10K
Boeing
2022
Industrials
570
132
000421ae-beb2-4615-a8c7-a265dc5642ea
[FY2023 8K | Page 108] Section 2.09      Optional Conversion or Continuation of Revolving Credit Advances. The Company may on any Business Day, upon notice given to the Agent not later than 11:00 A.M. (New York City time) on the third U.S. Government Securities Business Day prior to the date of the proposed Conversion ...
[ -0.019598187878727913, -0.03531866893172264, 0.017586195841431618, 0.06835076957941055, -0.07460551708936691, 0.07539959996938705, -0.021928977221250534, -0.0026675742119550705, -0.013906368985772133, -0.01339811086654663, -0.0030258260667324066, -0.04514047130942345, 0.1004405990242958, -...
[ 19522071, 28501148, 36234414, 58671053, 84077638, 103616747, 107966814, 130405786, 140061020, 273987075, 422208903, 520205939, 540511657, 695743380, 718402331, 728487644, 783977789, 801446384, 930883039, 964745590, 987377478, 1052767088, 1115203559, 1124641761, 119417...
[ 1.9297584295272827, 1.7186450958251953, 1.7186450958251953, 1.7186450958251953, 1.410115122795105, 1.410115122795105, 1.410115122795105, 1.8538511991500854, 1.410115122795105, 1.410115122795105, 1.410115122795105, 1.7186450958251953, 1.410115122795105, 1.7186450958251953, 1.4101151227951...
8k
pepsico
Unknown
2,023
public
PEPSICO_2023_8K_dated-2023-05-30.pdf
172
PEPSICO_2023_8K_dated-2023-05-30
PepsiCo
2023
Consumer Staples
497
108
00055ae1-20c0-470c-98ca-1b5c82e9abbc
[10K | Page 114] AnnualReport on Form 10-K for the fiscal year ended May 27, 2018).10.25 Agreements, dated November 29, 1989, by and between the Company and Nestle S.A AnnualReport on Form 10-K for the fiscal year ended May 27, 2018).10.25 Agreements, dated November 29, 1989, by and between the Company and Nestle S.A (...
[ -0.010962311178445816, -0.013606281019747257, 0.05856551602482796, 0.07251699268817902, -0.023176908493041992, 0.009512043558061123, -0.031043272465467453, 0.003979934845119715, -0.005184189882129431, 0.12584054470062256, 0.004223877098411322, -0.011636925861239433, 0.05189647898077965, -0...
[ 14784032, 19522071, 43262580, 53472192, 62221562, 123113293, 154979824, 161802799, 188927665, 219659958, 260759230, 361186773, 374304644, 378174453, 389542590, 394428255, 494070171, 613148321, 666694729, 697303812, 704967072, 718402331, 738393179, 741029359, 749129358...
[ 1.9107069969177246, 1.2521120309829712, 1.7567061185836792, 1.5959194898605347, 1.7567061185836792, 1.2521120309829712, 1.2521120309829712, 1.7567061185836792, 1.5959194898605347, 1.2521120309829712, 1.8498932123184204, 1.5959194898605347, 1.2521120309829712, 1.5959194898605347, 1.252112...
10k
general_mills
Unknown
2,019
public
GENERALMILLS_2019_10K.pdf
140
GENERALMILLS_2019_10K
General Mills
2019
Consumer Staples
718
114
00069756-315c-42f5-874f-52a6c8ff9579
[FY2022 10K | Page 73] relief from royalty method. 5. The fair value of IPR&D was determined using the income approach, specifically the multi-period excess earnings method. The fair value of the identified intangible assets subject to amortization are amortized over the assets’ estimated useful lives based on the patt...
[ -0.06574323028326035, 0.022296300157904625, 0.0519774854183197, 0.01662737876176834, 0.004103144630789757, 0.006825046148151159, -0.005032041575759649, -0.021253252401947975, 0.02229764685034752, 0.04374108836054802, 0.0008000148227438331, -0.025663746520876884, -0.005296696443110704, -0.0...
[ 100532018, 123462699, 151069542, 192389267, 424203027, 437624215, 623355355, 640124220, 653391441, 780280564, 783977789, 784500617, 841656574, 875398669, 897829118, 952937786, 989509788, 1010104944, 1093501205, 1124641761, 1222283305, 1263887312, 1285987188, 1289414591,...
[ 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.7950598001480103, 1.5160161256790161, 1.5160161256790161, 1.7950598001480103, 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.5160161256790161, 1.795059...
10k
amd
Unknown
2,022
public
AMD_2022_10K.pdf
121
AMD_2022_10K
AMD
2022
Information Technology
568
73
0007407b-721f-4eed-8cae-146cb6ed5979
[FY2023 10Q | Page 6] eal estate transactions — 4,373,820 Acquisitions, net of cash acquired — (1,597,739) Proceeds from repayment of principal on note receivable 152,518 — Distributions from unconsolidated affiliates 6,019 925 Investments and other (216,485) (155,280) Net cash provided by (used in) investing activitie...
[ 0.0013020324986428022, -0.005927325692027807, 0.01096722949296236, 0.037510666996240616, -0.029736867174506187, -0.02529432624578476, -0.03142590448260307, -0.0010548814898356795, 0.00929039902985096, 0.039715833961963654, -0.049944523721933365, -0.04029497504234314, 0.03690844029188156, 0...
[ 19522071, 36234414, 73978416, 76871726, 78085578, 102120688, 116350199, 151069542, 154959502, 158020840, 185582780, 210118348, 212512720, 243669559, 264741300, 276191442, 282052045, 286273892, 302425066, 438347465, 447523244, 484543298, 516830072, 520205939, 550510908...
[ 1.8512613773345947, 1.254622459411621, 1.254622459411621, 1.254622459411621, 1.254622459411621, 1.5979571342468262, 1.7583515644073486, 1.254622459411621, 1.254622459411621, 1.5979571342468262, 1.254622459411621, 1.254622459411621, 1.254622459411621, 1.254622459411621, 1.254622459411621,...
10q
mgm_resorts
Unknown
2,023
public
MGMRESORTS_2023Q2_10Q.pdf
62
MGMRESORTS_2023Q2_10Q
MGM Resorts
2023
Consumer Discretionary
16
6
00082cd3-2493-4b2c-afd7-5e5bc40c7353
[FY2021 10K | Page 116] EXHIBIT 32.1 CERTIFICATION OF CHIEF EXECUTIVE OFFICER PURSUANT TO SECTION 906 OF THE SARBANES-OXLEY ACT OF 2002, PURSUANT TO SECTION 1350 OF CHAPTER 63 OF TITLE 18 OF THE UNITED STATES CODE I, Hans E. Vestberg, Chairman and Chief Executive Officer of Verizon Communications Inc. (the Company), ce...
[ -0.009064792655408382, -0.02593613602221012, 0.023282883688807487, 0.04487323760986328, 0.02430996671319008, 0.021205535158514977, 0.0037934817373752594, 0.027690300717949867, 0.02101266384124756, -0.02459966391324997, 0.04827685281634331, 0.06861007213592529, 0.019750768318772316, 0.02298...
[ 12171607, 14784032, 19522071, 58671053, 62221562, 84077638, 84932836, 100532018, 106015722, 126729552, 161006309, 325801278, 361186773, 375772094, 405553547, 442064690, 508896879, 516115046, 538856249, 623624099, 647566671, 718402331, 722829366, 783977789, 809654831, ...
[ 1.366990327835083, 1.366990327835083, 1.366990327835083, 1.8285714387893677, 1.9091525077819824, 1.366990327835083, 1.686227560043335, 1.366990327835083, 1.366990327835083, 1.366990327835083, 1.8285714387893677, 1.366990327835083, 1.366990327835083, 1.366990327835083, 1.366990327835083, ...
10k
verizon
Unknown
2,021
public
VERIZON_2021_10K.pdf
120
VERIZON_2021_10K
Verizon
2021
Communication Services
694
116
00093e86-204f-4da9-942f-63d3f1bab7c0
[10K | Page 123] We are not required (
[ -0.001258366391994059, -0.031344737857580185, 0.023593146353960037, 0.04436665400862694, -0.01206400990486145, 0.06450881063938141, -0.03210818022489548, 0.0830228254199028, 0.019499242305755615, 0.05326776206493378, -0.036301832646131516, -0.07947711646556854, 0.048837658017873764, 0.0184...
[ 516115046, 783977789, 1263887312, 1632341525 ]
[ 1.6741974353790283, 1.6741974353790283, 1.6741974353790283, 1.6741974353790283 ]
10k
general_mills
Unknown
2,019
public
GENERALMILLS_2019_10K.pdf
140
GENERALMILLS_2019_10K
General Mills
2019
Consumer Staples
779
123
0009513c-0ca7-4859-aef7-d8af47130520
"[FY2022 10K | Page 199] nts significantly exceed, in the Firm’s\nview, the possible losses that c(...TRUNCATED)
[-0.06448780000209808,0.001931021804921329,0.05120464041829109,0.019339481368660927,-0.0310360491275(...TRUNCATED)
[14784032,209336618,272066790,361186773,367477241,522108628,590228230,640124220,783977789,787166532,(...TRUNCATED)
[1.5536551475524902,1.5536551475524902,1.5536551475524902,1.5536551475524902,1.5536551475524902,1.55(...TRUNCATED)
10k
jpmorgan
Unknown
2,022
public
JPMORGAN_2022_10K.pdf
382
JPMORGAN_2022_10K
JPMorgan
2022
Financials
1,087
199
0009f868-036f-4dc8-8a22-5fdd329ac2a2
"[10K | Page 52] urchase price over par value as a reduction of retained earnings. During 2020, we r(...TRUNCATED)
[-0.001601856085471809,-0.01024798396974802,-0.017165863886475563,0.023481756448745728,-0.0195261668(...TRUNCATED)
[102120688,105243925,113387672,138383321,149100262,251112200,264741300,281546400,406093298,503671584(...TRUNCATED)
[1.8028168678283691,1.9817030429840088,1.8028168678283691,1.527114987373352,1.8028168678283691,1.527(...TRUNCATED)
10k
lockheed_martin
Unknown
2,020
public
LOCKHEEDMARTIN_2020_10K.pdf
132
LOCKHEEDMARTIN_2020_10K
Lockheed Martin
2020
Industrials
359
52
End of preview. Expand in Data Studio

FinanceBench (voyage-finance-2 embeddings)

Pre-computed embeddings for the FinanceBench corpus. Skips ~$5-15 of Voyage API cost and ~30 minutes of ingest time vs re-embedding from raw PDFs. Intended consumer: the RAG agent at Rishabhmannu/financebench-rag-agent (install: pip install financebench-rag-agent).

What's in the box (frozen)

Field Value
Source corpus FinanceBench (SEC filings: 10-K, 10-Q, 8-K, earnings releases)
Parser pypdf (canonical)
Documents 84 distinct SEC filings
Chunks 68,059
Dense vectors voyage/voyage-finance-2 (1024-dim, cosine)
Sparse vectors BM25 tokens (Qdrant fastembed)
File format Parquet (snappy) + manifest.json
License CC BY-NC 4.0 (inherited from FinanceBench)

Sector coverage

Sector Chunks
Consumer Staples 15,714
Consumer Discretionary 10,563
Information Technology 10,221
Industrials 7,095
Financials 7,030
Health Care 6,465
Utilities 5,534
Communication Services 3,568
Materials 1,869

Use it

Prerequisites: install the CLI and bring up the local stack — see the project's docs/setup.md. The CLI command below assumes the api container is already running.

CLI consumer (recommended)

financebench seed --from-hf cmpunkmannu/financebench-voyage-finance-2-embeddings

Direct parquet (any RAG stack — no project setup needed)

from huggingface_hub import hf_hub_download
import pandas as pd

path = hf_hub_download(
    repo_id="cmpunkmannu/financebench-voyage-finance-2-embeddings",
    filename="chunks.parquet",
    repo_type="dataset",
)
df = pd.read_parquet(path)

Schema (chunks.parquet)

Column Type Description
chunk_id string (UUID) Stable point ID
content string Header-annotated chunk text
dense_vector fixed_size_list[1024] voyage-finance-2 embedding
sparse_indices list BM25 sparse vector indices
sparse_values list BM25 sparse vector values (parallel to indices)
doc_type string e.g. 10k, 10q, earnings
fb_company string FinanceBench canonical company
fb_doc_period string Fiscal period (year)
fb_gics_sector string GICS sector
financebench_doc_name string FB document ID
page_number int32 PDF page number
chunk_index int32 Position within document
source_file string PDF filename
company, company_name, fiscal_year, confidentiality, num_pages mixed Additional payload fields

manifest.json contains pipeline config, the exact dense distance metric, and the snapshot timestamp.

License and attribution

The chunk text is derived from FinanceBench (Islam et al., 2023), released under CC BY-NC 4.0. This dataset inherits that license — non-commercial use only.

Voyage AI's voyage-finance-2 is a commercial API. Embeddings are redistributed under their terms of service. Cite Voyage AI when using.

@article{islam2023financebench,
  title={FinanceBench: A New Benchmark for Financial Question Answering},
  author={Islam, Pranab and Kannappan, Anand and Kiela, Douwe and others},
  journal={arXiv preprint arXiv:2311.11944},
  year={2023}
}

Snapshot provenance (frozen)

  • Snapshot generated: 2026-06-01T17:46:12.918581+00:00
  • Generated by: financebench-rag-agent v0.3.0
  • Source project: https://github.com/Rishabhmannu/financebench-rag-agent
  • For current eval results, methodology, and pipeline updates: see the project repo's docs/. This README is frozen at snapshot time; the project may have moved on.
Downloads last month
39

Paper for cmpunkmannu/financebench-voyage-finance-2-embeddings